CN111582220B - Bone point behavior recognition system based on shift map convolution neural network and recognition method thereof - Google Patents

Bone point behavior recognition system based on shift map convolution neural network and recognition method thereof Download PDF

Info

Publication number
CN111582220B
CN111582220B CN202010419839.4A CN202010419839A CN111582220B CN 111582220 B CN111582220 B CN 111582220B CN 202010419839 A CN202010419839 A CN 202010419839A CN 111582220 B CN111582220 B CN 111582220B
Authority
CN
China
Prior art keywords
image
point
behavior
joint
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010419839.4A
Other languages
Chinese (zh)
Other versions
CN111582220A (en
Inventor
张一帆
程科
程健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202010419839.4A priority Critical patent/CN111582220B/en
Publication of CN111582220A publication Critical patent/CN111582220A/en
Application granted granted Critical
Publication of CN111582220B publication Critical patent/CN111582220B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a skeletal point behavior recognition system based on a shift map convolution neural network, which comprises the following steps: the system comprises an image acquisition module, an image processing module, an extraction module and a behavior recognition module, wherein the image acquisition module is used for acquiring a behavior image; the image processing module is used for processing the behavior image acquired by the image acquisition module to perform image processing; the extraction module is used for extracting skeleton points of the image processed by the image processing module; the behavior recognition module is used for recognizing and extracting the behavior characteristics of the bone points extracted by the extraction module. The design behavior recognition module is used for carrying out novel graph convolution for recognizing the bone point behavior and reducing the calculated amount of graph convolution, and unlike the traditional graph convolution, the shift graph convolution does not expand the sensing range by expanding the convolution kernel, but enables graph features to be shifted and spliced through novel shift operation, so that the same or even higher recognition precision is achieved under the condition that the calculated amount is obviously reduced and the calculation speed is improved, and the calculated amount of the traditional graph convolution is prevented from being increased along with the increase of the convolution kernel.

Description

Bone point behavior recognition system based on shift map convolution neural network and recognition method thereof
Technical Field
The invention relates to a bone point behavior recognition system based on a shift map convolution neural network, which relates to the field of general image data processing or G06T generation, in particular to the field of G06T7/20 motion analysis.
Background
In the behavior recognition task, due to the restriction of data volume and algorithm, the behavior recognition model based on the RGB image is often interfered by the change of the visual angle and the complex background, so that the generalization performance is insufficient, and the robustness is poor in practical application. And behavior recognition based on skeletal point data can better address this problem.
In the bone point data, the human body is represented by coordinates of a plurality of predefined key nodes in a camera coordinate system. It can be conveniently obtained by a depth camera and various attitude estimation algorithms.
However, in this conventional graph convolution method, the convolution kernel modeled only covers a neighborhood of one point. However, in skeletal point behavior recognition tasks, some behaviors (e.g., clapping hands) require modeling the positional relationship of points that are physically far apart (e.g., two hands). This requires increasing the convolution kernel size of the graph convolution model. However, the calculation amount of graph convolution increases as the convolution kernel increases, so that the calculation amount of the traditional graph convolution is larger.
Disclosure of Invention
The invention aims to: a system for identifying skeletal point behaviors based on a shift map convolution neural network is provided to solve the above problems in the prior art.
The technical scheme is as follows: a shift-graph-based convolution neural network skeletal point behavior recognition system, comprising:
an image acquisition module for acquiring a behavior image;
the image processing module is used for processing the behavior image acquired by the image acquisition module to perform image processing;
the skeleton point extraction module is used for extracting the image processed by the image processing module;
and the behavior recognition module is used for recognizing and extracting the behavior characteristics of the bone points by the extraction module.
In a further embodiment, the image acquisition module is based on an image acquisition device, the image acquisition device comprises a camera which is arranged in an equilateral triangle shape, and a rotating device which is arranged at the tail part of the camera and comprises a rotating shaft fixedly connected with the camera, and a rotating motor which is sleeved on the rotating shaft.
In a further embodiment, the image acquisition module performs image capturing of human body behaviors through three groups of cameras which are arranged in an equilateral triangle shape, and further, before, after and on the side parts of the behavior images acquired by the three groups of cameras are installed, the behavior images are respectively displayed on the computer terminal, and then the image processing module is used for comparing and processing the images.
In a further embodiment, the image processing module is mainly configured to process the human behavior image acquired by the image acquisition module into a human edge map; through a Krisch edge detection operator, when detecting the edge of an image, using a convolution 3*3 template to traverse pixel points in the image, examining pixel gray values of adjacent areas around each pixel point one by one, and calculating the gray weighted sum difference value of the gray weights of three adjacent pixels and the gray weights of the other five pixels; the convolution templates are as follows:
Figure DEST_PATH_IMAGE002
1 2 3 4
Figure DEST_PATH_IMAGE004
5 6 7 8
using eight convolution templates to sequentially process all pixels in an original image, calculating to obtain the edge strength of the pixels, detecting the pixels through a threshold value, extracting the final edge point, and finishing edge detection;
the Krisch operator detection image edge implementation steps are as follows:
step 1, acquiring a data area pointer of an original image;
step 2, two buffer areas are established, the size of the buffer areas is the same as that of the original image, the buffer areas are mainly used for storing the original image and the original image copy, the two buffer areas are initialized to the original image copy, and the original image copy is marked as an image 1 and an image 2 respectively;
step 3, setting a Krisch template for convolution operation in each buffer area, traversing pixels in the duplicate images in two areas respectively, carrying out convolution operation one by one, comparing calculated results, storing calculated comparative values in the image 1, and copying the image 1 into the buffer image 2;
Step 4, repeating the step 3, setting the rest six templates once, performing calculation processing, and finally obtaining larger gray values in the image 1 and the image 2 and storing the larger gray values in the buffer image 1;
and 5, copying the processed image 1 into original image data, and programming to realize the edge processing of the image.
In a further embodiment, the extraction module is configured to extract a bone point of the image processed by the image processing module, and when the image processing module finishes processing the image acquired by the image acquisition module, the bone point position that is input in advance according to the closest acquired image agent body shape on the human body edge map is then displayed on the human body edge map.
In a further embodiment, the extracting module further includes a correction module, when the image acquiring module acquires the human behavior image, the sizes of the frames need to be normalized to the same size because the sizes of the frames are different when the same group of actions are performed by the people with different sizes of the frames;
firstly, selecting a skeleton of a person as a reference skeleton, selecting a body center point as a root node for certain frame skeleton data, calculating vectors from all points directly connected with the root node to the root node, respectively using the modular length of the vectors at each vector to obtain a direction vector (the modular length is 1) of each vector, multiplying the length of the corresponding vector in the reference skeleton by the direction vector to obtain a vector, adding the coordinates of the root node to the vector to obtain the corrected coordinates of a point directly connected with the root node, recording the coordinates of the connected points, using the coordinates as the coordinates of the corresponding normalized skeleton point, sequentially updating the coordinate values of the root node according to the sequence of a breadth-first search algorithm, repeating the steps until the values of all skeleton points are corrected, and the algorithm is as follows:
Input: the length of the limb in the reference addition is
Figure DEST_PATH_IMAGE006
Preparing normalizationIs a skeleton point coordinate value;
the first step: definition of the definition
Figure DEST_PATH_IMAGE008
The root node coordinates;
and a second step of: will be
Figure 475059DEST_PATH_IMAGE008
Giving an initial value of +.>
Figure DEST_PATH_IMAGE010
Thirdly, performing the following steps; for all things
Figure DEST_PATH_IMAGE012
) Sequentially executing according to breadth-first search strategies; />
Fourth step: calculation of
Figure DEST_PATH_IMAGE014
-
Figure DEST_PATH_IMAGE016
Fifth step: calculation of
Figure DEST_PATH_IMAGE018
Sixth step:
Figure 277930DEST_PATH_IMAGE008
+
Figure DEST_PATH_IMAGE020
will->
Figure DEST_PATH_IMAGE022
The values of (a) are saved to set a;
seventh step: returning to the third part, and knowing that all limbs in the skeleton are traversed;
and (3) outputting: the skeleton point coordinates stored in the set A are corrected coordinates;
wherein ,
Figure DEST_PATH_IMAGE024
the value representing->
Figure DEST_PATH_IMAGE026
Limbs, suffering from pain>
Figure 633431DEST_PATH_IMAGE006
Representing the +.sup.th in the reference valuation>
Figure 685701DEST_PATH_IMAGE026
Length of individual limb->
Figure 528892DEST_PATH_IMAGE012
Respectively represent +.>
Figure 717428DEST_PATH_IMAGE026
Coordinate values of the start node and the end node of the limb, so that all +.>
Figure 371394DEST_PATH_IMAGE022
Calculating the values of the bone points to obtain all corrected bone point coordinates, and scaling the estimated size under the condition of ensuring that the included angle between limbs is unchanged;
when the included angle between the limbs changes, the included angle between vectors is selected to describe the bone points so as to avoid the bone point deviation when the included angle between the limbs changes;
the step of solving the human joint vector included angle is as follows:
Solving the angle of a certain joint point, firstly obtaining three joint points used for calculating the angle, capturing three-dimensional coordinate values of the joint point by using Kinect, constructing structural vectors among the three joint points of the component, and then solving the size of an included angle of the joint vectors by adopting an inverse cosine theorem;
find the angle of the first joint
Figure DEST_PATH_IMAGE028
As an example;
selecting other two joint points connected with the first joint, acquiring three-dimensional coordinate values of the joint points captured by Kinect, wherein the other two joint points are expressed as
Figure DEST_PATH_IMAGE030
Figure DEST_PATH_IMAGE032
The first joint point is expressed as
Figure DEST_PATH_IMAGE034
Constructing an inter-articular structure vector from a first articulation point to
Figure 583064DEST_PATH_IMAGE030
Point vector->
Figure DEST_PATH_IMAGE036
=
Figure DEST_PATH_IMAGE038
First node to->
Figure 167629DEST_PATH_IMAGE032
Point vector->
Figure DEST_PATH_IMAGE040
=
Figure DEST_PATH_IMAGE042
Figure 148354DEST_PATH_IMAGE032
Point to Point->
Figure 97856DEST_PATH_IMAGE030
Vector of (2) is
Figure DEST_PATH_IMAGE044
Calculating vectors
Figure 200417DEST_PATH_IMAGE036
Sum vector->
Figure 650990DEST_PATH_IMAGE040
Included angle->
Figure 814118DEST_PATH_IMAGE028
Size of:
Figure DEST_PATH_IMAGE046
wherein ,
Figure 872203DEST_PATH_IMAGE028
in order to make the joint vector included angle representation more accurate, according to the importance ranking of joint angles in the course of action, selecting representative joint angles for representation, and correcting the bone point position by size normalization and angle correction.
In a further embodiment, the behavior recognition module is mainly configured to perform recognition and extraction of behavior features of bone points, shift and splice neighboring behavior features according to an adjacency relationship of the graph, and perform convolution of 1*1 only once after splicing to obtain calculated behavior features, for one of the behavior features
Figure DEST_PATH_IMAGE048
For each node diagram, the feature dimension is set as +.>
Figure DEST_PATH_IMAGE050
Characteristic size is +.>
Figure DEST_PATH_IMAGE052
Wherein node->
Figure DEST_PATH_IMAGE054
Is +.>
Figure DEST_PATH_IMAGE056
The adjacent nodes are adjacent to each other, and the set of adjacent nodes is
Figure DEST_PATH_IMAGE058
The method comprises the steps of carrying out a first treatment on the surface of the For->
Figure 74777DEST_PATH_IMAGE054
The shift map module equally divides the characteristics of the nodes into +.>
Figure 63462DEST_PATH_IMAGE056
+1 parts, the first part retaining its own characteristics, followed by +.>
Figure 877834DEST_PATH_IMAGE056
The shares are shifted from their neighbor node features, expressed mathematically as follows:
Figure DEST_PATH_IMAGE060
=
Figure DEST_PATH_IMAGE062
wherein ,
Figure DEST_PATH_IMAGE064
Figure 838312DEST_PATH_IMAGE060
subscript->
Figure DEST_PATH_IMAGE066
A tag representing Python, < >>
Figure DEST_PATH_IMAGE068
The double vertical lines represent feature dimensions for feature stitching.
A recognition method of a bone point behavior recognition system based on a shift graph convolution neural network comprises the following steps:
step 1, firstly, controlling a camera to rotate through an image acquisition module, and further acquiring a human behavior characteristic image; the rotation motor rotates to drive the rotation shaft to rotate, so that the rotation shaft drives the camera to rotate, and the position of the camera is adjusted;
step 2, the image acquisition module performs image shooting human body behaviors through three groups of cameras which are arranged in an equilateral triangle shape, and further, the behavior images acquired by the three groups of cameras are respectively displayed on a computer terminal before, after and at the side parts of the installation, so that the image processing module can perform contrast processing on the images;
Step 3, the image processing module is mainly used for processing the human behavior image acquired by the image acquisition module into a human edge image; through a Krisch edge detection operator, when detecting the edge of an image, using a convolution 3*3 template to traverse pixel points in the image, examining pixel gray values of adjacent areas around each pixel point one by one, and calculating the gray weighted sum difference value of the gray weights of three adjacent pixels and the gray weights of the other five pixels;
using eight convolution templates to sequentially process all pixels in an original image, calculating to obtain the edge strength of the pixels, detecting the pixels through a threshold value, extracting the final edge point, and finishing edge detection;
the Krisch operator detection image edge implementation steps are as follows:
step 1, acquiring a data area pointer of an original image;
step 2, two buffer areas are established, the size of the buffer areas is the same as that of the original image, the buffer areas are mainly used for storing the original image and the original image copy, the two buffer areas are initialized to the original image copy, and the original image copy is marked as an image 1 and an image 2 respectively;
step 3, setting a Krisch template for convolution operation in each buffer area, traversing pixels in the duplicate images in two areas respectively, carrying out convolution operation one by one, comparing calculated results, storing calculated comparative values in the image 1, and copying the image 1 into the buffer image 2;
Step 4, repeating the step 3, setting the rest six templates once, performing calculation processing, and finally obtaining larger gray values in the image 1 and the image 2 and storing the larger gray values in the buffer image 1;
step 5, copying the processed image 1 into original image data, and programming to realize the edge processing of the image;
step 4, when the human body behavior feature image processing is finished, the extraction module is used for extracting skeleton points of the image processed by the image processing module, and when the image processing module is finished processing the image acquired by the image acquisition module, the skeleton points which are matched and input in advance according to the body shape of the nearest acquired image agent are positioned on the human body edge map, and then the matched skeleton points are displayed on the human body edge map;
step 5, when the skeleton point extraction is completed, the position of the skeleton point is corrected by the correction module, and when the image acquisition module acquires the human body behavior image, the skeleton size is normalized to be the same size because the three-dimensional coordinates of the skeleton point are different due to the fact that the skeleton sizes of the people with different body types are different when the people with different body types perform the same group of actions; firstly, selecting a skeleton of a person as a reference skeleton, selecting a body center point as a root node for certain frame skeleton data, calculating vectors from all points directly connected with the root node to the root node, respectively taking the modular length of the vectors at each vector to obtain a direction vector (the modular length is 1) of each vector, multiplying the length of the corresponding vector in the reference skeleton by the direction vector to obtain a vector, adding the coordinates of the root node to the vector to obtain the corrected coordinates of a point directly connected with the root node, recording the coordinates of the connected points as the coordinate values of the corresponding bone point after normalization, sequentially updating the coordinate values of the root node according to the sequence of a breadth-first search algorithm, and repeating the steps until the values of all the bone points are corrected; the correction method is to scale the estimated size under the condition of ensuring that the included angle between limbs is unchanged;
When the included angle between the limbs changes, the included angle between vectors is selected to describe the bone points so as to avoid the bone point deviation when the included angle between the limbs changes;
the step of solving the human joint vector included angle is as follows:
solving the angle of a certain joint point, firstly obtaining three joint points used for calculating the angle, capturing three-dimensional coordinate values of the joint point by using Kinect, constructing structural vectors among the three joint points of the component, and then solving the size of an included angle of the joint vectors by adopting an inverse cosine theorem;
find the angle of the first joint
Figure 387236DEST_PATH_IMAGE028
As an example;
selecting other two joint points connected with the first joint, acquiring three-dimensional coordinate values of the joint points captured by Kinect, wherein the other two joint points are expressed as
Figure 179612DEST_PATH_IMAGE030
Figure 786174DEST_PATH_IMAGE032
The first joint point is expressed as
Figure 389324DEST_PATH_IMAGE034
Constructing an inter-articular structure vector from a first articulation point to
Figure 612495DEST_PATH_IMAGE030
Point vector->
Figure 615086DEST_PATH_IMAGE036
=
Figure 731947DEST_PATH_IMAGE038
First node to->
Figure 630633DEST_PATH_IMAGE032
Point vector->
Figure 950887DEST_PATH_IMAGE040
=
Figure 694852DEST_PATH_IMAGE042
Figure 931798DEST_PATH_IMAGE032
Point to Point->
Figure 735806DEST_PATH_IMAGE030
Vector of (2) is
Figure 806006DEST_PATH_IMAGE044
Calculating vectors
Figure 150399DEST_PATH_IMAGE036
Sum vector->
Figure 117218DEST_PATH_IMAGE040
Included angle->
Figure 216761DEST_PATH_IMAGE028
Size of: />
Figure DEST_PATH_IMAGE046A
wherein ,
Figure 714870DEST_PATH_IMAGE028
in order to make the joint vector included angle representation more accurate, selecting a representative joint angle to represent according to the importance ranking of the joint angles in the course of behavior, and correcting the bone point position through size normalization and angle correction;
Step 6, after the correction of the bone points is completed, the behavior recognition module is used for recognizing the behaviors of the bone points, adjacent behavior features are shifted and spliced according to the adjacent relation of the graph, the calculated behavior features can be obtained by only carrying out one 1*1 convolution after splicing, and one behavior feature is obtained
Figure 190851DEST_PATH_IMAGE048
For each node diagram, the feature dimension is set as +.>
Figure 746597DEST_PATH_IMAGE050
Characteristic size is +.>
Figure 33353DEST_PATH_IMAGE052
Wherein node->
Figure 940129DEST_PATH_IMAGE054
Is +.>
Figure 626325DEST_PATH_IMAGE056
The adjacent nodes are adjacent to each other, and the set of adjacent nodes is +.>
Figure 692370DEST_PATH_IMAGE058
The method comprises the steps of carrying out a first treatment on the surface of the For->
Figure 9082DEST_PATH_IMAGE054
The shift map module equally divides the characteristics of the nodes into +.>
Figure 278521DEST_PATH_IMAGE056
+1 parts, the first part retaining its own characteristics, followed by +.>
Figure 706091DEST_PATH_IMAGE056
The shares are shifted from their neighbor node features, expressed mathematically as follows:
Figure 626642DEST_PATH_IMAGE060
=
Figure 379835DEST_PATH_IMAGE062
wherein ,
Figure 57941DEST_PATH_IMAGE064
Figure 185076DEST_PATH_IMAGE060
subscript->
Figure 835500DEST_PATH_IMAGE066
A tag representing Python, < >>
Figure 618648DEST_PATH_IMAGE068
The double vertical lines represent feature dimensions for feature stitching, so that skeleton point behavior features are identified.
The beneficial effects are that: the invention discloses a bone point behavior recognition system based on a shift graph convolution neural network, which is characterized in that a behavior recognition module is designed to recognize the bone point behavior, so that the calculated amount of graph convolution can be remarkably reduced, and the shift graph convolution is different from the traditional graph convolution, the sensing range is not expanded by expanding a convolution kernel, but the graph characteristics are subjected to shift splicing by a novel shift operation, so that the same or even higher recognition precision can be achieved under the condition that the calculated amount is remarkably reduced and the calculation speed is improved, and further, the situation that the calculated amount of the traditional graph convolution is increased along with the increase of the convolution kernel, and further, the calculated amount of the traditional graph convolution is larger is caused.
Drawings
FIG. 1 is a diagram of a skeletal point behavior recognition shift map convolution of the present invention.
Fig. 2 is a schematic of a local chart of the present invention.
Fig. 3 is a schematic diagram of a non-local chart of the present invention.
Fig. 4 is a diagram of a traditional graph convolution for identifying skeletal point behaviors.
FIG. 5 is a table comparing the accuracy and computational complexity of a shift map convolution with a conventional map convolution method.
Detailed Description
The reason why this problem occurs (the conventional graph convolution calculation amount is large) is that in the conventional graph convolution method, the convolution kernel modeled by the method can only cover a neighborhood of one point. However, in skeletal point behavior recognition tasks, some behaviors (e.g., clapping hands) require modeling the positional relationship of points that are physically far apart (e.g., two hands). This requires increasing the convolution kernel size of the graph convolution model. However, the calculated amount of graph convolution increases along with the increase of convolution kernels, so that the calculated amount of traditional graph convolution is larger, and the design behavior recognition module performs the behavior recognition on skeleton points, so that the calculated amount of graph convolution can be remarkably reduced.
A shift-graph-based convolution neural network skeletal point behavior recognition system, comprising: an image acquisition module for acquiring a behavior image; the image processing module is used for processing the behavior image acquired by the image acquisition module to perform image processing; the skeleton point extraction module is used for extracting the image processed by the image processing module; the behavior recognition module is used for recognizing and extracting the behavior characteristics of the bone points extracted by the extraction module;
the present invention does not specify a method of bone point extraction. There are many methods for human skeletal point extraction, for example: shooting from a camera, and then acquiring human skeletal points by using an algorithm. Obtained directly from the Kinect camera. The human body wears the acceleration sensor, so that the bone position is directly obtained; the invention concerns how behavior recognition is performed in case bone points have been acquired. However, the invention is not limited to the extraction method of the bone points, and any bone point extraction method is adopted, but in the embodiment, a correction module is provided to perform recognition correction on the image, and meanwhile, the image acquisition device is correspondingly changed to increase the multiple angles of image acquisition.
The image acquisition module is based on an image acquisition device, the image acquisition device comprises a camera which is arranged in an equilateral triangle shape, and a rotating device which is arranged at the tail part of the camera, the rotating device comprises a rotating shaft which is fixedly connected with the camera, and a rotating motor which is sleeved with the rotating shaft.
The image acquisition module is used for shooting human body behaviors through three groups of cameras which are arranged in an equilateral triangle shape, and further, behavior images acquired by the three groups of cameras are respectively displayed on the computer terminal before, after and at the side parts of the installation, so that the image processing module is used for comparing and processing the images.
The image processing module is mainly used for processing the human behavior image acquired by the image acquisition module into a human edge image; through a Krisch edge detection operator, when detecting the edge of an image, using a convolution 3*3 template to traverse pixel points in the image, examining pixel gray values of adjacent areas around each pixel point one by one, and calculating the gray weighted sum difference value of the gray weights of three adjacent pixels and the gray weights of the other five pixels; the convolution templates are as follows:
Figure DEST_PATH_IMAGE002A
1 2 3 4
Figure DEST_PATH_IMAGE004A
5 6 7 8
using eight convolution templates to sequentially process all pixels in an original image, calculating to obtain the edge strength of the pixels, detecting the pixels through a threshold value, extracting the final edge point, and finishing edge detection; the Krisch operator detection image edge implementation steps are as follows: step 1, acquiring a data area pointer of an original image;
step 2, two buffer areas are established, the size of the buffer areas is the same as that of the original image, the buffer areas are mainly used for storing the original image and the original image copy, the two buffer areas are initialized to the original image copy, and the original image copy is marked as an image 1 and an image 2 respectively;
Step 3, setting a Krisch template for convolution operation in each buffer area, traversing pixels in the duplicate images in two areas respectively, carrying out convolution operation one by one, comparing calculated results, storing calculated comparative values in the image 1, and copying the image 1 into the buffer image 2;
step 4, repeating the step 3, setting the rest six templates once, performing calculation processing, and finally obtaining larger gray values in the image 1 and the image 2 and storing the larger gray values in the buffer image 1;
and 5, copying the processed image 1 into original image data, and programming to realize the edge processing of the image.
The extraction module is used for extracting skeleton points of the image processed by the image processing module, when the image processing module finishes processing the image acquired by the image acquisition module, the skeleton points are matched and pre-recorded according to the body type of the agent closest to the acquired image on the human body edge map, and then the matched skeleton points are displayed on the human body edge map.
The extraction module further comprises a correction module, when the image acquisition module acquires human behavior images, the sizes of the frames are different because of different sizes of the human bodies, and when the human bodies with different sizes perform the same group of actions, the three-dimensional coordinates of bone points are different because of the different sizes of the frames, so that the sizes of the frames are required to be normalized to the same size;
Firstly, selecting a skeleton of a person as a reference skeleton, selecting a body center point as a root node for certain frame skeleton data, calculating vectors from all points directly connected with the root node to the root node, respectively using the modular length of the vectors at each vector to obtain a direction vector (the modular length is 1) of each vector, multiplying the length of the corresponding vector in the reference skeleton by the direction vector to obtain a vector, adding the coordinates of the root node to the vector to obtain the corrected coordinates of a point directly connected with the root node, recording the coordinates of the connected points, using the coordinates as the coordinates of the corresponding normalized skeleton point, sequentially updating the coordinate values of the root node according to the sequence of a breadth-first search algorithm, repeating the steps until the values of all skeleton points are corrected, and the algorithm is as follows:
input: the length of the limb in the reference addition is
Figure 206887DEST_PATH_IMAGE006
Preparing normalizationIs a skeleton point coordinate value;
the first step: definition of the definition
Figure 38577DEST_PATH_IMAGE008
The root node coordinates;
and a second step of: will be
Figure 543507DEST_PATH_IMAGE008
Giving an initial value of +.>
Figure 763136DEST_PATH_IMAGE010
Thirdly, performing the following steps; for all things
Figure 619097DEST_PATH_IMAGE012
) Sequentially executing according to breadth-first search strategies;
fourth step: calculation of
Figure 801947DEST_PATH_IMAGE014
-
Figure 426964DEST_PATH_IMAGE016
Fifth step: calculation of
Figure 489598DEST_PATH_IMAGE018
Sixth step:
Figure 691909DEST_PATH_IMAGE008
+
Figure 334243DEST_PATH_IMAGE020
will->
Figure 686202DEST_PATH_IMAGE022
The values of (a) are saved to set a;
Seventh step: returning to the third part, and knowing that all limbs in the skeleton are traversed;
and (3) outputting: the skeleton point coordinates stored in the set A are corrected coordinates;
wherein ,
Figure 857421DEST_PATH_IMAGE024
the value representing->
Figure 547028DEST_PATH_IMAGE026
Limbs, suffering from pain>
Figure 258632DEST_PATH_IMAGE006
Representing the +.sup.th in the reference valuation>
Figure 858241DEST_PATH_IMAGE026
Length of individual limb->
Figure 341306DEST_PATH_IMAGE012
Respectively represent +.>
Figure 127996DEST_PATH_IMAGE026
Coordinate values of the start node and the end node of the limb, so that all +.>
Figure 971187DEST_PATH_IMAGE022
Calculating the values of the bone points to obtain all corrected bone point coordinates, and scaling the estimated size under the condition of ensuring that the included angle between limbs is unchanged;
when the included angle between the limbs changes, the included angle between vectors is selected to describe the bone points so as to avoid the bone point deviation when the included angle between the limbs changes;
the step of solving the human joint vector included angle is as follows:
solving the angle of a certain joint point, firstly obtaining three joint points used for calculating the angle, capturing three-dimensional coordinate values of the joint point by using Kinect, constructing structural vectors among the three joint points of the component, and then solving the size of an included angle of the joint vectors by adopting an inverse cosine theorem;
find the angle of the first joint
Figure 425302DEST_PATH_IMAGE028
As an example;
selecting other two joint points connected with the first joint, acquiring three-dimensional coordinate values of the joint points captured by Kinect, wherein the other two joint points are expressed as
Figure 640DEST_PATH_IMAGE030
Figure 415572DEST_PATH_IMAGE032
The first joint point is expressed as
Figure 937820DEST_PATH_IMAGE034
Constructing an inter-articular structure vector from a first articulation point to
Figure 105496DEST_PATH_IMAGE030
Point vector->
Figure 54998DEST_PATH_IMAGE036
=
Figure 691647DEST_PATH_IMAGE038
First node to->
Figure 814323DEST_PATH_IMAGE032
Point vector->
Figure 977452DEST_PATH_IMAGE040
=
Figure 222488DEST_PATH_IMAGE042
Figure 471067DEST_PATH_IMAGE032
Point to Point->
Figure 207554DEST_PATH_IMAGE030
Vector of (2) is
Figure 225189DEST_PATH_IMAGE044
Calculating vectors
Figure 906706DEST_PATH_IMAGE036
Sum vector->
Figure 439318DEST_PATH_IMAGE040
Included angle->
Figure 841481DEST_PATH_IMAGE028
Size of:
Figure DEST_PATH_IMAGE046AA
wherein ,
Figure 120146DEST_PATH_IMAGE028
in order to make the joint vector included angle representation more accurate, according to the importance ranking of joint angles in the course of action, selecting representative joint angles for representation, and correcting the bone point position by size normalization and angle correction.
The behavior recognition module is mainly used for recognizing and extracting bone point behavior characteristics, shifting and splicing adjacent behavior characteristics according to the adjacent relation of the graph, and obtaining calculated behavior characteristics by only carrying out 1*1 convolution after splicing, wherein the calculated behavior characteristics correspond to one
Figure 723297DEST_PATH_IMAGE048
For each node diagram, the feature dimension is set as +.>
Figure 477627DEST_PATH_IMAGE050
Characteristic size is +.>
Figure 949059DEST_PATH_IMAGE052
Wherein node->
Figure 65920DEST_PATH_IMAGE054
Is +.>
Figure 433447DEST_PATH_IMAGE056
The adjacent nodes are adjacent to each other, and the set of adjacent nodes is +.>
Figure 19280DEST_PATH_IMAGE058
The method comprises the steps of carrying out a first treatment on the surface of the For->
Figure 28825DEST_PATH_IMAGE054
The shift map module equally divides the characteristics of the nodes into +.>
Figure 192DEST_PATH_IMAGE056
+1 parts, the first part retaining its own characteristics, followed by +.>
Figure 866517DEST_PATH_IMAGE056
The shares are shifted from their neighbor node features, expressed mathematically as follows:
Figure 798701DEST_PATH_IMAGE060
=
Figure 484372DEST_PATH_IMAGE062
wherein ,
Figure 185612DEST_PATH_IMAGE064
Figure 285155DEST_PATH_IMAGE060
subscript->
Figure 970214DEST_PATH_IMAGE066
A tag representing Python, < >>
Figure 118299DEST_PATH_IMAGE068
The double vertical lines represent feature dimensions for feature stitching; to intuitively understand the above formula, we take a graph of 7 nodes 20-dimensional features as an example, as shown in fig. 2 and 3; here we discuss two cases:
1. the neighborhood of each point contains only physically contiguous locations, we call the local design, shown in FIG. 2;
2. the location of each point contains the entire human skeletal map, we call the non-local design, shown in FIG. 3;
for both designs we take node 1 (node 1) and node 2 (node 2) as examples, respectively; as will be explained in detail below,
in fig. 2, for node 1, there are 1 contiguous nodes (i.e., node 2), so we divide its features equally into 1+1=2 shares, with the first share maintaining its own features (node 1 labeled as part 1) and the second share shifted from node 2 (node 1 labeled as part 2). In fig. 2, for node 2, there are 3 adjacent nodes (i.e., node 1, node 3 and node 4), so we divide its features equally into 3+1=4 parts, with the first part retaining its own features (node 1 labeled as part 2) and the next 3 parts shifted from nodes 1, 3, 4, respectively (corresponding to node 1 labeled as part 1, 3, 4, respectively).
In fig. 3, for any one node, all other nodes are contiguous with it, so we shift the features of all other nodes from the current node. Examples of nodes 1 and 2 are shown in fig. 3. After shifting, the resulting features appear to be helical, which is a result of a thorough mixing of the features of the different nodes, and experiments show that in both designs of the shifted graph convolution, the non-local design is more accurate in the task of behavior recognition, since it can better fuse the features of the different nodes, enable efficient feature fusion even if the nodes are far apart,
it should be noted that, with the same recognition accuracy, the proposed convolution of the shift map is more than 3 times smaller in calculation cost than the convolution of the conventional map, which is very important for quick recognition, and this method can be faster, on the one hand, because of the number of times of calculation of the convolution that we save (compare fig. 1 and 4); on the other hand, the shift operation may be implemented in the C++ or CUDA language by a pointer, and thus may be deployed very efficiently on the CPU or GPU.
Our main experiments are shown in figure 5. ST-GCN, adaptive-GCN and Adaptive-NL GCN are three typical methods of conventional GCN. Our Shift GCN includes both Local Shift GCN and Non-Local Shift GCN designs. As can be seen from the table, the FLOPs (floating point number of computations, representing computational complexity) of our method is more than 3 times smaller than the conventional graph convolution, which is very important for fast recognition. And, our precision is also higher than the traditional graph rolling method.
In addition, we also compare the case of adjacency matrix reducing the convolution of the traditional graph, i.e. the model of suffix "one a", their calculation is comparable to that of us, but the accuracy is significantly reduced. This means that the accuracy is significantly reduced when the computation amount of the conventional graph convolution is reduced. Our Shift GCN can reach the accuracy exceeding all the previous algorithms with small calculation amount.
Description of working principle: firstly, controlling the camera to rotate through the image acquisition module, and further acquiring the human behavior characteristic image; the rotation motor rotates to drive the rotation shaft to rotate, so that the rotation shaft drives the camera to rotate, and the position of the camera is adjusted; the image acquisition module is used for shooting human body behaviors through three groups of cameras which are arranged in an equilateral triangle shape, and further, the behavior images acquired by the three groups of cameras are respectively displayed on the computer terminal before, after and at the side parts of the installation, so that the image processing module is used for comparing and processing the images; the image processing module is mainly used for processing the human behavior image acquired by the image acquisition module into a human edge image; through a Krisch edge detection operator, when detecting the edge of an image, using a convolution 3*3 template to traverse pixel points in the image, examining pixel gray values of adjacent areas around each pixel point one by one, and calculating the gray weighted sum difference value of the gray weights of three adjacent pixels and the gray weights of the other five pixels; using eight convolution templates to sequentially process all pixels in an original image, calculating to obtain the edge strength of the pixels, detecting the pixels through a threshold value, extracting the final edge point, and finishing edge detection; the Krisch operator detection image edge implementation steps are as follows:
Step 1, acquiring a data area pointer of an original image;
step 2, two buffer areas are established, the size of the buffer areas is the same as that of the original image, the buffer areas are mainly used for storing the original image and the original image copy, the two buffer areas are initialized to the original image copy, and the original image copy is marked as an image 1 and an image 2 respectively;
step 3, setting a Krisch template for convolution operation in each buffer area, traversing pixels in the duplicate images in two areas respectively, carrying out convolution operation one by one, comparing calculated results, storing calculated comparative values in the image 1, and copying the image 1 into the buffer image 2;
step 4, repeating the step 3, setting the rest six templates once, performing calculation processing, and finally obtaining larger gray values in the image 1 and the image 2 and storing the larger gray values in the buffer image 1;
step 5, copying the processed image 1 into original image data, and programming to realize the edge processing of the image;
when the human body behavior feature image processing is finished, the extraction module is used for extracting skeleton points of the image processed by the image processing module, and when the image processing module is finished processing the image acquired by the image acquisition module, the skeleton points which are matched and pre-recorded according to the body shape of the nearest acquired image agent are positioned on the human body edge map, and then the matched skeleton points are displayed on the human body edge map; when the skeleton points are extracted, the position of the skeleton points is corrected by the correction module, and when the image acquisition module acquires the human body behavior image, the skeleton sizes are normalized to be the same size because the three-dimensional coordinates of the skeleton points are different due to the fact that the skeleton sizes of the people are different when the people with different body types perform the same group of actions; firstly, selecting a skeleton of a person as a reference skeleton, selecting a body center point as a root node for certain frame skeleton data, calculating vectors from all points directly connected with the root node to the root node, respectively taking the modular length of the vectors at each vector to obtain a direction vector (the modular length is 1) of each vector, multiplying the length of the corresponding vector in the reference skeleton by the direction vector to obtain a vector, adding the coordinates of the root node to the vector to obtain the corrected coordinates of a point directly connected with the root node, recording the coordinates of the connected points as the coordinate values of the corresponding bone point after normalization, sequentially updating the coordinate values of the root node according to the sequence of a breadth-first search algorithm, and repeating the steps until the values of all the bone points are corrected; the correction method is to scale the estimated size under the condition of ensuring that the included angle between limbs is unchanged; when the included angle between the limbs changes, the included angle between vectors is selected to describe the bone points so as to avoid the bone point deviation when the included angle between the limbs changes; the step of solving the human joint vector included angle is as follows: solving the angle of a certain joint point, firstly obtaining three joint points used for calculating the angle, capturing three-dimensional coordinate values of the joint point by using Kinect, constructing structural vectors among the three joint points of the component, and then solving the size of an included angle of the joint vectors by adopting an inverse cosine theorem; in order to enable the joint vector included angle representation to be more accurate, according to the importance ranking of joint angles in the behavior process, representative joint angles are selected for representation, and then the bone point positions are corrected through size normalization and angle correction; after the correction of the bone points is completed, the behavior recognition module is used for recognizing the behaviors of the bone points, adjacent behavior features are shifted and spliced according to the adjacent relation of the graph, and the calculated behavior features can be obtained by only performing 1*1 convolution once after splicing.
The preferred embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the present invention is not limited to the specific details of the above embodiments, and various equivalent changes can be made to the technical solutions of the present invention within the scope of the technical concept of the present invention, and these equivalent changes all fall within the scope of the present invention.

Claims (5)

1. A shift graph convolution neural network-based skeletal point behavior recognition system, comprising:
the behavior recognition module is used for recognizing and extracting the behavior characteristics of the bone points extracted by the extraction module;
the behavior recognition module is mainly used for recognizing and extracting bone point behavior characteristics, shifting and splicing the adjacent behavior characteristics according to the adjacent relation of the graph, and obtaining calculated behavior characteristics by only carrying out 1*1 convolution after splicing, wherein for N node graphs, the characteristic dimension is set as C, and the characteristic size is set as [ N, C ]]Wherein n nodes of the node v are adjacent to each other, and the set of adjacent nodes is that
Figure FDA0004164335420000011
For the v-th node, the shift map module uniformly divides its features into n+1 shares, the first share maintains its own features, and the following n shares are shifted from their neighbor node features, expressed mathematically as follows:
Figure FDA0004164335420000012
wherein ,
Figure FDA0004164335420000013
the subscript V in (a) represents a mark of Python, and the II … II double vertical lines represent feature dimensions for feature stitching;
The system also comprises an image acquisition module for acquiring the behavior image;
the image acquisition module is based on an image acquisition device, the image acquisition device comprises a camera which is arranged in an equilateral triangle shape, and a rotating device which is arranged at the tail part of the camera, the rotating device comprises a rotating shaft which is fixedly connected with the camera, and a rotating motor which is sleeved with the rotating shaft;
the system also comprises an image processing module for processing the behavior image acquired by the image acquisition module to perform image processing;
the image processing module is mainly used for processing the human behavior image acquired by the image acquisition module into a human edge image; traversing pixel points in an image by using a 3*3 convolution template when detecting the edge of the image through a Krisch edge detection operator, examining pixel gray values of adjacent areas around each pixel point one by one, and calculating the gray weighted sum difference value of the gray weights of three adjacent pixels and the gray weighted sum of the other five pixels;
the bone point extraction module is used for extracting the image processed by the image processing module; the extraction module is used for extracting skeleton points of the image processed by the image processing module, when the image processing module finishes processing the image acquired by the image acquisition module, the skeleton points are matched and pre-recorded according to the body type of the agent closest to the acquired image on the human body edge map, and then the matched skeleton points are displayed on the human body edge map.
2. The shift-map-based convolution neural network skeletal point behavior recognition system of claim 1, wherein: the image acquisition module is used for shooting human body behaviors through three groups of cameras which are arranged in an equilateral triangle shape, and further, behavior images acquired by the three groups of cameras are respectively displayed on the computer terminal before, after and at the side parts of the installation, so that the image processing module is used for comparing and processing the images.
3. The shift-map-based convolution neural network skeletal point behavior recognition system of claim 1, wherein: the convolution templates are as follows:
Figure FDA0004164335420000021
Figure FDA0004164335420000022
using eight convolution templates to sequentially process all pixels in an original image, calculating to obtain edge strength of the pixels, detecting the pixels through a threshold value, extracting a final edge point, and finishing edge detection;
the Krisch operator detection image edge implementation steps are as follows:
step 1, acquiring a data area pointer of an original image;
step 2, two buffer areas are established, the size of the buffer areas is the same as that of the original image, the buffer areas are mainly used for storing the original image and the original image copy, the two buffer areas are initialized to the original image copy, and the original image copy is marked as an image 1 and an image 2 respectively;
step 3, setting a Krisch template for convolution operation in each buffer area, traversing pixels in the duplicate images in two areas respectively, carrying out convolution operation one by one, comparing calculated results, storing calculated comparative values in the image 1, and copying the image 1 into the buffer image 2;
Step 4, repeating the step 3, setting the rest six templates once, performing calculation processing, and finally obtaining larger gray values in the image 1 and the image 2 and storing the larger gray values in the buffer image 1;
and 5, copying the processed image 1 into original image data, and programming to realize the edge processing of the image.
4. The shift-map-based convolution neural network skeletal point behavior recognition system of claim 1, wherein: the extraction module further comprises a correction module, wherein the correction module firstly selects a human skeleton as a reference skeleton, selects a body center point as a root node for certain frame skeleton data, calculates vectors from all points directly connected with the root node to the root node, uses the modular length of the vectors to obtain the direction vector of each vector by each vector, the modular length is 1, multiplies the length of the corresponding vector in the reference skeleton by the direction vector to obtain a vector, adds the vector to the coordinates of the root node to obtain the corrected coordinates of a point directly connected with the root node, records the coordinates of the connected point as the coordinates of the corresponding bone point after normalization, sequentially updates the coordinate values of the root node according to the sequence of breadth-first search algorithm, and repeats the steps until the values of all the bone points are corrected, and the algorithm is as follows:
Input: the length of the limb in the reference appendage is R i Preparing normalized bone point coordinate values;
the first step: definition of the definition
Figure FDA0004164335420000031
The root node coordinates;
and a second step of: will be
Figure FDA0004164335420000032
Giving an initial value of +.>
Figure FDA0004164335420000033
Thirdly, performing the following steps; for all of
Figure FDA0004164335420000034
Sequentially executing according to breadth-first search strategies;
fourth step: calculation of
Figure FDA0004164335420000035
Fifth step: calculation of
Figure FDA0004164335420000036
Sixth step:
Figure FDA0004164335420000037
will->
Figure FDA0004164335420000038
The values of (a) are saved to set a;
seventh step: returning to the third part, and knowing that all limbs in the skeleton are traversed;
and (3) outputting: the skeleton point coordinates stored in the set A are corrected coordinates;
wherein ,Ii The value of (2) represents the ith limb, R i Representing the length of the ith limb in the reference assessment,
Figure FDA0004164335420000039
coordinate values respectively representing the start node and the end node of the ith limb in the reference valuation, so that all +.>
Figure FDA00041643354200000310
Calculating the values of the bone points to obtain all corrected bone point coordinates, and scaling the estimated size under the condition of ensuring that the included angle between limbs is unchanged;
when the included angle between the limbs changes, the included angle between vectors is selected to describe the bone points so as to avoid the bone point deviation when the included angle between the limbs changes;
the step of solving the human joint vector included angle is as follows:
Solving the angle of a certain joint point, firstly obtaining three joint points used for calculating the angle, capturing three-dimensional coordinate values of the joint point by using Kinect, constructing structural vectors among the three joint points of the component, and then solving the size of an included angle of the joint vectors by adopting an inverse cosine theorem;
find the angle theta of the first joint 1 As an example;
selecting other two joint points connected with a first joint, and acquiring three-dimensional coordinate values of the joint points captured by Kinect, wherein the other two joint points are represented as S (sx, sy, sz), W (wx, wy, wz), and the first joint point is represented as E (ex, ey, ez);
constructing an inter-articular structure vector, a first articular point to S (sx, sy, sz) point vector
Figure FDA00041643354200000311
Figure FDA00041643354200000312
First node to W (wx, wy, wz) point vector +.>
Figure FDA00041643354200000313
Figure FDA0004164335420000041
The vector from point W (wx, wy, wz) to point S (sx, sy, sz) is +.>
Figure FDA0004164335420000042
Figure FDA0004164335420000043
Calculating vectors
Figure FDA0004164335420000044
Sum vector->
Figure FDA0004164335420000045
Included angle theta of (2) 1 Size of:
Figure FDA0004164335420000046
wherein ,θ1 In order to make the joint vector included angle representation more accurate, according to the importance ranking of joint angles in the course of action, selecting representative joint angles for representation, and correcting the bone point position by size normalization and angle correction.
5. The method for identifying the skeletal point behavior based on the shift map convolution neural network according to claim 1, which is characterized by comprising the following steps:
Step 1, firstly, controlling a camera to rotate through an image acquisition module, and further acquiring a human behavior characteristic image; the rotation motor rotates to drive the rotation shaft to rotate, so that the rotation shaft drives the camera to rotate, and the position of the camera is adjusted;
step 2, the image acquisition module performs image shooting human body behaviors through three groups of cameras which are arranged in an equilateral triangle shape, and further, the behavior images acquired by the three groups of cameras are respectively displayed on a computer terminal before, after and at the side parts of the installation, so that the image processing module can perform contrast processing on the images;
step 3, the image processing module is mainly used for processing the human behavior image acquired by the image acquisition module into a human edge image; traversing pixel points in an image by using a 3*3 convolution template when detecting the edge of the image through a Krisch edge detection operator, examining pixel gray values of adjacent areas around each pixel point one by one, and calculating the gray weighted sum difference value of the gray weights of three adjacent pixels and the gray weighted sum of the other five pixels;
using eight convolution templates to sequentially process all pixels in an original image, calculating to obtain edge strength of the pixels, detecting the pixels through a threshold value, extracting a final edge point, and finishing edge detection;
The Krisch operator detection image edge implementation steps are as follows:
step (1), acquiring a data area pointer of an original image;
step (2), two buffer areas are established, the size of the buffer areas is the same as that of the original image, the buffer areas are mainly used for storing the original image and the original image copy, the two buffer areas are initialized to the original image copy, and the original image copy is marked as an image 1 and an image 2 respectively;
step (3), setting a Krisch template for convolution operation in each buffer area, traversing pixels in the duplicate images in two areas respectively, carrying out convolution operation one by one, comparing calculated results, storing calculated comparative values in the image 1, and copying the image 1 into the buffer image 2;
step (4), repeating the step (3), setting the rest six templates once, performing calculation processing, and finally obtaining larger gray values in the image 1 and the image 2 and storing the larger gray values in the buffer image 1;
copying the processed image 1 into original image data, and programming to realize the edge processing of the image;
step 4, when the human body behavior feature image processing is finished, the extraction module is used for extracting skeleton points of the image processed by the image processing module, and when the image processing module is finished processing the image acquired by the image acquisition module, the skeleton points which are matched and input in advance according to the body shape of the nearest acquired image agent are positioned on the human body edge map, and then the matched skeleton points are displayed on the human body edge map;
Step 5, when the skeleton point extraction is finished, correcting the skeleton point position by a correction module, when the image acquisition module acquires a human body behavior image, firstly selecting a skeleton of a human body as a reference skeleton, selecting a body center point as a root node for certain frame skeleton data, calculating vectors from all points directly connected with the root node to the root node, respectively using the modular length of the vectors at each vector to obtain a direction vector of each vector, wherein the modular length is 1, multiplying the length of the corresponding vector in the reference skeleton by the direction vector to obtain a vector, adding the vector to the coordinates of the root node to obtain the corrected coordinates of a point directly connected with the root node, recording the coordinates of the connected point as the coordinates of the corresponding skeleton point after normalization, sequentially updating the coordinate values of the root node according to the sequence of breadth-first search algorithm, and repeating the steps until the values of all the skeleton points are corrected; the correction method is to scale the estimated size under the condition of ensuring that the included angle between limbs is unchanged;
when the included angle between the limbs changes, the included angle between vectors is selected to describe the bone points so as to avoid the bone point deviation when the included angle between the limbs changes;
The step of solving the human joint vector included angle is as follows:
solving the angle of a certain joint point, firstly obtaining three joint points used for calculating the angle, capturing three-dimensional coordinate values of the joint point by using Kinect, constructing structural vectors among the three joint points of the component, and then solving the size of an included angle of the joint vectors by adopting an inverse cosine theorem;
find the angle theta of the first joint 1 As an example;
selecting other two joint points connected with a first joint, and acquiring three-dimensional coordinate values of the joint points captured by Kinect, wherein the other two joint points are represented as S (sx, sy, sz), W (wx, wy, wz), and the first joint point is represented as E (ex, ey, ez);
constructing an inter-articular structure vector, a first articular point to S (sx, sy, sz) point vector
Figure FDA0004164335420000061
Figure FDA0004164335420000062
First node to W (wx, wy, wz) point vector +.>
Figure FDA0004164335420000063
Figure FDA0004164335420000064
The vector from point W (wx, wy, wz) to point S (sx, sy, sz) is +.>
Figure FDA0004164335420000065
Figure FDA0004164335420000066
Calculating vectors
Figure FDA0004164335420000067
Sum vector->
Figure FDA0004164335420000068
Included angle theta of (2) 1 Size of:
Figure FDA0004164335420000069
wherein ,θ1 In order to make the joint vector included angle representation more accurate, selecting a representative joint angle to represent according to the importance ranking of the joint angles in the course of behavior, and correcting the bone point position through size normalization and angle correction;
Step 6, when the bone points are correctedAfter completion, the behavior recognition module performs behavior recognition on bone points, adjacent behavior features are shifted and spliced according to the adjacent relation of the graph, the calculated behavior features can be obtained by performing 1*1 convolution only once after splicing, and for N node graphs, the feature dimension is set as C, and the feature size is set as [ N, C ]]Wherein n nodes of the node v are adjacent to each other, and the set of adjacent nodes is that
Figure FDA00041643354200000610
For the v-th node, the shift map module uniformly divides its features into n+1 shares, the first share maintains its own features, and the following n shares are shifted from their neighbor node features, expressed mathematically as follows:
Figure FDA00041643354200000611
wherein ,
Figure FDA00041643354200000612
the subscript V in (a) represents a mark of Python, and the double vertical lines represent feature dimensions for feature stitching, so that skeleton point behavior features are identified. />
CN202010419839.4A 2020-05-18 2020-05-18 Bone point behavior recognition system based on shift map convolution neural network and recognition method thereof Active CN111582220B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010419839.4A CN111582220B (en) 2020-05-18 2020-05-18 Bone point behavior recognition system based on shift map convolution neural network and recognition method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010419839.4A CN111582220B (en) 2020-05-18 2020-05-18 Bone point behavior recognition system based on shift map convolution neural network and recognition method thereof

Publications (2)

Publication Number Publication Date
CN111582220A CN111582220A (en) 2020-08-25
CN111582220B true CN111582220B (en) 2023-05-26

Family

ID=72123047

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010419839.4A Active CN111582220B (en) 2020-05-18 2020-05-18 Bone point behavior recognition system based on shift map convolution neural network and recognition method thereof

Country Status (1)

Country Link
CN (1) CN111582220B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112009717B (en) * 2020-08-31 2022-08-02 南京迪沃航空技术有限公司 Airport bulk cargo loader, machine leaning anti-collision system for bulk cargo loader and anti-collision method of machine leaning anti-collision system
CN113158782B (en) * 2021-03-10 2024-03-26 浙江工业大学 Multi-person concurrent interaction behavior understanding method based on single-frame image
CN113627409B (en) * 2021-10-13 2022-03-15 南通力人健身器材有限公司 Body-building action recognition monitoring method and system
CN114187653A (en) * 2021-11-16 2022-03-15 复旦大学 Behavior identification method based on multi-stream fusion graph convolution network
CN114463840B (en) * 2021-12-31 2024-08-02 北京工业大学 Skeleton-based shift chart convolution network human body behavior recognition method
JP7485154B1 (en) 2023-05-19 2024-05-16 トヨタ自動車株式会社 Video Processing System

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017133009A1 (en) * 2016-02-04 2017-08-10 广州新节奏智能科技有限公司 Method for positioning human joint using depth image of convolutional neural network
CN109522793A (en) * 2018-10-10 2019-03-26 华南理工大学 More people's unusual checkings and recognition methods based on machine vision
CN111340011A (en) * 2020-05-18 2020-06-26 中国科学院自动化研究所南京人工智能芯片创新研究院 Self-adaptive time sequence shift neural network time sequence behavior identification method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017133009A1 (en) * 2016-02-04 2017-08-10 广州新节奏智能科技有限公司 Method for positioning human joint using depth image of convolutional neural network
CN109522793A (en) * 2018-10-10 2019-03-26 华南理工大学 More people's unusual checkings and recognition methods based on machine vision
CN111340011A (en) * 2020-05-18 2020-06-26 中国科学院自动化研究所南京人工智能芯片创新研究院 Self-adaptive time sequence shift neural network time sequence behavior identification method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度学习框架的多模态动作识别;韩敏捷;;计算机与现代化(第07期);全文 *

Also Published As

Publication number Publication date
CN111582220A (en) 2020-08-25

Similar Documents

Publication Publication Date Title
CN111582220B (en) Bone point behavior recognition system based on shift map convolution neural network and recognition method thereof
WO2021253788A1 (en) Three-dimensional human body model construction method and apparatus
CN110363817B (en) Target pose estimation method, electronic device, and medium
US11080833B2 (en) Image manipulation using deep learning techniques in a patch matching operation
CN109919971B (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN111696196B (en) Three-dimensional face model reconstruction method and device
CN107953329A (en) Object identification and Attitude estimation method, apparatus and mechanical arm grasping system
CN108428224B (en) Animal body surface temperature detection method and device based on convolutional neural network
CN108010082B (en) Geometric matching method
CN112819875B (en) Monocular depth estimation method and device and electronic equipment
JP6951913B2 (en) Classification model generator, image data classification device and their programs
CN113744142B (en) Image restoration method, electronic device and storage medium
KR101593316B1 (en) Method and apparatus for recontructing 3-dimension model using stereo camera
CN111582204A (en) Attitude detection method and apparatus, computer device and storage medium
CN109784353B (en) Method, device and storage medium for processor implementation
CN116843834A (en) Three-dimensional face reconstruction and six-degree-of-freedom pose estimation method, device and equipment
CN114757984A (en) Scene depth estimation method and device of light field camera
CN117934308A (en) Lightweight self-supervision monocular depth estimation method based on graph convolution network
CN111339969B (en) Human body posture estimation method, device, equipment and storage medium
CN115630660B (en) Barcode positioning method and device based on convolutional neural network
CN107240149A (en) Object dimensional model building method based on image procossing
CN111461141B (en) Equipment pose calculating method and device
CN114723973A (en) Image feature matching method and device for large-scale change robustness
JP7489247B2 (en) PROGRAM, INFORMATION PROCESSING METHOD, INFORMATION PROCESSING APPARATUS AND MODEL GENERATION METHOD
CN111416938B (en) Augmented reality close-shooting method and device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant