CN114299614A - Behavior identification method based on skeleton joint combination geometric features - Google Patents
Behavior identification method based on skeleton joint combination geometric features Download PDFInfo
- Publication number
- CN114299614A CN114299614A CN202111623361.8A CN202111623361A CN114299614A CN 114299614 A CN114299614 A CN 114299614A CN 202111623361 A CN202111623361 A CN 202111623361A CN 114299614 A CN114299614 A CN 114299614A
- Authority
- CN
- China
- Prior art keywords
- picture
- joint
- value
- relative
- rgb
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Image Analysis (AREA)
Abstract
The application is suitable for the technical field of video data processing, provides a behavior recognition method based on the combined geometric features of the skeleton joints, and aims to improve the accuracy and efficiency of behavior recognition based on the combined geometric features of the skeleton joints. The method mainly comprises the following steps: dividing a skeleton into X joint combinations, wherein X is a positive integer greater than 1; describing the spatial position characteristics of each joint combination; describing the relative geometry of each joint combination; respectively combining the spatial position characteristic and the relative geometric characteristic of each joint combination to describe the spatial position characteristic and the relative geometric characteristic of each joint combination as the geometric characteristic of each joint combination; respectively inputting the geometric features of each joint combination into a trained deep convolution network for feature extraction and classification, and respectively outputting the possible probabilities of a plurality of behaviors; and comprehensively analyzing the possible probabilities of the behaviors according to a preset fusion function, and outputting the behavior with the highest possible probability to obtain the target behavior.
Description
Technical Field
The application belongs to the technical field of video data processing, and particularly relates to a behavior identification method based on a skeleton joint combination geometric characteristic.
Background
In the prior art, a behavior recognition technology based on a skeleton sequence is mainly expressed as a behavior recognition technology based on all joint space characteristics. At present, all joints are arranged into a whole according to joint serial numbers and are used as individual behavior characteristic description to realize behavior classification. A skeleton sequence derived from video data, each frame of video data comprising joints of a plurality of parts of a head, hands, legs and spine, each joint numbered with a number starting at 1. According to the current behavior identification method based on the skeleton sequence, all joints are sequenced according to joint numbers, the skeleton sequence data are converted into color RGB pictures to serve as behavior feature description, and feature extraction is carried out by utilizing a deep network, so that behavior classification and identification are achieved.
However, in the prior art of behavior recognition based on skeleton sequences, the applicant finds that the method for directly arranging all joints and describing behavior features has the problems of low recognition efficiency and insufficient accuracy. For example, the motion of a hand clapping, only the joints of the forelimb are changed, and the spatial positions of the joints of other joints can be unchanged, so that the motion contributes little to the feature description.
Disclosure of Invention
The application aims to provide a behavior recognition method based on the geometric features of the skeleton joint combination, and aims to improve the accuracy and efficiency of behavior recognition based on the geometric features of the skeleton joint combination.
The application provides a behavior identification method based on a skeleton joint combination geometric feature, which comprises the following steps:
dividing a skeleton into X joint combinations, wherein X is a positive integer greater than 1;
characterizing the spatial position of each of the joint combinations;
describing relative geometric features of each of the joint combinations;
describing the spatial position feature and the relative geometric feature of each joint combination as the geometric feature of each joint combination respectively;
inputting the geometric features of each joint combination into a trained deep convolution network for feature extraction and classification, and outputting possible probabilities of a plurality of behaviors respectively;
and comprehensively analyzing the possible probabilities of the behaviors according to a preset fusion function, and outputting the behavior with the highest possible probability to obtain a target behavior.
Optionally, X is 3; the skeleton is a human body skeleton; the dividing of the skeleton into X joint combinations comprises:
removing invalid joints in the human skeleton according to a preset rule to obtain residual joints;
dividing the remaining joints into 3 joint combinations P1, P2, P3, wherein the P1 comprises joints of the left arm and the right arm of the remaining joints, the P2 comprises joints of the left arm and the right leg of the remaining joints, and P3 comprises joints of the right arm and the left leg of the remaining joints.
Optionally, each joint combination includes a joint serial number set of joints, a spatial connection relationship of the joints, and a spatial connection order of the joints;
the P1, the P2, the P3 are respectively described as:
wherein the H represents a set of joint sequence numbers of the left arm and the right arm;
the LG represents a set of joint numbers for the left arm and the right leg;
the RG represents a set of joint numbers of the right arm and the left leg;
said JmA joint with a sequence number m;
said JnIndicates a joint with the serial number n;
said JkThe joint with the number k is shown.
Optionally, the describing the spatial position feature of each joint combination includes:
spatial location features describing the P1, P2, and P3, respectively, are:
the above-mentionedThe three-dimensional coordinate of the joint with the serial number i in the P1 at the time T is represented, and T represents the time length;
the above-mentionedThree-dimensional coordinates representing the joint with the sequence number j in the P2 at the time point t;
the above-mentionedDenotes the sequence number k in P3Three-dimensional coordinates of the joint at time t.
Optionally, after the spatial location features of the P1, the P2, and the P3 are respectively described, the method specifically further includes:
the spatial location of the P1Performing normalization processing to a first X value between 0 and 255;
representing the first X numerical value as a 224X 112 picture by a linear interpolation method;
representing the first Y value as a 224 x 112 picture by a linear interpolation method;
representing the first Z value as a 224 x 112 picture by a linear interpolation method;
respectively regarding a 224 × 112 picture represented by the first X value, a 224 × 112 picture represented by the first Y value, and a 224 × 112 picture represented by the first Z value as Red, Green, and Blue channels of a first RGB picture to obtain a first RGB picture;
describing spatial location features of the P1 using the first RGB picture;
representing the second X numerical value as a 224X 112 picture by a linear interpolation method;
representing the second Y value as a 224 x 112 picture by a linear interpolation method;
representing the second Z value as a 224 x 112 picture by a linear interpolation method;
respectively regarding a 224 × 112 picture represented by the second X value, a 224 × 112 picture represented by the second Y value, and a 224 × 112 picture represented by the second Z value as Red, Green, and Blue channels of a second RGB picture to obtain a second RGB picture;
describing spatial location features of the P2 using the second RGB picture;
representing the third X numerical value as a 224X 112 picture by a linear interpolation method;
representing the third Y value as a 224 x 112 picture by a linear interpolation method;
for the P3In the spatial position characteristics ofNormalizing to a third Z value between 0 and 255;
representing the third Z value as a 224 x 112 picture by a linear interpolation method;
respectively regarding a 224 × 112 picture represented by the third X value, a 224 × 112 picture represented by the third Y value, and a 224 × 112 picture represented by the third Z value as Red, Green, and Blue channels of a third RGB picture, so as to obtain a third RGB picture;
the spatial position feature of the P3 is described using the third RGB picture.
Optionally, the describing the relative geometry of each of the joint combinations comprises:
the relative geometric features of the P1 are described using the relative positional features of the P1:
wherein saidIndicates the arm joint JiThree-dimensional coordinates at time t, the arm joints comprising the left arm joint and the right arm joint;
calculating a relative distance d21 formed between the left arm joint, a relative distance d22 between the left arm joint and the right leg joint, a relative distance d23 between the left arm joint and right leg joint relative to the preset origin in the P2;
regarding the relative distance d21, the relative distance d22, the relative distance d23 as relative geometric features of the P2;
calculating a relative distance d31 formed between the right arm joint, a relative distance d32 between the right arm joint and the left leg joint, a relative distance d33 between the right arm joint and left leg joint relative to the preset origin in the P3;
the relative distance d31, the relative distance d32, the relative distance d33 are considered to be relative geometric features of the P3.
Optionally, after describing the relative geometric features of the P1, the P2 and the P3 respectively, the method further includes:
as to the relative geometry of the P1Performing normalization to a first relative value of X between 0 and 255;
representing the first X relative value as a 224X 112 picture by a linear interpolation method;
representing the first Y relative value as a 224 x 112 picture by a linear interpolation method;
representing the first Z relative value as a 224 x 112 picture by a linear interpolation method;
respectively regarding a 224 × 112 picture represented by the first X relative value, a 224 × 112 picture represented by the first Y relative value, and a 224 × 112 picture represented by the first Z relative value as Red, Green, and Blue channels of a fourth RGB picture, so as to obtain a fourth RGB picture;
describing relative geometric features of the P1 using the fourth RGB picture;
normalizing the relative distance d21 in the relative geometry of the P2 to a first d21 value between 0 and 255;
representing the first d21 numerical value as a 224 x 112 picture by a linear interpolation method;
normalizing the relative distance d22 in the relative geometry of the P2 to a first d22 value between 0 and 255;
representing the first d22 numerical value as a 224 x 112 picture by a linear interpolation method;
normalizing the relative distance d23 in the relative geometry of the P2 to a first d23 value between 0 and 255;
representing the first d23 numerical value as a 224 x 112 picture by a linear interpolation method;
regarding a 224 × 112 picture represented by the first d21 value, a 224 × 112 picture represented by the first d22 value, and a 224 × 112 picture represented by the first d23 value as Red, Green, and Blue channels of a fifth RGB picture, respectively, obtaining a fifth RGB picture;
describing relative geometric features of the P2 using the fifth RGB picture;
normalizing the relative distance d31 in the relative geometry of the P3 to a first d31 value between 0 and 255;
representing the first d31 numerical value as a 224 x 112 picture by a linear interpolation method;
normalizing the relative distance d32 in the relative geometry of the P3 to a first d32 value between 0 and 255;
representing the first d32 numerical value as a 224 x 112 picture by a linear interpolation method;
normalizing the relative distance d33 in the relative geometry of the P3 to a first d33 value between 0 and 255;
representing the first d33 numerical value as a 224 x 112 picture by a linear interpolation method;
regarding a 224 × 112 picture represented by the first d31 value, a 224 × 112 picture represented by the first d32 value, and a 224 × 112 picture represented by the first d33 value as Red, Green, and Blue channels of a sixth RGB picture, respectively, to obtain the sixth RGB picture;
the relative geometry of the P3 is described using the sixth RGB picture.
Optionally, describing the spatial position feature and the relative geometric feature of each joint combination as geometric features in combination comprises:
combining the first RGB picture and the fourth RGB picture to form a first geometric feature RGB picture corresponding to the 224 x 224 length vector of the P1;
combining the second RGB picture with the fifth RGB picture to form a second geometric feature RGB picture corresponding to the 224 × 224 length vector of P2;
combining the third RGB picture with the sixth RGB picture to form a third geometric feature RGB picture corresponding to the 224 x 224 length vector of P3.
Optionally, the respectively inputting the geometric features of each joint combination into a trained deep convolutional network for feature extraction and classification, and respectively outputting possible probabilities of a plurality of behaviors, includes:
and respectively inputting the first geometric feature RGB picture, the second geometric feature RGB picture and the third geometric feature RGB picture into a trained convolutional neural network Resnet-50 for feature extraction, and respectively outputting possible probabilities of a plurality of behaviors.
Optionally, the possible probabilities of the several actions include: posterior probability b of behavior extracted from the first geometric feature RGB picture1Posterior probability b of behavior extracted from the second geometry RGB picture2Posterior probability b of behavior extracted from the third geometry RGB picture3;
The comprehensively analyzing the possible probabilities of the behaviors according to a preset fusion function, and outputting the behavior with the highest possible probability to obtain a target behavior, including:
P(L|S)=η1b1+η2b2+η3b3
eta of1Eta of2Eta of3Is a preset weight;
the posterior probability vector P (L | S) represents the probability that the skeleton belongs to the action label class L;
label=Find(max(P(L|S)))
the label represents the target behavior.
According to the technical scheme, the embodiment of the application has the following advantages:
therefore, in the behavior identification method based on the geometric features of the skeleton joint combination, the skeleton is divided into X joint combinations, all joints of the skeleton are not recognized as a whole as in the prior art, the division into the X joint combinations is more beneficial to improving the classification precision of behaviors, the geometric feature basis of each joint combination in the recognition and classification process is determined, the analysis data volume is reduced, the trained deep convolutional network is used for carrying out feature extraction and classification to obtain the possible probabilities of a plurality of behaviors, then the possible probabilities of the plurality of behaviors are comprehensively analyzed according to the preset fusion function, the behavior with the highest possible probability is output, and the target behavior is obtained. Because the data volume of the joint combination is smaller than that of all joints of the whole skeleton, the occupied computing resources are less, and the behavior recognition efficiency based on the geometrical characteristics of the skeleton joint combination is improved; and because the spatial position characteristics and the relative geometric characteristics of each joint combination are described, the behavior identification accuracy based on the geometric characteristics of the skeleton joint combination is improved.
Drawings
FIG. 1 is a schematic flow chart illustrating an embodiment of a behavior recognition method based on a skeleton joint combination geometric feature according to the present application;
FIG. 2 is a schematic view of one embodiment of the present application illustrating the division of joints of a human skeleton into different joint combinations;
FIG. 3 is a schematic view of one embodiment of the relative geometry of the joint assembly of the present application;
FIG. 4 is a schematic diagram of an embodiment of feature extraction and classification of 3 joint combinations P1, P2 and P3 by the deep convolutional network of the present application;
FIG. 5 is a schematic structural diagram of an embodiment of the behavior recognition device based on the geometric features of the human skeleton joint combination according to the present application;
FIG. 6 is a schematic structural diagram of an embodiment of a computer device according to the present application.
Detailed Description
In order to clearly identify various behaviors expressed by various joint states of the skeleton, the applicant analyzes a behavior identification method based on a human body skeleton in the prior art, and finds that the method for directly arranging all joints of the skeleton according to joint numbers and describing behavior characteristics at present has at least the following problems and defects:
1. the joints adjacent to the serial number have no spatial correlation in the human skeleton. For example: the joint representing the head is adjacent to the joint number representing the hand, however, the two joints are not adjacent in spatial position, which is not beneficial to the identification of the behavior characteristics, so that the applicant thinks that the joints with spatial correlation should be combined according to the spatial correlation of the joints in the skeleton.
2. The prior art describes the characteristics of all joints, and the method cannot pay attention to local joints which really have behavior changes, such as the motion of 'clapping hands', only the joints of the hands change, the spatial positions of the joints at other legs, the head and the like do not change, the contribution to the characteristic description is small, and the characteristics representing the behavior should be gathered at partial key joints of a skeleton when the behavior characteristics are represented by some joints of a main skeleton.
3. The prior art methods do not take into account the relative spatial relationship between the symmetric joints. Most of the motions of the human body are performed by the hand or leg joints, such as "running", "clapping", "kicking", "javeling", etc., and the relative spatial relationship between the left hand or leg joint and the right hand or leg joint is crucial to the difference of the behavior characteristics and should be considered in the behavior characteristic description.
Based on the above understanding, the following embodiments are described only taking the recognition of human behavior as an example. It is worth noting that the behavior recognition method based on the combined geometric features of the skeleton joints can not only be suitable for recognizing the behaviors of people, but also be used for recognizing the behaviors of other objects, particularly animals, on the basis of specific conditions and outputting the recognized corresponding target behaviors.
Referring to fig. 1, an embodiment of a behavior recognition method based on a skeleton joint combination geometric feature of the present application includes:
101. the skeleton is divided into X joint combinations, wherein X is a positive integer larger than 1.
First, the present application needs to know the states of all joints of the skeleton, which can be simply regarded as joint points, and this step needs to obtain the spatial position coordinates of all joint points of the skeleton, the connection relations of all joint points, the spatial connection order of all joint points, and the like. Based on this, the step divides the joint points of the skeleton into X joint combinations, wherein X is a positive integer larger than 1.
Specifically, the skeleton is a human body skeleton, and the invalid joints of the human body skeleton are removed according to a preset rule to obtain the remaining joints. The invalid joint herein refers to a joint irrelevant to the identification of a specific behavior, for example, in order to identify behaviors such as "running", "clapping", "kicking", "gesture", and the like of a person, the behaviors are mainly formed by the movement combination of joints of four limbs, and at this time, a spine joint, a waist joint, and the like can be considered as invalid joints, it should be noted that identification is performed for different behaviors, joint objects of the corresponding invalid joints may be different, and a specific joint object of the invalid joint is not limited herein. The invalid joints are removed in the step, so that the calculated amount of the joints can be reduced, the calculation speed is increased, and the identification efficiency of the behavior identification method based on the combined geometric characteristics of the skeleton joints is improved.
In one embodiment, please refer to fig. 2, wherein the left side of fig. 2 shows a schematic diagram of a human skeleton joint, the human skeleton mainly comprises: the left and right shoulder joints 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, and 20. In order to identify the behaviors of people such as running, clapping, kicking, and gestures, the scheme considers that the vertebral joint 3, the lumbar joint 4, the caudal vertebra joint 7, and the head joint 20 are invalid joints, and subtracts the vertebral joint 3, the lumbar joint 4, the caudal vertebra joint 7, and the head joint 20 to obtain the remaining joints, which is specifically shown in the right diagram of fig. 2. The right diagram in fig. 2 divides the human skeleton into 3 joint combinations P1, P2, P3, wherein the joint combination P1 includes the joints of the left arm (left finger joint 12, left wrist joint 10, left elbow joint 8, left shoulder joint 1) and the right arm (right finger joint 13, right wrist joint 11, right elbow joint 9, right shoulder joint 2) of the remaining joints; joint combination P2 includes the joints of the left arm (left finger joint 12, left wrist joint 10, left elbow joint 8, left shoulder joint 1) and the right leg (right hip joint 6, right knee joint 15, right ankle joint 17, right toe joint 19) of the remaining joints; joint combination P3 includes the joints of the right arm (right finger joint 13, right wrist joint 11, right elbow joint 9, right shoulder joint 2) and the left leg (left hip joint 5, left knee joint 14, left ankle joint 16, left toe joint 18) of the remaining joints.
Further, the above-mentioned 3 joint combinations may further include a joint number set of joints, a spatial connection relationship of the joints, and a spatial connection order of the joints, where P1, P2, and P3 are respectively described as:
wherein H represents a set of joint serial numbers of the left arm and the right arm; LG represents the set of joint numbers of the left arm and the right leg; RG represents a set of joint numbers of the right arm and the left leg; j. the design is a squaremA joint with a sequence number m; j. the design is a squarenIndicates a joint with the serial number n; j. the design is a squarekThe joint with the number k is shown. The combination state of each joint of the 3 joint combinations can be more accurately expressed through the formula.
102. Spatial position features of each joint combination are described.
Describing the spatial position features of the joint combinations divided in step 101, where the spatial position features of different joint combinations are described by the three-dimensional space coordinates of each joint in the joint combination, for example, the three-dimensional space coordinates of the joints corresponding to different times in all the joints in the joint combination, a set of the three-dimensional coordinates of each joint in the joint combination may be described as the spatial position features of the joint combination. On this basis, the spatial position characteristics formed by all the joints in the three joint combinations P1, P2 and P3 are described as follows:
whereinThree-dimensional coordinates of a joint with the serial number i in the joint combination P1 at the time T are shown, and T represents time length;three-dimensional coordinates representing the joint with the sequence number j in P2 at time t;three-dimensional coordinates of the joint with the number k in P3 at time t are shown.
Furthermore, the spatial position feature of the joint combination P1, the spatial position feature of the joint combination P2, and the spatial position feature of the joint combination P3 may also be converted into RGB pictures for description, and the RGB pictures are more suitable for recognition and classification of a neural network model.
For the joint combination P1The three coordinates are respectively normalized so as to be regarded as Red, Green and Blue channel pictures of the first RGB picture correspondingly. Among the spatial position characteristics of the joint combination P1The normalization process is performed to a first X value between 0 and 255, described as an example of X-axis coordinate normalization of joint combination P1:
whereinRepresenting the normalized X-axis coordinate, i.e., the first X value; and then the first X value is represented as a 224X 112 picture by a linear interpolation method. Similarly, in the spatial position characteristic of the joint combination P1Normalizing to a first Y value between 0 and 255Can also refer to the above-mentioned normalization procedureThe formula (2) is normalized to obtain a corresponding Y-axis coordinate, namely a first Y value, and the first Y value is expressed as a picture of 224 multiplied by 112 by a linear interpolation method; similarly, in the spatial position characteristic of the joint combination P1Normalizing to a first Z value between 0 and 255Can also refer to the above-mentioned normalization procedureThe formula (2) is normalized to obtain a corresponding Z-axis coordinate, namely a first Z value, and the first Z value is expressed as a picture of 224 multiplied by 112 by a linear interpolation method; finally, a 224 × 112 picture represented by the first X value, a 224 × 112 picture represented by the first Y value and a 224 × 112 picture represented by the first Z value are respectively regarded as Red, Green and Blue channels of the first RGB picture to obtain a first RGB picture, so that the purpose of enabling the first RGB picture to be displayed in a color space of the display device is achievedThe spatial position feature of the joint combination P1 is described by a first RGB picture.
For the joint combination P2The three coordinates are respectively normalized so as to be regarded as Red, Green and Blue channel pictures of the second RGB picture correspondingly. Among the spatial position characteristics of the joint combination P2Normalizing to a second X value between 0 and 255Can also refer to the above-mentioned normalization procedureThe formula (2) is normalized to obtain a corresponding X-axis coordinate, i.e. a second X value, and the second X value is represented as a 224 × 112 picture by a linear interpolation method. Similarly, in the spatial position characteristic of the joint combination P2Normalizing to a second Y value between 0 and 255Can also refer to the above-mentioned normalization procedureThe formula (2) is normalized to obtain a corresponding Y-axis coordinate, namely a second Y value, and the second Y value is expressed as a picture of 224 x 112 by a linear interpolation method; similarly, in the spatial position characteristic of the joint combination P2Normalizing to a second Z value between 0 and 255Can also refer to the above-mentioned normalization procedureThe formula (2) is normalized to obtain a corresponding Z-axis coordinate, namely a second Z value, and the second Z value is expressed as a picture of 224 multiplied by 112 by a linear interpolation method; and finally, respectively regarding one 224 × 112 picture represented by the second X value, one 224 × 112 picture represented by the second Y value and one 224 × 112 picture represented by the second Z value as Red, Green and Blue channels of the second RGB picture to obtain a second RGB picture, so that the spatial position feature of the joint combination P2 can be described by using the second RGB picture.
For the joint combination P3The three coordinates are respectively normalized so as to be regarded as Red, Green and Blue channel pictures of the third RGB picture correspondingly. Among the spatial position characteristics of the joint combination P3Normalizing to a third X value between 0 and 255Can also refer to the above-mentioned normalization procedureThe formula (2) is normalized to obtain a corresponding X-axis coordinate, i.e. a third X value, and the third X value is represented as a 224 × 112 picture by a linear interpolation method. Similarly, in the spatial position characteristic of the joint combination P3Normalizing to a third Y value between 0 and 255Can also refer to the above-mentioned normalization procedureThe formula (2) is normalized to obtain a corresponding Y-axis coordinate, namely a third Y value, and the third Y value is expressed as a 224 x 112 picture by a linear interpolation method; similarly, in the spatial position characteristic of the joint combination P3Normalizing to a third Z value between 0 and 255Can also refer to the above-mentioned normalization procedureThe formula (2) is normalized to obtain a corresponding Z-axis coordinate, namely a third Z value, and the third Z value is expressed as a picture of 224 multiplied by 112 by a linear interpolation method; and finally, respectively regarding a 224 × 112 picture represented by the third X value, a 224 × 112 picture represented by the third Y value and a 224 × 112 picture represented by the third Z value as Red, Green and Blue channels of the third RGB picture to obtain a third RGB picture, so that the spatial position feature of the joint combination P3 can be described by using the third RGB picture.
103. The relative geometry of each joint combination is described.
Describing the relative geometric features of the joint combinations divided in step 101, the relative geometric features between different joint combinations are also very important for behavior recognition based on considering the spatial position features of the joints. The relative geometrical characteristics of different joints have great significance to the representation of behaviors, such as 'clapping' behaviors, and the relative distance between symmetrical joints of the left arm and the right arm can become small; as with the "pick up" action, the relative distances of the joints of the human arm (including the left and right arms) with respect to the joints of the leg, which are important to distinguish between different actions, also become small. In addition, the relative position characteristics of the joint points at adjacent moments are also important for behavior representation, particularly for motions with inconsistent motion amplitudes, such as "walking" and "running", and the relative position change amplitudes of the joints at adjacent moments are different. Referring to FIG. 3, the present application describes two relative geometric features, the first being a relative distance feature between different joint combinations, as shown in the left diagram of FIG. 3; the second is the relative position feature of the same joint combination, as shown in the right diagram of fig. 3.
For joint combination P1, the present embodiment describes the relative geometry of joint combination P1 using the relative position features of joint combination P1:
whereinIndicates the arm joint JiThree-dimensional coordinates at time t, wherein the arm joints comprise a left arm joint and a right arm joint;indicates the arm joint JiThree-dimensional coordinates at time t-1;indicates the arm joint JiRelative position features at time t; will be provided withConsidered as a relative geometric feature of the joint combination P1;
for joint combination P2 and joint combination P3, the following calculation formula is referenced:
wherein d is1Indicating the relative distance formed between the different joints of the arm, d2Indicating the relative distance between the arm and the leg joint, d3Representing the relative distance of the joint from the origin.
For joint combination P2, the relative distances between the individual joints in joint combination P2 are taken as the relative geometric features of joint combination P2. The relative distance d21 formed between the left arm joints, the relative distance d22 between the left arm joints and the right leg joints, and the relative distance d23 between the left arm joints and the right leg joints with respect to a preset origin are calculated in the joint combination P2. Specifically, the relative distance d21 formed between the left arm joints in the joint combination P2 is set according to the above d1The specific relative distance d21 is obtained through calculation by the formula (2); the relative distance d22 between the left arm joint and the right leg joint in the joint combination P2 is determined according to the above d2The specific relative distance d22 is obtained through calculation by the formula (2); d23 the relative distance between the left arm joint and the right leg joint in the joint combination P2 relative to the preset origin is d3The specific relative distance d23 is obtained through calculation by the formula (2); the relative distance d21, the relative distance d22, and the relative distance d23 are considered to be relative geometric features of the joint combination P2.
For joint combination P3, the relative distances between the individual joints in joint combination P3 are taken as the relative geometric features of joint combination P3. The relative distance d31 formed between the right arm joints, the relative distance d32 between the right arm joints and the left leg joints, and the relative distance d33 between the right arm joints and the left leg joints with respect to a preset origin are calculated in the joint combination P3. Specifically, the relative distance d31 between the right arm joints in the joint combination P3 is set according to the above d1The specific relative distance d31 is obtained through calculation by the formula (2); the relative distance d32 between the right arm joint and the left leg joint in the joint combination P3 is determined according to the above d2Is calculated by the formulaCalculating a specific relative distance d 32; d33 the relative distance between the right arm joint and the left leg joint in the joint combination P3 relative to the preset origin is d3The specific relative distance d33 is obtained through calculation by the formula (2); the relative distance d31, the relative distance d32, and the relative distance d33 are considered to be relative geometric features of the joint combination P3.
Furthermore, the relative geometric features of the joint combination P1, the relative geometric features of the joint combination P2 and the relative geometric features of the joint combination P3 can be converted into RGB pictures for description, and the RGB pictures are more suitable for the recognition and classification of the neural network model.
Relative geometric features for joint combination P1 The three coordinates are respectively normalized so as to be regarded as Red, Green and Blue channel pictures of the fourth RGB picture correspondingly. Among the relative geometrical features of the joint combination P1Normalizing to a first X relative value between 0 and 255, specificallyCan also refer to the above-mentioned normalization procedureThe formula (2) is normalized to obtain a corresponding X-axis coordinate, i.e. a first X relative value, and the first X relative value is represented as a 224 × 112 picture by a linear interpolation method. Similarly, in the relative geometry of the joint combination P1Normalizing to a first relative Y value between 0 and 255, specificallyCan also refer to the above-mentioned normalization procedureThe formula (2) is normalized to obtain a corresponding Y-axis coordinate, namely a second Y value, and the second Y value is expressed as a picture of 224 x 112 by a linear interpolation method; similarly, in the relative geometry of the joint combination P1Normalizing to a first Z relative value between 0 and 255, specificallyCan also refer to the above-mentioned normalization procedureThe formula (2) is normalized to obtain a corresponding Z-axis coordinate, namely a first Z relative value, and the first Z relative value is expressed as a picture of 224 x 112 by a linear interpolation method; and finally, respectively regarding a 224 × 112 picture represented by the first X relative numerical value, a 224 × 112 picture represented by the first Y relative numerical value and a 224 × 112 picture represented by the first Z relative numerical value as Red, Green and Blue channels of a fourth RGB picture to obtain the fourth RGB picture, so that the fourth RGB picture is used for describing the relative geometric features of the joint combination P1.
For the relative geometry of the joint combination P2, the normalization of the relative distance d21 in the relative geometry of the joint combination P2 to a first d21 value between 0 and 255 can be referred to above for the normalization of the relative distance d21The formula (2) is normalized to obtain a corresponding first d21 value, and the first d21 value is represented as a 224 x 112 picture by a linear interpolation method; relative geometric features of the joint combination P2The relative distance d22 is normalized to a first d22 value between 0 and 255, and the normalization process of the relative distance d22 can refer to the above-mentionedThe formula (2) is normalized to obtain a corresponding first d22 value, and the first d22 value is represented as a 224 x 112 picture by a linear interpolation method; the normalization of the relative distance d23 in the relative geometry of P2 to a first d23 value between 0 and 255 can be referred to above for the normalization of the relative distance d23The formula (2) is normalized to obtain a corresponding first d23 value, and the first d23 value is represented as a 224 x 112 picture by a linear interpolation method; respectively regarding a 224 × 112 picture represented by the first d21 value, a 224 × 112 picture represented by the first d22 value, and a 224 × 112 picture represented by the first d23 value as Red, Green, and Blue channels of the fifth RGB picture to obtain a fifth RGB picture; the use of the fifth RGB picture to describe the relative geometry of joint combination P2 is enabled.
For the relative geometry of the joint combination P3, the normalization of the relative distance d31 in the relative geometry of the joint combination P3 to a first d31 value between 0 and 255 can be referred to above for the normalization of the relative distance d31The formula (2) is normalized to obtain a corresponding first d31 value, and the first d31 value is represented as a 224 x 112 picture by a linear interpolation method; normalization of the relative distance d32 in the relative geometry of the joint combination P3 to a first d32 value between 0 and 255, in particular for the normalization of the relative distance d32, can be found in relation to the above descriptionIs normalized to obtain the correspondingA first d32 value, which is represented by a 224 × 112 picture through linear interpolation; normalization of the relative distance d33 in the relative geometry of the joint combination P3 to a first d33 value between 0 and 255, in particular for the normalization of the relative distance d33, can be found in relation to the above descriptionThe formula (2) is normalized to obtain a corresponding first d33 value, and the first d33 value is represented as a 224 x 112 picture by a linear interpolation method; and respectively regarding a 224 × 112 picture represented by the first d31 value, a 224 × 112 picture represented by the first d32 value, and a 224 × 112 picture represented by the first d33 value as Red, Green, and Blue channels of the sixth RGB picture to obtain a sixth RGB picture, so as to realize the description of the relative geometric features of the joint combination P3 by using the sixth RGB picture.
In a further embodiment, for the joint combination P1, the relative distance between the individual joints of the joint combination P1 may also be used as a relative geometric feature of the joint combination P1. A relative distance d11 formed between the arm joints, a relative distance d12 between the arm joints and the both-leg joints (left-leg joint and right-leg joint), and a relative distance d13 between the arm joints with respect to a preset origin are calculated in the joint combination P1. Specifically, the relative distance d11 between the arm joints in the joint combination P1 is set to d1The specific relative distance d11 is obtained through calculation by the formula (2); the relative distance d12 between the arm joint and the joint of the two legs is adjusted according to the above d2The specific relative distance d12 is obtained through calculation by the formula (2); d13 relative distance between arm joint in the joint combination P1 and a preset origin is d3The specific relative distance d33 is obtained through calculation by the formula (2); the relative distance d11, the relative distance d12, and the relative distance d13 are considered to be relative geometric features of the joint combination P1.
Further, for the relative geometry of the joint combination P1, the normalization process of the relative distance d11 in the relative geometry of the joint combination P1 to the first d31 value between 0 and 255, and particularly the normalization process of the relative distance d11, can be performedTo which reference is madeThe formula (2) is normalized to obtain a corresponding first d11 value, and the first d11 value is represented as a 224 x 112 picture by a linear interpolation method; normalization of the relative distance d12 in the relative geometry of the joint combination P1 to a first d32 value between 0 and 255, in particular for the normalization of the relative distance d12, can be found in relation to the above descriptionThe formula (2) is normalized to obtain a corresponding first d12 value, and the first d12 value is represented as a 224 x 112 picture by a linear interpolation method; normalization of the relative distance d13 in the relative geometry of the joint combination P1 to a first d13 value between 0 and 255, in particular for the normalization of the relative distance d13, can be found in relation to the above descriptionThe formula (2) is normalized to obtain a corresponding first d13 value, and the first d13 value is represented as a 224 x 112 picture by a linear interpolation method; and respectively regarding a 224 × 112 picture represented by the first d11 value, a 224 × 112 picture represented by the first d12 value, and a 224 × 112 picture represented by the first d13 value as Red, Green, and Blue channels of the seventh RGB picture to obtain a seventh RGB picture, so as to realize the description of the relative geometric features of the joint combination P1 by using the seventh RGB picture.
104. The spatial position characteristic and the relative geometric characteristic of each joint combination are respectively combined and described as the geometric characteristic of each joint combination.
The spatial position characteristic of each joint combination described in the step 102 and the relative geometric characteristic of each joint combination described in the step 103 are combined to describe the geometric characteristic of each joint combination.
For example, referring to fig. 4, the first RGB picture and the fourth RGB picture are combined to form a first geometric feature RGB picture corresponding to the 224 × 224 length vector describing the joint combination P1; combining the second RGB picture and the fifth RGB picture to form a second geometric feature RGB picture corresponding to the 224 × 224 length vector describing the shutdown combination P2; combining the third RGB picture and the sixth RGB picture, a third geometric feature RGB picture corresponding to the 224 × 224 length vector describing the joint combination P3 is formed. Of course, the first RGB picture and the seventh RGB picture may be combined to form another first geometric feature RGB picture corresponding to the 224 × 224 length vector describing the joint combination P1.
105. And respectively inputting the geometric features of each joint combination into a trained deep convolution network for feature extraction and classification, and respectively outputting the possible probabilities of a plurality of behaviors.
It can be understood that, in the embodiment of the present application, feature extraction is performed by using the trained convolutional neural network Resnet-50, and the training sample picture of the convolutional neural network Resnet-50 also needs to be processed as the above step 101, step 102, step 103 and step 104, so that the trained convolutional neural network Resnet-50 can have the recognition capability for the spatial position feature, the relative geometric feature and the geometric feature of each joint combination, and the label classification is performed in combination with manual operation on the behaviors expressed by the training sample picture, so as to improve the efficiency and the accuracy of training of the convolutional neural network Resnet-50, and obtain the trained convolutional neural network Resnet-50 with the capability of recognizing and classifying specific label behaviors. The training process of the convolutional neural network Resnet-50 is a mature prior art and is not described herein again.
Referring to fig. 4, in this step, the first geometric feature RGB picture, the second geometric feature RGB picture, and the third geometric feature RGB picture in step 104 are respectively input to the trained convolutional neural network Resnet-50 for feature extraction, and possible probabilities of a plurality of behaviors are respectively output.
106. And comprehensively analyzing the possible probabilities of the behaviors according to a preset fusion function, and outputting the behavior with the highest possible probability to obtain the target behavior.
For example, the possible probabilities of several actions obtained in step 105 include: from the first geometric feature RPosterior probability b of behavior extracted from GB picture1Posterior probability b of behavior extracted from the second geometry RGB picture2Posterior probability b of behavior extracted from third geometry RGB picture3(ii) a Then according to the following formula:
P(L|S)=η1b1+η2b2+η3b3
wherein eta1、ηa、η3For the preset weight, the posterior probability vector P (L | S) represents the probability that the skeleton belongs to the action tag class L, and then label ═ Find (max (P (L | S))), where label represents the target behavior. For example, the target behavior may be: the label categories of "clap", "pick up", "walk", "run", etc.
After the above description of the embodiments of the present application, the present embodiment also lists the recognition effects of two different skeleton data-based behavior databases NTU RGB + D60 and Northwestern-UCLA, where NTU RGB + D60 contains 56880 behavior samples, and 40 objects are involved in the shooting of 60 types of motions, and each object faces three different angle cameras to repeat the motions, so that there are two verification modes, i.e., a Cross-verification mode based on a perspective Cross-view (cv) and a Cross-verification mode based on an object Cross-subject (cs). The Northwestern-UCLA (N-UCLA) dataset contains 1494 actions of 10 action types, each action type is realized by repeating 10 different objects for 6 times, the dataset contains 3 cameras forming multiple viewing angles, and Cross validation is carried out based on the viewing angle Cross-View (CV). Comparing the behavior recognition technology based on the combined geometric characteristics of the skeleton joints in the embodiment of the application with the current behavior recognition technology based on the connection of all skeleton joint serial numbers in two data sets NTU RGB + D60 and N-UCLA, the recognition effect is shown in the following table 1:
TABLE 1
Table 1 above shows a comparison between the technical scheme of the behavior recognition method based on the combined geometric features of the skeletal joints provided in the embodiment of the present application and the accuracy of the behavior recognition in the prior art. The recognition accuracy results show that the technical scheme of the behavior recognition method based on the geometric characteristics of the skeleton joint combination is greatly improved in accuracy compared with the prior art, and the superiority of the technical scheme of the behavior recognition method based on the geometric characteristics of the skeleton joint combination is demonstrated.
Compared with the existing behavior recognition technology based on all joint space characteristics, the technical scheme of the behavior recognition method based on the skeleton joint combination geometric characteristics overcomes the following defects and achieves the advantages:
1. the prior art realizes behavior feature description based on all joint space features, and neglects the space connection features between joints. The joint is connected in sequence based on the spatial connection characteristics of the joints, and the joint is not connected according to the serial numbers of the joint points, so that the spatial characteristics among different joints are embodied.
2. The prior art realizes behavior feature description based on all joint space features, and does not highlight the key information of behaviors. The technology of joint combination provided by the application is used for prominently describing key positions and key joints of behavior occurrence, removing joints which do not contribute much to behavior feature description, focusing the behavior feature description on joints of arms and legs, dividing the joints into three joint combinations, and comprehensively considering the influence of different joint combinations on behavior difference.
3. The prior art lacks a description of the relative characteristics of different joints, including relative spatial and relative temporal characteristics, which are important for behavioral representation. The joint combination geometric characteristics are formed by the relative distance characteristics among different joints and the relative time characteristics of the arm joints and the joint spatial characteristics in a public mode, the spatial description and the time characteristics of the human skeleton are expressed, and the accuracy of behavior recognition is greatly improved.
The foregoing embodiment describes the behavior recognition method based on the geometric features of the skeleton joint combination, and the following describes the behavior recognition device based on the geometric features of the skeleton joint combination, please refer to fig. 5, which includes:
a dividing unit 501, configured to divide a skeleton into X joint combinations, where X is a positive integer greater than 1;
a first description unit 502, configured to describe spatial position characteristics of each joint combination;
a second description unit 503 for describing relative geometric features of each of the joint combinations;
a joint description unit 504, configured to describe the spatial position feature and the relative geometric feature of each joint combination as a geometric feature of each joint combination respectively;
a classification output unit 505, configured to input the geometric features of each joint combination into a trained deep convolution network for feature extraction and classification, and output possible probabilities of a plurality of behaviors;
and the fusion analysis unit 506 is configured to perform comprehensive analysis on the possible probabilities of the behaviors according to a preset fusion function, and output the behavior with the highest possible probability to obtain a target behavior.
Optionally, X is 3; the skeleton is a human body skeleton; when the skeleton is divided into X joint combinations, the dividing unit 501 is specifically configured to:
removing invalid joints in the human skeleton according to a preset rule to obtain residual joints;
dividing the remaining joints into 3 joint combinations P1, P2, P3, wherein the P1 comprises joints of the left arm and the right arm of the remaining joints, the P2 comprises joints of the left arm and the right leg of the remaining joints, and P3 comprises joints of the right arm and the left leg of the remaining joints.
Optionally, each joint combination includes a joint serial number set of joints, a spatial connection relationship of the joints, and a spatial connection order of the joints;
the dividing unit 501 describes the P1, the P2, and the P3 as:
wherein the H represents a set of joint sequence numbers of the left arm and the right arm;
the LG represents a set of joint numbers for the left arm and the right leg;
the RG represents a set of joint numbers of the right arm and the left leg;
said JmA joint with a sequence number m;
said JnIndicates a joint with the serial number n;
said JkThe joint with the number k is shown.
Optionally, when the first description unit 502 describes the spatial position feature of each joint combination, it is specifically configured to:
spatial location features describing the P1, P2, and P3, respectively, are:
the above-mentionedThree-dimensional coordinates of the joint with the serial number i in the P1 at the time T, T tableDisplaying time;
the above-mentionedThree-dimensional coordinates representing the joint with the sequence number j in the P2 at the time point t;
the above-mentionedThe three-dimensional coordinates of the joint with the number k in P3 at time t are shown.
Optionally, the first description unit 502 is further specifically configured to:
the spatial location of the P1Performing normalization processing to a first X value between 0 and 255;
representing the first X numerical value as a 224X 112 picture by a linear interpolation method;
representing the first Y value as a 224 x 112 picture by a linear interpolation method;
representing the first Z value as a 224 x 112 picture by a linear interpolation method;
respectively regarding a 224 × 112 picture represented by the first X value, a 224 × 112 picture represented by the first Y value, and a 224 × 112 picture represented by the first Z value as Red, Green, and Blue channels of a first RGB picture to obtain a first RGB picture;
describing spatial location features of the P1 using the first RGB picture;
representing the second X numerical value as a 224X 112 picture by a linear interpolation method;
representing the second Y value as a 224 x 112 picture by a linear interpolation method;
representing the second Z value as a 224 x 112 picture by a linear interpolation method;
respectively regarding a 224 × 112 picture represented by the second X value, a 224 × 112 picture represented by the second Y value, and a 224 × 112 picture represented by the second Z value as Red, Green, and Blue channels of a second RGB picture to obtain a second RGB picture;
describing spatial location features of the P2 using the second RGB picture;
representing the third X numerical value as a 224X 112 picture by a linear interpolation method;
representing the third Y value as a 224 x 112 picture by a linear interpolation method;
representing the third Z value as a 224 x 112 picture by a linear interpolation method;
respectively regarding a 224 × 112 picture represented by the third X value, a 224 × 112 picture represented by the third Y value, and a 224 × 112 picture represented by the third Z value as Red, Green, and Blue channels of a third RGB picture, so as to obtain a third RGB picture;
the spatial position feature of the P3 is described using the third RGB picture.
Optionally, when describing the relative geometric features of each joint combination, the second description unit 503 is specifically configured to:
the relative geometric features of the P1 are described using the relative positional features of the P1:
wherein saidIndicates the arm joint JiThree-dimensional coordinates at time t, the arm joints comprising the left arm joint and the right arm joint;
calculating a relative distance d21 formed between the left arm joint, a relative distance d22 between the left arm joint and the right leg joint, a relative distance d23 between the left arm joint and right leg joint relative to the preset origin in the P2;
regarding the relative distance d21, the relative distance d22, the relative distance d23 as relative geometric features of the P2;
calculating a relative distance d31 formed between the right arm joint, a relative distance d32 between the right arm joint and the left leg joint, a relative distance d33 between the right arm joint and left leg joint relative to the preset origin in the P3;
the relative distance d31, the relative distance d32, the relative distance d33 are considered to be relative geometric features of the P3.
Optionally, the second describing unit 503 is further configured to:
as to the relative geometry of the P1Performing normalization to a first relative value of X between 0 and 255;
representing the first X relative value as a 224X 112 picture by a linear interpolation method;
representing the first Y relative value as a 224 x 112 picture by a linear interpolation method;
representing the first Z relative value as a 224 x 112 picture by a linear interpolation method;
respectively regarding a 224 × 112 picture represented by the first X relative value, a 224 × 112 picture represented by the first Y relative value, and a 224 × 112 picture represented by the first Z relative value as Red, Green, and Blue channels of a fourth RGB picture, so as to obtain a fourth RGB picture;
describing relative geometric features of the P1 using the fourth RGB picture;
normalizing the relative distance d21 in the relative geometry of the P2 to a first d21 value between 0 and 255;
representing the first d21 numerical value as a 224 x 112 picture by a linear interpolation method;
normalizing the relative distance d22 in the relative geometry of the P2 to a first d22 value between 0 and 255;
representing the first d22 numerical value as a 224 x 112 picture by a linear interpolation method;
normalizing the relative distance d23 in the relative geometry of the P2 to a first d23 value between 0 and 255;
representing the first d23 numerical value as a 224 x 112 picture by a linear interpolation method;
regarding a 224 × 112 picture represented by the first d21 value, a 224 × 112 picture represented by the first d22 value, and a 224 × 112 picture represented by the first d23 value as Red, Green, and Blue channels of a fifth RGB picture, respectively, obtaining a fifth RGB picture;
describing relative geometric features of the P2 using the fifth RGB picture;
normalizing the relative distance d31 in the relative geometry of the P3 to a first d31 value between 0 and 255;
representing the first d31 numerical value as a 224 x 112 picture by a linear interpolation method;
normalizing the relative distance d32 in the relative geometry of the P3 to a first d32 value between 0 and 255;
representing the first d32 numerical value as a 224 x 112 picture by a linear interpolation method;
normalizing the relative distance d33 in the relative geometry of the P3 to a first d33 value between 0 and 255;
representing the first d33 numerical value as a 224 x 112 picture by a linear interpolation method;
regarding a 224 × 112 picture represented by the first d31 value, a 224 × 112 picture represented by the first d32 value, and a 224 × 112 picture represented by the first d33 value as Red, Green, and Blue channels of a sixth RGB picture, respectively, to obtain the sixth RGB picture;
the relative geometry of the P3 is described using the sixth RGB picture.
Optionally, when the combination description unit 504 describes the spatial position feature and the relative geometric feature of each joint combination as geometric features, it is specifically configured to:
combining the first RGB picture and the fourth RGB picture to form a first geometric feature RGB picture corresponding to the 224 x 224 length vector of the P1;
combining the second RGB picture with the fifth RGB picture to form a second geometric feature RGB picture corresponding to the 224 × 224 length vector of P2;
combining the third RGB picture with the sixth RGB picture to form a third geometric feature RGB picture corresponding to the 224 x 224 length vector of P3.
Optionally, the classification output unit 505 is specifically configured to, when the geometric features of each joint combination are input into a trained deep convolutional network for feature extraction and classification, and possible probabilities of a plurality of behaviors are output respectively:
and respectively inputting the first geometric feature RGB picture, the second geometric feature RGB picture and the third geometric feature RGB picture into a trained convolutional neural network Resnet-50 for feature extraction, and respectively outputting possible probabilities of a plurality of behaviors.
Optionally, the possible probabilities of the several actions include: posterior probability b of behavior extracted from the first geometric feature RGB picture1Posterior probability b of behavior extracted from the second geometry RGB picture2Posterior probability b of behavior extracted from the third geometry RGB picture3;
The fusion analysis unit 506 is specifically configured to, when performing comprehensive analysis on the possible probabilities of the behaviors according to a preset fusion function, and outputting the behavior with the highest possible probability to obtain a target behavior:
P(L|S)=η1b1+η2b2+η3b3
eta of1Eta of2Eta of3Is a preset weight;
the posterior probability vector P (L | S) represents the probability that the skeleton belongs to the action label class L;
label=Find(max(P(L|S)))
the label represents the target behavior.
The operation performed by the behavior recognition device based on the geometric features of the skeleton joint combination in the embodiment of the present application is similar to the operation performed in fig. 1, and is not repeated here.
Referring to fig. 6, a computer device in an embodiment of the present application is described below, where an embodiment of the computer device in the embodiment of the present application includes: the computer device 600 may include one or more processors (CPUs) 801 and a memory 602, where one or more applications or data are stored in the memory 602. Wherein the memory 602 is volatile storage or persistent storage. The program stored in the memory 602 may include one or more modules, each of which may include a sequence of instructions operating on a computer device. Still further, the processor 601 may be arranged in communication with the memory 602 to execute a series of instruction operations in the memory 602 on the computer device 600. The computer device 600 may also include one or more wireless network interfaces 603, one or more input-output interfaces 604, and/or one or more operating systems, such as Windows Server, Mac OS, Unix, Linux, FreeBSD, etc. The processor 601 may perform the operations performed in the embodiment shown in fig. 1, and details thereof are not described herein.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.
Claims (10)
1. A behavior identification method based on a skeleton joint combination geometric feature is characterized by comprising the following steps:
dividing a skeleton into X joint combinations, wherein X is a positive integer greater than 1;
characterizing the spatial position of each of the joint combinations;
describing relative geometric features of each of the joint combinations;
describing the spatial position feature and the relative geometric feature of each joint combination as the geometric feature of each joint combination respectively;
inputting the geometric features of each joint combination into a trained deep convolution network for feature extraction and classification, and outputting possible probabilities of a plurality of behaviors respectively;
and comprehensively analyzing the possible probabilities of the behaviors according to a preset fusion function, and outputting the behavior with the highest possible probability to obtain a target behavior.
2. The behavior recognition method according to claim 1, wherein X is 3; the skeleton is a human body skeleton; the dividing of the skeleton into X joint combinations comprises:
removing invalid joints in the human skeleton according to a preset rule to obtain residual joints;
dividing the remaining joints into 3 joint combinations P1, P2, P3, wherein the P1 comprises joints of the left arm and the right arm of the remaining joints, the P2 comprises joints of the left arm and the right leg of the remaining joints, and P3 comprises joints of the right arm and the left leg of the remaining joints.
3. The behavior recognition method according to claim 2, wherein each of the joint combinations includes a joint number set of joints, a spatial connection relationship of the joints, a spatial connection order of the joints;
the P1, the P2, the P3 are respectively described as:
wherein the H represents a set of joint sequence numbers of the left arm and the right arm;
the LG represents a set of joint numbers for the left arm and the right leg;
the RG represents a set of joint numbers of the right arm and the left leg;
said JmA joint with a sequence number m;
said JnIndicates a joint with the serial number n;
said JkThe joint with the number k is shown.
4. The behavior recognition method according to claim 3, wherein the describing the spatial position feature of each of the joint combinations comprises:
spatial location features describing the P1, P2, and P3, respectively, are:
the above-mentionedThe three-dimensional coordinate of the joint with the serial number i in the P1 at the time T is represented, and T represents the time length;
the above-mentionedThree-dimensional coordinates representing the joint with the sequence number j in the P2 at the time point t;
5. The behavior recognition method according to claim 4, further comprising, after describing the spatial location features of the P1, the P2 and the P3 respectively:
the spatial location of the P1Performing normalization processing to a first X value between 0 and 255;
representing the first X numerical value as a 224X 112 picture by a linear interpolation method;
representing the first Y value as a 224 x 112 picture by a linear interpolation method;
representing the first Z value as a 224 x 112 picture by a linear interpolation method;
respectively regarding a 224 × 112 picture represented by the first X value, a 224 × 112 picture represented by the first Y value, and a 224 × 112 picture represented by the first Z value as Red, Green, and Blue channels of a first RGB picture to obtain a first RGB picture;
describing spatial location features of the P1 using the first RGB picture;
representing the second X numerical value as a 224X 112 picture by a linear interpolation method;
representing the second Y value as a 224 x 112 picture by a linear interpolation method;
representing the second Z value as a 224 x 112 picture by a linear interpolation method;
respectively regarding a 224 × 112 picture represented by the second X value, a 224 × 112 picture represented by the second Y value, and a 224 × 112 picture represented by the second Z value as Red, Green, and Blue channels of a second RGB picture to obtain a second RGB picture;
describing spatial location features of the P2 using the second RGB picture;
representing the third X numerical value as a 224X 112 picture by a linear interpolation method;
representing the third Y value as a 224 x 112 picture by a linear interpolation method;
representing the third Z value as a 224 x 112 picture by a linear interpolation method;
respectively regarding a 224 × 112 picture represented by the third X value, a 224 × 112 picture represented by the third Y value, and a 224 × 112 picture represented by the third Z value as Red, Green, and Blue channels of a third RGB picture, so as to obtain a third RGB picture;
the spatial position feature of the P3 is described using the third RGB picture.
6. The behavior recognition method of claim 5, wherein the describing relative geometric features of each of the joint combinations comprises:
the relative geometric features of the P1 are described using the relative positional features of the P1:
wherein saidIndicates the arm joint JiThree-dimensional coordinates at time t, the arm joints comprising the left arm joint and the right arm joint;
calculating a relative distance d21 formed between the left arm joint, a relative distance d22 between the left arm joint and the right leg joint, a relative distance d23 between the left arm joint and right leg joint relative to the preset origin in the P2;
regarding the relative distance d21, the relative distance d22, the relative distance d23 as relative geometric features of the P2;
calculating a relative distance d31 formed between the right arm joint, a relative distance d32 between the right arm joint and the left leg joint, a relative distance d33 between the right arm joint and left leg joint relative to the preset origin in the P3;
the relative distance d31, the relative distance d32, the relative distance d33 are considered to be relative geometric features of the P3.
7. The behavior recognition method according to claim 6, further comprising, after describing relative geometric features of the P1, the P2 and the P3, respectively:
as to the relative geometry of the P1Performing normalization to a first relative value of X between 0 and 255;
representing the first X relative value as a 224X 112 picture by a linear interpolation method;
representing the first Y relative value as a 224 x 112 picture by a linear interpolation method;
representing the first Z relative value as a 224 x 112 picture by a linear interpolation method;
respectively regarding a 224 × 112 picture represented by the first X relative value, a 224 × 112 picture represented by the first Y relative value, and a 224 × 112 picture represented by the first Z relative value as Red, Green, and Blue channels of a fourth RGB picture, so as to obtain a fourth RGB picture;
describing relative geometric features of the P1 using the fourth RGB picture;
normalizing the relative distance d21 in the relative geometry of the P2 to a first d21 value between 0 and 255;
representing the first d21 numerical value as a 224 x 112 picture by a linear interpolation method;
normalizing the relative distance d22 in the relative geometry of the P2 to a first d22 value between 0 and 255;
representing the first d22 numerical value as a 224 x 112 picture by a linear interpolation method;
normalizing the relative distance d23 in the relative geometry of the P2 to a first d23 value between 0 and 255;
representing the first d23 numerical value as a 224 x 112 picture by a linear interpolation method;
regarding a 224 × 112 picture represented by the first d21 value, a 224 × 112 picture represented by the first d22 value, and a 224 × 112 picture represented by the first d23 value as Red, Green, and Blue channels of a fifth RGB picture, respectively, obtaining a fifth RGB picture;
describing relative geometric features of the P2 using the fifth RGB picture;
normalizing the relative distance d31 in the relative geometry of the P3 to a first d31 value between 0 and 255;
representing the first d31 numerical value as a 224 x 112 picture by a linear interpolation method;
normalizing the relative distance d32 in the relative geometry of the P3 to a first d32 value between 0 and 255;
representing the first d32 numerical value as a 224 x 112 picture by a linear interpolation method;
normalizing the relative distance d33 in the relative geometry of the P3 to a first d33 value between 0 and 255;
representing the first d33 numerical value as a 224 x 112 picture by a linear interpolation method;
regarding a 224 × 112 picture represented by the first d31 value, a 224 × 112 picture represented by the first d32 value, and a 224 × 112 picture represented by the first d33 value as Red, Green, and Blue channels of a sixth RGB picture, respectively, to obtain the sixth RGB picture;
the relative geometry of the P3 is described using the sixth RGB picture.
8. The behavior recognition method according to claim 7, wherein describing the spatial position feature and the relative geometric feature of each of the joint combinations as geometric features in combination comprises:
combining the first RGB picture and the fourth RGB picture to form a first geometric feature RGB picture corresponding to the 224 x 224 length vector of the P1;
combining the second RGB picture with the fifth RGB picture to form a second geometric feature RGB picture corresponding to the 224 × 224 length vector of P2;
combining the third RGB picture with the sixth RGB picture to form a third geometric feature RGB picture corresponding to the 224 x 224 length vector of P3.
9. The behavior recognition method according to claim 8, wherein the geometric features of each joint combination are input into a trained deep convolutional network for feature extraction and classification, and the probability probabilities of a plurality of behaviors are respectively output, and the method comprises the following steps:
and respectively inputting the first geometric feature RGB picture, the second geometric feature RGB picture and the third geometric feature RGB picture into a trained convolutional neural network Resnet-50 for feature extraction, and respectively outputting possible probabilities of a plurality of behaviors.
10. The behavior recognition method of claim 9, wherein the possible probabilities of the several behaviors comprise: posterior probability b of behavior extracted from the first geometric feature RGB picture1Posterior probability b of behavior extracted from the second geometry RGB picture2Posterior probability b of behavior extracted from the third geometry RGB picture3;
The comprehensively analyzing the possible probabilities of the behaviors according to a preset fusion function, and outputting the behavior with the highest possible probability to obtain a target behavior, including:
P(L|S)=η1b1+η2b2+η3b3
eta of1Eta of2Eta of3Is a preset weight;
the posterior probability vector P (L | S) represents the probability that the skeleton belongs to the action label class L;
label=Find(max(P(L|S)))
the label represents the target behavior.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111623361.8A CN114299614A (en) | 2021-12-28 | 2021-12-28 | Behavior identification method based on skeleton joint combination geometric features |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111623361.8A CN114299614A (en) | 2021-12-28 | 2021-12-28 | Behavior identification method based on skeleton joint combination geometric features |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114299614A true CN114299614A (en) | 2022-04-08 |
Family
ID=80971390
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111623361.8A Pending CN114299614A (en) | 2021-12-28 | 2021-12-28 | Behavior identification method based on skeleton joint combination geometric features |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114299614A (en) |
-
2021
- 2021-12-28 CN CN202111623361.8A patent/CN114299614A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Fieraru et al. | Three-dimensional reconstruction of human interactions | |
Yasin et al. | A dual-source approach for 3d pose estimation from a single image | |
Ji et al. | Interactive body part contrast mining for human interaction recognition | |
Yang et al. | Unik: A unified framework for real-world skeleton-based action recognition | |
CN108171133B (en) | Dynamic gesture recognition method based on characteristic covariance matrix | |
US9734435B2 (en) | Recognition of hand poses by classification using discrete values | |
JP2013033468A (en) | Image processing method and system | |
JP2019096113A (en) | Processing device, method and program relating to keypoint data | |
JP7480312B2 (en) | Multi-person three-dimensional motion capture method, storage medium, and electronic device | |
CN111274998A (en) | Parkinson's disease finger knocking action identification method and system, storage medium and terminal | |
JP2016014954A (en) | Method for detecting finger shape, program thereof, storage medium of program thereof, and system for detecting finger shape | |
JP6381368B2 (en) | Image processing apparatus, image processing method, and program | |
Lovanshi et al. | Human pose estimation: benchmarking deep learning-based methods | |
CN113901891A (en) | Parkinson's disease fist making task evaluation method and system, storage medium and terminal | |
Chan et al. | A 3-D-point-cloud system for human-pose estimation | |
CN113229807A (en) | Human body rehabilitation evaluation device, method, electronic device and storage medium | |
CN112990154B (en) | Data processing method, computer equipment and readable storage medium | |
Dhore et al. | Human Pose Estimation And Classification: A Review | |
CN114299614A (en) | Behavior identification method based on skeleton joint combination geometric features | |
Yasin et al. | 3d pose estimation from a single monocular image | |
CN114782992A (en) | Super-joint and multi-mode network and behavior identification method thereof | |
Ma et al. | Sports competition assistant system based on fuzzy big data and health exercise recognition algorithm | |
Rastgoo et al. | A Non-Anatomical Graph Structure for isolated hand gesture separation in continuous gesture sequences | |
CN111311648A (en) | Method for tracking human hand-object interaction process based on collaborative differential evolution filtering | |
Maik et al. | Hierarchical pose classification based on human physiology for behaviour analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |