CN114299614A - Behavior identification method based on skeleton joint combination geometric features - Google Patents

Behavior identification method based on skeleton joint combination geometric features Download PDF

Info

Publication number
CN114299614A
CN114299614A CN202111623361.8A CN202111623361A CN114299614A CN 114299614 A CN114299614 A CN 114299614A CN 202111623361 A CN202111623361 A CN 202111623361A CN 114299614 A CN114299614 A CN 114299614A
Authority
CN
China
Prior art keywords
picture
joint
value
relative
rgb
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111623361.8A
Other languages
Chinese (zh)
Inventor
刘星
顾礼
高波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Information Technology
Original Assignee
Shenzhen Institute of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Information Technology filed Critical Shenzhen Institute of Information Technology
Priority to CN202111623361.8A priority Critical patent/CN114299614A/en
Publication of CN114299614A publication Critical patent/CN114299614A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The application is suitable for the technical field of video data processing, provides a behavior recognition method based on the combined geometric features of the skeleton joints, and aims to improve the accuracy and efficiency of behavior recognition based on the combined geometric features of the skeleton joints. The method mainly comprises the following steps: dividing a skeleton into X joint combinations, wherein X is a positive integer greater than 1; describing the spatial position characteristics of each joint combination; describing the relative geometry of each joint combination; respectively combining the spatial position characteristic and the relative geometric characteristic of each joint combination to describe the spatial position characteristic and the relative geometric characteristic of each joint combination as the geometric characteristic of each joint combination; respectively inputting the geometric features of each joint combination into a trained deep convolution network for feature extraction and classification, and respectively outputting the possible probabilities of a plurality of behaviors; and comprehensively analyzing the possible probabilities of the behaviors according to a preset fusion function, and outputting the behavior with the highest possible probability to obtain the target behavior.

Description

Behavior identification method based on skeleton joint combination geometric features
Technical Field
The application belongs to the technical field of video data processing, and particularly relates to a behavior identification method based on a skeleton joint combination geometric characteristic.
Background
In the prior art, a behavior recognition technology based on a skeleton sequence is mainly expressed as a behavior recognition technology based on all joint space characteristics. At present, all joints are arranged into a whole according to joint serial numbers and are used as individual behavior characteristic description to realize behavior classification. A skeleton sequence derived from video data, each frame of video data comprising joints of a plurality of parts of a head, hands, legs and spine, each joint numbered with a number starting at 1. According to the current behavior identification method based on the skeleton sequence, all joints are sequenced according to joint numbers, the skeleton sequence data are converted into color RGB pictures to serve as behavior feature description, and feature extraction is carried out by utilizing a deep network, so that behavior classification and identification are achieved.
However, in the prior art of behavior recognition based on skeleton sequences, the applicant finds that the method for directly arranging all joints and describing behavior features has the problems of low recognition efficiency and insufficient accuracy. For example, the motion of a hand clapping, only the joints of the forelimb are changed, and the spatial positions of the joints of other joints can be unchanged, so that the motion contributes little to the feature description.
Disclosure of Invention
The application aims to provide a behavior recognition method based on the geometric features of the skeleton joint combination, and aims to improve the accuracy and efficiency of behavior recognition based on the geometric features of the skeleton joint combination.
The application provides a behavior identification method based on a skeleton joint combination geometric feature, which comprises the following steps:
dividing a skeleton into X joint combinations, wherein X is a positive integer greater than 1;
characterizing the spatial position of each of the joint combinations;
describing relative geometric features of each of the joint combinations;
describing the spatial position feature and the relative geometric feature of each joint combination as the geometric feature of each joint combination respectively;
inputting the geometric features of each joint combination into a trained deep convolution network for feature extraction and classification, and outputting possible probabilities of a plurality of behaviors respectively;
and comprehensively analyzing the possible probabilities of the behaviors according to a preset fusion function, and outputting the behavior with the highest possible probability to obtain a target behavior.
Optionally, X is 3; the skeleton is a human body skeleton; the dividing of the skeleton into X joint combinations comprises:
removing invalid joints in the human skeleton according to a preset rule to obtain residual joints;
dividing the remaining joints into 3 joint combinations P1, P2, P3, wherein the P1 comprises joints of the left arm and the right arm of the remaining joints, the P2 comprises joints of the left arm and the right leg of the remaining joints, and P3 comprises joints of the right arm and the left leg of the remaining joints.
Optionally, each joint combination includes a joint serial number set of joints, a spatial connection relationship of the joints, and a spatial connection order of the joints;
the P1, the P2, the P3 are respectively described as:
Figure BDA0003439019690000021
Figure BDA0003439019690000022
Figure BDA0003439019690000023
wherein the H represents a set of joint sequence numbers of the left arm and the right arm;
the LG represents a set of joint numbers for the left arm and the right leg;
the RG represents a set of joint numbers of the right arm and the left leg;
said JmA joint with a sequence number m;
said JnIndicates a joint with the serial number n;
said JkThe joint with the number k is shown.
Optionally, the describing the spatial position feature of each joint combination includes:
spatial location features describing the P1, P2, and P3, respectively, are:
Figure BDA0003439019690000031
Figure BDA0003439019690000032
Figure BDA0003439019690000033
the above-mentioned
Figure BDA0003439019690000034
The three-dimensional coordinate of the joint with the serial number i in the P1 at the time T is represented, and T represents the time length;
the above-mentioned
Figure BDA0003439019690000035
Three-dimensional coordinates representing the joint with the sequence number j in the P2 at the time point t;
the above-mentioned
Figure BDA0003439019690000036
Denotes the sequence number k in P3Three-dimensional coordinates of the joint at time t.
Optionally, after the spatial location features of the P1, the P2, and the P3 are respectively described, the method specifically further includes:
the spatial location of the P1
Figure BDA0003439019690000037
Performing normalization processing to a first X value between 0 and 255;
representing the first X numerical value as a 224X 112 picture by a linear interpolation method;
the spatial location of the P1
Figure BDA0003439019690000038
Performing normalization to a first Y value between 0 and 255;
representing the first Y value as a 224 x 112 picture by a linear interpolation method;
the spatial location of the P1
Figure BDA0003439019690000039
Performing normalization to a first Z value between 0 and 255;
representing the first Z value as a 224 x 112 picture by a linear interpolation method;
respectively regarding a 224 × 112 picture represented by the first X value, a 224 × 112 picture represented by the first Y value, and a 224 × 112 picture represented by the first Z value as Red, Green, and Blue channels of a first RGB picture to obtain a first RGB picture;
describing spatial location features of the P1 using the first RGB picture;
the spatial location of the P2
Figure BDA00034390196900000310
Normalizing to a second X value between 0 and 255;
representing the second X numerical value as a 224X 112 picture by a linear interpolation method;
the spatial location of the P2
Figure BDA00034390196900000311
Normalizing to a second Y value between 0 and 255;
representing the second Y value as a 224 x 112 picture by a linear interpolation method;
the spatial location of the P2
Figure BDA0003439019690000041
Normalizing to a second Z value between 0 and 255;
representing the second Z value as a 224 x 112 picture by a linear interpolation method;
respectively regarding a 224 × 112 picture represented by the second X value, a 224 × 112 picture represented by the second Y value, and a 224 × 112 picture represented by the second Z value as Red, Green, and Blue channels of a second RGB picture to obtain a second RGB picture;
describing spatial location features of the P2 using the second RGB picture;
the spatial location of the P3
Figure BDA0003439019690000042
Normalizing to a third X value between 0 and 255;
representing the third X numerical value as a 224X 112 picture by a linear interpolation method;
the spatial location of the P3
Figure BDA0003439019690000043
Normalizing to a third Y value between 0 and 255;
representing the third Y value as a 224 x 112 picture by a linear interpolation method;
for the P3In the spatial position characteristics of
Figure BDA0003439019690000044
Normalizing to a third Z value between 0 and 255;
representing the third Z value as a 224 x 112 picture by a linear interpolation method;
respectively regarding a 224 × 112 picture represented by the third X value, a 224 × 112 picture represented by the third Y value, and a 224 × 112 picture represented by the third Z value as Red, Green, and Blue channels of a third RGB picture, so as to obtain a third RGB picture;
the spatial position feature of the P3 is described using the third RGB picture.
Optionally, the describing the relative geometry of each of the joint combinations comprises:
the relative geometric features of the P1 are described using the relative positional features of the P1:
Figure BDA0003439019690000045
wherein said
Figure BDA0003439019690000046
Indicates the arm joint JiThree-dimensional coordinates at time t, the arm joints comprising the left arm joint and the right arm joint;
the above-mentioned
Figure BDA0003439019690000051
Represents the arm joint JiThree-dimensional coordinates at time t-1;
the above-mentioned
Figure BDA0003439019690000052
Represents the arm joint JiRelative position features at time t;
will be described in
Figure BDA0003439019690000053
Considered as a relative geometric feature of said P1;
calculating a relative distance d21 formed between the left arm joint, a relative distance d22 between the left arm joint and the right leg joint, a relative distance d23 between the left arm joint and right leg joint relative to the preset origin in the P2;
regarding the relative distance d21, the relative distance d22, the relative distance d23 as relative geometric features of the P2;
calculating a relative distance d31 formed between the right arm joint, a relative distance d32 between the right arm joint and the left leg joint, a relative distance d33 between the right arm joint and left leg joint relative to the preset origin in the P3;
the relative distance d31, the relative distance d32, the relative distance d33 are considered to be relative geometric features of the P3.
Optionally, after describing the relative geometric features of the P1, the P2 and the P3 respectively, the method further includes:
as to the relative geometry of the P1
Figure BDA0003439019690000054
Performing normalization to a first relative value of X between 0 and 255;
representing the first X relative value as a 224X 112 picture by a linear interpolation method;
the spatial location of the P1
Figure BDA0003439019690000055
Normalizing to a first relative value of Y between 0 and 255;
representing the first Y relative value as a 224 x 112 picture by a linear interpolation method;
the spatial location of the P1
Figure BDA0003439019690000056
Normalizing to a first relative Z value between 0 and 255;
representing the first Z relative value as a 224 x 112 picture by a linear interpolation method;
respectively regarding a 224 × 112 picture represented by the first X relative value, a 224 × 112 picture represented by the first Y relative value, and a 224 × 112 picture represented by the first Z relative value as Red, Green, and Blue channels of a fourth RGB picture, so as to obtain a fourth RGB picture;
describing relative geometric features of the P1 using the fourth RGB picture;
normalizing the relative distance d21 in the relative geometry of the P2 to a first d21 value between 0 and 255;
representing the first d21 numerical value as a 224 x 112 picture by a linear interpolation method;
normalizing the relative distance d22 in the relative geometry of the P2 to a first d22 value between 0 and 255;
representing the first d22 numerical value as a 224 x 112 picture by a linear interpolation method;
normalizing the relative distance d23 in the relative geometry of the P2 to a first d23 value between 0 and 255;
representing the first d23 numerical value as a 224 x 112 picture by a linear interpolation method;
regarding a 224 × 112 picture represented by the first d21 value, a 224 × 112 picture represented by the first d22 value, and a 224 × 112 picture represented by the first d23 value as Red, Green, and Blue channels of a fifth RGB picture, respectively, obtaining a fifth RGB picture;
describing relative geometric features of the P2 using the fifth RGB picture;
normalizing the relative distance d31 in the relative geometry of the P3 to a first d31 value between 0 and 255;
representing the first d31 numerical value as a 224 x 112 picture by a linear interpolation method;
normalizing the relative distance d32 in the relative geometry of the P3 to a first d32 value between 0 and 255;
representing the first d32 numerical value as a 224 x 112 picture by a linear interpolation method;
normalizing the relative distance d33 in the relative geometry of the P3 to a first d33 value between 0 and 255;
representing the first d33 numerical value as a 224 x 112 picture by a linear interpolation method;
regarding a 224 × 112 picture represented by the first d31 value, a 224 × 112 picture represented by the first d32 value, and a 224 × 112 picture represented by the first d33 value as Red, Green, and Blue channels of a sixth RGB picture, respectively, to obtain the sixth RGB picture;
the relative geometry of the P3 is described using the sixth RGB picture.
Optionally, describing the spatial position feature and the relative geometric feature of each joint combination as geometric features in combination comprises:
combining the first RGB picture and the fourth RGB picture to form a first geometric feature RGB picture corresponding to the 224 x 224 length vector of the P1;
combining the second RGB picture with the fifth RGB picture to form a second geometric feature RGB picture corresponding to the 224 × 224 length vector of P2;
combining the third RGB picture with the sixth RGB picture to form a third geometric feature RGB picture corresponding to the 224 x 224 length vector of P3.
Optionally, the respectively inputting the geometric features of each joint combination into a trained deep convolutional network for feature extraction and classification, and respectively outputting possible probabilities of a plurality of behaviors, includes:
and respectively inputting the first geometric feature RGB picture, the second geometric feature RGB picture and the third geometric feature RGB picture into a trained convolutional neural network Resnet-50 for feature extraction, and respectively outputting possible probabilities of a plurality of behaviors.
Optionally, the possible probabilities of the several actions include: posterior probability b of behavior extracted from the first geometric feature RGB picture1Posterior probability b of behavior extracted from the second geometry RGB picture2Posterior probability b of behavior extracted from the third geometry RGB picture3
The comprehensively analyzing the possible probabilities of the behaviors according to a preset fusion function, and outputting the behavior with the highest possible probability to obtain a target behavior, including:
P(L|S)=η1b12b23b3
eta of1Eta of2Eta of3Is a preset weight;
the posterior probability vector P (L | S) represents the probability that the skeleton belongs to the action label class L;
label=Find(max(P(L|S)))
the label represents the target behavior.
According to the technical scheme, the embodiment of the application has the following advantages:
therefore, in the behavior identification method based on the geometric features of the skeleton joint combination, the skeleton is divided into X joint combinations, all joints of the skeleton are not recognized as a whole as in the prior art, the division into the X joint combinations is more beneficial to improving the classification precision of behaviors, the geometric feature basis of each joint combination in the recognition and classification process is determined, the analysis data volume is reduced, the trained deep convolutional network is used for carrying out feature extraction and classification to obtain the possible probabilities of a plurality of behaviors, then the possible probabilities of the plurality of behaviors are comprehensively analyzed according to the preset fusion function, the behavior with the highest possible probability is output, and the target behavior is obtained. Because the data volume of the joint combination is smaller than that of all joints of the whole skeleton, the occupied computing resources are less, and the behavior recognition efficiency based on the geometrical characteristics of the skeleton joint combination is improved; and because the spatial position characteristics and the relative geometric characteristics of each joint combination are described, the behavior identification accuracy based on the geometric characteristics of the skeleton joint combination is improved.
Drawings
FIG. 1 is a schematic flow chart illustrating an embodiment of a behavior recognition method based on a skeleton joint combination geometric feature according to the present application;
FIG. 2 is a schematic view of one embodiment of the present application illustrating the division of joints of a human skeleton into different joint combinations;
FIG. 3 is a schematic view of one embodiment of the relative geometry of the joint assembly of the present application;
FIG. 4 is a schematic diagram of an embodiment of feature extraction and classification of 3 joint combinations P1, P2 and P3 by the deep convolutional network of the present application;
FIG. 5 is a schematic structural diagram of an embodiment of the behavior recognition device based on the geometric features of the human skeleton joint combination according to the present application;
FIG. 6 is a schematic structural diagram of an embodiment of a computer device according to the present application.
Detailed Description
In order to clearly identify various behaviors expressed by various joint states of the skeleton, the applicant analyzes a behavior identification method based on a human body skeleton in the prior art, and finds that the method for directly arranging all joints of the skeleton according to joint numbers and describing behavior characteristics at present has at least the following problems and defects:
1. the joints adjacent to the serial number have no spatial correlation in the human skeleton. For example: the joint representing the head is adjacent to the joint number representing the hand, however, the two joints are not adjacent in spatial position, which is not beneficial to the identification of the behavior characteristics, so that the applicant thinks that the joints with spatial correlation should be combined according to the spatial correlation of the joints in the skeleton.
2. The prior art describes the characteristics of all joints, and the method cannot pay attention to local joints which really have behavior changes, such as the motion of 'clapping hands', only the joints of the hands change, the spatial positions of the joints at other legs, the head and the like do not change, the contribution to the characteristic description is small, and the characteristics representing the behavior should be gathered at partial key joints of a skeleton when the behavior characteristics are represented by some joints of a main skeleton.
3. The prior art methods do not take into account the relative spatial relationship between the symmetric joints. Most of the motions of the human body are performed by the hand or leg joints, such as "running", "clapping", "kicking", "javeling", etc., and the relative spatial relationship between the left hand or leg joint and the right hand or leg joint is crucial to the difference of the behavior characteristics and should be considered in the behavior characteristic description.
Based on the above understanding, the following embodiments are described only taking the recognition of human behavior as an example. It is worth noting that the behavior recognition method based on the combined geometric features of the skeleton joints can not only be suitable for recognizing the behaviors of people, but also be used for recognizing the behaviors of other objects, particularly animals, on the basis of specific conditions and outputting the recognized corresponding target behaviors.
Referring to fig. 1, an embodiment of a behavior recognition method based on a skeleton joint combination geometric feature of the present application includes:
101. the skeleton is divided into X joint combinations, wherein X is a positive integer larger than 1.
First, the present application needs to know the states of all joints of the skeleton, which can be simply regarded as joint points, and this step needs to obtain the spatial position coordinates of all joint points of the skeleton, the connection relations of all joint points, the spatial connection order of all joint points, and the like. Based on this, the step divides the joint points of the skeleton into X joint combinations, wherein X is a positive integer larger than 1.
Specifically, the skeleton is a human body skeleton, and the invalid joints of the human body skeleton are removed according to a preset rule to obtain the remaining joints. The invalid joint herein refers to a joint irrelevant to the identification of a specific behavior, for example, in order to identify behaviors such as "running", "clapping", "kicking", "gesture", and the like of a person, the behaviors are mainly formed by the movement combination of joints of four limbs, and at this time, a spine joint, a waist joint, and the like can be considered as invalid joints, it should be noted that identification is performed for different behaviors, joint objects of the corresponding invalid joints may be different, and a specific joint object of the invalid joint is not limited herein. The invalid joints are removed in the step, so that the calculated amount of the joints can be reduced, the calculation speed is increased, and the identification efficiency of the behavior identification method based on the combined geometric characteristics of the skeleton joints is improved.
In one embodiment, please refer to fig. 2, wherein the left side of fig. 2 shows a schematic diagram of a human skeleton joint, the human skeleton mainly comprises: the left and right shoulder joints 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, and 20. In order to identify the behaviors of people such as running, clapping, kicking, and gestures, the scheme considers that the vertebral joint 3, the lumbar joint 4, the caudal vertebra joint 7, and the head joint 20 are invalid joints, and subtracts the vertebral joint 3, the lumbar joint 4, the caudal vertebra joint 7, and the head joint 20 to obtain the remaining joints, which is specifically shown in the right diagram of fig. 2. The right diagram in fig. 2 divides the human skeleton into 3 joint combinations P1, P2, P3, wherein the joint combination P1 includes the joints of the left arm (left finger joint 12, left wrist joint 10, left elbow joint 8, left shoulder joint 1) and the right arm (right finger joint 13, right wrist joint 11, right elbow joint 9, right shoulder joint 2) of the remaining joints; joint combination P2 includes the joints of the left arm (left finger joint 12, left wrist joint 10, left elbow joint 8, left shoulder joint 1) and the right leg (right hip joint 6, right knee joint 15, right ankle joint 17, right toe joint 19) of the remaining joints; joint combination P3 includes the joints of the right arm (right finger joint 13, right wrist joint 11, right elbow joint 9, right shoulder joint 2) and the left leg (left hip joint 5, left knee joint 14, left ankle joint 16, left toe joint 18) of the remaining joints.
Further, the above-mentioned 3 joint combinations may further include a joint number set of joints, a spatial connection relationship of the joints, and a spatial connection order of the joints, where P1, P2, and P3 are respectively described as:
Figure BDA0003439019690000101
Figure BDA0003439019690000111
Figure BDA0003439019690000112
wherein H represents a set of joint serial numbers of the left arm and the right arm; LG represents the set of joint numbers of the left arm and the right leg; RG represents a set of joint numbers of the right arm and the left leg; j. the design is a squaremA joint with a sequence number m; j. the design is a squarenIndicates a joint with the serial number n; j. the design is a squarekThe joint with the number k is shown. The combination state of each joint of the 3 joint combinations can be more accurately expressed through the formula.
102. Spatial position features of each joint combination are described.
Describing the spatial position features of the joint combinations divided in step 101, where the spatial position features of different joint combinations are described by the three-dimensional space coordinates of each joint in the joint combination, for example, the three-dimensional space coordinates of the joints corresponding to different times in all the joints in the joint combination, a set of the three-dimensional coordinates of each joint in the joint combination may be described as the spatial position features of the joint combination. On this basis, the spatial position characteristics formed by all the joints in the three joint combinations P1, P2 and P3 are described as follows:
Figure BDA0003439019690000113
Figure BDA0003439019690000114
Figure BDA0003439019690000115
wherein
Figure BDA0003439019690000116
Three-dimensional coordinates of a joint with the serial number i in the joint combination P1 at the time T are shown, and T represents time length;
Figure BDA0003439019690000117
three-dimensional coordinates representing the joint with the sequence number j in P2 at time t;
Figure BDA0003439019690000118
three-dimensional coordinates of the joint with the number k in P3 at time t are shown.
Furthermore, the spatial position feature of the joint combination P1, the spatial position feature of the joint combination P2, and the spatial position feature of the joint combination P3 may also be converted into RGB pictures for description, and the RGB pictures are more suitable for recognition and classification of a neural network model.
For the joint combination P1
Figure BDA0003439019690000119
The three coordinates are respectively normalized so as to be regarded as Red, Green and Blue channel pictures of the first RGB picture correspondingly. Among the spatial position characteristics of the joint combination P1
Figure BDA00034390196900001110
The normalization process is performed to a first X value between 0 and 255, described as an example of X-axis coordinate normalization of joint combination P1:
Figure BDA0003439019690000121
wherein
Figure BDA0003439019690000122
Representing the normalized X-axis coordinate, i.e., the first X value; and then the first X value is represented as a 224X 112 picture by a linear interpolation method. Similarly, in the spatial position characteristic of the joint combination P1
Figure BDA0003439019690000123
Normalizing to a first Y value between 0 and 255
Figure BDA0003439019690000124
Can also refer to the above-mentioned normalization procedure
Figure BDA0003439019690000125
The formula (2) is normalized to obtain a corresponding Y-axis coordinate, namely a first Y value, and the first Y value is expressed as a picture of 224 multiplied by 112 by a linear interpolation method; similarly, in the spatial position characteristic of the joint combination P1
Figure BDA0003439019690000126
Normalizing to a first Z value between 0 and 255
Figure BDA0003439019690000127
Can also refer to the above-mentioned normalization procedure
Figure BDA0003439019690000128
The formula (2) is normalized to obtain a corresponding Z-axis coordinate, namely a first Z value, and the first Z value is expressed as a picture of 224 multiplied by 112 by a linear interpolation method; finally, a 224 × 112 picture represented by the first X value, a 224 × 112 picture represented by the first Y value and a 224 × 112 picture represented by the first Z value are respectively regarded as Red, Green and Blue channels of the first RGB picture to obtain a first RGB picture, so that the purpose of enabling the first RGB picture to be displayed in a color space of the display device is achievedThe spatial position feature of the joint combination P1 is described by a first RGB picture.
For the joint combination P2
Figure BDA0003439019690000129
The three coordinates are respectively normalized so as to be regarded as Red, Green and Blue channel pictures of the second RGB picture correspondingly. Among the spatial position characteristics of the joint combination P2
Figure BDA00034390196900001210
Normalizing to a second X value between 0 and 255
Figure BDA00034390196900001211
Can also refer to the above-mentioned normalization procedure
Figure BDA00034390196900001212
The formula (2) is normalized to obtain a corresponding X-axis coordinate, i.e. a second X value, and the second X value is represented as a 224 × 112 picture by a linear interpolation method. Similarly, in the spatial position characteristic of the joint combination P2
Figure BDA00034390196900001213
Normalizing to a second Y value between 0 and 255
Figure BDA00034390196900001214
Can also refer to the above-mentioned normalization procedure
Figure BDA00034390196900001215
The formula (2) is normalized to obtain a corresponding Y-axis coordinate, namely a second Y value, and the second Y value is expressed as a picture of 224 x 112 by a linear interpolation method; similarly, in the spatial position characteristic of the joint combination P2
Figure BDA00034390196900001216
Normalizing to a second Z value between 0 and 255
Figure BDA00034390196900001217
Can also refer to the above-mentioned normalization procedure
Figure BDA00034390196900001218
The formula (2) is normalized to obtain a corresponding Z-axis coordinate, namely a second Z value, and the second Z value is expressed as a picture of 224 multiplied by 112 by a linear interpolation method; and finally, respectively regarding one 224 × 112 picture represented by the second X value, one 224 × 112 picture represented by the second Y value and one 224 × 112 picture represented by the second Z value as Red, Green and Blue channels of the second RGB picture to obtain a second RGB picture, so that the spatial position feature of the joint combination P2 can be described by using the second RGB picture.
For the joint combination P3
Figure BDA0003439019690000131
The three coordinates are respectively normalized so as to be regarded as Red, Green and Blue channel pictures of the third RGB picture correspondingly. Among the spatial position characteristics of the joint combination P3
Figure BDA0003439019690000132
Normalizing to a third X value between 0 and 255
Figure BDA0003439019690000133
Can also refer to the above-mentioned normalization procedure
Figure BDA0003439019690000134
The formula (2) is normalized to obtain a corresponding X-axis coordinate, i.e. a third X value, and the third X value is represented as a 224 × 112 picture by a linear interpolation method. Similarly, in the spatial position characteristic of the joint combination P3
Figure BDA0003439019690000135
Normalizing to a third Y value between 0 and 255
Figure BDA0003439019690000136
Can also refer to the above-mentioned normalization procedure
Figure BDA0003439019690000137
The formula (2) is normalized to obtain a corresponding Y-axis coordinate, namely a third Y value, and the third Y value is expressed as a 224 x 112 picture by a linear interpolation method; similarly, in the spatial position characteristic of the joint combination P3
Figure BDA0003439019690000138
Normalizing to a third Z value between 0 and 255
Figure BDA0003439019690000139
Can also refer to the above-mentioned normalization procedure
Figure BDA00034390196900001310
The formula (2) is normalized to obtain a corresponding Z-axis coordinate, namely a third Z value, and the third Z value is expressed as a picture of 224 multiplied by 112 by a linear interpolation method; and finally, respectively regarding a 224 × 112 picture represented by the third X value, a 224 × 112 picture represented by the third Y value and a 224 × 112 picture represented by the third Z value as Red, Green and Blue channels of the third RGB picture to obtain a third RGB picture, so that the spatial position feature of the joint combination P3 can be described by using the third RGB picture.
103. The relative geometry of each joint combination is described.
Describing the relative geometric features of the joint combinations divided in step 101, the relative geometric features between different joint combinations are also very important for behavior recognition based on considering the spatial position features of the joints. The relative geometrical characteristics of different joints have great significance to the representation of behaviors, such as 'clapping' behaviors, and the relative distance between symmetrical joints of the left arm and the right arm can become small; as with the "pick up" action, the relative distances of the joints of the human arm (including the left and right arms) with respect to the joints of the leg, which are important to distinguish between different actions, also become small. In addition, the relative position characteristics of the joint points at adjacent moments are also important for behavior representation, particularly for motions with inconsistent motion amplitudes, such as "walking" and "running", and the relative position change amplitudes of the joints at adjacent moments are different. Referring to FIG. 3, the present application describes two relative geometric features, the first being a relative distance feature between different joint combinations, as shown in the left diagram of FIG. 3; the second is the relative position feature of the same joint combination, as shown in the right diagram of fig. 3.
For joint combination P1, the present embodiment describes the relative geometry of joint combination P1 using the relative position features of joint combination P1:
Figure BDA0003439019690000141
wherein
Figure BDA0003439019690000142
Indicates the arm joint JiThree-dimensional coordinates at time t, wherein the arm joints comprise a left arm joint and a right arm joint;
Figure BDA0003439019690000143
indicates the arm joint JiThree-dimensional coordinates at time t-1;
Figure BDA0003439019690000144
indicates the arm joint JiRelative position features at time t; will be provided with
Figure BDA0003439019690000145
Considered as a relative geometric feature of the joint combination P1;
for joint combination P2 and joint combination P3, the following calculation formula is referenced:
Figure BDA0003439019690000146
Figure BDA0003439019690000147
Figure BDA0003439019690000148
wherein d is1Indicating the relative distance formed between the different joints of the arm, d2Indicating the relative distance between the arm and the leg joint, d3Representing the relative distance of the joint from the origin.
For joint combination P2, the relative distances between the individual joints in joint combination P2 are taken as the relative geometric features of joint combination P2. The relative distance d21 formed between the left arm joints, the relative distance d22 between the left arm joints and the right leg joints, and the relative distance d23 between the left arm joints and the right leg joints with respect to a preset origin are calculated in the joint combination P2. Specifically, the relative distance d21 formed between the left arm joints in the joint combination P2 is set according to the above d1The specific relative distance d21 is obtained through calculation by the formula (2); the relative distance d22 between the left arm joint and the right leg joint in the joint combination P2 is determined according to the above d2The specific relative distance d22 is obtained through calculation by the formula (2); d23 the relative distance between the left arm joint and the right leg joint in the joint combination P2 relative to the preset origin is d3The specific relative distance d23 is obtained through calculation by the formula (2); the relative distance d21, the relative distance d22, and the relative distance d23 are considered to be relative geometric features of the joint combination P2.
For joint combination P3, the relative distances between the individual joints in joint combination P3 are taken as the relative geometric features of joint combination P3. The relative distance d31 formed between the right arm joints, the relative distance d32 between the right arm joints and the left leg joints, and the relative distance d33 between the right arm joints and the left leg joints with respect to a preset origin are calculated in the joint combination P3. Specifically, the relative distance d31 between the right arm joints in the joint combination P3 is set according to the above d1The specific relative distance d31 is obtained through calculation by the formula (2); the relative distance d32 between the right arm joint and the left leg joint in the joint combination P3 is determined according to the above d2Is calculated by the formulaCalculating a specific relative distance d 32; d33 the relative distance between the right arm joint and the left leg joint in the joint combination P3 relative to the preset origin is d3The specific relative distance d33 is obtained through calculation by the formula (2); the relative distance d31, the relative distance d32, and the relative distance d33 are considered to be relative geometric features of the joint combination P3.
Furthermore, the relative geometric features of the joint combination P1, the relative geometric features of the joint combination P2 and the relative geometric features of the joint combination P3 can be converted into RGB pictures for description, and the RGB pictures are more suitable for the recognition and classification of the neural network model.
Relative geometric features for joint combination P1
Figure BDA0003439019690000151
Figure BDA0003439019690000152
The three coordinates are respectively normalized so as to be regarded as Red, Green and Blue channel pictures of the fourth RGB picture correspondingly. Among the relative geometrical features of the joint combination P1
Figure BDA0003439019690000153
Normalizing to a first X relative value between 0 and 255, specifically
Figure BDA0003439019690000154
Can also refer to the above-mentioned normalization procedure
Figure BDA0003439019690000155
The formula (2) is normalized to obtain a corresponding X-axis coordinate, i.e. a first X relative value, and the first X relative value is represented as a 224 × 112 picture by a linear interpolation method. Similarly, in the relative geometry of the joint combination P1
Figure BDA0003439019690000156
Normalizing to a first relative Y value between 0 and 255, specifically
Figure BDA0003439019690000157
Can also refer to the above-mentioned normalization procedure
Figure BDA0003439019690000158
The formula (2) is normalized to obtain a corresponding Y-axis coordinate, namely a second Y value, and the second Y value is expressed as a picture of 224 x 112 by a linear interpolation method; similarly, in the relative geometry of the joint combination P1
Figure BDA0003439019690000159
Normalizing to a first Z relative value between 0 and 255, specifically
Figure BDA0003439019690000161
Can also refer to the above-mentioned normalization procedure
Figure BDA0003439019690000162
The formula (2) is normalized to obtain a corresponding Z-axis coordinate, namely a first Z relative value, and the first Z relative value is expressed as a picture of 224 x 112 by a linear interpolation method; and finally, respectively regarding a 224 × 112 picture represented by the first X relative numerical value, a 224 × 112 picture represented by the first Y relative numerical value and a 224 × 112 picture represented by the first Z relative numerical value as Red, Green and Blue channels of a fourth RGB picture to obtain the fourth RGB picture, so that the fourth RGB picture is used for describing the relative geometric features of the joint combination P1.
For the relative geometry of the joint combination P2, the normalization of the relative distance d21 in the relative geometry of the joint combination P2 to a first d21 value between 0 and 255 can be referred to above for the normalization of the relative distance d21
Figure BDA0003439019690000163
The formula (2) is normalized to obtain a corresponding first d21 value, and the first d21 value is represented as a 224 x 112 picture by a linear interpolation method; relative geometric features of the joint combination P2The relative distance d22 is normalized to a first d22 value between 0 and 255, and the normalization process of the relative distance d22 can refer to the above-mentioned
Figure BDA0003439019690000164
The formula (2) is normalized to obtain a corresponding first d22 value, and the first d22 value is represented as a 224 x 112 picture by a linear interpolation method; the normalization of the relative distance d23 in the relative geometry of P2 to a first d23 value between 0 and 255 can be referred to above for the normalization of the relative distance d23
Figure BDA0003439019690000165
The formula (2) is normalized to obtain a corresponding first d23 value, and the first d23 value is represented as a 224 x 112 picture by a linear interpolation method; respectively regarding a 224 × 112 picture represented by the first d21 value, a 224 × 112 picture represented by the first d22 value, and a 224 × 112 picture represented by the first d23 value as Red, Green, and Blue channels of the fifth RGB picture to obtain a fifth RGB picture; the use of the fifth RGB picture to describe the relative geometry of joint combination P2 is enabled.
For the relative geometry of the joint combination P3, the normalization of the relative distance d31 in the relative geometry of the joint combination P3 to a first d31 value between 0 and 255 can be referred to above for the normalization of the relative distance d31
Figure BDA0003439019690000166
The formula (2) is normalized to obtain a corresponding first d31 value, and the first d31 value is represented as a 224 x 112 picture by a linear interpolation method; normalization of the relative distance d32 in the relative geometry of the joint combination P3 to a first d32 value between 0 and 255, in particular for the normalization of the relative distance d32, can be found in relation to the above description
Figure BDA0003439019690000171
Is normalized to obtain the correspondingA first d32 value, which is represented by a 224 × 112 picture through linear interpolation; normalization of the relative distance d33 in the relative geometry of the joint combination P3 to a first d33 value between 0 and 255, in particular for the normalization of the relative distance d33, can be found in relation to the above description
Figure BDA0003439019690000172
The formula (2) is normalized to obtain a corresponding first d33 value, and the first d33 value is represented as a 224 x 112 picture by a linear interpolation method; and respectively regarding a 224 × 112 picture represented by the first d31 value, a 224 × 112 picture represented by the first d32 value, and a 224 × 112 picture represented by the first d33 value as Red, Green, and Blue channels of the sixth RGB picture to obtain a sixth RGB picture, so as to realize the description of the relative geometric features of the joint combination P3 by using the sixth RGB picture.
In a further embodiment, for the joint combination P1, the relative distance between the individual joints of the joint combination P1 may also be used as a relative geometric feature of the joint combination P1. A relative distance d11 formed between the arm joints, a relative distance d12 between the arm joints and the both-leg joints (left-leg joint and right-leg joint), and a relative distance d13 between the arm joints with respect to a preset origin are calculated in the joint combination P1. Specifically, the relative distance d11 between the arm joints in the joint combination P1 is set to d1The specific relative distance d11 is obtained through calculation by the formula (2); the relative distance d12 between the arm joint and the joint of the two legs is adjusted according to the above d2The specific relative distance d12 is obtained through calculation by the formula (2); d13 relative distance between arm joint in the joint combination P1 and a preset origin is d3The specific relative distance d33 is obtained through calculation by the formula (2); the relative distance d11, the relative distance d12, and the relative distance d13 are considered to be relative geometric features of the joint combination P1.
Further, for the relative geometry of the joint combination P1, the normalization process of the relative distance d11 in the relative geometry of the joint combination P1 to the first d31 value between 0 and 255, and particularly the normalization process of the relative distance d11, can be performedTo which reference is made
Figure BDA0003439019690000173
The formula (2) is normalized to obtain a corresponding first d11 value, and the first d11 value is represented as a 224 x 112 picture by a linear interpolation method; normalization of the relative distance d12 in the relative geometry of the joint combination P1 to a first d32 value between 0 and 255, in particular for the normalization of the relative distance d12, can be found in relation to the above description
Figure BDA0003439019690000174
The formula (2) is normalized to obtain a corresponding first d12 value, and the first d12 value is represented as a 224 x 112 picture by a linear interpolation method; normalization of the relative distance d13 in the relative geometry of the joint combination P1 to a first d13 value between 0 and 255, in particular for the normalization of the relative distance d13, can be found in relation to the above description
Figure BDA0003439019690000181
The formula (2) is normalized to obtain a corresponding first d13 value, and the first d13 value is represented as a 224 x 112 picture by a linear interpolation method; and respectively regarding a 224 × 112 picture represented by the first d11 value, a 224 × 112 picture represented by the first d12 value, and a 224 × 112 picture represented by the first d13 value as Red, Green, and Blue channels of the seventh RGB picture to obtain a seventh RGB picture, so as to realize the description of the relative geometric features of the joint combination P1 by using the seventh RGB picture.
104. The spatial position characteristic and the relative geometric characteristic of each joint combination are respectively combined and described as the geometric characteristic of each joint combination.
The spatial position characteristic of each joint combination described in the step 102 and the relative geometric characteristic of each joint combination described in the step 103 are combined to describe the geometric characteristic of each joint combination.
For example, referring to fig. 4, the first RGB picture and the fourth RGB picture are combined to form a first geometric feature RGB picture corresponding to the 224 × 224 length vector describing the joint combination P1; combining the second RGB picture and the fifth RGB picture to form a second geometric feature RGB picture corresponding to the 224 × 224 length vector describing the shutdown combination P2; combining the third RGB picture and the sixth RGB picture, a third geometric feature RGB picture corresponding to the 224 × 224 length vector describing the joint combination P3 is formed. Of course, the first RGB picture and the seventh RGB picture may be combined to form another first geometric feature RGB picture corresponding to the 224 × 224 length vector describing the joint combination P1.
105. And respectively inputting the geometric features of each joint combination into a trained deep convolution network for feature extraction and classification, and respectively outputting the possible probabilities of a plurality of behaviors.
It can be understood that, in the embodiment of the present application, feature extraction is performed by using the trained convolutional neural network Resnet-50, and the training sample picture of the convolutional neural network Resnet-50 also needs to be processed as the above step 101, step 102, step 103 and step 104, so that the trained convolutional neural network Resnet-50 can have the recognition capability for the spatial position feature, the relative geometric feature and the geometric feature of each joint combination, and the label classification is performed in combination with manual operation on the behaviors expressed by the training sample picture, so as to improve the efficiency and the accuracy of training of the convolutional neural network Resnet-50, and obtain the trained convolutional neural network Resnet-50 with the capability of recognizing and classifying specific label behaviors. The training process of the convolutional neural network Resnet-50 is a mature prior art and is not described herein again.
Referring to fig. 4, in this step, the first geometric feature RGB picture, the second geometric feature RGB picture, and the third geometric feature RGB picture in step 104 are respectively input to the trained convolutional neural network Resnet-50 for feature extraction, and possible probabilities of a plurality of behaviors are respectively output.
106. And comprehensively analyzing the possible probabilities of the behaviors according to a preset fusion function, and outputting the behavior with the highest possible probability to obtain the target behavior.
For example, the possible probabilities of several actions obtained in step 105 include: from the first geometric feature RPosterior probability b of behavior extracted from GB picture1Posterior probability b of behavior extracted from the second geometry RGB picture2Posterior probability b of behavior extracted from third geometry RGB picture3(ii) a Then according to the following formula:
P(L|S)=η1b12b23b3
wherein eta1、ηa、η3For the preset weight, the posterior probability vector P (L | S) represents the probability that the skeleton belongs to the action tag class L, and then label ═ Find (max (P (L | S))), where label represents the target behavior. For example, the target behavior may be: the label categories of "clap", "pick up", "walk", "run", etc.
After the above description of the embodiments of the present application, the present embodiment also lists the recognition effects of two different skeleton data-based behavior databases NTU RGB + D60 and Northwestern-UCLA, where NTU RGB + D60 contains 56880 behavior samples, and 40 objects are involved in the shooting of 60 types of motions, and each object faces three different angle cameras to repeat the motions, so that there are two verification modes, i.e., a Cross-verification mode based on a perspective Cross-view (cv) and a Cross-verification mode based on an object Cross-subject (cs). The Northwestern-UCLA (N-UCLA) dataset contains 1494 actions of 10 action types, each action type is realized by repeating 10 different objects for 6 times, the dataset contains 3 cameras forming multiple viewing angles, and Cross validation is carried out based on the viewing angle Cross-View (CV). Comparing the behavior recognition technology based on the combined geometric characteristics of the skeleton joints in the embodiment of the application with the current behavior recognition technology based on the connection of all skeleton joint serial numbers in two data sets NTU RGB + D60 and N-UCLA, the recognition effect is shown in the following table 1:
Figure BDA0003439019690000201
TABLE 1
Table 1 above shows a comparison between the technical scheme of the behavior recognition method based on the combined geometric features of the skeletal joints provided in the embodiment of the present application and the accuracy of the behavior recognition in the prior art. The recognition accuracy results show that the technical scheme of the behavior recognition method based on the geometric characteristics of the skeleton joint combination is greatly improved in accuracy compared with the prior art, and the superiority of the technical scheme of the behavior recognition method based on the geometric characteristics of the skeleton joint combination is demonstrated.
Compared with the existing behavior recognition technology based on all joint space characteristics, the technical scheme of the behavior recognition method based on the skeleton joint combination geometric characteristics overcomes the following defects and achieves the advantages:
1. the prior art realizes behavior feature description based on all joint space features, and neglects the space connection features between joints. The joint is connected in sequence based on the spatial connection characteristics of the joints, and the joint is not connected according to the serial numbers of the joint points, so that the spatial characteristics among different joints are embodied.
2. The prior art realizes behavior feature description based on all joint space features, and does not highlight the key information of behaviors. The technology of joint combination provided by the application is used for prominently describing key positions and key joints of behavior occurrence, removing joints which do not contribute much to behavior feature description, focusing the behavior feature description on joints of arms and legs, dividing the joints into three joint combinations, and comprehensively considering the influence of different joint combinations on behavior difference.
3. The prior art lacks a description of the relative characteristics of different joints, including relative spatial and relative temporal characteristics, which are important for behavioral representation. The joint combination geometric characteristics are formed by the relative distance characteristics among different joints and the relative time characteristics of the arm joints and the joint spatial characteristics in a public mode, the spatial description and the time characteristics of the human skeleton are expressed, and the accuracy of behavior recognition is greatly improved.
The foregoing embodiment describes the behavior recognition method based on the geometric features of the skeleton joint combination, and the following describes the behavior recognition device based on the geometric features of the skeleton joint combination, please refer to fig. 5, which includes:
a dividing unit 501, configured to divide a skeleton into X joint combinations, where X is a positive integer greater than 1;
a first description unit 502, configured to describe spatial position characteristics of each joint combination;
a second description unit 503 for describing relative geometric features of each of the joint combinations;
a joint description unit 504, configured to describe the spatial position feature and the relative geometric feature of each joint combination as a geometric feature of each joint combination respectively;
a classification output unit 505, configured to input the geometric features of each joint combination into a trained deep convolution network for feature extraction and classification, and output possible probabilities of a plurality of behaviors;
and the fusion analysis unit 506 is configured to perform comprehensive analysis on the possible probabilities of the behaviors according to a preset fusion function, and output the behavior with the highest possible probability to obtain a target behavior.
Optionally, X is 3; the skeleton is a human body skeleton; when the skeleton is divided into X joint combinations, the dividing unit 501 is specifically configured to:
removing invalid joints in the human skeleton according to a preset rule to obtain residual joints;
dividing the remaining joints into 3 joint combinations P1, P2, P3, wherein the P1 comprises joints of the left arm and the right arm of the remaining joints, the P2 comprises joints of the left arm and the right leg of the remaining joints, and P3 comprises joints of the right arm and the left leg of the remaining joints.
Optionally, each joint combination includes a joint serial number set of joints, a spatial connection relationship of the joints, and a spatial connection order of the joints;
the dividing unit 501 describes the P1, the P2, and the P3 as:
Figure BDA0003439019690000211
Figure BDA0003439019690000212
Figure BDA0003439019690000221
wherein the H represents a set of joint sequence numbers of the left arm and the right arm;
the LG represents a set of joint numbers for the left arm and the right leg;
the RG represents a set of joint numbers of the right arm and the left leg;
said JmA joint with a sequence number m;
said JnIndicates a joint with the serial number n;
said JkThe joint with the number k is shown.
Optionally, when the first description unit 502 describes the spatial position feature of each joint combination, it is specifically configured to:
spatial location features describing the P1, P2, and P3, respectively, are:
Figure BDA0003439019690000222
Figure BDA0003439019690000223
Figure BDA0003439019690000224
the above-mentioned
Figure BDA0003439019690000225
Three-dimensional coordinates of the joint with the serial number i in the P1 at the time T, T tableDisplaying time;
the above-mentioned
Figure BDA0003439019690000226
Three-dimensional coordinates representing the joint with the sequence number j in the P2 at the time point t;
the above-mentioned
Figure BDA0003439019690000227
The three-dimensional coordinates of the joint with the number k in P3 at time t are shown.
Optionally, the first description unit 502 is further specifically configured to:
the spatial location of the P1
Figure BDA0003439019690000228
Performing normalization processing to a first X value between 0 and 255;
representing the first X numerical value as a 224X 112 picture by a linear interpolation method;
the spatial location of the P1
Figure BDA0003439019690000229
Performing normalization to a first Y value between 0 and 255;
representing the first Y value as a 224 x 112 picture by a linear interpolation method;
the spatial location of the P1
Figure BDA00034390196900002210
Performing normalization to a first Z value between 0 and 255;
representing the first Z value as a 224 x 112 picture by a linear interpolation method;
respectively regarding a 224 × 112 picture represented by the first X value, a 224 × 112 picture represented by the first Y value, and a 224 × 112 picture represented by the first Z value as Red, Green, and Blue channels of a first RGB picture to obtain a first RGB picture;
describing spatial location features of the P1 using the first RGB picture;
the spatial location of the P2
Figure BDA0003439019690000231
Normalizing to a second X value between 0 and 255;
representing the second X numerical value as a 224X 112 picture by a linear interpolation method;
the spatial location of the P2
Figure BDA0003439019690000232
Normalizing to a second Y value between 0 and 255;
representing the second Y value as a 224 x 112 picture by a linear interpolation method;
the spatial location of the P2
Figure BDA0003439019690000233
Normalizing to a second Z value between 0 and 255;
representing the second Z value as a 224 x 112 picture by a linear interpolation method;
respectively regarding a 224 × 112 picture represented by the second X value, a 224 × 112 picture represented by the second Y value, and a 224 × 112 picture represented by the second Z value as Red, Green, and Blue channels of a second RGB picture to obtain a second RGB picture;
describing spatial location features of the P2 using the second RGB picture;
the spatial location of the P3
Figure BDA0003439019690000234
Normalizing to a third X value between 0 and 255;
representing the third X numerical value as a 224X 112 picture by a linear interpolation method;
the spatial location of the P3
Figure BDA0003439019690000235
Normalizing to a third Y value between 0 and 255;
representing the third Y value as a 224 x 112 picture by a linear interpolation method;
the spatial location of the P3
Figure BDA0003439019690000236
Normalizing to a third Z value between 0 and 255;
representing the third Z value as a 224 x 112 picture by a linear interpolation method;
respectively regarding a 224 × 112 picture represented by the third X value, a 224 × 112 picture represented by the third Y value, and a 224 × 112 picture represented by the third Z value as Red, Green, and Blue channels of a third RGB picture, so as to obtain a third RGB picture;
the spatial position feature of the P3 is described using the third RGB picture.
Optionally, when describing the relative geometric features of each joint combination, the second description unit 503 is specifically configured to:
the relative geometric features of the P1 are described using the relative positional features of the P1:
Figure BDA0003439019690000241
wherein said
Figure BDA0003439019690000242
Indicates the arm joint JiThree-dimensional coordinates at time t, the arm joints comprising the left arm joint and the right arm joint;
the above-mentioned
Figure BDA0003439019690000243
Represents the arm joint JiThree-dimensional coordinates at time t-1;
the above-mentioned
Figure BDA0003439019690000244
Represents the arm joint JiRelative position features at time t;
will be described in
Figure BDA0003439019690000245
Considered as a relative geometric feature of said P1;
calculating a relative distance d21 formed between the left arm joint, a relative distance d22 between the left arm joint and the right leg joint, a relative distance d23 between the left arm joint and right leg joint relative to the preset origin in the P2;
regarding the relative distance d21, the relative distance d22, the relative distance d23 as relative geometric features of the P2;
calculating a relative distance d31 formed between the right arm joint, a relative distance d32 between the right arm joint and the left leg joint, a relative distance d33 between the right arm joint and left leg joint relative to the preset origin in the P3;
the relative distance d31, the relative distance d32, the relative distance d33 are considered to be relative geometric features of the P3.
Optionally, the second describing unit 503 is further configured to:
as to the relative geometry of the P1
Figure BDA0003439019690000246
Performing normalization to a first relative value of X between 0 and 255;
representing the first X relative value as a 224X 112 picture by a linear interpolation method;
the spatial location of the P1
Figure BDA0003439019690000251
Normalizing to a first relative value of Y between 0 and 255;
representing the first Y relative value as a 224 x 112 picture by a linear interpolation method;
the spatial location of the P1
Figure BDA0003439019690000252
Normalizing to a first relative Z value between 0 and 255;
representing the first Z relative value as a 224 x 112 picture by a linear interpolation method;
respectively regarding a 224 × 112 picture represented by the first X relative value, a 224 × 112 picture represented by the first Y relative value, and a 224 × 112 picture represented by the first Z relative value as Red, Green, and Blue channels of a fourth RGB picture, so as to obtain a fourth RGB picture;
describing relative geometric features of the P1 using the fourth RGB picture;
normalizing the relative distance d21 in the relative geometry of the P2 to a first d21 value between 0 and 255;
representing the first d21 numerical value as a 224 x 112 picture by a linear interpolation method;
normalizing the relative distance d22 in the relative geometry of the P2 to a first d22 value between 0 and 255;
representing the first d22 numerical value as a 224 x 112 picture by a linear interpolation method;
normalizing the relative distance d23 in the relative geometry of the P2 to a first d23 value between 0 and 255;
representing the first d23 numerical value as a 224 x 112 picture by a linear interpolation method;
regarding a 224 × 112 picture represented by the first d21 value, a 224 × 112 picture represented by the first d22 value, and a 224 × 112 picture represented by the first d23 value as Red, Green, and Blue channels of a fifth RGB picture, respectively, obtaining a fifth RGB picture;
describing relative geometric features of the P2 using the fifth RGB picture;
normalizing the relative distance d31 in the relative geometry of the P3 to a first d31 value between 0 and 255;
representing the first d31 numerical value as a 224 x 112 picture by a linear interpolation method;
normalizing the relative distance d32 in the relative geometry of the P3 to a first d32 value between 0 and 255;
representing the first d32 numerical value as a 224 x 112 picture by a linear interpolation method;
normalizing the relative distance d33 in the relative geometry of the P3 to a first d33 value between 0 and 255;
representing the first d33 numerical value as a 224 x 112 picture by a linear interpolation method;
regarding a 224 × 112 picture represented by the first d31 value, a 224 × 112 picture represented by the first d32 value, and a 224 × 112 picture represented by the first d33 value as Red, Green, and Blue channels of a sixth RGB picture, respectively, to obtain the sixth RGB picture;
the relative geometry of the P3 is described using the sixth RGB picture.
Optionally, when the combination description unit 504 describes the spatial position feature and the relative geometric feature of each joint combination as geometric features, it is specifically configured to:
combining the first RGB picture and the fourth RGB picture to form a first geometric feature RGB picture corresponding to the 224 x 224 length vector of the P1;
combining the second RGB picture with the fifth RGB picture to form a second geometric feature RGB picture corresponding to the 224 × 224 length vector of P2;
combining the third RGB picture with the sixth RGB picture to form a third geometric feature RGB picture corresponding to the 224 x 224 length vector of P3.
Optionally, the classification output unit 505 is specifically configured to, when the geometric features of each joint combination are input into a trained deep convolutional network for feature extraction and classification, and possible probabilities of a plurality of behaviors are output respectively:
and respectively inputting the first geometric feature RGB picture, the second geometric feature RGB picture and the third geometric feature RGB picture into a trained convolutional neural network Resnet-50 for feature extraction, and respectively outputting possible probabilities of a plurality of behaviors.
Optionally, the possible probabilities of the several actions include: posterior probability b of behavior extracted from the first geometric feature RGB picture1Posterior probability b of behavior extracted from the second geometry RGB picture2Posterior probability b of behavior extracted from the third geometry RGB picture3
The fusion analysis unit 506 is specifically configured to, when performing comprehensive analysis on the possible probabilities of the behaviors according to a preset fusion function, and outputting the behavior with the highest possible probability to obtain a target behavior:
P(L|S)=η1b12b23b3
eta of1Eta of2Eta of3Is a preset weight;
the posterior probability vector P (L | S) represents the probability that the skeleton belongs to the action label class L;
label=Find(max(P(L|S)))
the label represents the target behavior.
The operation performed by the behavior recognition device based on the geometric features of the skeleton joint combination in the embodiment of the present application is similar to the operation performed in fig. 1, and is not repeated here.
Referring to fig. 6, a computer device in an embodiment of the present application is described below, where an embodiment of the computer device in the embodiment of the present application includes: the computer device 600 may include one or more processors (CPUs) 801 and a memory 602, where one or more applications or data are stored in the memory 602. Wherein the memory 602 is volatile storage or persistent storage. The program stored in the memory 602 may include one or more modules, each of which may include a sequence of instructions operating on a computer device. Still further, the processor 601 may be arranged in communication with the memory 602 to execute a series of instruction operations in the memory 602 on the computer device 600. The computer device 600 may also include one or more wireless network interfaces 603, one or more input-output interfaces 604, and/or one or more operating systems, such as Windows Server, Mac OS, Unix, Linux, FreeBSD, etc. The processor 601 may perform the operations performed in the embodiment shown in fig. 1, and details thereof are not described herein.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (10)

1. A behavior identification method based on a skeleton joint combination geometric feature is characterized by comprising the following steps:
dividing a skeleton into X joint combinations, wherein X is a positive integer greater than 1;
characterizing the spatial position of each of the joint combinations;
describing relative geometric features of each of the joint combinations;
describing the spatial position feature and the relative geometric feature of each joint combination as the geometric feature of each joint combination respectively;
inputting the geometric features of each joint combination into a trained deep convolution network for feature extraction and classification, and outputting possible probabilities of a plurality of behaviors respectively;
and comprehensively analyzing the possible probabilities of the behaviors according to a preset fusion function, and outputting the behavior with the highest possible probability to obtain a target behavior.
2. The behavior recognition method according to claim 1, wherein X is 3; the skeleton is a human body skeleton; the dividing of the skeleton into X joint combinations comprises:
removing invalid joints in the human skeleton according to a preset rule to obtain residual joints;
dividing the remaining joints into 3 joint combinations P1, P2, P3, wherein the P1 comprises joints of the left arm and the right arm of the remaining joints, the P2 comprises joints of the left arm and the right leg of the remaining joints, and P3 comprises joints of the right arm and the left leg of the remaining joints.
3. The behavior recognition method according to claim 2, wherein each of the joint combinations includes a joint number set of joints, a spatial connection relationship of the joints, a spatial connection order of the joints;
the P1, the P2, the P3 are respectively described as:
Figure FDA0003439019680000011
Figure FDA0003439019680000012
Figure FDA0003439019680000013
wherein the H represents a set of joint sequence numbers of the left arm and the right arm;
the LG represents a set of joint numbers for the left arm and the right leg;
the RG represents a set of joint numbers of the right arm and the left leg;
said JmA joint with a sequence number m;
said JnIndicates a joint with the serial number n;
said JkThe joint with the number k is shown.
4. The behavior recognition method according to claim 3, wherein the describing the spatial position feature of each of the joint combinations comprises:
spatial location features describing the P1, P2, and P3, respectively, are:
Figure FDA0003439019680000021
Figure FDA0003439019680000022
Figure FDA0003439019680000023
the above-mentioned
Figure FDA0003439019680000024
The three-dimensional coordinate of the joint with the serial number i in the P1 at the time T is represented, and T represents the time length;
the above-mentioned
Figure FDA0003439019680000025
Three-dimensional coordinates representing the joint with the sequence number j in the P2 at the time point t;
the above-mentioned
Figure FDA0003439019680000026
The three-dimensional coordinates of the joint with the number k in P3 at time t are shown.
5. The behavior recognition method according to claim 4, further comprising, after describing the spatial location features of the P1, the P2 and the P3 respectively:
the spatial location of the P1
Figure FDA0003439019680000027
Performing normalization processing to a first X value between 0 and 255;
representing the first X numerical value as a 224X 112 picture by a linear interpolation method;
the spatial location of the P1
Figure FDA0003439019680000028
Performing normalization to a first Y value between 0 and 255;
representing the first Y value as a 224 x 112 picture by a linear interpolation method;
the spatial location of the P1
Figure FDA0003439019680000029
Performing normalization to a first Z value between 0 and 255;
representing the first Z value as a 224 x 112 picture by a linear interpolation method;
respectively regarding a 224 × 112 picture represented by the first X value, a 224 × 112 picture represented by the first Y value, and a 224 × 112 picture represented by the first Z value as Red, Green, and Blue channels of a first RGB picture to obtain a first RGB picture;
describing spatial location features of the P1 using the first RGB picture;
the spatial location of the P2
Figure FDA0003439019680000031
Normalizing to a second X value between 0 and 255;
representing the second X numerical value as a 224X 112 picture by a linear interpolation method;
the spatial location of the P2
Figure FDA0003439019680000032
Normalizing to a second Y value between 0 and 255;
representing the second Y value as a 224 x 112 picture by a linear interpolation method;
the spatial location of the P2
Figure FDA0003439019680000033
Normalizing to a second Z value between 0 and 255;
representing the second Z value as a 224 x 112 picture by a linear interpolation method;
respectively regarding a 224 × 112 picture represented by the second X value, a 224 × 112 picture represented by the second Y value, and a 224 × 112 picture represented by the second Z value as Red, Green, and Blue channels of a second RGB picture to obtain a second RGB picture;
describing spatial location features of the P2 using the second RGB picture;
the spatial location of the P3
Figure FDA0003439019680000034
Normalizing to a third X value between 0 and 255;
representing the third X numerical value as a 224X 112 picture by a linear interpolation method;
the spatial location of the P3
Figure FDA0003439019680000035
Normalizing to a third Y value between 0 and 255;
representing the third Y value as a 224 x 112 picture by a linear interpolation method;
the spatial location of the P3
Figure FDA0003439019680000036
Normalizing to a third Z value between 0 and 255;
representing the third Z value as a 224 x 112 picture by a linear interpolation method;
respectively regarding a 224 × 112 picture represented by the third X value, a 224 × 112 picture represented by the third Y value, and a 224 × 112 picture represented by the third Z value as Red, Green, and Blue channels of a third RGB picture, so as to obtain a third RGB picture;
the spatial position feature of the P3 is described using the third RGB picture.
6. The behavior recognition method of claim 5, wherein the describing relative geometric features of each of the joint combinations comprises:
the relative geometric features of the P1 are described using the relative positional features of the P1:
Figure FDA0003439019680000041
wherein said
Figure FDA0003439019680000042
Indicates the arm joint JiThree-dimensional coordinates at time t, the arm joints comprising the left arm joint and the right arm joint;
the above-mentioned
Figure FDA0003439019680000043
Represents the arm joint JiThree-dimensional coordinates at time t-1;
the above-mentioned
Figure FDA0003439019680000044
Represents the arm joint JiRelative position features at time t;
will be described in
Figure FDA0003439019680000045
Considered as a relative geometric feature of said P1;
calculating a relative distance d21 formed between the left arm joint, a relative distance d22 between the left arm joint and the right leg joint, a relative distance d23 between the left arm joint and right leg joint relative to the preset origin in the P2;
regarding the relative distance d21, the relative distance d22, the relative distance d23 as relative geometric features of the P2;
calculating a relative distance d31 formed between the right arm joint, a relative distance d32 between the right arm joint and the left leg joint, a relative distance d33 between the right arm joint and left leg joint relative to the preset origin in the P3;
the relative distance d31, the relative distance d32, the relative distance d33 are considered to be relative geometric features of the P3.
7. The behavior recognition method according to claim 6, further comprising, after describing relative geometric features of the P1, the P2 and the P3, respectively:
as to the relative geometry of the P1
Figure FDA0003439019680000046
Performing normalization to a first relative value of X between 0 and 255;
representing the first X relative value as a 224X 112 picture by a linear interpolation method;
the spatial location of the P1
Figure FDA0003439019680000051
Normalizing to a first relative value of Y between 0 and 255;
representing the first Y relative value as a 224 x 112 picture by a linear interpolation method;
the spatial location of the P1
Figure FDA0003439019680000052
Normalizing to a first relative Z value between 0 and 255;
representing the first Z relative value as a 224 x 112 picture by a linear interpolation method;
respectively regarding a 224 × 112 picture represented by the first X relative value, a 224 × 112 picture represented by the first Y relative value, and a 224 × 112 picture represented by the first Z relative value as Red, Green, and Blue channels of a fourth RGB picture, so as to obtain a fourth RGB picture;
describing relative geometric features of the P1 using the fourth RGB picture;
normalizing the relative distance d21 in the relative geometry of the P2 to a first d21 value between 0 and 255;
representing the first d21 numerical value as a 224 x 112 picture by a linear interpolation method;
normalizing the relative distance d22 in the relative geometry of the P2 to a first d22 value between 0 and 255;
representing the first d22 numerical value as a 224 x 112 picture by a linear interpolation method;
normalizing the relative distance d23 in the relative geometry of the P2 to a first d23 value between 0 and 255;
representing the first d23 numerical value as a 224 x 112 picture by a linear interpolation method;
regarding a 224 × 112 picture represented by the first d21 value, a 224 × 112 picture represented by the first d22 value, and a 224 × 112 picture represented by the first d23 value as Red, Green, and Blue channels of a fifth RGB picture, respectively, obtaining a fifth RGB picture;
describing relative geometric features of the P2 using the fifth RGB picture;
normalizing the relative distance d31 in the relative geometry of the P3 to a first d31 value between 0 and 255;
representing the first d31 numerical value as a 224 x 112 picture by a linear interpolation method;
normalizing the relative distance d32 in the relative geometry of the P3 to a first d32 value between 0 and 255;
representing the first d32 numerical value as a 224 x 112 picture by a linear interpolation method;
normalizing the relative distance d33 in the relative geometry of the P3 to a first d33 value between 0 and 255;
representing the first d33 numerical value as a 224 x 112 picture by a linear interpolation method;
regarding a 224 × 112 picture represented by the first d31 value, a 224 × 112 picture represented by the first d32 value, and a 224 × 112 picture represented by the first d33 value as Red, Green, and Blue channels of a sixth RGB picture, respectively, to obtain the sixth RGB picture;
the relative geometry of the P3 is described using the sixth RGB picture.
8. The behavior recognition method according to claim 7, wherein describing the spatial position feature and the relative geometric feature of each of the joint combinations as geometric features in combination comprises:
combining the first RGB picture and the fourth RGB picture to form a first geometric feature RGB picture corresponding to the 224 x 224 length vector of the P1;
combining the second RGB picture with the fifth RGB picture to form a second geometric feature RGB picture corresponding to the 224 × 224 length vector of P2;
combining the third RGB picture with the sixth RGB picture to form a third geometric feature RGB picture corresponding to the 224 x 224 length vector of P3.
9. The behavior recognition method according to claim 8, wherein the geometric features of each joint combination are input into a trained deep convolutional network for feature extraction and classification, and the probability probabilities of a plurality of behaviors are respectively output, and the method comprises the following steps:
and respectively inputting the first geometric feature RGB picture, the second geometric feature RGB picture and the third geometric feature RGB picture into a trained convolutional neural network Resnet-50 for feature extraction, and respectively outputting possible probabilities of a plurality of behaviors.
10. The behavior recognition method of claim 9, wherein the possible probabilities of the several behaviors comprise: posterior probability b of behavior extracted from the first geometric feature RGB picture1Posterior probability b of behavior extracted from the second geometry RGB picture2Posterior probability b of behavior extracted from the third geometry RGB picture3
The comprehensively analyzing the possible probabilities of the behaviors according to a preset fusion function, and outputting the behavior with the highest possible probability to obtain a target behavior, including:
P(L|S)=η1b12b23b3
eta of1Eta of2Eta of3Is a preset weight;
the posterior probability vector P (L | S) represents the probability that the skeleton belongs to the action label class L;
label=Find(max(P(L|S)))
the label represents the target behavior.
CN202111623361.8A 2021-12-28 2021-12-28 Behavior identification method based on skeleton joint combination geometric features Pending CN114299614A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111623361.8A CN114299614A (en) 2021-12-28 2021-12-28 Behavior identification method based on skeleton joint combination geometric features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111623361.8A CN114299614A (en) 2021-12-28 2021-12-28 Behavior identification method based on skeleton joint combination geometric features

Publications (1)

Publication Number Publication Date
CN114299614A true CN114299614A (en) 2022-04-08

Family

ID=80971390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111623361.8A Pending CN114299614A (en) 2021-12-28 2021-12-28 Behavior identification method based on skeleton joint combination geometric features

Country Status (1)

Country Link
CN (1) CN114299614A (en)

Similar Documents

Publication Publication Date Title
Fieraru et al. Three-dimensional reconstruction of human interactions
Yasin et al. A dual-source approach for 3d pose estimation from a single image
Ji et al. Interactive body part contrast mining for human interaction recognition
Yang et al. Unik: A unified framework for real-world skeleton-based action recognition
CN108171133B (en) Dynamic gesture recognition method based on characteristic covariance matrix
US9734435B2 (en) Recognition of hand poses by classification using discrete values
JP2013033468A (en) Image processing method and system
JP2019096113A (en) Processing device, method and program relating to keypoint data
JP7480312B2 (en) Multi-person three-dimensional motion capture method, storage medium, and electronic device
CN111274998A (en) Parkinson's disease finger knocking action identification method and system, storage medium and terminal
JP2016014954A (en) Method for detecting finger shape, program thereof, storage medium of program thereof, and system for detecting finger shape
JP6381368B2 (en) Image processing apparatus, image processing method, and program
Lovanshi et al. Human pose estimation: benchmarking deep learning-based methods
CN113901891A (en) Parkinson's disease fist making task evaluation method and system, storage medium and terminal
Chan et al. A 3-D-point-cloud system for human-pose estimation
CN113229807A (en) Human body rehabilitation evaluation device, method, electronic device and storage medium
CN112990154B (en) Data processing method, computer equipment and readable storage medium
Dhore et al. Human Pose Estimation And Classification: A Review
CN114299614A (en) Behavior identification method based on skeleton joint combination geometric features
Yasin et al. 3d pose estimation from a single monocular image
CN114782992A (en) Super-joint and multi-mode network and behavior identification method thereof
Ma et al. Sports competition assistant system based on fuzzy big data and health exercise recognition algorithm
Rastgoo et al. A Non-Anatomical Graph Structure for isolated hand gesture separation in continuous gesture sequences
CN111311648A (en) Method for tracking human hand-object interaction process based on collaborative differential evolution filtering
Maik et al. Hierarchical pose classification based on human physiology for behaviour analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination