CN114299614A

CN114299614A - Behavior identification method based on skeleton joint combination geometric features

Info

Publication number: CN114299614A
Application number: CN202111623361.8A
Authority: CN
Inventors: 刘星; 顾礼; 高波
Original assignee: Shenzhen Institute of Information Technology
Current assignee: Shenzhen Institute of Information Technology
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2022-04-08

Abstract

The application is suitable for the technical field of video data processing, provides a behavior recognition method based on the combined geometric features of the skeleton joints, and aims to improve the accuracy and efficiency of behavior recognition based on the combined geometric features of the skeleton joints. The method mainly comprises the following steps: dividing a skeleton into X joint combinations, wherein X is a positive integer greater than 1; describing the spatial position characteristics of each joint combination; describing the relative geometry of each joint combination; respectively combining the spatial position characteristic and the relative geometric characteristic of each joint combination to describe the spatial position characteristic and the relative geometric characteristic of each joint combination as the geometric characteristic of each joint combination; respectively inputting the geometric features of each joint combination into a trained deep convolution network for feature extraction and classification, and respectively outputting the possible probabilities of a plurality of behaviors; and comprehensively analyzing the possible probabilities of the behaviors according to a preset fusion function, and outputting the behavior with the highest possible probability to obtain the target behavior.

Description

Behavior identification method based on skeleton joint combination geometric features

Technical Field

The application belongs to the technical field of video data processing, and particularly relates to a behavior identification method based on a skeleton joint combination geometric characteristic.

Background

In the prior art, a behavior recognition technology based on a skeleton sequence is mainly expressed as a behavior recognition technology based on all joint space characteristics. At present, all joints are arranged into a whole according to joint serial numbers and are used as individual behavior characteristic description to realize behavior classification. A skeleton sequence derived from video data, each frame of video data comprising joints of a plurality of parts of a head, hands, legs and spine, each joint numbered with a number starting at 1. According to the current behavior identification method based on the skeleton sequence, all joints are sequenced according to joint numbers, the skeleton sequence data are converted into color RGB pictures to serve as behavior feature description, and feature extraction is carried out by utilizing a deep network, so that behavior classification and identification are achieved.

However, in the prior art of behavior recognition based on skeleton sequences, the applicant finds that the method for directly arranging all joints and describing behavior features has the problems of low recognition efficiency and insufficient accuracy. For example, the motion of a hand clapping, only the joints of the forelimb are changed, and the spatial positions of the joints of other joints can be unchanged, so that the motion contributes little to the feature description.

Disclosure of Invention

The application aims to provide a behavior recognition method based on the geometric features of the skeleton joint combination, and aims to improve the accuracy and efficiency of behavior recognition based on the geometric features of the skeleton joint combination.

The application provides a behavior identification method based on a skeleton joint combination geometric feature, which comprises the following steps:

dividing a skeleton into X joint combinations, wherein X is a positive integer greater than 1;

characterizing the spatial position of each of the joint combinations;

describing relative geometric features of each of the joint combinations;

describing the spatial position feature and the relative geometric feature of each joint combination as the geometric feature of each joint combination respectively;

inputting the geometric features of each joint combination into a trained deep convolution network for feature extraction and classification, and outputting possible probabilities of a plurality of behaviors respectively;

and comprehensively analyzing the possible probabilities of the behaviors according to a preset fusion function, and outputting the behavior with the highest possible probability to obtain a target behavior.

Optionally, X is 3; the skeleton is a human body skeleton; the dividing of the skeleton into X joint combinations comprises:

removing invalid joints in the human skeleton according to a preset rule to obtain residual joints;

dividing the remaining joints into 3 joint combinations P1, P2, P3, wherein the P1 comprises joints of the left arm and the right arm of the remaining joints, the P2 comprises joints of the left arm and the right leg of the remaining joints, and P3 comprises joints of the right arm and the left leg of the remaining joints.

Optionally, each joint combination includes a joint serial number set of joints, a spatial connection relationship of the joints, and a spatial connection order of the joints;

the P1, the P2, the P3 are respectively described as:

wherein the H represents a set of joint sequence numbers of the left arm and the right arm;

the LG represents a set of joint numbers for the left arm and the right leg;

the RG represents a set of joint numbers of the right arm and the left leg;

said J_mA joint with a sequence number m;

said J_nIndicates a joint with the serial number n;

said J_kThe joint with the number k is shown.

Optionally, the describing the spatial position feature of each joint combination includes:

spatial location features describing the P1, P2, and P3, respectively, are:

the above-mentioned

The three-dimensional coordinate of the joint with the serial number i in the P1 at the time T is represented, and T represents the time length;

the above-mentioned

Three-dimensional coordinates representing the joint with the sequence number j in the P2 at the time point t;

the above-mentioned

Denotes the sequence number k in P3Three-dimensional coordinates of the joint at time t.

Optionally, after the spatial location features of the P1, the P2, and the P3 are respectively described, the method specifically further includes:

the spatial location of the P1

Performing normalization processing to a first X value between 0 and 255;

representing the first X numerical value as a 224X 112 picture by a linear interpolation method;

the spatial location of the P1

Performing normalization to a first Y value between 0 and 255;

representing the first Y value as a 224 x 112 picture by a linear interpolation method;

the spatial location of the P1

Performing normalization to a first Z value between 0 and 255;

representing the first Z value as a 224 x 112 picture by a linear interpolation method;

respectively regarding a 224 × 112 picture represented by the first X value, a 224 × 112 picture represented by the first Y value, and a 224 × 112 picture represented by the first Z value as Red, Green, and Blue channels of a first RGB picture to obtain a first RGB picture;

describing spatial location features of the P1 using the first RGB picture;

the spatial location of the P2

Normalizing to a second X value between 0 and 255;

representing the second X numerical value as a 224X 112 picture by a linear interpolation method;

the spatial location of the P2

Normalizing to a second Y value between 0 and 255;

representing the second Y value as a 224 x 112 picture by a linear interpolation method;

the spatial location of the P2

Normalizing to a second Z value between 0 and 255;

representing the second Z value as a 224 x 112 picture by a linear interpolation method;

respectively regarding a 224 × 112 picture represented by the second X value, a 224 × 112 picture represented by the second Y value, and a 224 × 112 picture represented by the second Z value as Red, Green, and Blue channels of a second RGB picture to obtain a second RGB picture;

describing spatial location features of the P2 using the second RGB picture;

the spatial location of the P3

Normalizing to a third X value between 0 and 255;

representing the third X numerical value as a 224X 112 picture by a linear interpolation method;

the spatial location of the P3

Normalizing to a third Y value between 0 and 255;

representing the third Y value as a 224 x 112 picture by a linear interpolation method;

for the P3In the spatial position characteristics of

Normalizing to a third Z value between 0 and 255;

representing the third Z value as a 224 x 112 picture by a linear interpolation method;

respectively regarding a 224 × 112 picture represented by the third X value, a 224 × 112 picture represented by the third Y value, and a 224 × 112 picture represented by the third Z value as Red, Green, and Blue channels of a third RGB picture, so as to obtain a third RGB picture;

the spatial position feature of the P3 is described using the third RGB picture.

Optionally, the describing the relative geometry of each of the joint combinations comprises:

the relative geometric features of the P1 are described using the relative positional features of the P1:

wherein said

Indicates the arm joint J_iThree-dimensional coordinates at time t, the arm joints comprising the left arm joint and the right arm joint;

the above-mentioned

Represents the arm joint J_iThree-dimensional coordinates at time t-1;

the above-mentioned

Represents the arm joint J_iRelative position features at time t;

will be described in

Considered as a relative geometric feature of said P1;

calculating a relative distance d21 formed between the left arm joint, a relative distance d22 between the left arm joint and the right leg joint, a relative distance d23 between the left arm joint and right leg joint relative to the preset origin in the P2;

regarding the relative distance d21, the relative distance d22, the relative distance d23 as relative geometric features of the P2;

calculating a relative distance d31 formed between the right arm joint, a relative distance d32 between the right arm joint and the left leg joint, a relative distance d33 between the right arm joint and left leg joint relative to the preset origin in the P3;

the relative distance d31, the relative distance d32, the relative distance d33 are considered to be relative geometric features of the P3.

Optionally, after describing the relative geometric features of the P1, the P2 and the P3 respectively, the method further includes:

as to the relative geometry of the P1

Performing normalization to a first relative value of X between 0 and 255;

representing the first X relative value as a 224X 112 picture by a linear interpolation method;

the spatial location of the P1

Normalizing to a first relative value of Y between 0 and 255;

representing the first Y relative value as a 224 x 112 picture by a linear interpolation method;

the spatial location of the P1

Normalizing to a first relative Z value between 0 and 255;

representing the first Z relative value as a 224 x 112 picture by a linear interpolation method;

respectively regarding a 224 × 112 picture represented by the first X relative value, a 224 × 112 picture represented by the first Y relative value, and a 224 × 112 picture represented by the first Z relative value as Red, Green, and Blue channels of a fourth RGB picture, so as to obtain a fourth RGB picture;

describing relative geometric features of the P1 using the fourth RGB picture;

normalizing the relative distance d21 in the relative geometry of the P2 to a first d21 value between 0 and 255;

representing the first d21 numerical value as a 224 x 112 picture by a linear interpolation method;

normalizing the relative distance d22 in the relative geometry of the P2 to a first d22 value between 0 and 255;

representing the first d22 numerical value as a 224 x 112 picture by a linear interpolation method;

normalizing the relative distance d23 in the relative geometry of the P2 to a first d23 value between 0 and 255;

representing the first d23 numerical value as a 224 x 112 picture by a linear interpolation method;

regarding a 224 × 112 picture represented by the first d21 value, a 224 × 112 picture represented by the first d22 value, and a 224 × 112 picture represented by the first d23 value as Red, Green, and Blue channels of a fifth RGB picture, respectively, obtaining a fifth RGB picture;

describing relative geometric features of the P2 using the fifth RGB picture;

normalizing the relative distance d31 in the relative geometry of the P3 to a first d31 value between 0 and 255;

representing the first d31 numerical value as a 224 x 112 picture by a linear interpolation method;

normalizing the relative distance d32 in the relative geometry of the P3 to a first d32 value between 0 and 255;

representing the first d32 numerical value as a 224 x 112 picture by a linear interpolation method;

normalizing the relative distance d33 in the relative geometry of the P3 to a first d33 value between 0 and 255;

representing the first d33 numerical value as a 224 x 112 picture by a linear interpolation method;

regarding a 224 × 112 picture represented by the first d31 value, a 224 × 112 picture represented by the first d32 value, and a 224 × 112 picture represented by the first d33 value as Red, Green, and Blue channels of a sixth RGB picture, respectively, to obtain the sixth RGB picture;

the relative geometry of the P3 is described using the sixth RGB picture.

Optionally, describing the spatial position feature and the relative geometric feature of each joint combination as geometric features in combination comprises:

combining the first RGB picture and the fourth RGB picture to form a first geometric feature RGB picture corresponding to the 224 x 224 length vector of the P1;

combining the second RGB picture with the fifth RGB picture to form a second geometric feature RGB picture corresponding to the 224 × 224 length vector of P2;

combining the third RGB picture with the sixth RGB picture to form a third geometric feature RGB picture corresponding to the 224 x 224 length vector of P3.

Optionally, the respectively inputting the geometric features of each joint combination into a trained deep convolutional network for feature extraction and classification, and respectively outputting possible probabilities of a plurality of behaviors, includes:

and respectively inputting the first geometric feature RGB picture, the second geometric feature RGB picture and the third geometric feature RGB picture into a trained convolutional neural network Resnet-50 for feature extraction, and respectively outputting possible probabilities of a plurality of behaviors.

Optionally, the possible probabilities of the several actions include: posterior probability b of behavior extracted from the first geometric feature RGB picture₁Posterior probability b of behavior extracted from the second geometry RGB picture₂Posterior probability b of behavior extracted from the third geometry RGB picture₃；

The comprehensively analyzing the possible probabilities of the behaviors according to a preset fusion function, and outputting the behavior with the highest possible probability to obtain a target behavior, including:

P(L|S)＝η₁b₁+η₂b₂+η₃b₃

eta of₁Eta of₂Eta of₃Is a preset weight;

the posterior probability vector P (L | S) represents the probability that the skeleton belongs to the action label class L;

label＝Find(max(P(L|S)))

the label represents the target behavior.

According to the technical scheme, the embodiment of the application has the following advantages:

therefore, in the behavior identification method based on the geometric features of the skeleton joint combination, the skeleton is divided into X joint combinations, all joints of the skeleton are not recognized as a whole as in the prior art, the division into the X joint combinations is more beneficial to improving the classification precision of behaviors, the geometric feature basis of each joint combination in the recognition and classification process is determined, the analysis data volume is reduced, the trained deep convolutional network is used for carrying out feature extraction and classification to obtain the possible probabilities of a plurality of behaviors, then the possible probabilities of the plurality of behaviors are comprehensively analyzed according to the preset fusion function, the behavior with the highest possible probability is output, and the target behavior is obtained. Because the data volume of the joint combination is smaller than that of all joints of the whole skeleton, the occupied computing resources are less, and the behavior recognition efficiency based on the geometrical characteristics of the skeleton joint combination is improved; and because the spatial position characteristics and the relative geometric characteristics of each joint combination are described, the behavior identification accuracy based on the geometric characteristics of the skeleton joint combination is improved.

Drawings

FIG. 1 is a schematic flow chart illustrating an embodiment of a behavior recognition method based on a skeleton joint combination geometric feature according to the present application;

FIG. 2 is a schematic view of one embodiment of the present application illustrating the division of joints of a human skeleton into different joint combinations;

FIG. 3 is a schematic view of one embodiment of the relative geometry of the joint assembly of the present application;

FIG. 4 is a schematic diagram of an embodiment of feature extraction and classification of 3 joint combinations P1, P2 and P3 by the deep convolutional network of the present application;

FIG. 5 is a schematic structural diagram of an embodiment of the behavior recognition device based on the geometric features of the human skeleton joint combination according to the present application;

FIG. 6 is a schematic structural diagram of an embodiment of a computer device according to the present application.

Detailed Description

In order to clearly identify various behaviors expressed by various joint states of the skeleton, the applicant analyzes a behavior identification method based on a human body skeleton in the prior art, and finds that the method for directly arranging all joints of the skeleton according to joint numbers and describing behavior characteristics at present has at least the following problems and defects:

1. the joints adjacent to the serial number have no spatial correlation in the human skeleton. For example: the joint representing the head is adjacent to the joint number representing the hand, however, the two joints are not adjacent in spatial position, which is not beneficial to the identification of the behavior characteristics, so that the applicant thinks that the joints with spatial correlation should be combined according to the spatial correlation of the joints in the skeleton.

2. The prior art describes the characteristics of all joints, and the method cannot pay attention to local joints which really have behavior changes, such as the motion of 'clapping hands', only the joints of the hands change, the spatial positions of the joints at other legs, the head and the like do not change, the contribution to the characteristic description is small, and the characteristics representing the behavior should be gathered at partial key joints of a skeleton when the behavior characteristics are represented by some joints of a main skeleton.

3. The prior art methods do not take into account the relative spatial relationship between the symmetric joints. Most of the motions of the human body are performed by the hand or leg joints, such as "running", "clapping", "kicking", "javeling", etc., and the relative spatial relationship between the left hand or leg joint and the right hand or leg joint is crucial to the difference of the behavior characteristics and should be considered in the behavior characteristic description.

Based on the above understanding, the following embodiments are described only taking the recognition of human behavior as an example. It is worth noting that the behavior recognition method based on the combined geometric features of the skeleton joints can not only be suitable for recognizing the behaviors of people, but also be used for recognizing the behaviors of other objects, particularly animals, on the basis of specific conditions and outputting the recognized corresponding target behaviors.

Referring to fig. 1, an embodiment of a behavior recognition method based on a skeleton joint combination geometric feature of the present application includes:

101. the skeleton is divided into X joint combinations, wherein X is a positive integer larger than 1.

First, the present application needs to know the states of all joints of the skeleton, which can be simply regarded as joint points, and this step needs to obtain the spatial position coordinates of all joint points of the skeleton, the connection relations of all joint points, the spatial connection order of all joint points, and the like. Based on this, the step divides the joint points of the skeleton into X joint combinations, wherein X is a positive integer larger than 1.

Specifically, the skeleton is a human body skeleton, and the invalid joints of the human body skeleton are removed according to a preset rule to obtain the remaining joints. The invalid joint herein refers to a joint irrelevant to the identification of a specific behavior, for example, in order to identify behaviors such as "running", "clapping", "kicking", "gesture", and the like of a person, the behaviors are mainly formed by the movement combination of joints of four limbs, and at this time, a spine joint, a waist joint, and the like can be considered as invalid joints, it should be noted that identification is performed for different behaviors, joint objects of the corresponding invalid joints may be different, and a specific joint object of the invalid joint is not limited herein. The invalid joints are removed in the step, so that the calculated amount of the joints can be reduced, the calculation speed is increased, and the identification efficiency of the behavior identification method based on the combined geometric characteristics of the skeleton joints is improved.

In one embodiment, please refer to fig. 2, wherein the left side of fig. 2 shows a schematic diagram of a human skeleton joint, the human skeleton mainly comprises: the left and

right shoulder joints

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, and 20. In order to identify the behaviors of people such as running, clapping, kicking, and gestures, the scheme considers that the vertebral joint 3, the lumbar joint 4, the caudal vertebra joint 7, and the head joint 20 are invalid joints, and subtracts the vertebral joint 3, the lumbar joint 4, the caudal vertebra joint 7, and the head joint 20 to obtain the remaining joints, which is specifically shown in the right diagram of fig. 2. The right diagram in fig. 2 divides the human skeleton into 3 joint combinations P1, P2, P3, wherein the joint combination P1 includes the joints of the left arm (left finger joint 12, left wrist joint 10, left elbow joint 8, left shoulder joint 1) and the right arm (right finger joint 13, right wrist joint 11, right elbow joint 9, right shoulder joint 2) of the remaining joints; joint combination P2 includes the joints of the left arm (left finger joint 12, left wrist joint 10, left elbow joint 8, left shoulder joint 1) and the right leg (right hip joint 6, right knee joint 15, right ankle joint 17, right toe joint 19) of the remaining joints; joint combination P3 includes the joints of the right arm (right finger joint 13, right wrist joint 11, right elbow joint 9, right shoulder joint 2) and the left leg (left hip joint 5, left knee joint 14, left ankle joint 16, left toe joint 18) of the remaining joints.

Further, the above-mentioned 3 joint combinations may further include a joint number set of joints, a spatial connection relationship of the joints, and a spatial connection order of the joints, where P1, P2, and P3 are respectively described as:

wherein H represents a set of joint serial numbers of the left arm and the right arm; LG represents the set of joint numbers of the left arm and the right leg; RG represents a set of joint numbers of the right arm and the left leg; j. the design is a square_mA joint with a sequence number m; j. the design is a square_nIndicates a joint with the serial number n; j. the design is a square_kThe joint with the number k is shown. The combination state of each joint of the 3 joint combinations can be more accurately expressed through the formula.

102. Spatial position features of each joint combination are described.

Describing the spatial position features of the joint combinations divided in step 101, where the spatial position features of different joint combinations are described by the three-dimensional space coordinates of each joint in the joint combination, for example, the three-dimensional space coordinates of the joints corresponding to different times in all the joints in the joint combination, a set of the three-dimensional coordinates of each joint in the joint combination may be described as the spatial position features of the joint combination. On this basis, the spatial position characteristics formed by all the joints in the three joint combinations P1, P2 and P3 are described as follows:

wherein

Three-dimensional coordinates of a joint with the serial number i in the joint combination P1 at the time T are shown, and T represents time length;

three-dimensional coordinates representing the joint with the sequence number j in P2 at time t;

three-dimensional coordinates of the joint with the number k in P3 at time t are shown.

Furthermore, the spatial position feature of the joint combination P1, the spatial position feature of the joint combination P2, and the spatial position feature of the joint combination P3 may also be converted into RGB pictures for description, and the RGB pictures are more suitable for recognition and classification of a neural network model.

For the joint combination P1

The three coordinates are respectively normalized so as to be regarded as Red, Green and Blue channel pictures of the first RGB picture correspondingly. Among the spatial position characteristics of the joint combination P1

The normalization process is performed to a first X value between 0 and 255, described as an example of X-axis coordinate normalization of joint combination P1:

wherein

Representing the normalized X-axis coordinate, i.e., the first X value; and then the first X value is represented as a 224X 112 picture by a linear interpolation method. Similarly, in the spatial position characteristic of the joint combination P1

Normalizing to a first Y value between 0 and 255

Can also refer to the above-mentioned normalization procedure

The formula (2) is normalized to obtain a corresponding Y-axis coordinate, namely a first Y value, and the first Y value is expressed as a picture of 224 multiplied by 112 by a linear interpolation method; similarly, in the spatial position characteristic of the joint combination P1

Normalizing to a first Z value between 0 and 255

Can also refer to the above-mentioned normalization procedure

The formula (2) is normalized to obtain a corresponding Z-axis coordinate, namely a first Z value, and the first Z value is expressed as a picture of 224 multiplied by 112 by a linear interpolation method; finally, a 224 × 112 picture represented by the first X value, a 224 × 112 picture represented by the first Y value and a 224 × 112 picture represented by the first Z value are respectively regarded as Red, Green and Blue channels of the first RGB picture to obtain a first RGB picture, so that the purpose of enabling the first RGB picture to be displayed in a color space of the display device is achievedThe spatial position feature of the joint combination P1 is described by a first RGB picture.

For the joint combination P2

The three coordinates are respectively normalized so as to be regarded as Red, Green and Blue channel pictures of the second RGB picture correspondingly. Among the spatial position characteristics of the joint combination P2

Normalizing to a second X value between 0 and 255

Can also refer to the above-mentioned normalization procedure

The formula (2) is normalized to obtain a corresponding X-axis coordinate, i.e. a second X value, and the second X value is represented as a 224 × 112 picture by a linear interpolation method. Similarly, in the spatial position characteristic of the joint combination P2

Normalizing to a second Y value between 0 and 255

Can also refer to the above-mentioned normalization procedure

The formula (2) is normalized to obtain a corresponding Y-axis coordinate, namely a second Y value, and the second Y value is expressed as a picture of 224 x 112 by a linear interpolation method; similarly, in the spatial position characteristic of the joint combination P2

Normalizing to a second Z value between 0 and 255

Can also refer to the above-mentioned normalization procedure

The formula (2) is normalized to obtain a corresponding Z-axis coordinate, namely a second Z value, and the second Z value is expressed as a picture of 224 multiplied by 112 by a linear interpolation method; and finally, respectively regarding one 224 × 112 picture represented by the second X value, one 224 × 112 picture represented by the second Y value and one 224 × 112 picture represented by the second Z value as Red, Green and Blue channels of the second RGB picture to obtain a second RGB picture, so that the spatial position feature of the joint combination P2 can be described by using the second RGB picture.

For the joint combination P3

The three coordinates are respectively normalized so as to be regarded as Red, Green and Blue channel pictures of the third RGB picture correspondingly. Among the spatial position characteristics of the joint combination P3

Normalizing to a third X value between 0 and 255

Can also refer to the above-mentioned normalization procedure

The formula (2) is normalized to obtain a corresponding X-axis coordinate, i.e. a third X value, and the third X value is represented as a 224 × 112 picture by a linear interpolation method. Similarly, in the spatial position characteristic of the joint combination P3

Normalizing to a third Y value between 0 and 255

Can also refer to the above-mentioned normalization procedure

The formula (2) is normalized to obtain a corresponding Y-axis coordinate, namely a third Y value, and the third Y value is expressed as a 224 x 112 picture by a linear interpolation method; similarly, in the spatial position characteristic of the joint combination P3

Normalizing to a third Z value between 0 and 255

Can also refer to the above-mentioned normalization procedure

The formula (2) is normalized to obtain a corresponding Z-axis coordinate, namely a third Z value, and the third Z value is expressed as a picture of 224 multiplied by 112 by a linear interpolation method; and finally, respectively regarding a 224 × 112 picture represented by the third X value, a 224 × 112 picture represented by the third Y value and a 224 × 112 picture represented by the third Z value as Red, Green and Blue channels of the third RGB picture to obtain a third RGB picture, so that the spatial position feature of the joint combination P3 can be described by using the third RGB picture.

103. The relative geometry of each joint combination is described.

Describing the relative geometric features of the joint combinations divided in step 101, the relative geometric features between different joint combinations are also very important for behavior recognition based on considering the spatial position features of the joints. The relative geometrical characteristics of different joints have great significance to the representation of behaviors, such as 'clapping' behaviors, and the relative distance between symmetrical joints of the left arm and the right arm can become small; as with the "pick up" action, the relative distances of the joints of the human arm (including the left and right arms) with respect to the joints of the leg, which are important to distinguish between different actions, also become small. In addition, the relative position characteristics of the joint points at adjacent moments are also important for behavior representation, particularly for motions with inconsistent motion amplitudes, such as "walking" and "running", and the relative position change amplitudes of the joints at adjacent moments are different. Referring to FIG. 3, the present application describes two relative geometric features, the first being a relative distance feature between different joint combinations, as shown in the left diagram of FIG. 3; the second is the relative position feature of the same joint combination, as shown in the right diagram of fig. 3.

For joint combination P1, the present embodiment describes the relative geometry of joint combination P1 using the relative position features of joint combination P1:

wherein

Indicates the arm joint J_iThree-dimensional coordinates at time t, wherein the arm joints comprise a left arm joint and a right arm joint;

indicates the arm joint J_iThree-dimensional coordinates at time t-1;

indicates the arm joint J_iRelative position features at time t; will be provided with

Considered as a relative geometric feature of the joint combination P1;

for joint combination P2 and joint combination P3, the following calculation formula is referenced:

wherein d is₁Indicating the relative distance formed between the different joints of the arm, d₂Indicating the relative distance between the arm and the leg joint, d₃Representing the relative distance of the joint from the origin.

For joint combination P2, the relative distances between the individual joints in joint combination P2 are taken as the relative geometric features of joint combination P2. The relative distance d21 formed between the left arm joints, the relative distance d22 between the left arm joints and the right leg joints, and the relative distance d23 between the left arm joints and the right leg joints with respect to a preset origin are calculated in the joint combination P2. Specifically, the relative distance d21 formed between the left arm joints in the joint combination P2 is set according to the above d₁The specific relative distance d21 is obtained through calculation by the formula (2); the relative distance d22 between the left arm joint and the right leg joint in the joint combination P2 is determined according to the above d₂The specific relative distance d22 is obtained through calculation by the formula (2); d23 the relative distance between the left arm joint and the right leg joint in the joint combination P2 relative to the preset origin is d₃The specific relative distance d23 is obtained through calculation by the formula (2); the relative distance d21, the relative distance d22, and the relative distance d23 are considered to be relative geometric features of the joint combination P2.

For joint combination P3, the relative distances between the individual joints in joint combination P3 are taken as the relative geometric features of joint combination P3. The relative distance d31 formed between the right arm joints, the relative distance d32 between the right arm joints and the left leg joints, and the relative distance d33 between the right arm joints and the left leg joints with respect to a preset origin are calculated in the joint combination P3. Specifically, the relative distance d31 between the right arm joints in the joint combination P3 is set according to the above d₁The specific relative distance d31 is obtained through calculation by the formula (2); the relative distance d32 between the right arm joint and the left leg joint in the joint combination P3 is determined according to the above d₂Is calculated by the formulaCalculating a specific relative distance d 32; d33 the relative distance between the right arm joint and the left leg joint in the joint combination P3 relative to the preset origin is d₃The specific relative distance d33 is obtained through calculation by the formula (2); the relative distance d31, the relative distance d32, and the relative distance d33 are considered to be relative geometric features of the joint combination P3.

Furthermore, the relative geometric features of the joint combination P1, the relative geometric features of the joint combination P2 and the relative geometric features of the joint combination P3 can be converted into RGB pictures for description, and the RGB pictures are more suitable for the recognition and classification of the neural network model.

Relative geometric features for joint combination P1

The three coordinates are respectively normalized so as to be regarded as Red, Green and Blue channel pictures of the fourth RGB picture correspondingly. Among the relative geometrical features of the joint combination P1

Normalizing to a first X relative value between 0 and 255, specifically

Can also refer to the above-mentioned normalization procedure

The formula (2) is normalized to obtain a corresponding X-axis coordinate, i.e. a first X relative value, and the first X relative value is represented as a 224 × 112 picture by a linear interpolation method. Similarly, in the relative geometry of the joint combination P1

Normalizing to a first relative Y value between 0 and 255, specifically

Can also refer to the above-mentioned normalization procedure

The formula (2) is normalized to obtain a corresponding Y-axis coordinate, namely a second Y value, and the second Y value is expressed as a picture of 224 x 112 by a linear interpolation method; similarly, in the relative geometry of the joint combination P1

Normalizing to a first Z relative value between 0 and 255, specifically

Can also refer to the above-mentioned normalization procedure

The formula (2) is normalized to obtain a corresponding Z-axis coordinate, namely a first Z relative value, and the first Z relative value is expressed as a picture of 224 x 112 by a linear interpolation method; and finally, respectively regarding a 224 × 112 picture represented by the first X relative numerical value, a 224 × 112 picture represented by the first Y relative numerical value and a 224 × 112 picture represented by the first Z relative numerical value as Red, Green and Blue channels of a fourth RGB picture to obtain the fourth RGB picture, so that the fourth RGB picture is used for describing the relative geometric features of the joint combination P1.

For the relative geometry of the joint combination P2, the normalization of the relative distance d21 in the relative geometry of the joint combination P2 to a first d21 value between 0 and 255 can be referred to above for the normalization of the relative distance d21

The formula (2) is normalized to obtain a corresponding first d21 value, and the first d21 value is represented as a 224 x 112 picture by a linear interpolation method; relative geometric features of the joint combination P2The relative distance d22 is normalized to a first d22 value between 0 and 255, and the normalization process of the relative distance d22 can refer to the above-mentioned

The formula (2) is normalized to obtain a corresponding first d22 value, and the first d22 value is represented as a 224 x 112 picture by a linear interpolation method; the normalization of the relative distance d23 in the relative geometry of P2 to a first d23 value between 0 and 255 can be referred to above for the normalization of the relative distance d23

The formula (2) is normalized to obtain a corresponding first d23 value, and the first d23 value is represented as a 224 x 112 picture by a linear interpolation method; respectively regarding a 224 × 112 picture represented by the first d21 value, a 224 × 112 picture represented by the first d22 value, and a 224 × 112 picture represented by the first d23 value as Red, Green, and Blue channels of the fifth RGB picture to obtain a fifth RGB picture; the use of the fifth RGB picture to describe the relative geometry of joint combination P2 is enabled.

For the relative geometry of the joint combination P3, the normalization of the relative distance d31 in the relative geometry of the joint combination P3 to a first d31 value between 0 and 255 can be referred to above for the normalization of the relative distance d31

The formula (2) is normalized to obtain a corresponding first d31 value, and the first d31 value is represented as a 224 x 112 picture by a linear interpolation method; normalization of the relative distance d32 in the relative geometry of the joint combination P3 to a first d32 value between 0 and 255, in particular for the normalization of the relative distance d32, can be found in relation to the above description

Is normalized to obtain the correspondingA first d32 value, which is represented by a 224 × 112 picture through linear interpolation; normalization of the relative distance d33 in the relative geometry of the joint combination P3 to a first d33 value between 0 and 255, in particular for the normalization of the relative distance d33, can be found in relation to the above description

The formula (2) is normalized to obtain a corresponding first d33 value, and the first d33 value is represented as a 224 x 112 picture by a linear interpolation method; and respectively regarding a 224 × 112 picture represented by the first d31 value, a 224 × 112 picture represented by the first d32 value, and a 224 × 112 picture represented by the first d33 value as Red, Green, and Blue channels of the sixth RGB picture to obtain a sixth RGB picture, so as to realize the description of the relative geometric features of the joint combination P3 by using the sixth RGB picture.

In a further embodiment, for the joint combination P1, the relative distance between the individual joints of the joint combination P1 may also be used as a relative geometric feature of the joint combination P1. A relative distance d11 formed between the arm joints, a relative distance d12 between the arm joints and the both-leg joints (left-leg joint and right-leg joint), and a relative distance d13 between the arm joints with respect to a preset origin are calculated in the joint combination P1. Specifically, the relative distance d11 between the arm joints in the joint combination P1 is set to d₁The specific relative distance d11 is obtained through calculation by the formula (2); the relative distance d12 between the arm joint and the joint of the two legs is adjusted according to the above d₂The specific relative distance d12 is obtained through calculation by the formula (2); d13 relative distance between arm joint in the joint combination P1 and a preset origin is d₃The specific relative distance d33 is obtained through calculation by the formula (2); the relative distance d11, the relative distance d12, and the relative distance d13 are considered to be relative geometric features of the joint combination P1.

Further, for the relative geometry of the joint combination P1, the normalization process of the relative distance d11 in the relative geometry of the joint combination P1 to the first d31 value between 0 and 255, and particularly the normalization process of the relative distance d11, can be performedTo which reference is made

The formula (2) is normalized to obtain a corresponding first d11 value, and the first d11 value is represented as a 224 x 112 picture by a linear interpolation method; normalization of the relative distance d12 in the relative geometry of the joint combination P1 to a first d32 value between 0 and 255, in particular for the normalization of the relative distance d12, can be found in relation to the above description

The formula (2) is normalized to obtain a corresponding first d12 value, and the first d12 value is represented as a 224 x 112 picture by a linear interpolation method; normalization of the relative distance d13 in the relative geometry of the joint combination P1 to a first d13 value between 0 and 255, in particular for the normalization of the relative distance d13, can be found in relation to the above description

The formula (2) is normalized to obtain a corresponding first d13 value, and the first d13 value is represented as a 224 x 112 picture by a linear interpolation method; and respectively regarding a 224 × 112 picture represented by the first d11 value, a 224 × 112 picture represented by the first d12 value, and a 224 × 112 picture represented by the first d13 value as Red, Green, and Blue channels of the seventh RGB picture to obtain a seventh RGB picture, so as to realize the description of the relative geometric features of the joint combination P1 by using the seventh RGB picture.

104. The spatial position characteristic and the relative geometric characteristic of each joint combination are respectively combined and described as the geometric characteristic of each joint combination.

The spatial position characteristic of each joint combination described in the step 102 and the relative geometric characteristic of each joint combination described in the step 103 are combined to describe the geometric characteristic of each joint combination.

For example, referring to fig. 4, the first RGB picture and the fourth RGB picture are combined to form a first geometric feature RGB picture corresponding to the 224 × 224 length vector describing the joint combination P1; combining the second RGB picture and the fifth RGB picture to form a second geometric feature RGB picture corresponding to the 224 × 224 length vector describing the shutdown combination P2; combining the third RGB picture and the sixth RGB picture, a third geometric feature RGB picture corresponding to the 224 × 224 length vector describing the joint combination P3 is formed. Of course, the first RGB picture and the seventh RGB picture may be combined to form another first geometric feature RGB picture corresponding to the 224 × 224 length vector describing the joint combination P1.

105. And respectively inputting the geometric features of each joint combination into a trained deep convolution network for feature extraction and classification, and respectively outputting the possible probabilities of a plurality of behaviors.

It can be understood that, in the embodiment of the present application, feature extraction is performed by using the trained convolutional neural network Resnet-50, and the training sample picture of the convolutional neural network Resnet-50 also needs to be processed as the above step 101, step 102, step 103 and step 104, so that the trained convolutional neural network Resnet-50 can have the recognition capability for the spatial position feature, the relative geometric feature and the geometric feature of each joint combination, and the label classification is performed in combination with manual operation on the behaviors expressed by the training sample picture, so as to improve the efficiency and the accuracy of training of the convolutional neural network Resnet-50, and obtain the trained convolutional neural network Resnet-50 with the capability of recognizing and classifying specific label behaviors. The training process of the convolutional neural network Resnet-50 is a mature prior art and is not described herein again.

Referring to fig. 4, in this step, the first geometric feature RGB picture, the second geometric feature RGB picture, and the third geometric feature RGB picture in step 104 are respectively input to the trained convolutional neural network Resnet-50 for feature extraction, and possible probabilities of a plurality of behaviors are respectively output.

106. And comprehensively analyzing the possible probabilities of the behaviors according to a preset fusion function, and outputting the behavior with the highest possible probability to obtain the target behavior.

For example, the possible probabilities of several actions obtained in step 105 include: from the first geometric feature RPosterior probability b of behavior extracted from GB picture₁Posterior probability b of behavior extracted from the second geometry RGB picture₂Posterior probability b of behavior extracted from third geometry RGB picture₃(ii) a Then according to the following formula:

P(L|S)＝η₁b₁+η₂b₂+η₃b₃

wherein eta₁、η_a、η₃For the preset weight, the posterior probability vector P (L | S) represents the probability that the skeleton belongs to the action tag class L, and then label ═ Find (max (P (L | S))), where label represents the target behavior. For example, the target behavior may be: the label categories of "clap", "pick up", "walk", "run", etc.

After the above description of the embodiments of the present application, the present embodiment also lists the recognition effects of two different skeleton data-based behavior databases NTU RGB + D60 and Northwestern-UCLA, where NTU RGB + D60 contains 56880 behavior samples, and 40 objects are involved in the shooting of 60 types of motions, and each object faces three different angle cameras to repeat the motions, so that there are two verification modes, i.e., a Cross-verification mode based on a perspective Cross-view (cv) and a Cross-verification mode based on an object Cross-subject (cs). The Northwestern-UCLA (N-UCLA) dataset contains 1494 actions of 10 action types, each action type is realized by repeating 10 different objects for 6 times, the dataset contains 3 cameras forming multiple viewing angles, and Cross validation is carried out based on the viewing angle Cross-View (CV). Comparing the behavior recognition technology based on the combined geometric characteristics of the skeleton joints in the embodiment of the application with the current behavior recognition technology based on the connection of all skeleton joint serial numbers in two data sets NTU RGB + D60 and N-UCLA, the recognition effect is shown in the following table 1:

TABLE 1

Table 1 above shows a comparison between the technical scheme of the behavior recognition method based on the combined geometric features of the skeletal joints provided in the embodiment of the present application and the accuracy of the behavior recognition in the prior art. The recognition accuracy results show that the technical scheme of the behavior recognition method based on the geometric characteristics of the skeleton joint combination is greatly improved in accuracy compared with the prior art, and the superiority of the technical scheme of the behavior recognition method based on the geometric characteristics of the skeleton joint combination is demonstrated.

Compared with the existing behavior recognition technology based on all joint space characteristics, the technical scheme of the behavior recognition method based on the skeleton joint combination geometric characteristics overcomes the following defects and achieves the advantages:

1. the prior art realizes behavior feature description based on all joint space features, and neglects the space connection features between joints. The joint is connected in sequence based on the spatial connection characteristics of the joints, and the joint is not connected according to the serial numbers of the joint points, so that the spatial characteristics among different joints are embodied.

2. The prior art realizes behavior feature description based on all joint space features, and does not highlight the key information of behaviors. The technology of joint combination provided by the application is used for prominently describing key positions and key joints of behavior occurrence, removing joints which do not contribute much to behavior feature description, focusing the behavior feature description on joints of arms and legs, dividing the joints into three joint combinations, and comprehensively considering the influence of different joint combinations on behavior difference.

3. The prior art lacks a description of the relative characteristics of different joints, including relative spatial and relative temporal characteristics, which are important for behavioral representation. The joint combination geometric characteristics are formed by the relative distance characteristics among different joints and the relative time characteristics of the arm joints and the joint spatial characteristics in a public mode, the spatial description and the time characteristics of the human skeleton are expressed, and the accuracy of behavior recognition is greatly improved.

The foregoing embodiment describes the behavior recognition method based on the geometric features of the skeleton joint combination, and the following describes the behavior recognition device based on the geometric features of the skeleton joint combination, please refer to fig. 5, which includes:

a dividing unit 501, configured to divide a skeleton into X joint combinations, where X is a positive integer greater than 1;

a first description unit 502, configured to describe spatial position characteristics of each joint combination;

a second description unit 503 for describing relative geometric features of each of the joint combinations;

a joint description unit 504, configured to describe the spatial position feature and the relative geometric feature of each joint combination as a geometric feature of each joint combination respectively;

a classification output unit 505, configured to input the geometric features of each joint combination into a trained deep convolution network for feature extraction and classification, and output possible probabilities of a plurality of behaviors;

and the fusion analysis unit 506 is configured to perform comprehensive analysis on the possible probabilities of the behaviors according to a preset fusion function, and output the behavior with the highest possible probability to obtain a target behavior.

Optionally, X is 3; the skeleton is a human body skeleton; when the skeleton is divided into X joint combinations, the dividing unit 501 is specifically configured to:

the dividing unit 501 describes the P1, the P2, and the P3 as:

the LG represents a set of joint numbers for the left arm and the right leg;

the RG represents a set of joint numbers of the right arm and the left leg;

said J_mA joint with a sequence number m;

said J_nIndicates a joint with the serial number n;

said J_kThe joint with the number k is shown.

Optionally, when the first description unit 502 describes the spatial position feature of each joint combination, it is specifically configured to:

spatial location features describing the P1, P2, and P3, respectively, are:

the above-mentioned

Three-dimensional coordinates of the joint with the serial number i in the P1 at the time T, T tableDisplaying time;

the above-mentioned

the above-mentioned

The three-dimensional coordinates of the joint with the number k in P3 at time t are shown.

Optionally, the first description unit 502 is further specifically configured to:

the spatial location of the P1

Performing normalization processing to a first X value between 0 and 255;

the spatial location of the P1

Performing normalization to a first Y value between 0 and 255;

the spatial location of the P1

Performing normalization to a first Z value between 0 and 255;

describing spatial location features of the P1 using the first RGB picture;

the spatial location of the P2

Normalizing to a second X value between 0 and 255;

the spatial location of the P2

Normalizing to a second Y value between 0 and 255;

the spatial location of the P2

Normalizing to a second Z value between 0 and 255;

describing spatial location features of the P2 using the second RGB picture;

the spatial location of the P3

Normalizing to a third X value between 0 and 255;

the spatial location of the P3

Normalizing to a third Y value between 0 and 255;

the spatial location of the P3

Normalizing to a third Z value between 0 and 255;

Optionally, when describing the relative geometric features of each joint combination, the second description unit 503 is specifically configured to:

wherein said

the above-mentioned

Represents the arm joint J_iThree-dimensional coordinates at time t-1;

the above-mentioned

Represents the arm joint J_iRelative position features at time t;

will be described in

Considered as a relative geometric feature of said P1;

Optionally, the second describing unit 503 is further configured to:

as to the relative geometry of the P1

Performing normalization to a first relative value of X between 0 and 255;

the spatial location of the P1

Normalizing to a first relative value of Y between 0 and 255;

the spatial location of the P1

Normalizing to a first relative Z value between 0 and 255;

describing relative geometric features of the P1 using the fourth RGB picture;

describing relative geometric features of the P2 using the fifth RGB picture;

the relative geometry of the P3 is described using the sixth RGB picture.

Optionally, when the combination description unit 504 describes the spatial position feature and the relative geometric feature of each joint combination as geometric features, it is specifically configured to:

Optionally, the classification output unit 505 is specifically configured to, when the geometric features of each joint combination are input into a trained deep convolutional network for feature extraction and classification, and possible probabilities of a plurality of behaviors are output respectively:

The fusion analysis unit 506 is specifically configured to, when performing comprehensive analysis on the possible probabilities of the behaviors according to a preset fusion function, and outputting the behavior with the highest possible probability to obtain a target behavior:

P(L|S)＝η₁b₁+η₂b₂+η₃b₃

eta of₁Eta of₂Eta of₃Is a preset weight;

label＝Find(max(P(L|S)))

the label represents the target behavior.

The operation performed by the behavior recognition device based on the geometric features of the skeleton joint combination in the embodiment of the present application is similar to the operation performed in fig. 1, and is not repeated here.

Referring to fig. 6, a computer device in an embodiment of the present application is described below, where an embodiment of the computer device in the embodiment of the present application includes: the computer device 600 may include one or more processors (CPUs) 801 and a memory 602, where one or more applications or data are stored in the memory 602. Wherein the memory 602 is volatile storage or persistent storage. The program stored in the memory 602 may include one or more modules, each of which may include a sequence of instructions operating on a computer device. Still further, the processor 601 may be arranged in communication with the memory 602 to execute a series of instruction operations in the memory 602 on the computer device 600. The computer device 600 may also include one or more wireless network interfaces 603, one or more input-output interfaces 604, and/or one or more operating systems, such as Windows Server, Mac OS, Unix, Linux, FreeBSD, etc. The processor 601 may perform the operations performed in the embodiment shown in fig. 1, and details thereof are not described herein.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A behavior identification method based on a skeleton joint combination geometric feature is characterized by comprising the following steps:

characterizing the spatial position of each of the joint combinations;

describing relative geometric features of each of the joint combinations;

2. The behavior recognition method according to claim 1, wherein X is 3; the skeleton is a human body skeleton; the dividing of the skeleton into X joint combinations comprises:

3. The behavior recognition method according to claim 2, wherein each of the joint combinations includes a joint number set of joints, a spatial connection relationship of the joints, a spatial connection order of the joints;

the P1, the P2, the P3 are respectively described as:

the LG represents a set of joint numbers for the left arm and the right leg;

the RG represents a set of joint numbers of the right arm and the left leg;

said J_mA joint with a sequence number m;

said J_nIndicates a joint with the serial number n;

said J_kThe joint with the number k is shown.

4. The behavior recognition method according to claim 3, wherein the describing the spatial position feature of each of the joint combinations comprises:

spatial location features describing the P1, P2, and P3, respectively, are:

the above-mentioned

the above-mentioned

the above-mentioned

5. The behavior recognition method according to claim 4, further comprising, after describing the spatial location features of the P1, the P2 and the P3 respectively:

the spatial location of the P1

Performing normalization processing to a first X value between 0 and 255;

the spatial location of the P1

Performing normalization to a first Y value between 0 and 255;

the spatial location of the P1

Performing normalization to a first Z value between 0 and 255;

describing spatial location features of the P1 using the first RGB picture;

the spatial location of the P2

Normalizing to a second X value between 0 and 255;

the spatial location of the P2

Normalizing to a second Y value between 0 and 255;

the spatial location of the P2

Normalizing to a second Z value between 0 and 255;

describing spatial location features of the P2 using the second RGB picture;

the spatial location of the P3

Normalizing to a third X value between 0 and 255;

the spatial location of the P3

Normalizing to a third Y value between 0 and 255;

the spatial location of the P3

Normalizing to a third Z value between 0 and 255;

6. The behavior recognition method of claim 5, wherein the describing relative geometric features of each of the joint combinations comprises:

wherein said

the above-mentioned

Represents the arm joint J_iThree-dimensional coordinates at time t-1;

the above-mentioned

Represents the arm joint J_iRelative position features at time t;

will be described in

Considered as a relative geometric feature of said P1;

7. The behavior recognition method according to claim 6, further comprising, after describing relative geometric features of the P1, the P2 and the P3, respectively:

as to the relative geometry of the P1

Performing normalization to a first relative value of X between 0 and 255;

the spatial location of the P1

Normalizing to a first relative value of Y between 0 and 255;

the spatial location of the P1

Normalizing to a first relative Z value between 0 and 255;

describing relative geometric features of the P1 using the fourth RGB picture;

describing relative geometric features of the P2 using the fifth RGB picture;

the relative geometry of the P3 is described using the sixth RGB picture.

8. The behavior recognition method according to claim 7, wherein describing the spatial position feature and the relative geometric feature of each of the joint combinations as geometric features in combination comprises:

9. The behavior recognition method according to claim 8, wherein the geometric features of each joint combination are input into a trained deep convolutional network for feature extraction and classification, and the probability probabilities of a plurality of behaviors are respectively output, and the method comprises the following steps:

10. The behavior recognition method of claim 9, wherein the possible probabilities of the several behaviors comprise: posterior probability b of behavior extracted from the first geometric feature RGB picture₁Posterior probability b of behavior extracted from the second geometry RGB picture₂Posterior probability b of behavior extracted from the third geometry RGB picture₃；

P(L|S)＝η₁b₁+η₂b₂+η₃b₃

eta of₁Eta of₂Eta of₃Is a preset weight;

label＝Find(max(P(L|S)))

the label represents the target behavior.