CN105550667A

CN105550667A - Stereo camera based framework information action feature extraction method

Info

Publication number: CN105550667A
Application number: CN201610047866.7A
Authority: CN
Inventors: 陈启军; 尹晓川; 韩云
Original assignee: Tongji University
Current assignee: Suzhou Tongqi Artificial Intelligence Technology Co ltd
Priority date: 2016-01-25
Filing date: 2016-01-25
Publication date: 2016-05-04
Anticipated expiration: 2036-01-25
Also published as: CN105550667B

Abstract

The invention relates to a stereo camera based framework information action feature extraction method. The method comprises the following steps: (1) obtaining coordinates of each joint point of a human body by utilizing a stereo camera; (2) determining a coordinate system of the human body; (3) calculating a first position matrix from a coordinate system of the camera to the coordinate system of the human body, wherein the first position matrix includes a rotary part matrix and a translation part matrix; (4) calculating a second position matrix from each joint point of the human body to the coordinate system of the camera; (5) calculating a relative position of each joint point of the human body relative to the coordinate system of the human body; (6) performing smoothing treatment on a rotary translation matrix between two adjacent frames, and accumulating variations between the two adjacent frames to obtain a translation variation and a rotary variation of the coordinate system of the human body from the beginning of an action to a current moment; and (7) generating an eigenvector of the current moment. Compared with the prior art, the method has the advantages of high precision, convenience for application and the like.

Description

Skeleton information action feature extraction method based on stereo camera

Technical Field

The invention relates to a skeleton information action feature extraction method, in particular to a skeleton information action feature extraction method based on a stereo camera.

Background

Human motion characteristics are being studied in biomedical engineering, physiotherapy, medical diagnosis and rehabilitation. The detection of the human motion characteristics has wide requirements in places such as nursing homes, hospitals and the like, and also has many applications in the fields of safety protection, battlefield reconnaissance and the like. Under the impetus of the development of motion performance analysis, visual monitoring and biometry, the method for extracting and analyzing different human motion is widely regarded.

Currently, the most common method for detecting motion characteristics of a human body is to use a sequence of visual images. However, the visual perception of human body movement is affected by distance, light changes, clothing changes and the shielding of various parts of the human body on the appearance, and the detection performance is reduced. Radar is an electromagnetic sensor, which can work in daytime and at night due to its long range of action and has the ability to penetrate through walls and the ground, and is also commonly used for detecting human motion characteristics. However, the traditional radar has lower working frequency, the influence of the micro Doppler effect of human motion is very small, and the human motion characteristics are more difficult to detect with high resolution.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a method for representing the action characteristics of skeleton information extracted by a stereo camera, which has high precision and convenient application.

The purpose of the invention can be realized by the following technical scheme: a skeleton information action feature extraction method based on a stereo camera comprises the following steps:

(1) acquiring coordinates of each joint point of the human body by using a stereo camera;

(2) determining a human body coordinate system;

(3) calculating a first position matrix from a camera coordinate system to a human body coordinate system, wherein the first position matrix comprises a rotating part matrix and a translating part matrix;

(4) calculating a second position matrix from each joint point of the human body to a camera coordinate system;

(5) calculating the relative position of each joint point of the human body relative to a human body coordinate system, and taking the relative position as a part of the required characteristics of motion recognition;

(6) smoothing the rotation translation matrix between two adjacent frames, and accumulating the variation between two adjacent needles to obtain the translation variation and the rotation variation of the human body coordinate system from the beginning of the action to the current moment;

(7) and generating a feature vector of the current moment.

The human body coordinate system is characterized in that an origin O of the human body coordinate system is an origin which is an intersection point of a connecting line of a left shoulder L and a right shoulder R of a human body and a human body symmetry axis, an x axis of the human body coordinate system is a ray from the origin O to the left shoulder L, a y axis of the human body coordinate system is a ray from the origin O to a human body center T, and a z axis of the human body coordinate system is perpendicular to a plane where the x axis and the y axis are located and meets the right-hand rule.

The step (3) is specifically as follows: let the coordinate of the left shoulder L of the human body acquired by the stereo camera be (^CX_L,^CY_L,^CZ_L) The coordinate of the right shoulder R of the human body is (^CX_R,^CY_R,^CZ_R) The coordinate of the center T of the human body is (^CX_T,^CY_T,^CZ_T) The rotating parts are in matrix ofSaid first position matrix

P_{O} = R_{C}^{O} P_{O} + {P_{O}}_{C O R G},

Wherein,^OP_CORGin order to translate a portion of the matrix,^OP_CORG＝[^CO_x,^CO_y,^CO_z]^T，

{O_{c}}_{x} = \frac{{X_{C}}_{L} + {X_{C}}_{R}}{2}, {O_{c}}_{y} = \frac{{Y_{C}}_{L} + {Y_{C}}_{R}}{2},

{O_{c}}_{z} = \frac{{Z_{C}}_{L} + {Z_{C}}_{R} + {Z_{C}}_{T}}{3} .

the rotating part matrix isThe calculation is carried out according to the formula (1),

d i a g (\begin{matrix} | \overset{&RightArrow;}{R L} | & | \overset{&RightArrow;}{T O} | & | \overset{&RightArrow;}{R L} \times \overset{&RightArrow;}{R T} | \end{matrix}) = R_{C}^{O} (\begin{matrix} \overset{&RightArrow;}{R L} & \overset{&RightArrow;}{T O} & \overset{&RightArrow;}{O Z} \end{matrix}) - - - (1)

wherein,representing the vector from the right shoulder to the left shoulder of the human body,a vector representing the center of the human body to the origin of the human body coordinate system,represents the vector from the right shoulder of the human body to the center of the human body,

the step (4) is specifically as follows: orthogonalizing the rotating part matrix in the first position matrix to obtain a second position matrixComprises the following steps:

{R_{C}^{O}}^{'} = U^{T} \cdot d i a g (\begin{matrix} λ_{1}^{- \frac{1}{2}} & λ_{2}^{- \frac{1}{2}} & λ_{3}^{- \frac{1}{2}} \end{matrix}) \cdot U \cdot R_{C}^{O}

wherein,for a rotating partial matrix, U and λ satisfy

R_{C}^{O} \cdot {R_{C}^{O}}^{T} = U^{T} \cdot d i a g (\begin{matrix} λ_{1} & λ_{2} & λ_{3} \end{matrix}) \cdot U .

Since the rotating part matrices calculated from the acquired point cloud data and obtained based on the three left and right shoulders and the body center are not necessarily orthogonal, it is necessary to orthogonalize the rotating part matrices,

the step (5) is specifically as follows: the coordinates of each joint point of the human body are converted into a human body coordinate system through coordinate transformation, so that the relative position of each joint point of the human body with respect to the human body coordinate system is obtained, and the influence of visual angle change in action recognition is solved; however, for the motion sequence with the indistinct four-limb movement, the change of the human body coordinate system in the motion needs to be added for distinguishing the motion such as rotation, jumping and the like.

The smoothing treatment specifically comprises the following steps: and performing arithmetic average on the obtained rotation and translation matrixes between two adjacent frames. Because the human body motion is equivalent to a rigid body, the change of the human body motion between two adjacent frames is smooth, and sudden change does not occur, so that the obtained change quantity between two adjacent frames is smoothed.

The step (7) is specifically as follows: and (5) taking the relative position in the step (5) and the translation variation and the rotation variation in the step (6) as the feature vector of the current moment.

Compared with the prior art, the invention has the following advantages:

(1) according to the invention, a human body coordinate system is established, so that the scene information does not need to be modeled, and the extra error caused by calculating a horizontal or vertical reference coordinate system is avoided;

(2) after the human body is identified, the method solves the influence of visual angle change in action identification by calculating the relative position of the human body joint point relative to the human body coordinate system;

(3) in the actual test, the method reduces the error and improves the accuracy of action identification by accumulating the variable quantity between two adjacent frames;

(4) the stereo camera used by the invention makes up the defect of the loss of space information of the traditional monocular camera, has more and more extensive application due to the price advantage compared with a binocular camera and a TOF camera, and improves the identification precision and capability of the skeleton.

Drawings

Fig. 1 is a flowchart of a skeleton information action feature extraction method based on a stereo camera according to the present application;

FIG. 2 is a schematic diagram of a human coordinate system;

fig. 3 is a schematic diagram illustrating the transformation from point a to point B of the human body at time t.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments.

As shown in fig. 1, a skeleton information action feature extraction method based on a stereo camera includes the following steps:

(2) and determining a human body coordinate system, wherein an origin O of the human body coordinate system is an origin at an intersection point of a connecting line of a left shoulder L and a right shoulder R of the human body and a human body symmetry axis, an x-axis of the human body coordinate system is a ray from the origin O to the left shoulder L, a y-axis of the human body coordinate system is a ray from the origin O to a human body center T, and a z-axis of the human body coordinate system is perpendicular to a plane where the x-axis and the y-axis are located and meets the right-hand rule, as shown in fig. 2.

(3) Calculating a first position matrix from a camera coordinate system to a human body coordinate system, wherein the first position matrix comprises a rotating part matrix and a translating part matrix; let the coordinate of the left shoulder L of the human body acquired by the stereo camera be (^CX_L,^CY_L,^CZ_L) The coordinate of the right shoulder R of the human body is (^CX_R,^CY_R,^CZ_R) The coordinate of the center T of the human body is (^CX_T,^CY_T,^CZ_T) The rotating parts are in matrix ofThen the first position matrix

P_{O} = R_{C}^{O} P_{O} + {P_{O}}_{C O R G},

{O_{c}}_{x} = \frac{{X_{C}}_{L} + {X_{C}}_{R}}{2}, {O_{c}}_{y} = \frac{{Y_{C}}_{L} + {Y_{C}}_{R}}{2}, {O_{c}}_{z} = \frac{{Z_{C}}_{L} + {Z_{C}}_{R} + {Z_{C}}_{T}}{3} .

rotating parts are in matrix ofThe calculation is carried out according to the formula (1),

d i a g (\begin{matrix} | \overset{&RightArrow;}{R L} | & | \overset{&RightArrow;}{T O} | & | \overset{&RightArrow;}{R L} \times \overset{&RightArrow;}{R T} | \end{matrix}) = R_{C}^{O} (\begin{matrix} \overset{&RightArrow;}{R L} & \overset{&RightArrow;}{T O} & \overset{&RightArrow;}{O Z} \end{matrix}) - - - (1)

(4) since the rotating part matrices calculated from the acquired point cloud data and obtained based on the three left shoulders, right shoulders and body center are not necessarily orthogonal, it is necessary to orthogonalize the rotating part matrices to obtain a second position matrix, which is a position matrix with a high accuracyComprises the following steps:

{R_{C}^{O}}^{'} = U^{T} \cdot d i a g (\begin{matrix} λ_{1}^{- \frac{1}{2}} & λ_{2}^{- \frac{1}{2}} & λ_{3}^{- \frac{1}{2}} \end{matrix}) \cdot U \cdot R_{C}^{O}

wherein,for a rotating partial matrix, U and λ satisfy

R_{C}^{O} \cdot {R_{C}^{O}}^{T} = U^{T} \cdot d i a g (\begin{matrix} λ_{1} & λ_{2} & λ_{3} \end{matrix}) \cdot U .

(5) The coordinates of each joint point of the human body are converted into a human body coordinate system through coordinate transformation to obtain the relative position of each joint point of the human body relative to the human body coordinate system, and the relative position is used as a part of the required characteristics of action identification, so that the influence of visual angle change in the action identification is solved; however, for the motion sequence with the indistinct four-limb movement, the change of the human body coordinate system in the motion needs to be added for distinguishing the motion such as rotation, jumping and the like.

(6) Because human motion equivalence becomes a rigid body, the motion of human motion is comparatively gentle between two adjacent frames, can not take place the sudden change, consequently carries out the smoothing to the rotational translation matrix between two adjacent frames that obtain, and the smoothing process specifically is: and performing arithmetic average on the obtained rotation and translation matrixes between two adjacent frames. After the smoothing treatment, accumulating the variation between two adjacent needles to obtain the translation variation and the rotation variation of the human body coordinate system from the beginning of the action to the current moment;

assuming that the human body is located at point A at time t, the human body is located at point B at time t +1, and α, β and gamma are the rotation angles of the human body around the x-axis, the y-axis and the z-axis, respectively, as shown in FIG. 3, the rotation matrix is rotated

R_{A}^{B} = (\begin{matrix} r_{11} & r_{12} & r_{13} \\ r_{21} & r_{22} & r_{23} \\ r_{31} & r_{32} & r_{33} \end{matrix}),

The Euler angle can be obtained by calculation

α = \tan^{- 1} (\frac{r_{21}}{r_{11}}), β = \tan^{- 1} (- \frac{r_{31}}{\sqrt{r_{11}^{2} + r_{21}^{2}}}), γ = \tan^{- 1} (\frac{r_{32}}{r_{33}}),

The obtained variation is a change between frames and is small, and in the action recognition, the variation needs to be accumulated to calculate the translation and rotation changes of the human coordinate system from the action start to the current time.

(7) And (5) taking the relative position in the step (5) and the translation variation and the rotation variation in the step (6) as the feature vector of the current moment.

Claims

1. A skeleton information action feature extraction method based on a stereo camera is characterized by comprising the following steps:

(2) determining a human body coordinate system;

(5) calculating the relative position of each joint point of the human body relative to a human body coordinate system;

(7) and generating a feature vector of the current moment.

2. The method as claimed in claim 1, wherein an origin O of the human body coordinate system is an intersection point of a connecting line of a left shoulder L and a right shoulder R of the human body and a human body symmetry axis, an x-axis of the human body coordinate system is a ray from the origin O to the left shoulder L, a y-axis of the human body coordinate system is a ray from the origin O to a human body center T, and a z-axis of the human body coordinate system is perpendicular to a plane where the x-axis and the y-axis are located and satisfies a right-hand rule.

3. The method for extracting skeleton information action features based on a stereo camera according to claim 1, wherein the step (3) specifically comprises: let the coordinate of the left shoulder L of the human body acquired by the stereo camera be (^CX_L,^CY_L,^CZ_L) The coordinate of the right shoulder R of the human body is (^CX_R,^CY_R,^CZ_R) The coordinate of the center T of the human body is (^CX_T,^CY_T,^CZ_T) The rotating parts are in matrix ofSaid first position matrixWherein,^OP_CORGin order to translate a portion of the matrix,^OP_CORG＝[^CO_x,^CO_y,^CO_z]^T， ^Cand P is the coordinate of each joint point of the human body in the human body coordinate system.

4. The method according to claim 1, wherein the matrix of rotating parts isThe calculation is carried out according to the formula (1),

d i a g (\begin{matrix} | \overset{&RightArrow;}{R L} | & | \overset{&RightArrow;}{T O} | & | \overset{&RightArrow;}{R L} \times \overset{&RightArrow;}{R T} | \end{matrix}) = R_{C}^{O} (\begin{matrix} \overset{&RightArrow;}{R L} & \overset{&RightArrow;}{T O} & \overset{&RightArrow;}{O Z} \end{matrix}) - - - (1)

5. the method for extracting skeleton information action features based on a stereo camera according to claim 1, wherein the step (4) specifically comprises: orthogonalizing the rotating part matrix in the first position matrix to obtain a second position matrixComprises the following steps:

{R_{C}^{O}}^{'} = U^{T} \cdot d i a g (\begin{matrix} λ_{1}^{- \frac{1}{2}} & λ_{2}^{- \frac{1}{2}} & λ_{3}^{- \frac{1}{2}} \end{matrix}) \cdot U \cdot R_{C}^{O}

wherein,for a rotating partial matrix, U and λ satisfy

R_{C}^{O} \cdot {R_{C}^{O}}^{T} = U^{T} \cdot d i a g (\begin{matrix} λ_{1} & λ_{2} & λ_{3} \end{matrix}) \cdot U .

6. The method for extracting skeleton information action features based on a stereo camera according to claim 1, wherein the step (5) specifically comprises: and converting the coordinates of each joint point of the human body into a human body coordinate system through coordinate transformation to obtain the relative position of each joint point of the human body with respect to the human body coordinate system.

7. The method for extracting skeleton information action features based on a stereo camera according to claim 1, wherein the smoothing process specifically comprises: and performing arithmetic average on the obtained rotation and translation matrixes between two adjacent frames.

8. The method for extracting skeleton information action features based on a stereo camera according to claim 1, wherein the step (7) specifically comprises: and (5) taking the relative position in the step (5) and the translation variation and the rotation variation in the step (6) as the feature vector of the current moment.