CN112101102A

CN112101102A - Method for acquiring 3D limb movement in RGB video based on artificial intelligence

Info

Publication number: CN112101102A
Application number: CN202010789617.1A
Authority: CN
Inventors: 方浩树; 何书廉; 刘烨斌; 陆晓飞; 徐阳
Original assignee: Yiyun Zhixing Shenzhen Technology Co ltd
Current assignee: Yiyun Zhixing Shenzhen Technology Co ltd
Priority date: 2020-08-07
Filing date: 2020-08-07
Publication date: 2020-12-18

Abstract

The invention relates to the technical field of human body action recognition and acquisition, in particular to a method for acquiring 3D limb actions in RGB (red, green and blue) videos based on artificial intelligence. The method for acquiring the 3D limb movement in the RGB video based on the artificial intelligence receives RGB visual information containing the human body through a server, further calculates the position of the human body from the video, carries out data standardization on the human body information, calculates characteristic data from the position of the human body in the video and the position of key points of the human body, then inputs the characteristic data into a locally stored deep learning model, the deep learning model is trained by using a large amount of RGB video data containing the human body collected by the method, the deep learning model outputs three-dimensional numerical values corresponding to the limb key points, and finally, the three-dimensional numerical values corresponding to the limb key points are automatically optimized to be a final result, so that more detailed limb movement is output.

Description

Method for acquiring 3D limb movement in RGB video based on artificial intelligence

Technical Field

The invention relates to the technical field of human body action recognition and acquisition, in particular to a method for acquiring 3D limb actions in RGB (red, green and blue) videos based on artificial intelligence.

Background

With the development of computer vision technology, motion recognition by using video acquisition equipment becomes a research focus. In the existing action recognition method, data such as joint positions need to be extracted from a video stream, the data are input into a three-layer bidirectional long-time memory cycle artificial neural network, and dynamic features of the data are extracted by the neural network. And then, inputting the extracted dynamic characteristics into a classifier network, and finally acquiring action types corresponding to the data of the video stream.

At present, the video analysis technology based on deep learning develops rapidly, such as: pose estimation, motion tracking, face feature point detection, etc., a large amount of important information can be extracted from videos and images by computer vision algorithms. For recognizing the limb movement from the video, the existing technology generally only outputs crude information (such as standing, sitting and the like) as a label of the limb movement, and cannot output more detailed limb movement.

Disclosure of Invention

In order to solve the problems, the invention provides a method for acquiring 3D limb actions in RGB (red, green and blue) videos based on artificial intelligence, which mainly aims at the recognition practicability of the limb actions, develops and develops a deep learning model to directly analyze the limb actions in the RGB videos, and outputs three-dimensional numerical value corresponding to the key points of the limbs to express the detailed limb actions.

In order to achieve the purpose, the invention adopts the technical scheme that: a method for acquiring 3D limb actions in an RGB video based on artificial intelligence comprises the following algorithm steps:

s1, receiving RGB video information containing a human body by a server end;

s2, calculating the position of the human body from the video: taking out each frame from the video, temporarily storing the frame in an image format, and inputting each picture into a human body key point detection system to obtain X and Y coordinates of key points;

s3, calculating human body feature point detection from the video: extracting human body features based on the obtained key point coordinates, and distinguishing feature point groups by different parts of the human body;

s4, carrying out data standardization on human body information: carrying out data standardization on each feature point group;

s5, extracting characteristic data of human body information: the standardized feature point group becomes different feature data;

s6, inputting the feature data into a locally stored deep learning model;

s7, calculating a three-dimensional numerical value corresponding to the limb key point by using a deep learning model;

and S8, automatically optimizing the three-dimensional numerical values corresponding to the output limb key points.

Further, in S1, the user uploads the video to the server via the network interface, and the human body information received by the server is the human body information selected by the user.

Further, in S1, the RGB video including the human body is acquired by shooting or locally acquiring.

Wherein, in S3, the human body part includes a left arm, a right arm, a left leg, a right leg, a torso, and a head.

Further, in S4, with P ═ { P1, P2,. pn } as all (n) feature points, the normalized feature point group P' is calculated as follows:

Q＝P/(max(P)–min(P))

P’＝Q-mean(Q)。

further, in S7, feature data P 'is input and a three-dimensional value bs ═ P' × M + b corresponding to the limb key points is calculated, where M and b are a convolution kernel parameter and a bias layer parameter of the depth network, respectively, and the parameters are obtained from the deep learning training process.

Further, in S7, the deep learning model learns the correlation between the feature data of the human body information and the three-dimensional values corresponding to the limb key points in the training data using the multi-layer neural network.

The invention has the beneficial effects that: the method for acquiring the 3D limb movement in the RGB video based on the artificial intelligence receives RGB visual information containing the human body through a server, further calculates the position of the human body from the video, carries out data standardization on the human body information, calculates characteristic data from the position of the human body in the video and the position of key points of the human body, then inputs the characteristic data into a locally stored deep learning model, the deep learning model is trained by using a large amount of RGB video data containing the human body collected by the method, the deep learning model outputs three-dimensional numerical values corresponding to the limb key points, and finally, the three-dimensional numerical values corresponding to the limb key points are automatically optimized to be a final result, so that more detailed limb movement is output.

Drawings

Fig. 1 is a block flow diagram of the present embodiment.

Detailed Description

The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. The present application may be embodied in many different forms and is not limited to the embodiments described in the present embodiment. The following detailed description is provided to facilitate a more thorough understanding of the present disclosure.

Referring to fig. 1, the invention relates to a method for acquiring 3D limb movement in RGB video based on artificial intelligence, comprising the following algorithm steps:

s1, receiving RGB video information containing a human body by a server: a user uploads videos to a server through a network interface (such as a website by using an HTTP hypertext transfer protocol), the human body information received by the server is the human body information selected by the user, and the RGB videos containing the human body are obtained by shooting or locally obtaining;

s3, calculating human body feature point detection from the video: extracting human body features based on the obtained key point coordinates, and distinguishing feature point groups by different parts of a human body, wherein the feature point groups comprise a left arm, a right arm, a left leg, a right leg, a trunk and a head;

s4, carrying out data standardization on human body information: data normalization is performed on each feature point group, taking P ═ { P1, P2,. and pn } as an example of all (n) feature points, and the normalized feature point group P' is calculated as follows:

Q＝P/(max(P)–min(P))

P’＝Q-mean(Q)；

s6, inputting the feature data into a locally stored deep learning model;

s7, calculating a three-dimensional numerical value corresponding to the limb key point by the deep learning model: the deep learning model learns the correlation between the feature data of human body information and the three-dimensional values corresponding to the limb key points in training data by using a multilayer neural network, inputs the feature data P 'and calculates the three-dimensional values bs (P'. M + b) corresponding to the limb key points, wherein M and b are convolution kernel parameters and bias layer parameters of the deep network respectively, and the parameters are obtained in the deep learning training process;

From the above, the method for acquiring the 3D limb movement in the RGB video of the present embodiment mainly includes the following steps: calculating the position of the human body from the video; calculating human body key point detection from the video; extracting feature data of key points of a human body; and inputting the characteristic data into the deep learning model, and calculating a three-dimensional numerical value corresponding to the limb key point. In the deep learning model of the embodiment, the correlation between the feature data of the human key points and the three-dimensional numerical values corresponding to the limb key points is learned in the training data by using the multilayer neural network. In addition, the embodiment also collects a large amount of RGB video data containing human bodies, and labels three-dimensional numerical values corresponding to the limb key points on each section of video for deep learning model training.

Compared with the prior art, the method for acquiring the 3D limb movement in the RGB video in the embodiment includes: firstly, analyzing the tiny changes and actions of a human body through limb images in an RGB video, and identifying the limb actions by using a deep learning model; when analyzing the slight change of the limb part in the limb action, acquiring the key point information of the human body and extracting the feature code of the key point information of the human body; then, the extracted feature codes are used as input information of a deep learning model; and finally, analyzing the received feature codes through a deep learning model, and calculating a three-dimensional numerical value corresponding to the limb key point as feedback. In the body motion recognition process, the RGB video is directly used, other hardware such as a depth camera or a certain brand of smart phones is not needed, the detailed three-dimensional numerical values corresponding to the body key points are output to express detailed motion, and the method can be applied to movies, 3D animations, virtual characters and the like.

It should be further noted that, unless otherwise explicitly stated or limited, terms such as "obtaining," "extracting," "outputting," and the like are to be construed broadly, and specific meanings of the above terms in the present application will be understood by those skilled in the art according to specific situations.

The above embodiments are merely illustrative of the preferred embodiments of the present invention, and not restrictive, and various changes and modifications to the technical solutions of the present invention may be made by those skilled in the art without departing from the spirit of the present invention, and the technical solutions of the present invention are intended to fall within the scope of the present invention defined by the appended claims.

Claims

1. A method for acquiring 3D limb actions in RGB video based on artificial intelligence is characterized by comprising the following steps: the algorithm comprises the following steps:

s1, receiving RGB video information containing a human body by a server end;

s6, inputting the feature data into a locally stored deep learning model;

2. The method for acquiring 3D limb movement in RGB video based on artificial intelligence as claimed in claim 1, wherein: in S1, the user uploads the video to the server via the network interface, and the human body information received by the server is the human body information selected by the user.

3. The method for acquiring 3D limb movement in RGB video based on artificial intelligence as claimed in claim 1, wherein: in S1, the RGB video including the human body is acquired by shooting or locally acquiring.

4. The method for acquiring 3D limb movement in RGB video based on artificial intelligence as claimed in claim 1, wherein: at S3, the human body part includes a left arm, a right arm, a left leg, a right leg, a torso, and a head.

5. The method for acquiring 3D limb movement in RGB video based on artificial intelligence as claimed in claim 1, wherein: in S4, the normalized feature point group P' is calculated by using P ═ { P1, P2,. and pn } as all (n) feature points as follows:

Q＝P/(max(P)–min(P))

P’＝Q-mean(Q)。

6. the method for acquiring 3D limb movement in RGB video based on artificial intelligence as claimed in claim 1, wherein: in S7, feature data P 'is input to calculate a three-dimensional value bs ═ P' × M + b corresponding to the limb key points, where M and b are convolution kernel parameters and bias layer parameters of the depth network, respectively, which are obtained from the deep learning training process.

7. The method for acquiring 3D limb movement in RGB video based on artificial intelligence as claimed in claim 1, wherein: in S7, the deep learning model learns the correlation between the feature data of the human body information and the three-dimensional numerical values corresponding to the limb key points in the training data using the multilayer neural network.