CN112101102A - Method for acquiring 3D limb movement in RGB video based on artificial intelligence - Google Patents

Method for acquiring 3D limb movement in RGB video based on artificial intelligence Download PDF

Info

Publication number
CN112101102A
CN112101102A CN202010789617.1A CN202010789617A CN112101102A CN 112101102 A CN112101102 A CN 112101102A CN 202010789617 A CN202010789617 A CN 202010789617A CN 112101102 A CN112101102 A CN 112101102A
Authority
CN
China
Prior art keywords
human body
limb
acquiring
artificial intelligence
rgb video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010789617.1A
Other languages
Chinese (zh)
Inventor
方浩树
何书廉
刘烨斌
陆晓飞
徐阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yiyun Zhixing Shenzhen Technology Co ltd
Original Assignee
Yiyun Zhixing Shenzhen Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yiyun Zhixing Shenzhen Technology Co ltd filed Critical Yiyun Zhixing Shenzhen Technology Co ltd
Priority to CN202010789617.1A priority Critical patent/CN112101102A/en
Publication of CN112101102A publication Critical patent/CN112101102A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of human body action recognition and acquisition, in particular to a method for acquiring 3D limb actions in RGB (red, green and blue) videos based on artificial intelligence. The method for acquiring the 3D limb movement in the RGB video based on the artificial intelligence receives RGB visual information containing the human body through a server, further calculates the position of the human body from the video, carries out data standardization on the human body information, calculates characteristic data from the position of the human body in the video and the position of key points of the human body, then inputs the characteristic data into a locally stored deep learning model, the deep learning model is trained by using a large amount of RGB video data containing the human body collected by the method, the deep learning model outputs three-dimensional numerical values corresponding to the limb key points, and finally, the three-dimensional numerical values corresponding to the limb key points are automatically optimized to be a final result, so that more detailed limb movement is output.

Description

Method for acquiring 3D limb movement in RGB video based on artificial intelligence
Technical Field
The invention relates to the technical field of human body action recognition and acquisition, in particular to a method for acquiring 3D limb actions in RGB (red, green and blue) videos based on artificial intelligence.
Background
With the development of computer vision technology, motion recognition by using video acquisition equipment becomes a research focus. In the existing action recognition method, data such as joint positions need to be extracted from a video stream, the data are input into a three-layer bidirectional long-time memory cycle artificial neural network, and dynamic features of the data are extracted by the neural network. And then, inputting the extracted dynamic characteristics into a classifier network, and finally acquiring action types corresponding to the data of the video stream.
At present, the video analysis technology based on deep learning develops rapidly, such as: pose estimation, motion tracking, face feature point detection, etc., a large amount of important information can be extracted from videos and images by computer vision algorithms. For recognizing the limb movement from the video, the existing technology generally only outputs crude information (such as standing, sitting and the like) as a label of the limb movement, and cannot output more detailed limb movement.
Disclosure of Invention
In order to solve the problems, the invention provides a method for acquiring 3D limb actions in RGB (red, green and blue) videos based on artificial intelligence, which mainly aims at the recognition practicability of the limb actions, develops and develops a deep learning model to directly analyze the limb actions in the RGB videos, and outputs three-dimensional numerical value corresponding to the key points of the limbs to express the detailed limb actions.
In order to achieve the purpose, the invention adopts the technical scheme that: a method for acquiring 3D limb actions in an RGB video based on artificial intelligence comprises the following algorithm steps:
s1, receiving RGB video information containing a human body by a server end;
s2, calculating the position of the human body from the video: taking out each frame from the video, temporarily storing the frame in an image format, and inputting each picture into a human body key point detection system to obtain X and Y coordinates of key points;
s3, calculating human body feature point detection from the video: extracting human body features based on the obtained key point coordinates, and distinguishing feature point groups by different parts of the human body;
s4, carrying out data standardization on human body information: carrying out data standardization on each feature point group;
s5, extracting characteristic data of human body information: the standardized feature point group becomes different feature data;
s6, inputting the feature data into a locally stored deep learning model;
s7, calculating a three-dimensional numerical value corresponding to the limb key point by using a deep learning model;
and S8, automatically optimizing the three-dimensional numerical values corresponding to the output limb key points.
Further, in S1, the user uploads the video to the server via the network interface, and the human body information received by the server is the human body information selected by the user.
Further, in S1, the RGB video including the human body is acquired by shooting or locally acquiring.
Wherein, in S3, the human body part includes a left arm, a right arm, a left leg, a right leg, a torso, and a head.
Further, in S4, with P ═ { P1, P2,. pn } as all (n) feature points, the normalized feature point group P' is calculated as follows:
Q=P/(max(P)–min(P))
P’=Q-mean(Q)。
further, in S7, feature data P 'is input and a three-dimensional value bs ═ P' × M + b corresponding to the limb key points is calculated, where M and b are a convolution kernel parameter and a bias layer parameter of the depth network, respectively, and the parameters are obtained from the deep learning training process.
Further, in S7, the deep learning model learns the correlation between the feature data of the human body information and the three-dimensional values corresponding to the limb key points in the training data using the multi-layer neural network.
The invention has the beneficial effects that: the method for acquiring the 3D limb movement in the RGB video based on the artificial intelligence receives RGB visual information containing the human body through a server, further calculates the position of the human body from the video, carries out data standardization on the human body information, calculates characteristic data from the position of the human body in the video and the position of key points of the human body, then inputs the characteristic data into a locally stored deep learning model, the deep learning model is trained by using a large amount of RGB video data containing the human body collected by the method, the deep learning model outputs three-dimensional numerical values corresponding to the limb key points, and finally, the three-dimensional numerical values corresponding to the limb key points are automatically optimized to be a final result, so that more detailed limb movement is output.
Drawings
Fig. 1 is a block flow diagram of the present embodiment.
Detailed Description
The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. The present application may be embodied in many different forms and is not limited to the embodiments described in the present embodiment. The following detailed description is provided to facilitate a more thorough understanding of the present disclosure.
Referring to fig. 1, the invention relates to a method for acquiring 3D limb movement in RGB video based on artificial intelligence, comprising the following algorithm steps:
s1, receiving RGB video information containing a human body by a server: a user uploads videos to a server through a network interface (such as a website by using an HTTP hypertext transfer protocol), the human body information received by the server is the human body information selected by the user, and the RGB videos containing the human body are obtained by shooting or locally obtaining;
s2, calculating the position of the human body from the video: taking out each frame from the video, temporarily storing the frame in an image format, and inputting each picture into a human body key point detection system to obtain X and Y coordinates of key points;
s3, calculating human body feature point detection from the video: extracting human body features based on the obtained key point coordinates, and distinguishing feature point groups by different parts of a human body, wherein the feature point groups comprise a left arm, a right arm, a left leg, a right leg, a trunk and a head;
s4, carrying out data standardization on human body information: data normalization is performed on each feature point group, taking P ═ { P1, P2,. and pn } as an example of all (n) feature points, and the normalized feature point group P' is calculated as follows:
Q=P/(max(P)–min(P))
P’=Q-mean(Q);
s5, extracting characteristic data of human body information: the standardized feature point group becomes different feature data;
s6, inputting the feature data into a locally stored deep learning model;
s7, calculating a three-dimensional numerical value corresponding to the limb key point by the deep learning model: the deep learning model learns the correlation between the feature data of human body information and the three-dimensional values corresponding to the limb key points in training data by using a multilayer neural network, inputs the feature data P 'and calculates the three-dimensional values bs (P'. M + b) corresponding to the limb key points, wherein M and b are convolution kernel parameters and bias layer parameters of the deep network respectively, and the parameters are obtained in the deep learning training process;
and S8, automatically optimizing the three-dimensional numerical values corresponding to the output limb key points.
From the above, the method for acquiring the 3D limb movement in the RGB video of the present embodiment mainly includes the following steps: calculating the position of the human body from the video; calculating human body key point detection from the video; extracting feature data of key points of a human body; and inputting the characteristic data into the deep learning model, and calculating a three-dimensional numerical value corresponding to the limb key point. In the deep learning model of the embodiment, the correlation between the feature data of the human key points and the three-dimensional numerical values corresponding to the limb key points is learned in the training data by using the multilayer neural network. In addition, the embodiment also collects a large amount of RGB video data containing human bodies, and labels three-dimensional numerical values corresponding to the limb key points on each section of video for deep learning model training.
Compared with the prior art, the method for acquiring the 3D limb movement in the RGB video in the embodiment includes: firstly, analyzing the tiny changes and actions of a human body through limb images in an RGB video, and identifying the limb actions by using a deep learning model; when analyzing the slight change of the limb part in the limb action, acquiring the key point information of the human body and extracting the feature code of the key point information of the human body; then, the extracted feature codes are used as input information of a deep learning model; and finally, analyzing the received feature codes through a deep learning model, and calculating a three-dimensional numerical value corresponding to the limb key point as feedback. In the body motion recognition process, the RGB video is directly used, other hardware such as a depth camera or a certain brand of smart phones is not needed, the detailed three-dimensional numerical values corresponding to the body key points are output to express detailed motion, and the method can be applied to movies, 3D animations, virtual characters and the like.
It should be further noted that, unless otherwise explicitly stated or limited, terms such as "obtaining," "extracting," "outputting," and the like are to be construed broadly, and specific meanings of the above terms in the present application will be understood by those skilled in the art according to specific situations.
The above embodiments are merely illustrative of the preferred embodiments of the present invention, and not restrictive, and various changes and modifications to the technical solutions of the present invention may be made by those skilled in the art without departing from the spirit of the present invention, and the technical solutions of the present invention are intended to fall within the scope of the present invention defined by the appended claims.

Claims (7)

1. A method for acquiring 3D limb actions in RGB video based on artificial intelligence is characterized by comprising the following steps: the algorithm comprises the following steps:
s1, receiving RGB video information containing a human body by a server end;
s2, calculating the position of the human body from the video: taking out each frame from the video, temporarily storing the frame in an image format, and inputting each picture into a human body key point detection system to obtain X and Y coordinates of key points;
s3, calculating human body feature point detection from the video: extracting human body features based on the obtained key point coordinates, and distinguishing feature point groups by different parts of the human body;
s4, carrying out data standardization on human body information: carrying out data standardization on each feature point group;
s5, extracting characteristic data of human body information: the standardized feature point group becomes different feature data;
s6, inputting the feature data into a locally stored deep learning model;
s7, calculating a three-dimensional numerical value corresponding to the limb key point by using a deep learning model;
and S8, automatically optimizing the three-dimensional numerical values corresponding to the output limb key points.
2. The method for acquiring 3D limb movement in RGB video based on artificial intelligence as claimed in claim 1, wherein: in S1, the user uploads the video to the server via the network interface, and the human body information received by the server is the human body information selected by the user.
3. The method for acquiring 3D limb movement in RGB video based on artificial intelligence as claimed in claim 1, wherein: in S1, the RGB video including the human body is acquired by shooting or locally acquiring.
4. The method for acquiring 3D limb movement in RGB video based on artificial intelligence as claimed in claim 1, wherein: at S3, the human body part includes a left arm, a right arm, a left leg, a right leg, a torso, and a head.
5. The method for acquiring 3D limb movement in RGB video based on artificial intelligence as claimed in claim 1, wherein: in S4, the normalized feature point group P' is calculated by using P ═ { P1, P2,. and pn } as all (n) feature points as follows:
Q=P/(max(P)–min(P))
P’=Q-mean(Q)。
6. the method for acquiring 3D limb movement in RGB video based on artificial intelligence as claimed in claim 1, wherein: in S7, feature data P 'is input to calculate a three-dimensional value bs ═ P' × M + b corresponding to the limb key points, where M and b are convolution kernel parameters and bias layer parameters of the depth network, respectively, which are obtained from the deep learning training process.
7. The method for acquiring 3D limb movement in RGB video based on artificial intelligence as claimed in claim 1, wherein: in S7, the deep learning model learns the correlation between the feature data of the human body information and the three-dimensional numerical values corresponding to the limb key points in the training data using the multilayer neural network.
CN202010789617.1A 2020-08-07 2020-08-07 Method for acquiring 3D limb movement in RGB video based on artificial intelligence Pending CN112101102A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010789617.1A CN112101102A (en) 2020-08-07 2020-08-07 Method for acquiring 3D limb movement in RGB video based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010789617.1A CN112101102A (en) 2020-08-07 2020-08-07 Method for acquiring 3D limb movement in RGB video based on artificial intelligence

Publications (1)

Publication Number Publication Date
CN112101102A true CN112101102A (en) 2020-12-18

Family

ID=73752698

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010789617.1A Pending CN112101102A (en) 2020-08-07 2020-08-07 Method for acquiring 3D limb movement in RGB video based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN112101102A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460945A (en) * 2020-03-25 2020-07-28 亿匀智行(深圳)科技有限公司 Algorithm for acquiring 3D expression in RGB video based on artificial intelligence
CN111488824A (en) * 2020-04-09 2020-08-04 北京百度网讯科技有限公司 Motion prompting method and device, electronic equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460945A (en) * 2020-03-25 2020-07-28 亿匀智行(深圳)科技有限公司 Algorithm for acquiring 3D expression in RGB video based on artificial intelligence
CN111488824A (en) * 2020-04-09 2020-08-04 北京百度网讯科技有限公司 Motion prompting method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110135249B (en) Human behavior identification method based on time attention mechanism and LSTM (least Square TM)
Du et al. Representation learning of temporal dynamics for skeleton-based action recognition
Zhou et al. Activity analysis, summarization, and visualization for indoor human activity monitoring
WO2019174439A1 (en) Image recognition method and apparatus, and terminal and storage medium
KR102174595B1 (en) System and method for identifying faces in unconstrained media
CN110889672B (en) Student card punching and class taking state detection system based on deep learning
CN112418095A (en) Facial expression recognition method and system combined with attention mechanism
Murtaza et al. Analysis of face recognition under varying facial expression: a survey.
CN112800903B (en) Dynamic expression recognition method and system based on space-time diagram convolutional neural network
CN111770299B (en) Method and system for real-time face abstract service of intelligent video conference terminal
CN109635727A (en) A kind of facial expression recognizing method and device
Nguyen et al. Static hand gesture recognition using artificial neural network
KR101563297B1 (en) Method and apparatus for recognizing action in video
Rao et al. Sign Language Recognition System Simulated for Video Captured with Smart Phone Front Camera.
CN110458235B (en) Motion posture similarity comparison method in video
CN110633624A (en) Machine vision human body abnormal behavior identification method based on multi-feature fusion
CN113255522A (en) Personalized motion attitude estimation and analysis method and system based on time consistency
CN111898571A (en) Action recognition system and method
CN112489129A (en) Pose recognition model training method and device, pose recognition method and terminal equipment
CN112906520A (en) Gesture coding-based action recognition method and device
CN114120389A (en) Network training and video frame processing method, device, equipment and storage medium
CN111460945A (en) Algorithm for acquiring 3D expression in RGB video based on artificial intelligence
Megalingam Human action recognition: A review
CN110348395B (en) Skeleton behavior identification method based on space-time relationship
CN116229507A (en) Human body posture detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 518000 717, building r2-a, Gaoxin industrial village, No. 020, Gaoxin South seventh Road, Gaoxin community, Yuehai street, Nanshan District, Shenzhen, Guangdong

Applicant after: Yiyun Zhixing (Shenzhen) Technology Co.,Ltd.

Address before: 518000 1403a-1005, east block, Coast Building, No. 15, Haide Third Road, Haizhu community, Yuehai street, Nanshan District, Shenzhen, Guangdong

Applicant before: Yiyun Zhixing (Shenzhen) Technology Co.,Ltd.