CN111460945A

CN111460945A - Algorithm for acquiring 3D expression in RGB video based on artificial intelligence

Info

Publication number: CN111460945A
Application number: CN202010215726.2A
Authority: CN
Inventors: 高立艳; 何書廉; 陆晓飞; 徐阳; 刘烨斌; 方浩树
Original assignee: Yiyun Zhixing Shenzhen Technology Co ltd
Current assignee: Yiyun Zhixing Shenzhen Technology Co ltd
Priority date: 2020-03-25
Filing date: 2020-03-25
Publication date: 2020-07-28

Abstract

The invention discloses an algorithm for acquiring 3D expression in RGB video based on artificial intelligence, which comprises the following steps: s1, receiving RGB video information containing human faces by the server; s2, calculating the position of the face from the video; s3, calculating face characteristic point detection from the video; s4, carrying out data standardization on the face information; s5, extracting feature data of the face information; s6, inputting the feature data into a locally stored deep learning model; s7, calculating a Blend Shape value by the deep learning model; and S8, automatically optimizing the output Blend Shape value. The invention has the advantages that: the method does not need excessive hardware equipment, can output detailed Blend Shape values, and can be applied to 3D animation production.

Description

Algorithm for acquiring 3D expression in RGB video based on artificial intelligence

Technical Field

The invention relates to the field of expression recognition, in particular to an algorithm for acquiring a 3D expression in an RGB video based on artificial intelligence.

Background

With the progress of science and technology, the video analysis technology based on deep learning develops rapidly, such as: pose estimation, motion tracking, face feature point detection, etc., a large amount of important information can be extracted from videos and images by computer vision algorithms.

For recognizing facial expressions from video, the current technology generally only outputs crude information, such as: happiness, anger, sadness, happiness, and the like, which are used as labels of facial expressions, or are bound on API development software of a certain brand smart phone, such as: ARKit for apple mobile.

Disclosure of Invention

The invention aims to solve the technical problem of providing an algorithm which is used for acquiring 3D expression in an RGB video based on artificial intelligence, does not need excessive hardware equipment, can output detailed Blend Shape value and can be applied to 3D animation production.

In order to solve the technical problems, the technical scheme provided by the invention is as follows: an algorithm for acquiring 3D expression in RGB video based on artificial intelligence comprises the following steps:

an algorithm for acquiring 3D expression in RGB video based on artificial intelligence comprises the following steps:

s1, the user uploads the video to a server through a network interface, and the server receives RGB video information containing human faces;

s2, taking out each frame from the video and temporarily storing the frame in an image format, and inputting each image into a human face key point detection system of D L ib to obtain the X and Y coordinates of the key points;

s3, extracting face features based on the obtained key point coordinates, and distinguishing feature point groups by different parts of the face;

s4, normalizing the data of each feature point group, taking P = { P1, P2.., pn } as all (n) feature points as an example, the normalized feature point group P' is calculated by the following formula:

Q = P / (max(P) – min(P))

P’= Q - mean(Q)；

s5, extracting feature data of the face information, wherein the standardized feature point group becomes different feature data;

s6, inputting the feature data into a locally stored deep learning model;

s7, calculating a BlendShape value by the deep learning model, inputting the feature data P' and calculating the BlendShape value bs by the deep learning model, wherein the formula is as follows:

bs = P’ * M + b

wherein M and b are obtained from the deep learning training process;

and S8, automatically optimizing the output Blend Shape value.

Further, the face information received by the server in S1 is the face information selected by the user.

Further, the feature point group in S3 includes a left eyebrow, a right eyebrow, a left eye, a right eye, a nose, and a mouth.

Further, the deep learning model in S7 learns the correlation between the feature data of the face information and the Blend Shape value in the training data by using a multi-layer neural network.

Compared with the prior art, the invention has the advantages that: the cascade communication resource graphical interaction method based on the intelligent terminal comprises the steps that a server end receives RGB video information containing human faces, the positions of the human faces are calculated from videos, carrying out data standardization on face information, calculating characteristic data from the positions of faces and the positions of face characteristic points in a video, inputting the characteristic data into a locally stored deep learning model, the deep learning model is trained by using a large amount of RGB video data containing human faces collected by the invention, the deep learning model outputs Blend Shape value, and finally, the Blend Shape value is automatically optimized to be the final result, in the facial expression recognition process, the RGB video is directly used, other hardware such as a depth camera or a certain brand of smart phone is not needed, and outputting detailed Blend Shape numerical expression detailed expressions, and can be applied to production of movies, 3D animations and virtual characters.

Drawings

FIG. 1 is a flow chart of an algorithm for obtaining 3D expressions in RGB video based on artificial intelligence.

Detailed Description

Examples

S1, uploading the video to a server by a user through a network interface (such as a website by HTTP hypertext transfer protocol), and receiving RGB video information containing human faces by the server;

s3, extracting face features based on the obtained key point coordinates, and distinguishing feature point groups by different parts of the face, wherein the feature point groups comprise a left eyebrow, a right eyebrow, a left eye, a right eye, a nose and a mouth;

Q = P / (max(P) – min(P))

P’= Q - mean(Q)；

s6, inputting the feature data into a locally stored deep learning model;

bs = P’ * M + b

wherein M and b are obtained from the deep learning training process;

and S8, automatically optimizing the output Blend Shape value.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. All such modifications and variations are intended to be included herein within the scope of this disclosure and the present invention.

Claims

1. An algorithm for acquiring 3D expression in RGB video based on artificial intelligence is characterized by comprising the following steps:

Q = P / (max(P) – min(P))

P’= Q - mean(Q)；

s6, inputting the feature data into a locally stored deep learning model;

bs = P’ * M + b

wherein M and b are obtained from the deep learning training process;

and S8, automatically optimizing the output Blend Shape value.

2. The algorithm for acquiring 3D expression in RGB video based on artificial intelligence as claimed in claim 1, wherein: the face information received by the server in S1 is the face information selected by the user.

3. The algorithm for acquiring 3D expression in RGB video based on artificial intelligence as claimed in claim 1, wherein: the feature point group in S3 includes a left eyebrow, a right eyebrow, a left eye, a right eye, a nose, and a mouth.

4. The algorithm for acquiring 3D expression in RGB video based on artificial intelligence as claimed in claim 1, wherein: the deep learning model in the S7 learns the correlation between the feature data of the face information and the Blend Shape value in the training data by using the multilayer neural network.