CN111460945A - Algorithm for acquiring 3D expression in RGB video based on artificial intelligence - Google Patents

Algorithm for acquiring 3D expression in RGB video based on artificial intelligence Download PDF

Info

Publication number
CN111460945A
CN111460945A CN202010215726.2A CN202010215726A CN111460945A CN 111460945 A CN111460945 A CN 111460945A CN 202010215726 A CN202010215726 A CN 202010215726A CN 111460945 A CN111460945 A CN 111460945A
Authority
CN
China
Prior art keywords
deep learning
rgb video
face
expression
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010215726.2A
Other languages
Chinese (zh)
Inventor
高立艳
何書廉
陆晓飞
徐阳
刘烨斌
方浩树
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yiyun Zhixing Shenzhen Technology Co ltd
Original Assignee
Yiyun Zhixing Shenzhen Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yiyun Zhixing Shenzhen Technology Co ltd filed Critical Yiyun Zhixing Shenzhen Technology Co ltd
Priority to CN202010215726.2A priority Critical patent/CN111460945A/en
Publication of CN111460945A publication Critical patent/CN111460945A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses an algorithm for acquiring 3D expression in RGB video based on artificial intelligence, which comprises the following steps: s1, receiving RGB video information containing human faces by the server; s2, calculating the position of the face from the video; s3, calculating face characteristic point detection from the video; s4, carrying out data standardization on the face information; s5, extracting feature data of the face information; s6, inputting the feature data into a locally stored deep learning model; s7, calculating a Blend Shape value by the deep learning model; and S8, automatically optimizing the output Blend Shape value. The invention has the advantages that: the method does not need excessive hardware equipment, can output detailed Blend Shape values, and can be applied to 3D animation production.

Description

Algorithm for acquiring 3D expression in RGB video based on artificial intelligence
Technical Field
The invention relates to the field of expression recognition, in particular to an algorithm for acquiring a 3D expression in an RGB video based on artificial intelligence.
Background
With the progress of science and technology, the video analysis technology based on deep learning develops rapidly, such as: pose estimation, motion tracking, face feature point detection, etc., a large amount of important information can be extracted from videos and images by computer vision algorithms.
For recognizing facial expressions from video, the current technology generally only outputs crude information, such as: happiness, anger, sadness, happiness, and the like, which are used as labels of facial expressions, or are bound on API development software of a certain brand smart phone, such as: ARKit for apple mobile.
Disclosure of Invention
The invention aims to solve the technical problem of providing an algorithm which is used for acquiring 3D expression in an RGB video based on artificial intelligence, does not need excessive hardware equipment, can output detailed Blend Shape value and can be applied to 3D animation production.
In order to solve the technical problems, the technical scheme provided by the invention is as follows: an algorithm for acquiring 3D expression in RGB video based on artificial intelligence comprises the following steps:
an algorithm for acquiring 3D expression in RGB video based on artificial intelligence comprises the following steps:
s1, the user uploads the video to a server through a network interface, and the server receives RGB video information containing human faces;
s2, taking out each frame from the video and temporarily storing the frame in an image format, and inputting each image into a human face key point detection system of D L ib to obtain the X and Y coordinates of the key points;
s3, extracting face features based on the obtained key point coordinates, and distinguishing feature point groups by different parts of the face;
s4, normalizing the data of each feature point group, taking P = { P1, P2.., pn } as all (n) feature points as an example, the normalized feature point group P' is calculated by the following formula:
Q = P / (max(P) – min(P))
P’= Q - mean(Q);
s5, extracting feature data of the face information, wherein the standardized feature point group becomes different feature data;
s6, inputting the feature data into a locally stored deep learning model;
s7, calculating a BlendShape value by the deep learning model, inputting the feature data P' and calculating the BlendShape value bs by the deep learning model, wherein the formula is as follows:
bs = P’ * M + b
wherein M and b are obtained from the deep learning training process;
and S8, automatically optimizing the output Blend Shape value.
Further, the face information received by the server in S1 is the face information selected by the user.
Further, the feature point group in S3 includes a left eyebrow, a right eyebrow, a left eye, a right eye, a nose, and a mouth.
Further, the deep learning model in S7 learns the correlation between the feature data of the face information and the Blend Shape value in the training data by using a multi-layer neural network.
Compared with the prior art, the invention has the advantages that: the cascade communication resource graphical interaction method based on the intelligent terminal comprises the steps that a server end receives RGB video information containing human faces, the positions of the human faces are calculated from videos, carrying out data standardization on face information, calculating characteristic data from the positions of faces and the positions of face characteristic points in a video, inputting the characteristic data into a locally stored deep learning model, the deep learning model is trained by using a large amount of RGB video data containing human faces collected by the invention, the deep learning model outputs Blend Shape value, and finally, the Blend Shape value is automatically optimized to be the final result, in the facial expression recognition process, the RGB video is directly used, other hardware such as a depth camera or a certain brand of smart phone is not needed, and outputting detailed Blend Shape numerical expression detailed expressions, and can be applied to production of movies, 3D animations and virtual characters.
Drawings
FIG. 1 is a flow chart of an algorithm for obtaining 3D expressions in RGB video based on artificial intelligence.
Detailed Description
Examples
S1, uploading the video to a server by a user through a network interface (such as a website by HTTP hypertext transfer protocol), and receiving RGB video information containing human faces by the server;
s2, taking out each frame from the video and temporarily storing the frame in an image format, and inputting each image into a human face key point detection system of D L ib to obtain the X and Y coordinates of the key points;
s3, extracting face features based on the obtained key point coordinates, and distinguishing feature point groups by different parts of the face, wherein the feature point groups comprise a left eyebrow, a right eyebrow, a left eye, a right eye, a nose and a mouth;
s4, normalizing the data of each feature point group, taking P = { P1, P2.., pn } as all (n) feature points as an example, the normalized feature point group P' is calculated by the following formula:
Q = P / (max(P) – min(P))
P’= Q - mean(Q);
s5, extracting feature data of the face information, wherein the standardized feature point group becomes different feature data;
s6, inputting the feature data into a locally stored deep learning model;
s7, calculating a BlendShape value by the deep learning model, inputting the feature data P' and calculating the BlendShape value bs by the deep learning model, wherein the formula is as follows:
bs = P’ * M + b
wherein M and b are obtained from the deep learning training process;
and S8, automatically optimizing the output Blend Shape value.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. All such modifications and variations are intended to be included herein within the scope of this disclosure and the present invention.

Claims (4)

1. An algorithm for acquiring 3D expression in RGB video based on artificial intelligence is characterized by comprising the following steps:
s1, the user uploads the video to a server through a network interface, and the server receives RGB video information containing human faces;
s2, taking out each frame from the video and temporarily storing the frame in an image format, and inputting each image into a human face key point detection system of D L ib to obtain the X and Y coordinates of the key points;
s3, extracting face features based on the obtained key point coordinates, and distinguishing feature point groups by different parts of the face;
s4, normalizing the data of each feature point group, taking P = { P1, P2.., pn } as all (n) feature points as an example, the normalized feature point group P' is calculated by the following formula:
Q = P / (max(P) – min(P))
P’= Q - mean(Q);
s5, extracting feature data of the face information, wherein the standardized feature point group becomes different feature data;
s6, inputting the feature data into a locally stored deep learning model;
s7, calculating a BlendShape value by the deep learning model, inputting the feature data P' and calculating the BlendShape value bs by the deep learning model, wherein the formula is as follows:
bs = P’ * M + b
wherein M and b are obtained from the deep learning training process;
and S8, automatically optimizing the output Blend Shape value.
2. The algorithm for acquiring 3D expression in RGB video based on artificial intelligence as claimed in claim 1, wherein: the face information received by the server in S1 is the face information selected by the user.
3. The algorithm for acquiring 3D expression in RGB video based on artificial intelligence as claimed in claim 1, wherein: the feature point group in S3 includes a left eyebrow, a right eyebrow, a left eye, a right eye, a nose, and a mouth.
4. The algorithm for acquiring 3D expression in RGB video based on artificial intelligence as claimed in claim 1, wherein: the deep learning model in the S7 learns the correlation between the feature data of the face information and the Blend Shape value in the training data by using the multilayer neural network.
CN202010215726.2A 2020-03-25 2020-03-25 Algorithm for acquiring 3D expression in RGB video based on artificial intelligence Pending CN111460945A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010215726.2A CN111460945A (en) 2020-03-25 2020-03-25 Algorithm for acquiring 3D expression in RGB video based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010215726.2A CN111460945A (en) 2020-03-25 2020-03-25 Algorithm for acquiring 3D expression in RGB video based on artificial intelligence

Publications (1)

Publication Number Publication Date
CN111460945A true CN111460945A (en) 2020-07-28

Family

ID=71685673

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010215726.2A Pending CN111460945A (en) 2020-03-25 2020-03-25 Algorithm for acquiring 3D expression in RGB video based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN111460945A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101102A (en) * 2020-08-07 2020-12-18 亿匀智行(深圳)科技有限公司 Method for acquiring 3D limb movement in RGB video based on artificial intelligence
CN112101306A (en) * 2020-11-10 2020-12-18 成都市谛视科技有限公司 Fine facial expression capturing method and device based on RGB image

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104217454A (en) * 2014-08-21 2014-12-17 中国科学院计算技术研究所 Video driven facial animation generation method
CN104794444A (en) * 2015-04-16 2015-07-22 美国掌赢信息科技有限公司 Facial expression recognition method in instant video and electronic equipment
CN104951743A (en) * 2015-03-04 2015-09-30 苏州大学 Active-shape-model-algorithm-based method for analyzing face expression
CN106778563A (en) * 2016-12-02 2017-05-31 江苏大学 A kind of quick any attitude facial expression recognizing method based on the coherent feature in space
CN107610209A (en) * 2017-08-17 2018-01-19 上海交通大学 Human face countenance synthesis method, device, storage medium and computer equipment
KR20180037419A (en) * 2016-10-04 2018-04-12 재단법인대구경북과학기술원 Apparatus for age and gender estimation using region-sift and discriminant svm classifier and method thereof
CN108363973A (en) * 2018-02-07 2018-08-03 电子科技大学 A kind of unconfined 3D expressions moving method
CN108805040A (en) * 2018-05-24 2018-11-13 复旦大学 It is a kind of that face recognition algorithms are blocked based on piecemeal
CN108876879A (en) * 2017-05-12 2018-11-23 腾讯科技(深圳)有限公司 Method, apparatus, computer equipment and the storage medium that human face animation is realized
CN109493403A (en) * 2018-11-13 2019-03-19 北京中科嘉宁科技有限公司 A method of human face animation is realized based on moving cell Expression Mapping
CN110415323A (en) * 2019-07-30 2019-11-05 成都数字天空科技有限公司 A kind of fusion deformation coefficient preparation method, device and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104217454A (en) * 2014-08-21 2014-12-17 中国科学院计算技术研究所 Video driven facial animation generation method
CN104951743A (en) * 2015-03-04 2015-09-30 苏州大学 Active-shape-model-algorithm-based method for analyzing face expression
CN104794444A (en) * 2015-04-16 2015-07-22 美国掌赢信息科技有限公司 Facial expression recognition method in instant video and electronic equipment
KR20180037419A (en) * 2016-10-04 2018-04-12 재단법인대구경북과학기술원 Apparatus for age and gender estimation using region-sift and discriminant svm classifier and method thereof
CN106778563A (en) * 2016-12-02 2017-05-31 江苏大学 A kind of quick any attitude facial expression recognizing method based on the coherent feature in space
CN108876879A (en) * 2017-05-12 2018-11-23 腾讯科技(深圳)有限公司 Method, apparatus, computer equipment and the storage medium that human face animation is realized
CN107610209A (en) * 2017-08-17 2018-01-19 上海交通大学 Human face countenance synthesis method, device, storage medium and computer equipment
CN108363973A (en) * 2018-02-07 2018-08-03 电子科技大学 A kind of unconfined 3D expressions moving method
CN108805040A (en) * 2018-05-24 2018-11-13 复旦大学 It is a kind of that face recognition algorithms are blocked based on piecemeal
CN109493403A (en) * 2018-11-13 2019-03-19 北京中科嘉宁科技有限公司 A method of human face animation is realized based on moving cell Expression Mapping
CN110415323A (en) * 2019-07-30 2019-11-05 成都数字天空科技有限公司 A kind of fusion deformation coefficient preparation method, device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘文如: "《零基础入门Python深度学习》", 华中科技大学出版社, pages: 111 - 114 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101102A (en) * 2020-08-07 2020-12-18 亿匀智行(深圳)科技有限公司 Method for acquiring 3D limb movement in RGB video based on artificial intelligence
CN112101306A (en) * 2020-11-10 2020-12-18 成都市谛视科技有限公司 Fine facial expression capturing method and device based on RGB image
CN112101306B (en) * 2020-11-10 2021-02-09 成都市谛视科技有限公司 Fine facial expression capturing method and device based on RGB image

Similar Documents

Publication Publication Date Title
CN110569795B (en) Image identification method and device and related equipment
Zhang et al. Facial: Synthesizing dynamic talking face with implicit attribute learning
US9805255B2 (en) Temporal fusion of multimodal data from multiple data acquisition systems to automatically recognize and classify an action
CN112800903B (en) Dynamic expression recognition method and system based on space-time diagram convolutional neural network
CN111770299B (en) Method and system for real-time face abstract service of intelligent video conference terminal
KR101887637B1 (en) Robot system
CN112418095A (en) Facial expression recognition method and system combined with attention mechanism
CN107333071A (en) Video processing method and device, electronic equipment and storage medium
CN110555896B (en) Image generation method and device and storage medium
CN111563417A (en) Pyramid structure convolutional neural network-based facial expression recognition method
Meng et al. Weakly supervised semantic segmentation by a class-level multiple group cosegmentation and foreground fusion strategy
CN111108508B (en) Face emotion recognition method, intelligent device and computer readable storage medium
CN111460945A (en) Algorithm for acquiring 3D expression in RGB video based on artificial intelligence
Thuseethan et al. Complex emotion profiling: An incremental active learning based approach with sparse annotations
CN111680550A (en) Emotion information identification method and device, storage medium and computer equipment
CN110866962A (en) Virtual portrait and expression synchronization method based on convolutional neural network
CN112257513A (en) Training method, translation method and system for sign language video translation model
CN108399358B (en) Expression display method and system for video chat
CN113449564A (en) Behavior image classification method based on human body local semantic knowledge
Kumar et al. Facial emotion recognition and detection using cnn
CN112784631A (en) Method for recognizing face emotion based on deep neural network
Renjith et al. Indian sign language recognition: A comparative analysis using cnn and rnn models
Praneel et al. Malayalam Sign Language Character Recognition System
CN112101102A (en) Method for acquiring 3D limb movement in RGB video based on artificial intelligence
Ptucha et al. Fusion of static and temporal predictors for unconstrained facial expression recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 518000 717, building r2-a, Gaoxin industrial village, No. 020, Gaoxin South seventh Road, Gaoxin community, Yuehai street, Nanshan District, Shenzhen, Guangdong

Applicant after: Yiyun Zhixing (Shenzhen) Technology Co.,Ltd.

Address before: 518000 1403a-1005, east block, Coast Building, No. 15, Haide Third Road, Haizhu community, Yuehai street, Nanshan District, Shenzhen, Guangdong

Applicant before: Yiyun Zhixing (Shenzhen) Technology Co.,Ltd.

CB02 Change of applicant information