CN116665275A

CN116665275A - Facial expression synthesis and interaction control method based on text-to-Chinese pinyin

Info

Publication number: CN116665275A
Application number: CN202310658598.2A
Authority: CN
Inventors: 刘增科; 殷继彬
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2023-06-06
Filing date: 2023-06-06
Publication date: 2023-08-29

Abstract

The invention relates to a facial expression synthesis and interaction control method based on text-to-Chinese pinyin. According to the method, the input text is converted into the Chinese pinyin, the key point movement and mouth shape information of the face under different expressions are recorded by combining a face key point tracking technology, the Chinese pinyin and the mouth shape of the key points are matched and fitted through a mouth shape mapping and mouth shape deformation algorithm, and a key point time frame sequence is mapped onto a three-dimensional model to realize dynamic deformation of the mouth. And further according to emotion analysis and key point displacement information of the input text, mapping the emotion key point displacement information to a mouth deformation time sequence frame of the three-dimensional model by using a DFFD algorithm to form a complete facial expression animation. The invention can be widely applied to the field of man-machine interaction, such as voice assistants, virtual characters, game characters and the like.

Description

Facial expression synthesis and interaction control method based on text-to-Chinese pinyin

Technical Field

The technical field of the invention relates to the fields of computer graphic image processing and man-machine interaction. The invention converts the text into the Chinese phonetic representation, combines the facial key point tracking technology and the mouth deformation algorithm, realizes the real-time generation of accurate facial expression animation, provides an interactive control function, and meets the personalized requirements of users.

Background

In the field of virtual characters and computer graphics, realizing real expression simulation is important to improving user experience and emotion communication. Currently existing expression generation techniques typically rely on complex modeling and animation tools and have limitations in terms of real interactions with users. Therefore, the method for researching and developing the real three-dimensional virtual character expression interactive animation according to the input characters has important application value.

To achieve this objective, the following key directions need to be explored:

1. text to expression conversion: and converting the input characters into corresponding emotion and expression information through an algorithm and a model. This may involve natural language processing techniques and emotion analysis to capture and understand the emotion content expressed by the input text.

2. Expression generation and simulation: based on the converted emotion and expression information, a computer graphic technology is used for generating a real three-dimensional virtual character expression. This may include dynamic changes in mouth shape and facial features, as well as simulation of other body gestures and actions. This involves techniques such as facial keypoint tracking, mouth deformation, and animation synthesis.

3. Interaction with the user: to enhance realism and user engagement, it is desirable to implement interactive functionality with the user. This may include the user's ability to adjust and control the avatar's expression in real time and the user's ability to communicate emotionally with the avatar.

Interaction may be achieved through techniques such as sensors, speech recognition, and gesture tracking.

4. Real-time rendering and performance optimization: to achieve real-time expression rendering and interaction, optimization algorithms and techniques are needed to increase computational efficiency and graphics rendering speed. This includes utilizing hardware acceleration, parallel computing, and optimization algorithms to achieve a smooth user experience.

In summary, by researching and developing a method for generating real three-dimensional virtual character expression interactive animation according to input characters, user experience can be improved, emotion communication can be realized, and the method has wide application prospects in the application fields of virtual characters and computer graphics.

Disclosure of Invention

The invention aims to provide a method for generating real three-dimensional virtual character expression interactive animation according to input characters. The core idea of the invention is to realize highly realistic expression simulation by converting text into Chinese pinyin, tracking facial key points, mouth shape mapping, mouth deformation, facial expression rendering and the like. The following is the content of the invention:

the invention provides a method capable of generating real three-dimensional virtual character expression interactive animation according to characters input by a user, comprising the following steps:

s1, receiving text information input by a user.

S2, converting the input characters into corresponding Chinese pinyin representations, and realizing the Chinese pinyin representations through a jieba-pypinyin algorithm model.

And the AFINN technology is used for calculating the overall emotion score according to the score of the words in the text, and identifying simple emotions such as happiness, anger, fun, fear and the like.

S3, tracking and recording key point movement information of the face under different expressions by a face key point tracking technology, and recording mouth shape information of different pinyin letters when the Chinese pinyin is spoken, so that key feature points and mouth shape changes of the face can be accurately captured.

S4, matching and fitting the Chinese pinyin of the S2 and the key point mouth shape of the S3, and generating a time frame sequence array of key point mouth shape information according to the alphabetical sequence of the Chinese pinyin.

S5, mapping the key point time frame sequence obtained in the step S4 onto the three-dimensional model by using a DFFD algorithm to form a three-dimensional model mouth deformation time sequence frame.

S6, performing facial expression rendering on the three-dimensional model mouth deformation time sequence frame obtained in the step S5, wherein the facial expression rendering comprises the step of matching emotion classification information in the step S2 with key point movement information of the person under different expressions acquired in the step S3 to obtain emotion key point displacement information of the section. And then, using a DFFD algorithm to map the emotion key point displacement information to a mouth deformation time sequence frame of the three-dimensional model to form a complete expression animation.

S7, providing real-time display and interaction control functions, wherein a user can control the deformation degree of the model.

Wherein, the S2 can combine word segmentation algorithm and pinyin conversion algorithm to complete the process when converting the input text into corresponding Chinese pinyin. The following is a step of constructing the jieba-pypinyin algorithm in detail:

s21, word segmentation: the Chinese word segmentation algorithm jieba is used for segmenting the input text into individual words or single characters.

S22, pinyin conversion: for each word or word segmented, it is converted to a corresponding pinyin representation using the pinyin conversion algorithm pypinylin. The pinyin conversion library converts each chinese character into pinyin, e.g., converts the chinese character "you" into pinyin "ni", according to the pronunciation rules of the chinese character.

S23, combining pinyin: if the text entered is a sentence or phrase, the pinyin representation of each word or word is combined,

a complete pinyin representation is formed.

S24, multi-tone word processing: in the pinyin conversion process, polyphones, i.e., the situation that one Chinese character corresponds to a plurality of pronunciations, may be encountered. According to the context or the common pronunciation of the words, the multi-tone word processing function in the Pinyin conversion library Pypin is used for selecting the proper pronunciation.

S25, emotion recognition: and calculating the overall emotion score according to the score of the words in the text by using AFINN, and identifying emotion such as happiness, anger, fun, happiness, fear and the like.

And S3, key feature points and mouth shape changes of the human face can be accurately captured by combining the mouth shape information of the Pinyin letters through a face key point tracking technology. The following is a detailed procedure to achieve this:

s31, face key point tracking: face detection and key point positioning are carried out on a face image by using a face key point tracking technology, which can identify a face area and position key characteristic points such as eyes, eyebrows, noses, mouths and the like, continuously track the positions of the key points of the face and record movement information of the key points under different expressions.

Such as happiness, anger, surprise, etc.

S32, recognizing the mouth shape of the Pinyin letters: when a user sends out Chinese pinyin, the mouth shape information of the pinyin letters can be captured by analyzing the shape and the movement of the mouth. The face key point tracking technology is used for packaging OpenCV, the shape and the movement of a mouth can be identified and correlated with Pinyin letters, and the mouth shape information of the Pinyin letters can be deduced by analyzing the change of the shape and the movement of the mouth.

S33, recording key points and mouth shape information: and recording the tracked face key points and the mouth shape information of the Pinyin letters to form a data set, wherein the data set comprises the coordinate positions of the recorded key points, the mouth shape state of the Pinyin letters and the like, and the data can be used for subsequent analysis, training models or application development.

And S4, matching and fitting the Chinese pinyin with the key point mouth shape information extracted in the step S3, and generating a time frame sequence array of the key point mouth shape information. The following is a detailed description:

s41, data preparation: the data contains the Chinese phonetic alphabet and the key point displacement number of the corresponding face key point mouth type information.

These arrays can be derived from the complete pinyin sequence of S2 and the meta-consonant mouth-style key point information of the chinese pinyin in S3. S42 model inference: fitting is carried out according to the prepared Chinese pinyin and the key point mouth type information and the time line of the normal spelling sequence according to the sequence of the Chinese pinyin, so as to obtain a time frame sequence array of the mouth type key point information.

The key points on the model are sequentially moved according to the key point displacement array by the step S5DFFD algorithm according to the time axis sequence, so that the deformation effect of the model is achieved. The specific flow is as follows:

s51, data preparation: we need to prepare key point displacement arrays of facial feature points, which can be obtained from S4 according to the pinyin order. Meanwhile, a three-dimensional face model is needed, which contains key point information of a mouth area.

S52DFFD algorithm principle: the DFFD algorithm is an algorithm for three-dimensional model mouth deformation that achieves the deformation effect by displacing key points on the model. The algorithm adopts a time axis planning mode to map the key point displacement array according to the Pinyin sequence.

S53 keypoint tracking and morphing: in the DFFD algorithm, we first track facial feature points to ensure the continuity and accuracy of the key points. And then, applying the key point position data group to the key points on the model according to the pinyin sequence and the time axis planning times, and realizing the deformation of the mouth area of the three-dimensional model by moving the key points.

S54 time-series frame generation: by applying the keypoint locations to the keypoints on the model in time-axis order, we can gradually generate time-series frames of the three-dimensional model mouth deformation. Each frame represents the deformation state of the model mouth at a specific point in time, and by playing the frames sequentially, a continuous deformation effect can be formed.

The specific flow of step S6 is as follows:

s61 emotion classification information: in S2 we obtain results of emotion analysis on the text information, what emotion the text expression is accepted to be, such as happiness, anger, fun, happiness, fear, etc.

S62 emotion key point displacement information matching: the emotion classification information is matched with the key point movement information of the person under different expressions acquired in the step S3, and the key point displacement information related to the specific emotion can be determined by establishing a mapping relation between the emotion classification information and the key point movement information.

S63 mapping using DFFD algorithm: using the DFFD algorithm obtained in S5, we mapped the emotion keypoint displacement information onto the mouth deformation time series frame of the three-dimensional model. And sequentially applying emotion key point displacement information to key points of the model according to a time axis sequence by using the DFFD algorithm, so that the expression change of the model is realized.

S64, generating complete expression animation: by applying the emotion key point displacement information to the mouth deformation time sequence frames of the three-dimensional model, a complete expression animation can be formed, and in each frame, the emotion key points and the mouth key points of the model can correspondingly move and deform according to the displacement. By playing these frames continuously, we can present a dynamic expression that varies according to emotion.

Compared with the prior art, the invention has the following beneficial effects:

1. the Chinese pinyin mouth shape, the DFFD algorithm and the emotion analysis are comprehensively utilized, so that the whole flow generation of the character-to-three-dimensional virtual character expression interactive animation is realized, and the user experience and the emotion communication authenticity are improved.

2. Through mouth shape collection and fitting, an accurate mouth animation time frame can be generated according to the real mouth shape of the user, and the sense of reality and interaction effect of the expression are enhanced.

3. By combining emotion analysis and emotion label expression mapping, the virtual character can express corresponding emotion according to the input text content, and the richness and accuracy of emotion communication are increased.

4. And the mouth animation and the facial expression are fitted on an animation time axis by using a DFFD algorithm and a time frame technology, so that the smoothness and naturalness of the expression are ensured.

5. The generated real character expression interactive animation can be applied to multiple fields, such as virtual reality, games, film and television production and the like, and improves user experience and visual effect.

Drawings

FIG. 1 is a flow chart of processing input text;

FIG. 2 is a diagram of the overall structure of a face key point tracking technique;

FIG. 3 is a frame fitting structure diagram of a mouth-type key point information sequence;

FIG. 4 is a diagram of three-dimensional model mouth animation generation;

FIG. 5 generates a three-dimensional emotion expression animation structure diagram of text;

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention; all other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

S1, receiving text information input by a user.

S6, performing facial expression rendering on the three-dimensional model mouth deformation time sequence frame obtained in the S5, wherein the facial expression rendering comprises the steps of matching emotion classification information in the S2 with key point movement information of the person under different expressions acquired in the S3 to obtain emotion key point displacement information of the segment, and then mapping the emotion key point displacement information onto the three-dimensional model mouth deformation time sequence frame by using a DFFD algorithm to form a complete expression animation.

a complete pinyin representation is formed.

S25, emotion recognition: and calculating the overall emotion score according to the score of the words in the text by using AFINN, and identifying five emotions such as happiness, anger, fun, happiness and fear.

Such as happiness, anger, surprise, etc.

s51, data preparation: we need to prepare key point displacement arrays of facial feature points, which can be obtained from step S4 according to the pinyin order. Meanwhile, a three-dimensional face model is needed, which contains key point information of a mouth area.

The specific flow of step S6 is as follows:

s61 emotion classification information: in step S2 we obtain results of emotion analysis on the text information, resulting in what emotion the text expression is accepted to be, such as happiness, anger, fun, happiness, fear, etc.

S62 emotion key point displacement information matching: the emotion classification information is matched with the key point movement information of the person under different expressions acquired in the step 3, and the mapping relation between the emotion classification and the key point movement information is established,

we can determine key point displacement information related to a particular emotion.

S63 mapping using DFFD algorithm: using the DFFD algorithm obtained in step S5, we map the emotion keypoint displacement information onto the mouth deformation time-series frame of the three-dimensional model. And sequentially applying emotion key point displacement information to key points of the model according to a time axis sequence by using the DFFD algorithm, so that the expression change of the model is realized.

The method and the device realize the generation of real three-dimensional virtual character expression interactive animation according to the characters input by the user by utilizing the steps of Chinese pinyin, facial key point tracking, mouth shape mapping, mouth deformation, facial expression rendering and the like. The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to facilitate an understanding of the core concepts of the invention. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

Claims

1. A facial expression synthesis and interaction control method based on text to Chinese pinyin is characterized in that,

s1, receiving text information input by a user;

s2, converting the input characters into corresponding Chinese pinyin representations, realizing the Chinese pinyin representations through a jieba-pypinyin algorithm model, calculating overall emotion scores according to the scores of words in the text by using an AFINN technology, and identifying emotion such as happiness, anger, grime, happiness, fear and the like;

s3, tracking and recording key point movement information of the face under different expressions by a face key point tracking technology, and recording mouth shape information of different pinyin letters when the Chinese pinyin is spoken, so that key feature points and mouth shape changes of the face can be accurately captured;

s4, matching and fitting the Chinese pinyin of the S2 and the key point mouth shape of the S3, and generating a time frame sequence array of key point mouth shape information according to the alphabetical order of the Chinese pinyin;

s5, mapping the key point time frame sequence obtained in the step S4 onto the three-dimensional model by using a DFFD algorithm to form a three-dimensional model mouth deformation time sequence frame;

s6, performing facial expression rendering on the three-dimensional model mouth deformation time sequence frame obtained in the S5, wherein the facial expression rendering comprises the steps of matching emotion classification information in the S2 with key point movement information of a person under different expressions acquired in the S3 to obtain emotion key point displacement information of the segment, and then mapping the emotion key point displacement information onto the three-dimensional model mouth deformation time sequence frame by using a DFFD algorithm to form a complete expression animation;

2. The jieba-pypinyin algorithm according to claim 1, which implements conversion of the input text into a corresponding pinyin representation,

(2-1) word segmentation: the method comprises the steps of performing word segmentation on an input text by using a Chinese word segmentation algorithm jieba, and dividing the input text into individual words or single characters;

(2-2) Pinyin conversion: for each word or word after word segmentation, the word or word is converted into a corresponding pinyin representation by using a pinyin conversion algorithm pypinyin, and the pinyin conversion library converts each Chinese character into pinyin according to the pronunciation rules of the Chinese character, for example, converts the Chinese character 'you' into pinyin 'ni';

(2-3) combining pinyin: if the input text is a sentence or phrase, combining the pinyin representations of each word or word to form a complete pinyin representation;

(2-4) polyphone processing: in the pinyin conversion process, a multi-tone word may be encountered, that is, a case where one chinese character corresponds to a plurality of pronunciations, and according to the context or the common pronunciations of the words, a multi-tone word processing function in the pinyin conversion library Pypinyin is used to select an appropriate pronunciation.

3. The method for calculating an overall emotion score according to the score of a word in a text as claimed in claim 1, wherein the overall emotion score is calculated according to the score of a word in a text by using AFINN in S2, and simple emotions such as happiness, anger, grime, happiness, fear and the like are identified.

4. The facial-based key point tracking technology according to claim 1, wherein key point movement information of a face under different expressions is tracked and recorded,

(4-1) facial keypoint tracking: face detection and key point positioning are carried out on the face image by using a face key point tracking technology, which can identify the face area and position key characteristic points such as eyes, eyebrows, nose, mouth and the like, continuously track the positions of the key points of the face and record the movement information of the key points under different expressions,

such as happiness, anger, surprise, etc.;

(4-2) mouth recognition of Pinyin letters: when a user sends out Chinese pinyin, the shape and the movement of a mouth can be analyzed to capture the mouth shape information of the pinyin letters, the face key point tracking technology package OpenCV is used to identify the shape and the movement of the mouth and correlate the mouth shape and the movement with the pinyin letters, and the mouth shape information of the pinyin letters can be deduced by analyzing the change of the shape and the movement of the mouth;

(4-3) recording key points and mouth shape information: and recording the tracked face key points and the mouth shape information of the Pinyin letters to form a data set, wherein the data set comprises the coordinate positions of the recorded key points, the mouth shape state of the Pinyin letters and the like, and the data can be used for subsequent analysis, training models or application development.

5. The method of claim 1, wherein the sequence of keypoints time frames obtained in S4 is mapped onto a three-dimensional model using DFFD-AFINN algorithm to form a three-dimensional model mouth deformation time sequence frame, the facial feature points are tracked and deformed by DFFD algorithm, and the deformations are applied to the three-dimensional face model, and the time sequence frame is generated by: by applying the key point positions to the key points on the model in time axis sequence, we can gradually generate time series frames of the deformation of the mouth of the three-dimensional model, each frame representing the deformation state of the mouth of the model at a specific time point, and by sequentially playing the frames, a continuous deformation effect can be formed.

6. The emotion classification information of claim 1, which is matched with the acquired key point movement information of the person under different expressions in claim 3 to obtain the emotion key point displacement information of the segment, and then the emotion key point displacement information is mapped onto a mouth deformation time sequence frame of the three-dimensional model by using a DFFD algorithm to form a complete expression animation,

(6-1) emotion classification information: in S2, we obtain emotion analysis results about the text information, and what emotion the text expression is accepted is, such as happiness, anger, fun, happiness, fear and the like;

(6-2) emotion key point displacement information matching: the emotion classification information is matched with the key point movement information of the person under different expressions acquired in the step S3, and the key point displacement information related to the specific emotion can be determined by establishing a mapping relation between the emotion classification information and the key point movement information;

(6-3) mapping using DFFD algorithm: using the DFFD algorithm obtained in S5, mapping the emotion key point displacement information to a mouth deformation time sequence frame of the three-dimensional model, and sequentially applying the emotion key point displacement information to key points of the model by the DFFD algorithm according to a time axis sequence, so as to realize expression change of the model;

(6-4) complete expression animation generation: by applying the emotion key point displacement information to the mouth deformation time sequence frames of the three-dimensional model, a complete expression animation can be formed, in each frame, the emotion key points and the mouth key points of the model correspondingly move and deform according to the displacement, and by continuously playing the frames, a dynamic expression according to emotion change can be displayed.