CN116665275A - Facial expression synthesis and interaction control method based on text-to-Chinese pinyin - Google Patents

Facial expression synthesis and interaction control method based on text-to-Chinese pinyin Download PDF

Info

Publication number
CN116665275A
CN116665275A CN202310658598.2A CN202310658598A CN116665275A CN 116665275 A CN116665275 A CN 116665275A CN 202310658598 A CN202310658598 A CN 202310658598A CN 116665275 A CN116665275 A CN 116665275A
Authority
CN
China
Prior art keywords
key point
pinyin
mouth
emotion
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310658598.2A
Other languages
Chinese (zh)
Inventor
刘增科
殷继彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202310658598.2A priority Critical patent/CN116665275A/en
Publication of CN116665275A publication Critical patent/CN116665275A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/023Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
    • G06F3/0233Character input methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention relates to a facial expression synthesis and interaction control method based on text-to-Chinese pinyin. According to the method, the input text is converted into the Chinese pinyin, the key point movement and mouth shape information of the face under different expressions are recorded by combining a face key point tracking technology, the Chinese pinyin and the mouth shape of the key points are matched and fitted through a mouth shape mapping and mouth shape deformation algorithm, and a key point time frame sequence is mapped onto a three-dimensional model to realize dynamic deformation of the mouth. And further according to emotion analysis and key point displacement information of the input text, mapping the emotion key point displacement information to a mouth deformation time sequence frame of the three-dimensional model by using a DFFD algorithm to form a complete facial expression animation. The invention can be widely applied to the field of man-machine interaction, such as voice assistants, virtual characters, game characters and the like.

Description

Facial expression synthesis and interaction control method based on text-to-Chinese pinyin
Technical Field
The technical field of the invention relates to the fields of computer graphic image processing and man-machine interaction. The invention converts the text into the Chinese phonetic representation, combines the facial key point tracking technology and the mouth deformation algorithm, realizes the real-time generation of accurate facial expression animation, provides an interactive control function, and meets the personalized requirements of users.
Background
In the field of virtual characters and computer graphics, realizing real expression simulation is important to improving user experience and emotion communication. Currently existing expression generation techniques typically rely on complex modeling and animation tools and have limitations in terms of real interactions with users. Therefore, the method for researching and developing the real three-dimensional virtual character expression interactive animation according to the input characters has important application value.
To achieve this objective, the following key directions need to be explored:
1. text to expression conversion: and converting the input characters into corresponding emotion and expression information through an algorithm and a model. This may involve natural language processing techniques and emotion analysis to capture and understand the emotion content expressed by the input text.
2. Expression generation and simulation: based on the converted emotion and expression information, a computer graphic technology is used for generating a real three-dimensional virtual character expression. This may include dynamic changes in mouth shape and facial features, as well as simulation of other body gestures and actions. This involves techniques such as facial keypoint tracking, mouth deformation, and animation synthesis.
3. Interaction with the user: to enhance realism and user engagement, it is desirable to implement interactive functionality with the user. This may include the user's ability to adjust and control the avatar's expression in real time and the user's ability to communicate emotionally with the avatar.
Interaction may be achieved through techniques such as sensors, speech recognition, and gesture tracking.
4. Real-time rendering and performance optimization: to achieve real-time expression rendering and interaction, optimization algorithms and techniques are needed to increase computational efficiency and graphics rendering speed. This includes utilizing hardware acceleration, parallel computing, and optimization algorithms to achieve a smooth user experience.
In summary, by researching and developing a method for generating real three-dimensional virtual character expression interactive animation according to input characters, user experience can be improved, emotion communication can be realized, and the method has wide application prospects in the application fields of virtual characters and computer graphics.
Disclosure of Invention
The invention aims to provide a method for generating real three-dimensional virtual character expression interactive animation according to input characters. The core idea of the invention is to realize highly realistic expression simulation by converting text into Chinese pinyin, tracking facial key points, mouth shape mapping, mouth deformation, facial expression rendering and the like. The following is the content of the invention:
the invention provides a method capable of generating real three-dimensional virtual character expression interactive animation according to characters input by a user, comprising the following steps:
s1, receiving text information input by a user.
S2, converting the input characters into corresponding Chinese pinyin representations, and realizing the Chinese pinyin representations through a jieba-pypinyin algorithm model.
And the AFINN technology is used for calculating the overall emotion score according to the score of the words in the text, and identifying simple emotions such as happiness, anger, fun, fear and the like.
S3, tracking and recording key point movement information of the face under different expressions by a face key point tracking technology, and recording mouth shape information of different pinyin letters when the Chinese pinyin is spoken, so that key feature points and mouth shape changes of the face can be accurately captured.
S4, matching and fitting the Chinese pinyin of the S2 and the key point mouth shape of the S3, and generating a time frame sequence array of key point mouth shape information according to the alphabetical sequence of the Chinese pinyin.
S5, mapping the key point time frame sequence obtained in the step S4 onto the three-dimensional model by using a DFFD algorithm to form a three-dimensional model mouth deformation time sequence frame.
S6, performing facial expression rendering on the three-dimensional model mouth deformation time sequence frame obtained in the step S5, wherein the facial expression rendering comprises the step of matching emotion classification information in the step S2 with key point movement information of the person under different expressions acquired in the step S3 to obtain emotion key point displacement information of the section. And then, using a DFFD algorithm to map the emotion key point displacement information to a mouth deformation time sequence frame of the three-dimensional model to form a complete expression animation.
S7, providing real-time display and interaction control functions, wherein a user can control the deformation degree of the model.
Wherein, the S2 can combine word segmentation algorithm and pinyin conversion algorithm to complete the process when converting the input text into corresponding Chinese pinyin. The following is a step of constructing the jieba-pypinyin algorithm in detail:
s21, word segmentation: the Chinese word segmentation algorithm jieba is used for segmenting the input text into individual words or single characters.
S22, pinyin conversion: for each word or word segmented, it is converted to a corresponding pinyin representation using the pinyin conversion algorithm pypinylin. The pinyin conversion library converts each chinese character into pinyin, e.g., converts the chinese character "you" into pinyin "ni", according to the pronunciation rules of the chinese character.
S23, combining pinyin: if the text entered is a sentence or phrase, the pinyin representation of each word or word is combined,
a complete pinyin representation is formed.
S24, multi-tone word processing: in the pinyin conversion process, polyphones, i.e., the situation that one Chinese character corresponds to a plurality of pronunciations, may be encountered. According to the context or the common pronunciation of the words, the multi-tone word processing function in the Pinyin conversion library Pypin is used for selecting the proper pronunciation.
S25, emotion recognition: and calculating the overall emotion score according to the score of the words in the text by using AFINN, and identifying emotion such as happiness, anger, fun, happiness, fear and the like.
And S3, key feature points and mouth shape changes of the human face can be accurately captured by combining the mouth shape information of the Pinyin letters through a face key point tracking technology. The following is a detailed procedure to achieve this:
s31, face key point tracking: face detection and key point positioning are carried out on a face image by using a face key point tracking technology, which can identify a face area and position key characteristic points such as eyes, eyebrows, noses, mouths and the like, continuously track the positions of the key points of the face and record movement information of the key points under different expressions.
Such as happiness, anger, surprise, etc.
S32, recognizing the mouth shape of the Pinyin letters: when a user sends out Chinese pinyin, the mouth shape information of the pinyin letters can be captured by analyzing the shape and the movement of the mouth. The face key point tracking technology is used for packaging OpenCV, the shape and the movement of a mouth can be identified and correlated with Pinyin letters, and the mouth shape information of the Pinyin letters can be deduced by analyzing the change of the shape and the movement of the mouth.
S33, recording key points and mouth shape information: and recording the tracked face key points and the mouth shape information of the Pinyin letters to form a data set, wherein the data set comprises the coordinate positions of the recorded key points, the mouth shape state of the Pinyin letters and the like, and the data can be used for subsequent analysis, training models or application development.
And S4, matching and fitting the Chinese pinyin with the key point mouth shape information extracted in the step S3, and generating a time frame sequence array of the key point mouth shape information. The following is a detailed description:
s41, data preparation: the data contains the Chinese phonetic alphabet and the key point displacement number of the corresponding face key point mouth type information.
These arrays can be derived from the complete pinyin sequence of S2 and the meta-consonant mouth-style key point information of the chinese pinyin in S3. S42 model inference: fitting is carried out according to the prepared Chinese pinyin and the key point mouth type information and the time line of the normal spelling sequence according to the sequence of the Chinese pinyin, so as to obtain a time frame sequence array of the mouth type key point information.
The key points on the model are sequentially moved according to the key point displacement array by the step S5DFFD algorithm according to the time axis sequence, so that the deformation effect of the model is achieved. The specific flow is as follows:
s51, data preparation: we need to prepare key point displacement arrays of facial feature points, which can be obtained from S4 according to the pinyin order. Meanwhile, a three-dimensional face model is needed, which contains key point information of a mouth area.
S52DFFD algorithm principle: the DFFD algorithm is an algorithm for three-dimensional model mouth deformation that achieves the deformation effect by displacing key points on the model. The algorithm adopts a time axis planning mode to map the key point displacement array according to the Pinyin sequence.
S53 keypoint tracking and morphing: in the DFFD algorithm, we first track facial feature points to ensure the continuity and accuracy of the key points. And then, applying the key point position data group to the key points on the model according to the pinyin sequence and the time axis planning times, and realizing the deformation of the mouth area of the three-dimensional model by moving the key points.
S54 time-series frame generation: by applying the keypoint locations to the keypoints on the model in time-axis order, we can gradually generate time-series frames of the three-dimensional model mouth deformation. Each frame represents the deformation state of the model mouth at a specific point in time, and by playing the frames sequentially, a continuous deformation effect can be formed.
The specific flow of step S6 is as follows:
s61 emotion classification information: in S2 we obtain results of emotion analysis on the text information, what emotion the text expression is accepted to be, such as happiness, anger, fun, happiness, fear, etc.
S62 emotion key point displacement information matching: the emotion classification information is matched with the key point movement information of the person under different expressions acquired in the step S3, and the key point displacement information related to the specific emotion can be determined by establishing a mapping relation between the emotion classification information and the key point movement information.
S63 mapping using DFFD algorithm: using the DFFD algorithm obtained in S5, we mapped the emotion keypoint displacement information onto the mouth deformation time series frame of the three-dimensional model. And sequentially applying emotion key point displacement information to key points of the model according to a time axis sequence by using the DFFD algorithm, so that the expression change of the model is realized.
S64, generating complete expression animation: by applying the emotion key point displacement information to the mouth deformation time sequence frames of the three-dimensional model, a complete expression animation can be formed, and in each frame, the emotion key points and the mouth key points of the model can correspondingly move and deform according to the displacement. By playing these frames continuously, we can present a dynamic expression that varies according to emotion.
Compared with the prior art, the invention has the following beneficial effects:
1. the Chinese pinyin mouth shape, the DFFD algorithm and the emotion analysis are comprehensively utilized, so that the whole flow generation of the character-to-three-dimensional virtual character expression interactive animation is realized, and the user experience and the emotion communication authenticity are improved.
2. Through mouth shape collection and fitting, an accurate mouth animation time frame can be generated according to the real mouth shape of the user, and the sense of reality and interaction effect of the expression are enhanced.
3. By combining emotion analysis and emotion label expression mapping, the virtual character can express corresponding emotion according to the input text content, and the richness and accuracy of emotion communication are increased.
4. And the mouth animation and the facial expression are fitted on an animation time axis by using a DFFD algorithm and a time frame technology, so that the smoothness and naturalness of the expression are ensured.
5. The generated real character expression interactive animation can be applied to multiple fields, such as virtual reality, games, film and television production and the like, and improves user experience and visual effect.
Drawings
FIG. 1 is a flow chart of processing input text;
FIG. 2 is a diagram of the overall structure of a face key point tracking technique;
FIG. 3 is a frame fitting structure diagram of a mouth-type key point information sequence;
FIG. 4 is a diagram of three-dimensional model mouth animation generation;
FIG. 5 generates a three-dimensional emotion expression animation structure diagram of text;
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention; all other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
S1, receiving text information input by a user.
S2, converting the input characters into corresponding Chinese pinyin representations, and realizing the Chinese pinyin representations through a jieba-pypinyin algorithm model.
And the AFINN technology is used for calculating the overall emotion score according to the score of the words in the text, and identifying simple emotions such as happiness, anger, fun, fear and the like.
S3, tracking and recording key point movement information of the face under different expressions by a face key point tracking technology, and recording mouth shape information of different pinyin letters when the Chinese pinyin is spoken, so that key feature points and mouth shape changes of the face can be accurately captured.
S4, matching and fitting the Chinese pinyin of the S2 and the key point mouth shape of the S3, and generating a time frame sequence array of key point mouth shape information according to the alphabetical sequence of the Chinese pinyin.
S5, mapping the key point time frame sequence obtained in the step S4 onto the three-dimensional model by using a DFFD algorithm to form a three-dimensional model mouth deformation time sequence frame.
S6, performing facial expression rendering on the three-dimensional model mouth deformation time sequence frame obtained in the S5, wherein the facial expression rendering comprises the steps of matching emotion classification information in the S2 with key point movement information of the person under different expressions acquired in the S3 to obtain emotion key point displacement information of the segment, and then mapping the emotion key point displacement information onto the three-dimensional model mouth deformation time sequence frame by using a DFFD algorithm to form a complete expression animation.
S7, providing real-time display and interaction control functions, wherein a user can control the deformation degree of the model.
Wherein, the S2 can combine word segmentation algorithm and pinyin conversion algorithm to complete the process when converting the input text into corresponding Chinese pinyin. The following is a step of constructing the jieba-pypinyin algorithm in detail:
s21, word segmentation: the Chinese word segmentation algorithm jieba is used for segmenting the input text into individual words or single characters.
S22, pinyin conversion: for each word or word segmented, it is converted to a corresponding pinyin representation using the pinyin conversion algorithm pypinylin. The pinyin conversion library converts each chinese character into pinyin, e.g., converts the chinese character "you" into pinyin "ni", according to the pronunciation rules of the chinese character.
S23, combining pinyin: if the text entered is a sentence or phrase, the pinyin representation of each word or word is combined,
a complete pinyin representation is formed.
S24, multi-tone word processing: in the pinyin conversion process, polyphones, i.e., the situation that one Chinese character corresponds to a plurality of pronunciations, may be encountered. According to the context or the common pronunciation of the words, the multi-tone word processing function in the Pinyin conversion library Pypin is used for selecting the proper pronunciation.
S25, emotion recognition: and calculating the overall emotion score according to the score of the words in the text by using AFINN, and identifying five emotions such as happiness, anger, fun, happiness and fear.
And S3, key feature points and mouth shape changes of the human face can be accurately captured by combining the mouth shape information of the Pinyin letters through a face key point tracking technology. The following is a detailed procedure to achieve this:
s31, face key point tracking: face detection and key point positioning are carried out on a face image by using a face key point tracking technology, which can identify a face area and position key characteristic points such as eyes, eyebrows, noses, mouths and the like, continuously track the positions of the key points of the face and record movement information of the key points under different expressions.
Such as happiness, anger, surprise, etc.
S32, recognizing the mouth shape of the Pinyin letters: when a user sends out Chinese pinyin, the mouth shape information of the pinyin letters can be captured by analyzing the shape and the movement of the mouth. The face key point tracking technology is used for packaging OpenCV, the shape and the movement of a mouth can be identified and correlated with Pinyin letters, and the mouth shape information of the Pinyin letters can be deduced by analyzing the change of the shape and the movement of the mouth.
S33, recording key points and mouth shape information: and recording the tracked face key points and the mouth shape information of the Pinyin letters to form a data set, wherein the data set comprises the coordinate positions of the recorded key points, the mouth shape state of the Pinyin letters and the like, and the data can be used for subsequent analysis, training models or application development.
And S4, matching and fitting the Chinese pinyin with the key point mouth shape information extracted in the step S3, and generating a time frame sequence array of the key point mouth shape information. The following is a detailed description:
s41, data preparation: the data contains the Chinese phonetic alphabet and the key point displacement number of the corresponding face key point mouth type information.
These arrays can be derived from the complete pinyin sequence of S2 and the meta-consonant mouth-style key point information of the chinese pinyin in S3. S42 model inference: fitting is carried out according to the prepared Chinese pinyin and the key point mouth type information and the time line of the normal spelling sequence according to the sequence of the Chinese pinyin, so as to obtain a time frame sequence array of the mouth type key point information.
The key points on the model are sequentially moved according to the key point displacement array by the step S5DFFD algorithm according to the time axis sequence, so that the deformation effect of the model is achieved. The specific flow is as follows:
s51, data preparation: we need to prepare key point displacement arrays of facial feature points, which can be obtained from step S4 according to the pinyin order. Meanwhile, a three-dimensional face model is needed, which contains key point information of a mouth area.
S52DFFD algorithm principle: the DFFD algorithm is an algorithm for three-dimensional model mouth deformation that achieves the deformation effect by displacing key points on the model. The algorithm adopts a time axis planning mode to map the key point displacement array according to the Pinyin sequence.
S53 keypoint tracking and morphing: in the DFFD algorithm, we first track facial feature points to ensure the continuity and accuracy of the key points. And then, applying the key point position data group to the key points on the model according to the pinyin sequence and the time axis planning times, and realizing the deformation of the mouth area of the three-dimensional model by moving the key points.
S54 time-series frame generation: by applying the keypoint locations to the keypoints on the model in time-axis order, we can gradually generate time-series frames of the three-dimensional model mouth deformation. Each frame represents the deformation state of the model mouth at a specific point in time, and by playing the frames sequentially, a continuous deformation effect can be formed.
The specific flow of step S6 is as follows:
s61 emotion classification information: in step S2 we obtain results of emotion analysis on the text information, resulting in what emotion the text expression is accepted to be, such as happiness, anger, fun, happiness, fear, etc.
S62 emotion key point displacement information matching: the emotion classification information is matched with the key point movement information of the person under different expressions acquired in the step 3, and the mapping relation between the emotion classification and the key point movement information is established,
we can determine key point displacement information related to a particular emotion.
S63 mapping using DFFD algorithm: using the DFFD algorithm obtained in step S5, we map the emotion keypoint displacement information onto the mouth deformation time-series frame of the three-dimensional model. And sequentially applying emotion key point displacement information to key points of the model according to a time axis sequence by using the DFFD algorithm, so that the expression change of the model is realized.
S64, generating complete expression animation: by applying the emotion key point displacement information to the mouth deformation time sequence frames of the three-dimensional model, a complete expression animation can be formed, and in each frame, the emotion key points and the mouth key points of the model can correspondingly move and deform according to the displacement. By playing these frames continuously, we can present a dynamic expression that varies according to emotion.
The method and the device realize the generation of real three-dimensional virtual character expression interactive animation according to the characters input by the user by utilizing the steps of Chinese pinyin, facial key point tracking, mouth shape mapping, mouth deformation, facial expression rendering and the like. The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to facilitate an understanding of the core concepts of the invention. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

Claims (6)

1. A facial expression synthesis and interaction control method based on text to Chinese pinyin is characterized in that,
s1, receiving text information input by a user;
s2, converting the input characters into corresponding Chinese pinyin representations, realizing the Chinese pinyin representations through a jieba-pypinyin algorithm model, calculating overall emotion scores according to the scores of words in the text by using an AFINN technology, and identifying emotion such as happiness, anger, grime, happiness, fear and the like;
s3, tracking and recording key point movement information of the face under different expressions by a face key point tracking technology, and recording mouth shape information of different pinyin letters when the Chinese pinyin is spoken, so that key feature points and mouth shape changes of the face can be accurately captured;
s4, matching and fitting the Chinese pinyin of the S2 and the key point mouth shape of the S3, and generating a time frame sequence array of key point mouth shape information according to the alphabetical order of the Chinese pinyin;
s5, mapping the key point time frame sequence obtained in the step S4 onto the three-dimensional model by using a DFFD algorithm to form a three-dimensional model mouth deformation time sequence frame;
s6, performing facial expression rendering on the three-dimensional model mouth deformation time sequence frame obtained in the S5, wherein the facial expression rendering comprises the steps of matching emotion classification information in the S2 with key point movement information of a person under different expressions acquired in the S3 to obtain emotion key point displacement information of the segment, and then mapping the emotion key point displacement information onto the three-dimensional model mouth deformation time sequence frame by using a DFFD algorithm to form a complete expression animation;
s7, providing real-time display and interaction control functions, wherein a user can control the deformation degree of the model.
2. The jieba-pypinyin algorithm according to claim 1, which implements conversion of the input text into a corresponding pinyin representation,
(2-1) word segmentation: the method comprises the steps of performing word segmentation on an input text by using a Chinese word segmentation algorithm jieba, and dividing the input text into individual words or single characters;
(2-2) Pinyin conversion: for each word or word after word segmentation, the word or word is converted into a corresponding pinyin representation by using a pinyin conversion algorithm pypinyin, and the pinyin conversion library converts each Chinese character into pinyin according to the pronunciation rules of the Chinese character, for example, converts the Chinese character 'you' into pinyin 'ni';
(2-3) combining pinyin: if the input text is a sentence or phrase, combining the pinyin representations of each word or word to form a complete pinyin representation;
(2-4) polyphone processing: in the pinyin conversion process, a multi-tone word may be encountered, that is, a case where one chinese character corresponds to a plurality of pronunciations, and according to the context or the common pronunciations of the words, a multi-tone word processing function in the pinyin conversion library Pypinyin is used to select an appropriate pronunciation.
3. The method for calculating an overall emotion score according to the score of a word in a text as claimed in claim 1, wherein the overall emotion score is calculated according to the score of a word in a text by using AFINN in S2, and simple emotions such as happiness, anger, grime, happiness, fear and the like are identified.
4. The facial-based key point tracking technology according to claim 1, wherein key point movement information of a face under different expressions is tracked and recorded,
(4-1) facial keypoint tracking: face detection and key point positioning are carried out on the face image by using a face key point tracking technology, which can identify the face area and position key characteristic points such as eyes, eyebrows, nose, mouth and the like, continuously track the positions of the key points of the face and record the movement information of the key points under different expressions,
such as happiness, anger, surprise, etc.;
(4-2) mouth recognition of Pinyin letters: when a user sends out Chinese pinyin, the shape and the movement of a mouth can be analyzed to capture the mouth shape information of the pinyin letters, the face key point tracking technology package OpenCV is used to identify the shape and the movement of the mouth and correlate the mouth shape and the movement with the pinyin letters, and the mouth shape information of the pinyin letters can be deduced by analyzing the change of the shape and the movement of the mouth;
(4-3) recording key points and mouth shape information: and recording the tracked face key points and the mouth shape information of the Pinyin letters to form a data set, wherein the data set comprises the coordinate positions of the recorded key points, the mouth shape state of the Pinyin letters and the like, and the data can be used for subsequent analysis, training models or application development.
5. The method of claim 1, wherein the sequence of keypoints time frames obtained in S4 is mapped onto a three-dimensional model using DFFD-AFINN algorithm to form a three-dimensional model mouth deformation time sequence frame, the facial feature points are tracked and deformed by DFFD algorithm, and the deformations are applied to the three-dimensional face model, and the time sequence frame is generated by: by applying the key point positions to the key points on the model in time axis sequence, we can gradually generate time series frames of the deformation of the mouth of the three-dimensional model, each frame representing the deformation state of the mouth of the model at a specific time point, and by sequentially playing the frames, a continuous deformation effect can be formed.
6. The emotion classification information of claim 1, which is matched with the acquired key point movement information of the person under different expressions in claim 3 to obtain the emotion key point displacement information of the segment, and then the emotion key point displacement information is mapped onto a mouth deformation time sequence frame of the three-dimensional model by using a DFFD algorithm to form a complete expression animation,
(6-1) emotion classification information: in S2, we obtain emotion analysis results about the text information, and what emotion the text expression is accepted is, such as happiness, anger, fun, happiness, fear and the like;
(6-2) emotion key point displacement information matching: the emotion classification information is matched with the key point movement information of the person under different expressions acquired in the step S3, and the key point displacement information related to the specific emotion can be determined by establishing a mapping relation between the emotion classification information and the key point movement information;
(6-3) mapping using DFFD algorithm: using the DFFD algorithm obtained in S5, mapping the emotion key point displacement information to a mouth deformation time sequence frame of the three-dimensional model, and sequentially applying the emotion key point displacement information to key points of the model by the DFFD algorithm according to a time axis sequence, so as to realize expression change of the model;
(6-4) complete expression animation generation: by applying the emotion key point displacement information to the mouth deformation time sequence frames of the three-dimensional model, a complete expression animation can be formed, in each frame, the emotion key points and the mouth key points of the model correspondingly move and deform according to the displacement, and by continuously playing the frames, a dynamic expression according to emotion change can be displayed.
CN202310658598.2A 2023-06-06 2023-06-06 Facial expression synthesis and interaction control method based on text-to-Chinese pinyin Pending CN116665275A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310658598.2A CN116665275A (en) 2023-06-06 2023-06-06 Facial expression synthesis and interaction control method based on text-to-Chinese pinyin

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310658598.2A CN116665275A (en) 2023-06-06 2023-06-06 Facial expression synthesis and interaction control method based on text-to-Chinese pinyin

Publications (1)

Publication Number Publication Date
CN116665275A true CN116665275A (en) 2023-08-29

Family

ID=87716819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310658598.2A Pending CN116665275A (en) 2023-06-06 2023-06-06 Facial expression synthesis and interaction control method based on text-to-Chinese pinyin

Country Status (1)

Country Link
CN (1) CN116665275A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117292030A (en) * 2023-10-27 2023-12-26 海看网络科技(山东)股份有限公司 Method and system for generating three-dimensional digital human animation
CN118410813A (en) * 2024-02-27 2024-07-30 濮阳职业技术学院 Language learning method, system and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117292030A (en) * 2023-10-27 2023-12-26 海看网络科技(山东)股份有限公司 Method and system for generating three-dimensional digital human animation
CN118410813A (en) * 2024-02-27 2024-07-30 濮阳职业技术学院 Language learning method, system and storage medium
CN118410813B (en) * 2024-02-27 2024-09-27 濮阳职业技术学院 Language learning method, system and storage medium

Similar Documents

Publication Publication Date Title
JP7557055B2 (en) Method, device, equipment and computer program for driving the movement of a target object
CN106653052B (en) Virtual human face animation generation method and device
Hong et al. Real-time speech-driven face animation with expressions using neural networks
Pham et al. Speech-driven 3D facial animation with implicit emotional awareness: A deep learning approach
CN113378806B (en) Audio-driven face animation generation method and system integrating emotion coding
US8224652B2 (en) Speech and text driven HMM-based body animation synthesis
CN109065055A (en) Method, storage medium and the device of AR content are generated based on sound
CN112581569B (en) Adaptive emotion expression speaker facial animation generation method and electronic device
CN106919251A (en) A kind of collaborative virtual learning environment natural interactive method based on multi-modal emotion recognition
CN116665275A (en) Facial expression synthesis and interaction control method based on text-to-Chinese pinyin
CN101187990A (en) A session robotic system
KR20120130627A (en) Apparatus and method for generating animation using avatar
WO2022267380A1 (en) Face motion synthesis method based on voice driving, electronic device, and storage medium
WO2023284435A1 (en) Method and apparatus for generating animation
CN106446406A (en) Simulation system and simulation method for converting Chinese sentences into human mouth shapes
Verma et al. A comprehensive review on automation of Indian sign language
CN117251057A (en) AIGC-based method and system for constructing AI number wisdom
Ding et al. Speech-driven eyebrow motion synthesis with contextual markovian models
Li et al. AI-based visual speech recognition towards realistic avatars and lip-reading applications in the metaverse
Websdale et al. Speaker-independent speech animation using perceptual loss functions and synthetic data
Sagawa et al. Pattern recognition and synthesis for a sign language translation system
Wu et al. Speech synthesis with face embeddings
CN116883608B (en) Multi-mode digital person social attribute control method and related device
CN1952850A (en) Three-dimensional face cartoon method driven by voice based on dynamic elementary access
Li et al. A novel speech-driven lip-sync model with CNN and LSTM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination