CN117827001A - Digital virtual person generation method based on cross-modal emotion analysis - Google Patents

Digital virtual person generation method based on cross-modal emotion analysis Download PDF

Info

Publication number
CN117827001A
CN117827001A CN202410014719.4A CN202410014719A CN117827001A CN 117827001 A CN117827001 A CN 117827001A CN 202410014719 A CN202410014719 A CN 202410014719A CN 117827001 A CN117827001 A CN 117827001A
Authority
CN
China
Prior art keywords
cross
model
emotion
virtual person
modal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410014719.4A
Other languages
Chinese (zh)
Inventor
张东裕
李思雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi University of Finance and Economics
Original Assignee
Jiangxi University of Finance and Economics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi University of Finance and Economics filed Critical Jiangxi University of Finance and Economics
Priority to CN202410014719.4A priority Critical patent/CN117827001A/en
Publication of CN117827001A publication Critical patent/CN117827001A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a method and a system for generating a digital virtual person based on cross-modal emotion analysis, which realize the ability of accurately understanding the emotion of a user and generating the expression and language output of a corresponding virtual person through a cross-modal emotion analysis technology. The method has wide application prospects in the fields of virtual reality, games, advertisements and the like, and improves the emotion interaction effect between the user and the virtual person.

Description

Digital virtual person generation method based on cross-modal emotion analysis
Technical Field
The invention relates to a method and a system for generating a digital virtual person based on cross-modal emotion analysis, which are suitable for the fields of computer graphics and artificial intelligence.
Background
The digital virtual person is an artificial entity with lifelike appearance and intelligent interaction capability, and has wide application in the fields of virtual reality, games, advertisements and the like. However, the conventional digital virtual person generating method often carries out user emotion analysis through a single mode, so that the conventional digital virtual person generating method often cannot accurately capture the emotion and expression requirements of the user, and lacks emotion expression capability when interacting with the user. In addition, the traditional digital virtual man generation method generally lacks understanding of natural language semantics through a simple segmentation method when processing language related input of a user, and can influence the accuracy of emotion capture of the user. Therefore, there is a need for a novel method that enables accurate understanding of user emotion and generation of corresponding virtual human expressions and languages through cross-modal emotion analysis.
Disclosure of Invention
The invention provides a method and a system for generating a digital virtual person based on cross-modal emotion analysis, which extract emotion information from texts and images input by a user through a cross-modal emotion analysis technology, and generate corresponding virtual person expression and language output so as to realize effective interaction with the emotion of the user.
The invention specifically discloses a digital virtual person generation method based on cross-modal emotion analysis, which comprises the following steps:
s1, collecting and labeling data, and collecting a multi-mode data set containing texts, images and corresponding emotion labels as a basis for model training and evaluation;
s2, text preprocessing: performing word segmentation, stop word removal and word stem pretreatment operation on a text input by a user so as to facilitate subsequent emotion feature extraction and generation treatment;
s3: preprocessing an image;
s4: cross-modal feature extraction;
s5: cross-modal emotion representation learning;
s6: generating a model for processing a virtual person;
s7, model training, model evaluation and tuning;
and S8, generating an application by the digital virtual person, and deploying the trained model into the actual application, such as a virtual reality environment and a game role. According to the text and image information input by the user, the model generates virtual human expression and language output corresponding to the emotion expression. And carrying out digital virtual person generation application development by adopting a Unity virtual person modeling and rendering engine.
In a preferred scheme, the image preprocessing performs resizing, cropping and normalization preprocessing operations on the image input by the user so as to adapt to the processing requirements of subsequent feature extraction and generation.
In a preferred scheme, the cross-modal feature extraction utilizes a deep learning method to extract emotion related features from text and images respectively. For example, emotion vocabulary and syntactic structural features are extracted from texts, and facial expressions and color features are extracted from images. And extracting features by adopting BERT and a convolutional neural network model.
In a preferred scheme, the shared cross-modal emotion expression space maps and fuses emotion information of texts and images. Emotion representation learning is performed using a transducer and attention mechanism model.
In a preferred embodiment, the virtual person generating model specifically includes: and designing a virtual person generation model, and taking the cross-modal emotion representation as input to generate corresponding virtual person expression and language output. The generation processing can be performed by adopting a generation countermeasure network, and the virtual person generation model can be optimized by adopting a conditional generation and multi-mode fusion method.
In a preferred embodiment, the model training utilizes a annotated multimodal dataset to train a virtual person to generate a model. Model parameters are optimized by minimizing the generation error. And performing model training by adopting a cross-validation method.
In the preferred scheme, the model evaluation and tuning uses a test set to evaluate the trained model, and the model hyper-parameters and structure are adjusted to improve the accuracy and fidelity of the virtual human generation. And (5) evaluating and generating an effect by adopting a perception evaluation and user investigation method.
The invention realizes the digital virtual person generation method based on cross-modal emotion analysis, and can enable the virtual person to realize emotion interaction with the user through effective emotion expression learning and generation processing. The method has wide application prospect in the fields of virtual reality, games, advertisements and the like, and improves user experience and emotion interaction effect
Drawings
FIG. 1 is a schematic diagram of an embodiment of the present invention;
FIG. 2 is a flow chart for use with the present method;
Detailed Description
The present invention will be described in detail with reference to the accompanying drawings.
As shown in FIG. 1, the invention provides a digital virtual person generating method and a system based on cross-modal emotion analysis, which extract emotion information from texts and images input by a user through a cross-modal emotion analysis technology, and generate corresponding virtual person expressions and language outputs so as to realize effective interaction with emotion of the user.
The implementation steps are as follows:
data collection and labeling: a multimodal dataset containing text, images and corresponding emotion tags is collected as a basis for model training and assessment.
Text preprocessing: and performing preprocessing operations such as word segmentation, stop word removal, word drying and the like on the text input by the user so as to facilitate subsequent emotion feature extraction and generation processing.
Image preprocessing: and carrying out preprocessing operations such as resizing, clipping, normalization and the like on the image input by the user so as to adapt to the processing requirements of subsequent feature extraction and generation.
Cross-modal feature extraction: and extracting emotion related features from the text and the image respectively by using a deep learning method. For example, features such as emotion vocabulary and syntax structure are extracted from text, and facial expression and color features are extracted from images. And extracting features by adopting models such as BERT, convolutional neural network and the like.
Cross-modal emotion representation learning: and combining the characteristics of the text and the image, learning a sharable cross-mode emotion expression space by using a deep learning method, and mapping and fusing emotion information of the text and the image. Emotion expression learning is performed by using models such as a transducer and an attention mechanism.
Virtual person generation model processing: and performing virtual human generation model processing through the neural network model, and generating corresponding virtual human expression and language output by taking the cross-modal emotion representation as input. The generation process may be performed using a model such as a Generation Antagonism Network (GAN). And optimizing the virtual person generation model by adopting methods such as condition generation, multi-mode fusion and the like.
Model training: training the virtual man to generate a model by using the marked multi-mode data set. Model parameters are optimized by minimizing the generation error. And performing model training by adopting methods such as cross validation and the like.
Model evaluation and tuning: and evaluating the trained model by using the test set, and adjusting the super-parameters and the structure of the model to improve the accuracy and the fidelity of the virtual human generation. And adopting methods such as perception evaluation, user investigation and the like to evaluate the generated effect.
Digital virtual person generation application: the trained models are deployed into practical applications, such as virtual reality environments, game characters, and the like. According to the text and image information input by the user, the model generates virtual human expression and language output corresponding to the emotion expression. And carrying out digital virtual person generation application development by adopting a Unity and other virtual person modeling and rendering engine.
The invention realizes the digital virtual person generation method based on cross-modal emotion analysis, and can enable the virtual person to realize emotion interaction with the user through effective emotion expression learning and generation processing. The method has wide application prospect in the fields of virtual reality, games, advertisements and the like, and improves user experience and emotion interaction effect.

Claims (7)

1. A digital virtual person generating method based on cross-modal emotion analysis is characterized by comprising the following steps:
s1, collecting and labeling data, and collecting a multi-mode data set containing texts, images and corresponding emotion labels as a basis for model training and evaluation;
s2, text preprocessing: performing word segmentation, stop word removal and word stem pretreatment operation on a text input by a user so as to facilitate subsequent emotion feature extraction and generation treatment;
s3: preprocessing an image;
s4: cross-modal feature extraction;
s5: cross-modal emotion representation learning;
s6: generating a model for processing a virtual person;
s7, model training, model evaluation and tuning;
and S8, generating an application by the digital virtual person, and deploying the trained model into the actual application, such as a virtual reality environment and a game role. According to the text and image information input by the user, the model generates virtual human expression and language output corresponding to the emotion expression. And carrying out digital virtual person generation application development by adopting a Unity virtual person modeling and rendering engine.
2. The method for generating digital virtual persons based on cross-modal emotion analysis according to claim 1, wherein the image preprocessing performs resizing, cropping and normalizing preprocessing operations on the image input by the user so as to adapt to the subsequent feature extraction and generation processing requirements.
3. The method for generating digital virtual persons based on cross-modal emotion analysis according to claim 1, wherein the cross-modal feature extraction utilizes a deep learning method to extract emotion related features from texts and images respectively. For example, emotion vocabulary and syntactic structural features are extracted from texts, and facial expressions and color features are extracted from images. And extracting features by adopting BERT and a convolutional neural network model.
4. The method for generating digital virtual persons based on cross-modal emotion analysis according to claim 1, wherein the cross-modal emotion expression learning combines the characteristics of texts and images, a sharable cross-modal emotion expression space is learned by using a deep learning method, and emotion information of the texts and the images is mapped and fused. Emotion representation learning is performed using a transducer and attention mechanism model.
5. The method for generating a digital virtual person based on cross-modal emotion analysis according to claim 1, wherein the virtual person generation model specifically comprises: and designing a virtual person generation model, and taking the cross-modal emotion representation as input to generate corresponding virtual person expression and language output. The generation processing can be performed by adopting a generation countermeasure network, and the virtual person generation model can be optimized by adopting a conditional generation and multi-mode fusion method.
6. The method for generating a digital virtual person based on cross-modal emotion analysis as recited in claim 1, wherein the model training utilizes a marked multi-modal data set to train the virtual person to generate the model. Model parameters are optimized by minimizing the generation error. And performing model training by adopting a cross-validation method.
7. The method for generating a digital virtual person based on cross-modal emotion analysis as recited in claim 1, wherein the model evaluation and tuning uses a test set to evaluate the trained model, and adjusts the model hyper-parameters and structure to improve the accuracy and fidelity of the virtual person generation. And (5) evaluating and generating an effect by adopting a perception evaluation and user investigation method.
CN202410014719.4A 2024-01-04 2024-01-04 Digital virtual person generation method based on cross-modal emotion analysis Pending CN117827001A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410014719.4A CN117827001A (en) 2024-01-04 2024-01-04 Digital virtual person generation method based on cross-modal emotion analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410014719.4A CN117827001A (en) 2024-01-04 2024-01-04 Digital virtual person generation method based on cross-modal emotion analysis

Publications (1)

Publication Number Publication Date
CN117827001A true CN117827001A (en) 2024-04-05

Family

ID=90513092

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410014719.4A Pending CN117827001A (en) 2024-01-04 2024-01-04 Digital virtual person generation method based on cross-modal emotion analysis

Country Status (1)

Country Link
CN (1) CN117827001A (en)

Similar Documents

Publication Publication Date Title
CN110750959B (en) Text information processing method, model training method and related device
Farooq et al. Advances in machine translation for sign language: approaches, limitations, and challenges
Joksimoski et al. Technological solutions for sign language recognition: a scoping review of research trends, challenges, and opportunities
CN111967334B (en) Human body intention identification method, system and storage medium
Sonawane et al. Speech to Indian sign language (ISL) translation system
CN117251057A (en) AIGC-based method and system for constructing AI number wisdom
CN117011875A (en) Method, device, equipment, medium and program product for generating multimedia page
CN116579348A (en) False news detection method and system based on uncertain semantic fusion
CN116662924A (en) Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism
Balayn et al. Data-driven development of virtual sign language communication agents
CN110135583A (en) The generation method of markup information, the generating means of markup information and electronic equipment
CN117827001A (en) Digital virtual person generation method based on cross-modal emotion analysis
Javaid et al. Manual and non-manual sign language recognition framework using hybrid deep learning techniques
CN113792143A (en) Capsule network-based multi-language emotion classification method, device, equipment and storage medium
Cho et al. Design of image generation system for DCGAN-based kids' book text
Petkar et al. Real Time Sign Language Recognition System for Hearing and Speech Impaired People
Rai et al. MyOcrTool: visualization system for generating associative images of Chinese characters in smart devices
Kane et al. Towards establishing a mute communication: An Indian sign language perspective
CN111062207A (en) Expression image processing method and device, computer storage medium and electronic equipment
Lima et al. Using convolutional neural networks for fingerspelling sign recognition in brazilian sign language
Sevilla et al. Tools for the Use of SignWriting as a Language Resource
Pari et al. SLatAR-A Sign Language Translating Augmented Reality Application
CN115658964B (en) Training method and device for pre-training model and somatosensory wind identification model
CN114417001B (en) Chinese writing intelligent analysis method, system and medium based on multi-mode
Menon et al. SIGN LANGUAGE RECOGNITION USING CONVOLUTIONAL NEURAL NETWORKS IN MACHINE LEARNING.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination