CN117827001A - Digital virtual person generation method based on cross-modal emotion analysis - Google Patents
Digital virtual person generation method based on cross-modal emotion analysis Download PDFInfo
- Publication number
- CN117827001A CN117827001A CN202410014719.4A CN202410014719A CN117827001A CN 117827001 A CN117827001 A CN 117827001A CN 202410014719 A CN202410014719 A CN 202410014719A CN 117827001 A CN117827001 A CN 117827001A
- Authority
- CN
- China
- Prior art keywords
- cross
- model
- emotion
- virtual person
- modal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000008451 emotion Effects 0.000 title claims abstract description 68
- 238000000034 method Methods 0.000 title claims abstract description 38
- 230000014509 gene expression Effects 0.000 claims abstract description 22
- 230000000694 effects Effects 0.000 claims abstract description 6
- 238000012545 processing Methods 0.000 claims description 13
- 238000007781 pre-processing Methods 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 12
- 238000000605 extraction Methods 0.000 claims description 11
- 238000011156 evaluation Methods 0.000 claims description 10
- 238000013135 deep learning Methods 0.000 claims description 5
- 230000011218 segmentation Effects 0.000 claims description 4
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 238000002790 cross-validation Methods 0.000 claims description 3
- 238000011161 development Methods 0.000 claims description 3
- 230000018109 developmental process Effects 0.000 claims description 3
- 230000008921 facial expression Effects 0.000 claims description 3
- 238000011835 investigation Methods 0.000 claims description 3
- 238000002372 labelling Methods 0.000 claims description 3
- 230000008447 perception Effects 0.000 claims description 3
- 238000009877 rendering Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 238000007500 overflow downdraw method Methods 0.000 claims description 2
- 230000003993 interaction Effects 0.000 abstract description 8
- 238000010606 normalization Methods 0.000 description 2
- 230000008485 antagonism Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000001035 drying Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses a method and a system for generating a digital virtual person based on cross-modal emotion analysis, which realize the ability of accurately understanding the emotion of a user and generating the expression and language output of a corresponding virtual person through a cross-modal emotion analysis technology. The method has wide application prospects in the fields of virtual reality, games, advertisements and the like, and improves the emotion interaction effect between the user and the virtual person.
Description
Technical Field
The invention relates to a method and a system for generating a digital virtual person based on cross-modal emotion analysis, which are suitable for the fields of computer graphics and artificial intelligence.
Background
The digital virtual person is an artificial entity with lifelike appearance and intelligent interaction capability, and has wide application in the fields of virtual reality, games, advertisements and the like. However, the conventional digital virtual person generating method often carries out user emotion analysis through a single mode, so that the conventional digital virtual person generating method often cannot accurately capture the emotion and expression requirements of the user, and lacks emotion expression capability when interacting with the user. In addition, the traditional digital virtual man generation method generally lacks understanding of natural language semantics through a simple segmentation method when processing language related input of a user, and can influence the accuracy of emotion capture of the user. Therefore, there is a need for a novel method that enables accurate understanding of user emotion and generation of corresponding virtual human expressions and languages through cross-modal emotion analysis.
Disclosure of Invention
The invention provides a method and a system for generating a digital virtual person based on cross-modal emotion analysis, which extract emotion information from texts and images input by a user through a cross-modal emotion analysis technology, and generate corresponding virtual person expression and language output so as to realize effective interaction with the emotion of the user.
The invention specifically discloses a digital virtual person generation method based on cross-modal emotion analysis, which comprises the following steps:
s1, collecting and labeling data, and collecting a multi-mode data set containing texts, images and corresponding emotion labels as a basis for model training and evaluation;
s2, text preprocessing: performing word segmentation, stop word removal and word stem pretreatment operation on a text input by a user so as to facilitate subsequent emotion feature extraction and generation treatment;
s3: preprocessing an image;
s4: cross-modal feature extraction;
s5: cross-modal emotion representation learning;
s6: generating a model for processing a virtual person;
s7, model training, model evaluation and tuning;
and S8, generating an application by the digital virtual person, and deploying the trained model into the actual application, such as a virtual reality environment and a game role. According to the text and image information input by the user, the model generates virtual human expression and language output corresponding to the emotion expression. And carrying out digital virtual person generation application development by adopting a Unity virtual person modeling and rendering engine.
In a preferred scheme, the image preprocessing performs resizing, cropping and normalization preprocessing operations on the image input by the user so as to adapt to the processing requirements of subsequent feature extraction and generation.
In a preferred scheme, the cross-modal feature extraction utilizes a deep learning method to extract emotion related features from text and images respectively. For example, emotion vocabulary and syntactic structural features are extracted from texts, and facial expressions and color features are extracted from images. And extracting features by adopting BERT and a convolutional neural network model.
In a preferred scheme, the shared cross-modal emotion expression space maps and fuses emotion information of texts and images. Emotion representation learning is performed using a transducer and attention mechanism model.
In a preferred embodiment, the virtual person generating model specifically includes: and designing a virtual person generation model, and taking the cross-modal emotion representation as input to generate corresponding virtual person expression and language output. The generation processing can be performed by adopting a generation countermeasure network, and the virtual person generation model can be optimized by adopting a conditional generation and multi-mode fusion method.
In a preferred embodiment, the model training utilizes a annotated multimodal dataset to train a virtual person to generate a model. Model parameters are optimized by minimizing the generation error. And performing model training by adopting a cross-validation method.
In the preferred scheme, the model evaluation and tuning uses a test set to evaluate the trained model, and the model hyper-parameters and structure are adjusted to improve the accuracy and fidelity of the virtual human generation. And (5) evaluating and generating an effect by adopting a perception evaluation and user investigation method.
The invention realizes the digital virtual person generation method based on cross-modal emotion analysis, and can enable the virtual person to realize emotion interaction with the user through effective emotion expression learning and generation processing. The method has wide application prospect in the fields of virtual reality, games, advertisements and the like, and improves user experience and emotion interaction effect
Drawings
FIG. 1 is a schematic diagram of an embodiment of the present invention;
FIG. 2 is a flow chart for use with the present method;
Detailed Description
The present invention will be described in detail with reference to the accompanying drawings.
As shown in FIG. 1, the invention provides a digital virtual person generating method and a system based on cross-modal emotion analysis, which extract emotion information from texts and images input by a user through a cross-modal emotion analysis technology, and generate corresponding virtual person expressions and language outputs so as to realize effective interaction with emotion of the user.
The implementation steps are as follows:
data collection and labeling: a multimodal dataset containing text, images and corresponding emotion tags is collected as a basis for model training and assessment.
Text preprocessing: and performing preprocessing operations such as word segmentation, stop word removal, word drying and the like on the text input by the user so as to facilitate subsequent emotion feature extraction and generation processing.
Image preprocessing: and carrying out preprocessing operations such as resizing, clipping, normalization and the like on the image input by the user so as to adapt to the processing requirements of subsequent feature extraction and generation.
Cross-modal feature extraction: and extracting emotion related features from the text and the image respectively by using a deep learning method. For example, features such as emotion vocabulary and syntax structure are extracted from text, and facial expression and color features are extracted from images. And extracting features by adopting models such as BERT, convolutional neural network and the like.
Cross-modal emotion representation learning: and combining the characteristics of the text and the image, learning a sharable cross-mode emotion expression space by using a deep learning method, and mapping and fusing emotion information of the text and the image. Emotion expression learning is performed by using models such as a transducer and an attention mechanism.
Virtual person generation model processing: and performing virtual human generation model processing through the neural network model, and generating corresponding virtual human expression and language output by taking the cross-modal emotion representation as input. The generation process may be performed using a model such as a Generation Antagonism Network (GAN). And optimizing the virtual person generation model by adopting methods such as condition generation, multi-mode fusion and the like.
Model training: training the virtual man to generate a model by using the marked multi-mode data set. Model parameters are optimized by minimizing the generation error. And performing model training by adopting methods such as cross validation and the like.
Model evaluation and tuning: and evaluating the trained model by using the test set, and adjusting the super-parameters and the structure of the model to improve the accuracy and the fidelity of the virtual human generation. And adopting methods such as perception evaluation, user investigation and the like to evaluate the generated effect.
Digital virtual person generation application: the trained models are deployed into practical applications, such as virtual reality environments, game characters, and the like. According to the text and image information input by the user, the model generates virtual human expression and language output corresponding to the emotion expression. And carrying out digital virtual person generation application development by adopting a Unity and other virtual person modeling and rendering engine.
The invention realizes the digital virtual person generation method based on cross-modal emotion analysis, and can enable the virtual person to realize emotion interaction with the user through effective emotion expression learning and generation processing. The method has wide application prospect in the fields of virtual reality, games, advertisements and the like, and improves user experience and emotion interaction effect.
Claims (7)
1. A digital virtual person generating method based on cross-modal emotion analysis is characterized by comprising the following steps:
s1, collecting and labeling data, and collecting a multi-mode data set containing texts, images and corresponding emotion labels as a basis for model training and evaluation;
s2, text preprocessing: performing word segmentation, stop word removal and word stem pretreatment operation on a text input by a user so as to facilitate subsequent emotion feature extraction and generation treatment;
s3: preprocessing an image;
s4: cross-modal feature extraction;
s5: cross-modal emotion representation learning;
s6: generating a model for processing a virtual person;
s7, model training, model evaluation and tuning;
and S8, generating an application by the digital virtual person, and deploying the trained model into the actual application, such as a virtual reality environment and a game role. According to the text and image information input by the user, the model generates virtual human expression and language output corresponding to the emotion expression. And carrying out digital virtual person generation application development by adopting a Unity virtual person modeling and rendering engine.
2. The method for generating digital virtual persons based on cross-modal emotion analysis according to claim 1, wherein the image preprocessing performs resizing, cropping and normalizing preprocessing operations on the image input by the user so as to adapt to the subsequent feature extraction and generation processing requirements.
3. The method for generating digital virtual persons based on cross-modal emotion analysis according to claim 1, wherein the cross-modal feature extraction utilizes a deep learning method to extract emotion related features from texts and images respectively. For example, emotion vocabulary and syntactic structural features are extracted from texts, and facial expressions and color features are extracted from images. And extracting features by adopting BERT and a convolutional neural network model.
4. The method for generating digital virtual persons based on cross-modal emotion analysis according to claim 1, wherein the cross-modal emotion expression learning combines the characteristics of texts and images, a sharable cross-modal emotion expression space is learned by using a deep learning method, and emotion information of the texts and the images is mapped and fused. Emotion representation learning is performed using a transducer and attention mechanism model.
5. The method for generating a digital virtual person based on cross-modal emotion analysis according to claim 1, wherein the virtual person generation model specifically comprises: and designing a virtual person generation model, and taking the cross-modal emotion representation as input to generate corresponding virtual person expression and language output. The generation processing can be performed by adopting a generation countermeasure network, and the virtual person generation model can be optimized by adopting a conditional generation and multi-mode fusion method.
6. The method for generating a digital virtual person based on cross-modal emotion analysis as recited in claim 1, wherein the model training utilizes a marked multi-modal data set to train the virtual person to generate the model. Model parameters are optimized by minimizing the generation error. And performing model training by adopting a cross-validation method.
7. The method for generating a digital virtual person based on cross-modal emotion analysis as recited in claim 1, wherein the model evaluation and tuning uses a test set to evaluate the trained model, and adjusts the model hyper-parameters and structure to improve the accuracy and fidelity of the virtual person generation. And (5) evaluating and generating an effect by adopting a perception evaluation and user investigation method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410014719.4A CN117827001A (en) | 2024-01-04 | 2024-01-04 | Digital virtual person generation method based on cross-modal emotion analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410014719.4A CN117827001A (en) | 2024-01-04 | 2024-01-04 | Digital virtual person generation method based on cross-modal emotion analysis |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117827001A true CN117827001A (en) | 2024-04-05 |
Family
ID=90513092
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410014719.4A Pending CN117827001A (en) | 2024-01-04 | 2024-01-04 | Digital virtual person generation method based on cross-modal emotion analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117827001A (en) |
-
2024
- 2024-01-04 CN CN202410014719.4A patent/CN117827001A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110750959B (en) | Text information processing method, model training method and related device | |
Farooq et al. | Advances in machine translation for sign language: approaches, limitations, and challenges | |
Joksimoski et al. | Technological solutions for sign language recognition: a scoping review of research trends, challenges, and opportunities | |
CN111967334B (en) | Human body intention identification method, system and storage medium | |
Sonawane et al. | Speech to Indian sign language (ISL) translation system | |
CN117251057A (en) | AIGC-based method and system for constructing AI number wisdom | |
CN117011875A (en) | Method, device, equipment, medium and program product for generating multimedia page | |
CN116579348A (en) | False news detection method and system based on uncertain semantic fusion | |
CN116662924A (en) | Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism | |
Balayn et al. | Data-driven development of virtual sign language communication agents | |
CN110135583A (en) | The generation method of markup information, the generating means of markup information and electronic equipment | |
CN117827001A (en) | Digital virtual person generation method based on cross-modal emotion analysis | |
Javaid et al. | Manual and non-manual sign language recognition framework using hybrid deep learning techniques | |
CN113792143A (en) | Capsule network-based multi-language emotion classification method, device, equipment and storage medium | |
Cho et al. | Design of image generation system for DCGAN-based kids' book text | |
Petkar et al. | Real Time Sign Language Recognition System for Hearing and Speech Impaired People | |
Rai et al. | MyOcrTool: visualization system for generating associative images of Chinese characters in smart devices | |
Kane et al. | Towards establishing a mute communication: An Indian sign language perspective | |
CN111062207A (en) | Expression image processing method and device, computer storage medium and electronic equipment | |
Lima et al. | Using convolutional neural networks for fingerspelling sign recognition in brazilian sign language | |
Sevilla et al. | Tools for the Use of SignWriting as a Language Resource | |
Pari et al. | SLatAR-A Sign Language Translating Augmented Reality Application | |
CN115658964B (en) | Training method and device for pre-training model and somatosensory wind identification model | |
CN114417001B (en) | Chinese writing intelligent analysis method, system and medium based on multi-mode | |
Menon et al. | SIGN LANGUAGE RECOGNITION USING CONVOLUTIONAL NEURAL NETWORKS IN MACHINE LEARNING. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |