CN117369681A

CN117369681A - Digital human image interaction system based on artificial intelligence technology

Info

Publication number: CN117369681A
Application number: CN202311404134.5A
Authority: CN
Inventors: 陈章勤
Original assignee: Weimai Technology Co ltd
Current assignee: Weimai Technology Co ltd
Priority date: 2023-10-27
Filing date: 2023-10-27
Publication date: 2024-01-09

Abstract

The invention relates to a digital human image interaction system based on an artificial intelligence technology, which comprises a real-time three-dimensional modeling module, wherein one side of the real-time three-dimensional modeling module is provided with a facial expression database, one side of the facial expression database is provided with a network connection module, and a processor is arranged between the three-dimensional modeling module and one side of the network connection module. According to the invention, the audio information of the voice signal is analyzed, the three-dimensional module is generated in real time for display, the facial expressions of the model can be replaced in real time according to different characters, and the facial expression database is arranged for storing various facial expressions, so that a digital person can form different mouth actions, and the voice track synchronization module is arranged, so that the facial expressions of the digital person and the voice content are synchronously carried out, the fidelity and expressive force of the image of the digital person can be greatly improved, the digital person is more close to the expression and the mouth shape of the real human, and better use experience and immersion feeling are brought to a user.

Description

Digital human image interaction system based on artificial intelligence technology

Technical Field

The invention relates to the technical field of digital human figure interaction systems, in particular to a digital human figure interaction system based on an artificial intelligence technology.

Background

A digital human interactive system is a system that simulates and creates virtual characters that interact with a human user using computer technology. These virtual characters are typically constructed based on artificial intelligence and computer graphics technology that enable human interaction with users for conversations, actions, and expressions.

Digital human interactive systems may be applied in a number of fields including virtual assistants, virtual anchor, virtual tour guides, etc. The following are some common applications of digital human interactive systems.

However, in the prior art, although the existing digital person can interact with the user, the existing digital person has obvious defects in actual use, and the specific defects are as follows:

most digital people tend to have the condition of asynchronous audio frequency when speaking, so that uncoordinated feeling can be caused, and the digital people face of the user tends to be a fixed expression, so that immersive feeling cannot be given to the user.

Accordingly, those skilled in the art have proposed a digital human image interaction system based on artificial intelligence techniques.

Disclosure of Invention

In view of the foregoing problems with the prior art, it is a primary object of the present invention to provide a digital human image interaction system based on artificial intelligence technology.

The technical scheme of the invention is as follows: the digital human figure interactive system based on the artificial intelligence technology comprises a real-time three-dimensional modeling module, wherein one side of the real-time three-dimensional modeling module is provided with a facial expression database, one side of the facial expression database is provided with a network connection module, a processor is arranged between one side of the three-dimensional modeling module and one side of the network connection module, one side of the processor, which is far away from the real-time three-dimensional modeling module, is provided with a dialect database, one side of the real-time three-dimensional modeling module, which is far away from the processor, is provided with a human voice simulation module, one side of the real-time three-dimensional modeling module, which is far away from the facial expression database, is provided with an image transmission module, and a control module is arranged between the image transmission module and one side of the processor;

the real-time three-dimensional modeling module can be set into software such as Blender or ZBrush;

the facial expression database is set and stores specific parameters of facial expressions according to corresponding characters;

the network connection module is used for collecting facial expressions in a networking mode and storing different facial expression parameters in a facial expression database;

the processor is used for calling parameters in the facial database and controlling the real-time three-dimensional modeling module to use;

the dialect database is used for storing local languages, can be called, is suitable for users with different mandarin non-standards, and can improve the recognition function of the system;

the human voice simulation module is used for matching with the facial expression database, and when the facial expression database calls specific facial data of a certain word, the human voice simulation module calls the corresponding word;

the image transmission module is used for transmitting the model constructed by the three-dimensional modeling module, so that the model is displayed to the outside through the carrier;

the control module is used for receiving the command sent by the processor and controlling the command.

As a preferred implementation manner, a display screen is arranged on one side, far away from the real-time three-dimensional modeling module, of the image transmission module, a microphone is arranged on the outer side of the display screen, a loudspeaker is arranged on one side, located on the microphone, of the display screen, an alternating current module is arranged in the display screen, and an audio same-track module is arranged on one side, located in the alternating current module, of the display screen;

the display screen is used as a carrier, and the image transmission module can display the modeled model through the display screen;

the microphone is used for receiving voice commands of a user;

the loudspeaker is used for transmitting sound generated by the human voice simulation module;

the communication module is used for converting the information collected by the microphone to change the information into command information which can be recognized by the system;

the audio co-track module is used for matching the information transmitted by the image transmission module and the voice simulation module, so that the image information and the voice information of the audio co-track module are matched.

As a preferred implementation mode, the network connection module is in bidirectional electrical connection with the facial expression database, the facial expression database is in bidirectional electrical connection with the real-time three-dimensional modeling module, the processor, the voice simulation module and the image transmission module are in bidirectional electrical connection with the real-time three-dimensional modeling module, the image transmission module is in electrical connection with the display screen, the alternating current module and the audio co-track module are in bidirectional electrical connection with the control module, the microphone is in bidirectional electrical connection with the alternating current module, the image transmission module is in bidirectional electrical connection with the audio co-track module, and the processor is in bidirectional electrical connection with the facial expression database.

As a preferred embodiment, the network connection module may be configured to be used in a wired connection, a WIFI connection, and a hotspot connection.

As a preferred embodiment, the ac module is internally provided with a noise reduction function, and noise reduction codes thereof are as follows:

import noisereduce as nr

import soundfile as sf

# read Audio File

audio_data,sample_rate＝sf.read('input.wav')

# extract noise samples

noisy_part＝audio_data[5000:15000]

Noise reduction processing #

reduced_noise＝nr.reduce_noise(audio_clip＝audio_data,noise_clip＝noisy_part,verbose＝False)

# save denoised audio file

sf.write('output.wav',reduced_noise,sample_rate)。

As a preferred embodiment, the image transmission module is internally provided with an image enhancement function, and the code is set as follows:

import cv2

def enhance_image(image_path):

# read image

image＝cv2.imread(image_path)

# convert image to gray-scale image

gray＝cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)

Enhancing an image using Laplacian pyramid algorithm

blurred＝cv2.GaussianBlur(gray,(5,5),0)

sharpened＝cv2.addWeighted(gray,1.5,blurred,-0.5,0)

# save enhanced image

output_path＝'enhanced_image.jpg'

cv2.imwrite(output_path,sharpened)

return output_path。

As a preferred embodiment, the track co-track module has integrated therein a voice synchronization function, and the running code is configured to:

import cv2

def enhance_image(image_path):

# read image

image＝cv2.imread(image_path)

# convert image to gray-scale image

gray＝cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)

Enhancing an image using Laplacian pyramid algorithm

blurred＝cv2.GaussianBlur(gray,(5,5),0)

sharpened＝cv2.addWeighted(gray,1.5,blurred,-0.5,0)

# save enhanced image

output_path＝'enhanced_image.jpg'

cv2.imwrite(output_path,sharpened)

return output_path。

Compared with the prior art, the invention has the advantages and positive effects that,

according to the invention, the audio information of the voice signal is analyzed, the three-dimensional module is generated in real time for display, the facial expressions of the model can be replaced in real time according to different characters, and the facial expression database is arranged for storing various facial expressions, so that a digital person can form different mouth actions, and the voice track synchronization module is arranged, so that the facial expressions of the digital person and the voice content are synchronously carried out, the fidelity and expressive force of the image of the digital person can be greatly improved, the digital person is more close to the expression and the mouth shape of the real person, and a user can carry out vivid and interactive conversation with the digital person, thereby realizing more natural and real man-machine interaction experience. The system is widely applied to the fields of online education, virtual tour guide, intelligent accompanying and nursing and the like, provides a brand new interaction mode for users, and brings more intelligent and convenient service experience.

Drawings

FIG. 1 is a schematic diagram of the overall structure of a digital human image interaction system based on artificial intelligence technology.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention will be further described with reference to the drawings and the specific embodiments

Examples

As shown in fig. 1, the present invention provides a technical solution: including real-time three-dimensional modeling module, its characterized in that: a facial expression database is arranged on one side of the real-time three-dimensional modeling module, a network connection module is arranged on one side of the facial expression database, a processor is arranged between the three-dimensional modeling module and one side of the network connection module, a dialect database is arranged on one side of the processor, which is far away from the real-time three-dimensional modeling module, a human voice simulation module is arranged on one side, which is far away from the processor, of the real-time three-dimensional modeling module, an image transmission module is arranged on one side, which is far away from the facial expression database, and a control module is arranged between the image transmission module and one side of the processor;

the real-time three-dimensional modeling module can be set into software such as Blender or ZBrush, is mainly used for modeling a model face, and can call different face models for filling;

blender has wide functions including modeling, animation, rendering, physical simulation, video editing and the like, and is suitable for various fields from game development, movie production, building visualization and the like, so that the software can be suitable for the use requirements of the system.

The facial expression database is arranged, and can be used for storing specific parameters of the facial expression according to the corresponding characters, and the facial expression can be called in real time after being stored;

the facial expression database may be set as a local database or a cloud database;

the local database refers to a database system and data stored on a local computer or server, and both the data and database management systems are located on their own physical devices. It is commonly used for storing and managing data in a local environment, with higher data access speed and control rights;

and cloud databases refer to databases that store database systems and data on a cloud platform. The data is connected to the cloud server through a network and managed and maintained by a cloud service provider. The user may access and manage the data through the cloud service provider's interface.

The user can select according to the actual use requirement.

The network connection module is used for collecting facial expressions in a networking way and storing different facial expression parameters in a facial expression database, and the network connection module can directly download corresponding data from the network or the database;

the processor is used for calling parameters in the facial database and controlling the real-time three-dimensional modeling module to use, and can control the three-dimensional modeling to use in combination with various data;

the dialect database is used for storing local languages and can be called, is suitable for users with different mandarin non-standards, can improve the recognition function of the system, and can be provided with the language recognition function of the system;

the voice simulation module is used for matching with the facial expression database, and when the facial expression database calls specific facial data of a certain word, the voice simulation module can call out the corresponding word, so that the system can have a voice playing function;

the control module is used for receiving the command sent by the processor and controlling the command;

the system can generate mouth actions of the digital human image in real time by analyzing the audio information of the voice signal, so that the mouth actions and the voice content are synchronously carried out. The technology can greatly improve the fidelity and expressive force of the digital human image, so that the digital human image is more close to the real human expression and mouth shape.

The device comprises an image transmission module, a real-time three-dimensional modeling module, a microphone, a loudspeaker, an alternating current module and an audio same-track module, wherein the image transmission module is arranged on one side far away from the real-time three-dimensional modeling module, the microphone is arranged on the outer side of the display screen, the loudspeaker is arranged on one side of the display screen, which is positioned on the microphone, the alternating current module is arranged in the display screen, and the audio same-track module is arranged on one side of the display screen, which is positioned in the alternating current module;

the display screen is used as a carrier, the image transmission module can display the modeled model through the display screen, and the display screen can be set as a liquid crystal screen and play;

the microphone is used for receiving voice commands of a user, and can be an embedded microphone or an externally-hung microphone;

the loudspeaker is used for transmitting sound generated by the human voice simulation module, and can transmit the sound;

the communication module is used for converting information collected by the microphone to enable the information to be changed into command information which can be identified by the system, and the communication module is used for detecting information transmitted by the microphone;

the audio co-track module is used for matching the information transmitted by the image transmission module and the voice simulation module, so that the image information of the audio co-track module is matched with the voice information;

through the technology, the voice input of the user can be accurately identified and converted into the text form, so that the understanding and analysis of the language content of the user can be realized.

The network connection module is in bidirectional electrical connection with the facial expression database, the facial expression database is in bidirectional electrical connection with the real-time three-dimensional modeling module, the processor, the voice simulation module and the image transmission module are all in bidirectional electrical connection with the real-time three-dimensional modeling module, the image transmission module is in electrical connection with the display screen, the alternating current module and the audio co-track module are all in bidirectional electrical connection with the control module, the microphone is in bidirectional electrical connection with the alternating current module, the image transmission module is in bidirectional electrical connection with the audio co-track module, and the processor is in bidirectional electrical connection with the facial expression database;

the network connection module can be used in a wired connection mode, a WIFI connection mode and a hot spot connection mode;

the alternating current module is internally provided with a noise reduction function, and the noise reduction code is as follows:

import noisereduce as nr

import soundfile as sf

# read Audio File

audio_data,sample_rate＝sf.read('input.wav')

# extract noise samples

noisy_part＝audio_data[5000:15000]

Noise reduction processing #

# save denoised audio file

sf.write('output.wav',reduced_noise,sample_rate)；

The method comprises the steps of reading an audio file to be denoised, selecting a noise-containing part from the audio file as a noise sample, performing denoising treatment on the whole audio, and finally storing the denoised audio data as a new audio file.

The image transmission module is internally provided with an image enhancement function, and the code is set as follows:

import cv2

def enhance_image(image_path):

# read image

image＝cv2.imread(image_path)

# convert image to gray-scale image

gray＝cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)

Enhancing an image using Laplacian pyramid algorithm

blurred＝cv2.GaussianBlur(gray,(5,5),0)

sharpened＝cv2.addWeighted(gray,1.5,blurred,-0.5,0)

# save enhanced image

output_path＝'enhanced_image.jpg'

cv2.imwrite(output_path,sharpened)

return output_path；

Firstly, an image file is read, an image is converted into a gray image, then, the gray image is enhanced by using a Laplacian pyramid algorithm, gaussian blur processing is carried out on the image, then, the image and the blurred image are added to obtain an enhanced image, and finally, the enhanced image is saved and a path of an output image is returned.

The voice synchronization function is integrated in the same track module of the voice track, and the running code is set as follows:

import cv2

def enhance_image(image_path):

# read image

image＝cv2.imread(image_path)

# convert image to gray-scale image

gray＝cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)

Enhancing an image using Laplacian pyramid algorithm

blurred＝cv2.GaussianBlur(gray,(5,5),0)

sharpened＝cv2.addWeighted(gray,1.5,blurred,-0.5,0)

# save enhanced image

output_path＝'enhanced_image.jpg'

cv2.imwrite(output_path,sharpened)

return output_path。

The audio file is first read. The audio time stamp information is then acquired and aligned with the image using the audio time stamp information or audio processing techniques.

Working principle:

the system can display through analyzing the audio information of the voice signal and generating the three-dimensional module in real time, and can change the facial expression of the model in real time according to different characters, and set up a facial expression database to store various facial expressions, so that a digital person can form different mouth actions, and an audio track synchronization module is arranged, so that the facial expression and the voice content of the digital person can be synchronously carried out, the fidelity and the expressive force of the image of the digital person can be greatly improved, the digital person is more close to the expression and the mouth shape of the real human, and a user can carry out vivid and interactive conversation with the digital person, thereby realizing more natural and real human-computer interaction experience. The system is widely applied to the fields of online education, virtual tour guide, intelligent accompanying and nursing and the like, provides a brand new interaction mode for users, and brings more intelligent and convenient service experience.

Finally, it should be noted that: the embodiments described above are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced with equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. The digital human image interaction system based on the artificial intelligence technology comprises a real-time three-dimensional modeling module, and is characterized in that: a facial expression database is arranged on one side of the real-time three-dimensional modeling module, a network connection module is arranged on one side of the facial expression database, a processor is arranged between the three-dimensional modeling module and one side of the network connection module, a dialect database is arranged on one side of the processor, which is far away from the real-time three-dimensional modeling module, a human voice simulation module is arranged on one side, which is far away from the processor, of the real-time three-dimensional modeling module, an image transmission module is arranged on one side, which is far away from the facial expression database, and a control module is arranged between the image transmission module and one side of the processor;

the voice simulation module is used for matching with the facial expression database, and when the facial expression database calls specific facial data of a certain word, the voice simulation module can call out the corresponding word.

2. The artificial intelligence technology-based digital human image interaction system according to claim 1, wherein: the image transmission module is used for transmitting the model constructed by the three-dimensional modeling module, so that the model is displayed to the outside through a carrier, the image transmission module compresses the acquired image data into smaller data packets, and the encoded image data is transmitted to a target place through a network or other communication channels;

the control module receives external input signals or data as input to the control module and sends control commands or interfaces for adjusting parameters to external devices, and the outputs can be electrical signals, digital signals, control signals and the like for controlling the execution mechanism or other controlled devices.

3. The artificial intelligence technology-based digital human image interaction system according to claim 1, wherein: the device comprises an image transmission module, a real-time three-dimensional modeling module, a microphone, a loudspeaker, an alternating current module and an audio same-track module, wherein the image transmission module is arranged on one side far away from the real-time three-dimensional modeling module, the microphone is arranged on the outer side of the display screen, the loudspeaker is arranged on one side of the display screen, which is positioned on the microphone, the alternating current module is arranged in the display screen, and the audio same-track module is arranged on one side of the display screen, which is positioned in the alternating current module;

the microphone is used for receiving voice commands of a user;

4. The artificial intelligence technology-based digital human image interaction system according to claim 1, wherein: the network connection module is in bidirectional electrical connection with the facial expression database, the facial expression database is in bidirectional electrical connection with the real-time three-dimensional modeling module, the processor, the voice simulation module and the image transmission module are in bidirectional electrical connection with the real-time three-dimensional modeling module, the image transmission module is in electrical connection with the display screen, the communication module and the audio co-track module are in bidirectional electrical connection with the control module, the microphone is in bidirectional electrical connection with the communication module, the image transmission module is in bidirectional electrical connection with the audio co-track module, and the processor is in bidirectional electrical connection with the facial expression database.

5. The artificial intelligence technology-based digital human image interaction system according to claim 1, wherein: the network connection module can be used in a wired connection mode, a WIFI connection mode and a hot spot connection mode.

6. The artificial intelligence technology-based digital human image interaction system according to claim 1, wherein: the alternating current module is internally provided with a noise reduction function, and the noise reduction code is as follows:

import noisereduce as nr

import soundfile as sf

# read Audio File

audio_data,sample_rate＝sf.read('input.wav')

# extract noise samples

noisy_part＝audio_data[5000:15000]

Noise reduction processing #

# save denoised audio file

sf.write('output.wav',reduced_noise,sample_rate)。

7. The artificial intelligence technology-based digital human image interaction system according to claim 1, wherein: the image transmission module is internally provided with an image enhancement function, and the code is set as follows:

import cv2

def enhance_image(image_path):

# read image

image＝cv2.imread(image_path)

# convert image to gray-scale image

gray＝cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)

Enhancing an image using Laplacian pyramid algorithm

blurred＝cv2.GaussianBlur(gray,(5,5),0)

sharpened＝cv2.addWeighted(gray,1.5,blurred,-0.5,0)

# save enhanced image

output_path＝'enhanced_image.jpg'

cv2.imwrite(output_path,sharpened)

return output_path。

8. The artificial intelligence technology-based digital human image interaction system according to claim 1, wherein: the voice synchronization function is integrated in the same track module of the voice track, and the running code is set as follows:

import cv2

def enhance_image(image_path):

# read image

image＝cv2.imread(image_path)

# convert image to gray-scale image

gray＝cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)

Enhancing an image using Laplacian pyramid algorithm

blurred＝cv2.GaussianBlur(gray,(5,5),0)

sharpened＝cv2.addWeighted(gray,1.5,blurred,-0.5,0)

# save enhanced image

output_path＝'enhanced_image.jpg'

cv2.imwrite(output_path,sharpened)

return output_path。