CN113992929A - Virtual digital human interaction method, system, equipment and computer program product - Google Patents

Virtual digital human interaction method, system, equipment and computer program product Download PDF

Info

Publication number
CN113992929A
CN113992929A CN202111259647.2A CN202111259647A CN113992929A CN 113992929 A CN113992929 A CN 113992929A CN 202111259647 A CN202111259647 A CN 202111259647A CN 113992929 A CN113992929 A CN 113992929A
Authority
CN
China
Prior art keywords
digital human
video
target
identifier
answer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111259647.2A
Other languages
Chinese (zh)
Inventor
黄启亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Merchants Bank Co Ltd
Original Assignee
China Merchants Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Merchants Bank Co Ltd filed Critical China Merchants Bank Co Ltd
Priority to CN202111259647.2A priority Critical patent/CN113992929A/en
Publication of CN113992929A publication Critical patent/CN113992929A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/232Content retrieval operation locally within server, e.g. reading video streams from disk arrays
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/239Interfacing the upstream path of the transmission network, e.g. prioritizing client content requests
    • H04N21/2393Interfacing the upstream path of the transmission network, e.g. prioritizing client content requests involving handling client requests
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/437Interfacing the upstream path of the transmission network, e.g. for transmitting client requests to a VOD server
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/858Linking data to content, e.g. by linking an URL to a video object, by creating a hotspot
    • H04N21/8586Linking data to content, e.g. by linking an URL to a video object, by creating a hotspot by using a URL

Abstract

The invention discloses a virtual digital human interaction method, a system, equipment and a product, which can meet the interaction requirements of users by pre-synthesizing a virtual digital human video, pre-storing the incidence relation between a video identifier and an answer identifier, then determining the identifier corresponding to the answer meeting the requirements of the users when receiving an interaction instruction, finally searching a target video identifier associated with the target answer identifier according to the pre-stored incidence relation, and sending the target video identifier to a client, wherein the client can obtain and display the target digital human video according to the target video identifier, thereby avoiding the bandwidth consumption of digital human video production after the user sends the instruction, avoiding the system obtaining real target video data, and supporting the simple operations of batch identifier searching and distribution based on the conventional bandwidth of a common background server even in a high-concurrency scene, thereby realizing the virtual digital human interactive operation with one-to-one large concurrency.

Description

Virtual digital human interaction method, system, equipment and computer program product
Technical Field
The present invention relates to the field of virtual digital human technology, and in particular, to a virtual digital human interaction method, system, device, and computer program product.
Background
With the rapid development of new technologies such as artificial intelligence and virtual reality, virtual digital human technology is also dramatically improved, and the digitalization of appearance gradually deepens to the interaction of behaviors and the intellectualization of ideas. Digital people represented by virtual anchor, virtual staff and the like successfully enter the public visual field and are active in numerous fields such as movie and television, games, media, text travel, finance and the like. The virtual digital human can generate the short video of the explanation of the real person by freely inputting any characters in the background after model training of images, sounds, actions and the like is carried out by recording a section of real person video and based on technologies such as computer vision, voice synthesis and the like.
However, the existing virtual digital human system in the industry is designed based on the idea of live broadcasting, such system is only suitable for one-to-many scenes, and under the one-to-one consultation and service scenes, the existing virtual digital human system consumes a large amount of computing resources and bandwidth, and is difficult to expand to the service type scene.
Disclosure of Invention
The invention mainly aims to provide a virtual digital human interaction method, a virtual digital human interaction system, a virtual digital human interaction device and a computer program product, and aims to solve the technical problem that the existing virtual digital human system cannot support one-to-one large concurrency in the use process.
In order to achieve the above object, the present invention provides a virtual digital human interaction method, which comprises:
receiving a digital human interaction request sent by a client, and determining interaction request information according to the digital human interaction request;
acquiring a target answer identifier corresponding to the interactive request information, and searching a target video identifier associated with the target answer identifier from a preset association relation set;
and sending the target video identification to the client so that the client can obtain the corresponding target digital human video based on the target video identification and carry out interactive display, wherein the target digital human video is a pre-made virtual digital human video.
Optionally, before the step of receiving the request based on the digital human interaction sent by the client, the method further includes:
acquiring a full-scale answer library for interactive response from a designated robot engine, wherein the full-scale answer library comprises a plurality of interactive response messages, and each interactive response message corresponds to an answer identifier;
generating a corresponding digital person video aiming at each interactive response information in the full-scale answer library, and acquiring a video identifier corresponding to each digital person video;
and establishing an incidence relation between the video identification and the answer identification to obtain the incidence relation set.
Optionally, the step of obtaining a video identifier corresponding to each of the digital human videos includes:
uploading the full amount of the digital human videos to a cloud server side;
and acquiring a Uniform Resource Locator (URL) of each digital human video at a cloud server end to serve as the video identifier.
Optionally, the step of obtaining a target answer identifier corresponding to the interactive request information, and finding a target video identifier associated with the target answer identifier from a preset association set includes:
and finding a target answer identifier corresponding to the interactive request information from a preset question-answer pair, and finding a target URL associated with the target answer identifier from the association relation set to be used as the target video identifier.
Optionally, the target video identifier and the plurality of clients exist, and the step of sending the target video identifier to the client so that the client can obtain the corresponding target digital human video based on the target video identifier and perform interaction includes:
and distributing each target video identifier to the corresponding client side, so that the corresponding client side can obtain the corresponding target digital human video based on the corresponding target video identifier, and the target digital human video is played and displayed at the front end.
Optionally, the step of distributing each target video identifier to the corresponding client, so that the corresponding client obtains a corresponding target digital human video based on the corresponding target video identifier, and displays the target digital human video at a front end includes:
and distributing each target digital human video to the corresponding client based on a specified communication protocol so that the corresponding client can acquire the target digital human video corresponding to the target video identification from a cloud server terminal in an accelerated manner based on a content distribution network, and playing and displaying the target digital human video at the front end.
Optionally, the receiving is based on a digital human interaction request sent by a client, and the step of determining interaction request information according to the digital human interaction request includes:
receiving a digital human interaction request which is sent by a client and contains a user voice data stream;
and carrying out voice recognition on the user voice data stream to obtain voice text information as the interactive request information.
In addition, to achieve the above object, the present invention also provides a virtual digital human interaction system, including:
the interactive request determining module is used for receiving a digital human interactive request sent by a client side and determining interactive request information according to the digital human interactive request;
the video identification searching module is used for acquiring a target answer identification corresponding to the interactive request information and searching a target video identification associated with the target answer identification from a preset association relation set;
and the target video display module is used for sending the target video identification to the client so that the client can obtain the corresponding target digital human video based on the target video identification and carry out interactive display, wherein the target digital human video is a pre-manufactured virtual digital human video.
In addition, to achieve the above object, the present invention also provides a virtual digital human interaction device, including: the system comprises a memory, a processor and a virtual digital human interaction program stored on the memory and capable of running on the processor, wherein the virtual digital human interaction program realizes the steps of the virtual digital human interaction method when being executed by the processor.
In addition, to achieve the above object, the present invention also provides a computer readable storage medium having a virtual digital human interaction program stored thereon, which when executed by a processor implements the steps of the virtual digital human interaction method as described above.
Furthermore, to achieve the above object, the present invention also provides a computer program product comprising a computer program which, when being executed by a processor, realizes the steps of the virtual digital human interaction method as described above.
According to the invention, the virtual digital human video capable of meeting the user interaction requirement is synthesized in advance, the association relation between the corresponding identification of the video and the corresponding identification of the answer is stored in advance, then the requirement of the user (namely the interaction request information) is determined through the instruction when the interaction instruction of the user is received, the identification corresponding to the answer capable of meeting the user requirement (namely the target answer identification) is determined, finally the target video identification associated with the target answer identification can be found according to the pre-stored association relation, the target video identification is sent to the client, and the client can acquire and display the actual target digital human video according to the target video identification, so that the effect of interaction between the virtual digital human and the user and response to the user requirement is achieved. The process avoids the situation that the system carries out digital human video production after a user sends an instruction, and meanwhile does not need to obtain real target video data, so that the time for obtaining the virtual digital human video by the client is greatly shortened, the system does not need to occupy extra bandwidth resources, even in a high-concurrency scene, the conventional use bandwidth based on a common background server can also support simple operations of batch identification searching and distribution, thereby realizing one-to-one large-concurrency virtual digital human interaction operation, and solving the technical problem that the existing virtual digital human system can not support one-to-one large concurrency in the use process.
Drawings
FIG. 1 is a schematic diagram of an apparatus architecture of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a first embodiment of a virtual digital human interaction method according to the present invention;
FIG. 3 is a schematic diagram of a flow chart of a virtual digital human interaction method according to a second embodiment of the present invention;
FIG. 4 is a schematic overall flowchart illustrating a third embodiment of a virtual digital human interaction method according to the present invention;
FIG. 5 is a functional block diagram of the virtual digital human interaction system of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
With the rapid development of new technologies such as artificial intelligence and virtual reality, virtual digital human technology is also dramatically improved, and the digitalization of appearance gradually deepens to the interaction of behaviors and the intellectualization of ideas. Digital people represented by virtual anchor, virtual staff and the like successfully enter the public visual field and are active in numerous fields such as movie and television, games, media, text travel, finance and the like. The virtual digital human can generate the short video of the explanation of the real person by freely inputting any characters in the background after model training of images, sounds, actions and the like is carried out by recording a section of real person video and based on technologies such as computer vision, voice synthesis and the like.
However, the existing virtual digital human system in the industry is designed based on the idea of live broadcasting, such system is only suitable for one-to-many scenes, and under the one-to-one consultation and service scenes, the existing virtual digital human system consumes a large amount of computing resources and bandwidth, and is difficult to expand to the service type scene.
In order to solve the above problems, the present invention provides a virtual digital human interaction method, which comprises pre-synthesizing a virtual digital human video capable of satisfying the user interaction requirements, pre-storing the association relationship between the corresponding video identifier and the corresponding answer identifier, then, when an interactive instruction of a user is received, the requirement of the user (namely the interactive request information) is determined through the instruction, then an identifier corresponding to an answer capable of meeting the requirement of the user (namely the target answer identifier) is determined, finally, a target video identifier associated with the target answer identifier can be found according to a pre-stored association relation, and the target video identification is sent to the client, and the client can acquire and display the actual target digital person video according to the target video identification, so that the effects of interaction between the virtual digital person and the user and response to the user requirement are achieved. The process avoids the situation that the system carries out digital human video production after a user sends an instruction, and meanwhile does not need to obtain real target video data, so that the time for obtaining the virtual digital human video by the client is greatly shortened, the system does not need to occupy extra bandwidth resources, even in a high-concurrency scene, the conventional use bandwidth based on a common background server can also support simple operations of batch identification searching and distribution, thereby realizing one-to-one large-concurrency virtual digital human interaction operation, and solving the technical problem that the existing virtual digital human system can not support one-to-one large concurrency in the use process.
As shown in fig. 1, fig. 1 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the virtual digital human interaction system may include: a processor 1001, such as a CPU, a user interface 1003, a network interface 1004, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the configuration of the apparatus shown in fig. 1 is not intended to be limiting of the apparatus and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a virtual digital human interaction program.
In the device shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (programmer's end) and performing data communication with the client; and the processor 1001 may be used to call the virtual digital human interaction program stored in the memory 1005 and perform the operations in the virtual digital human interaction method described below.
Based on the hardware structure, the embodiment of the virtual digital human interaction method is provided.
Referring to fig. 2, fig. 2 is a flowchart illustrating a virtual digital human interaction method according to a first embodiment of the present invention. The virtual digital human interaction method comprises the following steps:
step S10, receiving a digital human interaction request sent by a client, and determining interaction request information according to the digital human interaction request;
in this embodiment, the present invention is applied to a backend server. The client terminal refers to a terminal based on which the user sends a digital human interaction request to the background server. The digital human interaction request refers to an instruction for interacting with a virtual digital human, which is generally an instruction for asking or seeking help from a user to a server under a consultation or service-like scene. The background server can process a plurality of digital human interaction requests at the same time. The interactive request information refers to a specific inquiry or instruction content sent by a user, and is usually represented in a text form.
Specifically, in a service consultation scene, a user logs in a digital human interaction platform on a client, inputs inquiry information in a voice or text form, the client uses the inquiry information as a digital human interaction request and sends the digital human interaction request to a background server, and the background server analyzes the instruction to acquire the content to be actually consulted by the user when receiving the digital human interaction request, namely the interaction request information. For inquiry information in a character form, the background server can directly use the inquiry information as interactive request information, can also perform text cleaning and other processing on the inquiry information, and uses the processed character information as interactive request information; for the inquiry information in the form of voice, the background server may perform voice-to-text conversion on the inquiry information, and use the converted text information as the interactive request information.
Step S20, obtaining a target answer identifier corresponding to the interactive request information, and finding a target video identifier associated with the target answer identifier from a preset association relation set;
in this embodiment, the target answer identifier refers to an identifier corresponding to answer information for responding to the interactive request information, which can be acquired by the backend server. The target video identification refers to the video identification associated with the target answer identification. The incidence relation set refers to a set of incidence relations between each answer identification and the corresponding video identification. It should be noted that the correspondence between the answer identifier and the video identifier may be one-to-one, one-to-many, or many-to-many.
Specifically, under the condition that a background server processes a plurality of pieces of interactive request information simultaneously, the background server searches a plurality of answer sentences capable of responding to the current interactive request information, and further obtains an identifier corresponding to each answer sentence as a target answer identifier; or directly finding out the corresponding marks of the multiple answer sentences as target answer marks according to the interactive request information. Since the association relation set of the association relation between the answer identifier and the video identifier is pre-stored in the background server, the background server can search the corresponding target video identifiers from the set according to the current target answer identifiers.
Step S30, sending the target video identification to the client side so that the client side can obtain the corresponding target digital human video based on the target video identification and carry out interactive display, wherein the target digital human video is a pre-made virtual digital human video.
In this embodiment, the target digital person video refers to a virtual digital person video corresponding to the target video identifier, in the virtual digital person video, answer content for responding to the interactive request information can be expressed through a virtual digital person, and the virtual digital person video is already manufactured before the user sends an instruction. The correspondence between video identification and virtual digital human video is typically one-to-one.
Specifically, after the background server obtains the target video identifier, the target video identifier is fed back to the client, that is, after the client sends the digital human interaction request to the background server, the obtained feedback information is not the video data stream of the actual virtual digital human video but the target video identifier. After the client acquires the target video identification fed back by the background server, the client acquires the actual target digital person video according to the target video identification, and then the video can be played at the front end so as to display the virtual digital person to the user, thereby achieving the interactive effect.
The embodiment provides a virtual digital human interaction method, which includes synthesizing a virtual digital human video capable of meeting user interaction requirements in advance, storing association relations between corresponding identifications of the video and corresponding identifications of answers in advance, determining the requirements of a user (namely, the interaction request information) through the association relations when an interaction instruction of the user is received, determining the identifications corresponding to the answers capable of meeting the user requirements (namely, the target answer identifications), finding out target video identifications associated with the target answer identifications according to the pre-stored association relations, sending the target video identifications to a client, and enabling the client to obtain and display actual target digital human videos according to the target video identifications, so that the effect of interacting between a virtual digital human and the user and responding to the user requirements is achieved. The process avoids the situation that the system carries out digital human video production after a user sends an instruction, and meanwhile does not need to obtain real target video data, so that the time for obtaining the virtual digital human video by the client is greatly shortened, the system does not need to occupy extra bandwidth resources, even in a high-concurrency scene, the conventional use bandwidth based on a common background server can also support simple operations of batch identification searching and distribution, thereby realizing one-to-one large-concurrency virtual digital human interaction operation, and solving the technical problem that the existing virtual digital human system can not support one-to-one large concurrency in the use process.
Further, based on the first embodiment shown in fig. 2, a second embodiment of the virtual digital human interaction method of the present invention is proposed. In this embodiment, before step S10, the method further includes:
step S01, acquiring a full-scale answer library for interactive response from a designated robot engine, wherein the full-scale answer library comprises a plurality of interactive response information, and each interactive response information corresponds to an answer identifier;
step S02, generating corresponding digital human videos aiming at each interactive response information in the full-quantity answer library, and acquiring a video identifier corresponding to each digital human video;
step S03, establishing an association relationship between the video identifier and the answer identifier to obtain the association relationship set.
In the present embodiment, the specified robot engine refers to a question-and-answer engine for intelligently asking and answering with the user. The full quantity answer library refers to all answer information under the application which can be acquired from the robot engine. The answer identifier, namely the answer ID, refers to the answer identifier corresponding to each piece of answer information in the full-quantity answer library.
Specifically, the background server prepares the virtual digital human video in advance before step S10. The server firstly obtains a full-quantity answer library under application from a specified robot engine, wherein the full-quantity answer library at least covers answer information corresponding to common questions which may be consulted or inquired by a user. The server respectively makes corresponding virtual digital human videos according to the answer information, and the specific making mode can refer to a conventional mode for making the virtual digital human videos. The video identification can be a character identification distributed by the server for the virtual digital human video, and can also be identification information obtained from the cloud server after the server uploads the video to the cloud server. And the server associates the video identification corresponding to each virtual digital person video with the answer ID, and puts the obtained association relations into the same set to obtain the association relation set.
Further, the step of obtaining the video identifier corresponding to each of the digital human videos includes:
a1, uploading the full amount of digital human videos to a cloud server;
step A2, acquiring a Uniform Resource Locator (URL) of each digital human video at a cloud server end to serve as the video identifier.
In the embodiment, in order to accelerate the process of acquiring the digital human video by the client and reduce the bandwidth pressure of the server, the background server makes the corresponding virtual digital human video in advance, and then uploads the video to the cloud for storage. The cloud server side stores the virtual digital man videos and returns the virtual digital man videos to the background server, wherein the background server obtains the URL of each virtual digital man video and uses the URL as a video identifier of the video. That is, only the video identification of the virtual digital human video is stored in the background server, not the video data itself.
As a specific embodiment, the preamble processing of the background server is shown in FIG. 3.
Firstly, a background server acquires a full quantity answer library under application from a robot engine;
secondly, the background server respectively makes corresponding virtual digital human videos aiming at the answer information in the answer library;
thirdly, the background server pushes the manufactured virtual digital human videos to a cloud server side one by one to obtain corresponding URLs;
fourthly, the background server stores the one-to-one corresponding relation between the answer ID and the URL in a local database.
Further, step S20 includes:
step S21, finding a target answer identifier corresponding to the interactive request information from a preset question-answer pair, and finding a target URL associated with the target answer identifier from the association set, as the target video identifier.
In this embodiment, the target answer identifier is also a target answer ID, which refers to an ID corresponding to answer information that can be used for responding to the interactive request information. The preset question-answer pair may be a pair between a question sentence and an answer sentence, or a pair between a question sentence and an answer ID.
Specifically, the background server uses the interactive request information as a question sentence, and retrieves an answer ID corresponding to the interactive request information from a preset question-answer pair as a target answer ID. And then the background service searches the corresponding target URL as a target video identifier from the association relation set between the answer ID and the video URL by taking the target answer ID as an index.
The embodiment further aims at the characteristics of service scenes, and is designed by being compatible with the capability of the existing virtual digital people, including pre-synthesis and storage of digital people videos, and searching the digital people videos through text answers, and mainly solves the problem that the digital people cannot support one-to-one large concurrency in the using process. The method of the patent can easily support thousands of concurrencies and can be transversely expanded.
Further, based on the second embodiment, a third embodiment of the virtual digital human interaction method of the present invention is provided. In this embodiment, there are a plurality of target video identifiers and clients, and step S30 includes:
step S31, distributing each target video identifier to the corresponding client, so that the corresponding client can obtain a corresponding target digital human video based on the corresponding target video identifier, and display the target digital human video at the front end.
In this embodiment, when processing a large amount of concurrent one-to-one interactive requests, the background server sends each target video identifier to its corresponding client after finding the target video identifiers corresponding to different requests, so that the client obtains the corresponding target digital person video according to the given target video identifier. After acquiring the respective target digital human videos, each client can play and display the target digital human videos at the front end, so that interaction between the user and the virtual digital human is realized.
Further, step S31 includes:
step S311, distributing each target digital human video to the corresponding client based on a specified communication protocol, so that the corresponding client obtains the target digital human video corresponding to the target video identifier from a cloud server based on a content distribution network in an accelerated manner, and displays the target digital human video at a front end.
In this embodiment, the background server specifically sends each target video identifier to a corresponding client through a websocket, which is a designated communication protocol, and after receiving the target video identifier returned by the background server, the client specifically obtains a corresponding video stream from a Content Delivery Network (CDN) of a cloud server ECS that stores virtual digital human videos uploaded in advance by the background server, and then the client may output the obtained video stream data for a user to watch.
Further, step S10 includes:
step S11, receiving the digital human interactive request containing the user voice data stream sent by the client;
step S12, performing voice recognition on the user voice data stream to obtain voice text information as the interactive request information.
In this embodiment, a user inputs a piece of information in a voice form on a client, the client generates a digital human interaction request based on the voice information, and a background server recognizes the voice information by using an Automatic Speech Recognition technology (ASR) after receiving the request sent by the client, so as to obtain voice text information capable of reflecting the actual content of the voice information.
As another specific implementation mode, the client performs voice recognition on the user voice data stream, and then generates a digital human interaction request based on the recognized voice text information and sends the request to the background server.
As an embodiment, the overall process flow is shown in FIG. 4.
Firstly, a client collects a user voice input stream;
secondly, the client inputs the user voice stream into the ASR so as to recognize the voice text information;
thirdly, the client sends the voice text information to a background server, and the background server retrieves a target answer ID corresponding to the voice text information;
fourthly, the background server acquires the URL of the pre-generated digital human video corresponding to the ID through the target answer ID;
fifthly, the background server sends the URL to the client through the websocket;
and sixthly, the client acquires the corresponding video stream from the ECS CDN for displaying.
The embodiment can easily support 1000 paths of concurrency under the condition of a single node; only the bandwidth which is the same as that of the common background service is needed, and a large amount of server bandwidth resources are saved; meanwhile, the time for the client to obtain the virtual digital human video can be improved, and the user experience is greatly improved.
As shown in fig. 5, the present invention also provides a virtual digital human interaction system, which includes:
the interactive request determining module 10 is configured to receive a digital human interactive request sent by a client, and determine interactive request information according to the digital human interactive request;
a video identifier searching module 20, configured to obtain a target answer identifier corresponding to the interaction request information, and search a target video identifier associated with the target answer identifier from a preset association relationship set;
and the target video display module 30 is configured to send the target video identifier to the client, so that the client obtains a corresponding target digital human video based on the target video identifier and performs interactive display, where the target digital human video is a pre-made virtual digital human video.
Optionally, the virtual digital human interaction system further comprises:
the system comprises a full-quantity answer acquisition module, a full-quantity answer database and a full-quantity answer processing module, wherein the full-quantity answer database is used for acquiring a full-quantity answer database for interactive response from a specified robot engine, the full-quantity answer database comprises a plurality of interactive response information, and each interactive response information corresponds to an answer identifier;
the video identification acquisition module is used for generating a corresponding digital human video aiming at each interactive response information in the full answer library and acquiring a video identification corresponding to each digital human video;
and the incidence relation establishing module is used for establishing the incidence relation between the video identifier and the answer identifier so as to obtain the incidence relation set.
Optionally, the video identifier obtaining module includes:
the full-volume video uploading unit is used for uploading full-volume digital human videos to a cloud server side;
and the resource positioning acquisition unit is used for acquiring a Uniform Resource Locator (URL) of each digital human video at the cloud server end to serve as the video identifier.
Optionally, the video identifier lookup module 20 includes:
and the video identifier searching unit is used for searching a target answer identifier corresponding to the interactive request information from a preset question-answer pair, and searching a target URL (uniform resource locator) associated with the target answer identifier from the association relation set to be used as the target video identifier.
Optionally, there are a plurality of target video identifications and clients, and the target video presentation module 30 includes:
and the target video display unit is used for distributing each target video identifier to the corresponding client side so as to enable the corresponding client side to obtain the corresponding target digital human video based on the corresponding target video identifier and display the target digital human video at the front end.
Optionally, the target video presentation unit is further configured to:
and distributing each target digital human video to the corresponding client based on a specified communication protocol so that the corresponding client can acquire the target digital human video corresponding to the target video identification from a cloud server terminal in an accelerated manner based on a content distribution network, and playing and displaying the target digital human video at the front end.
Optionally, the interaction request determining module 10 includes:
the interactive request receiving unit is used for receiving a digital human interactive request which is sent by a client and contains a user voice data stream;
and the user voice recognition unit is used for carrying out voice recognition on the user voice data stream to obtain voice text information as the interactive request information.
The invention also provides virtual digital human interaction equipment.
The virtual digital human interaction device comprises a processor, a memory and a virtual digital human interaction program stored on the memory and capable of running on the processor, wherein the steps of the virtual digital human interaction method are realized when the virtual digital human interaction program is executed by the processor.
The method implemented when the virtual digital human interaction program is executed may refer to each embodiment of the virtual digital human interaction method of the present invention, and details are not repeated herein.
The invention also provides a computer readable storage medium.
The computer readable storage medium of the present invention has stored thereon a virtual digital human interaction program, which when executed by a processor implements the steps of the virtual digital human interaction method as described above.
The method implemented when the virtual digital human interaction program is executed may refer to each embodiment of the virtual digital human interaction method of the present invention, and details are not repeated herein.
The invention also provides a computer program product comprising a computer program which, when executed by a processor, carries out the steps of the virtual digital human interaction method as described above.
The method implemented when the computer program is executed can refer to each embodiment of the virtual digital human interaction method of the present invention, and details are not repeated here.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A virtual digital human interaction method, characterized in that the virtual digital human interaction method comprises:
receiving a digital human interaction request sent by a client, and determining interaction request information according to the digital human interaction request;
acquiring a target answer identifier corresponding to the interactive request information, and searching a target video identifier associated with the target answer identifier from a preset association relation set;
and sending the target video identification to the client so that the client can obtain the corresponding target digital human video based on the target video identification and carry out interactive display, wherein the target digital human video is a pre-made virtual digital human video.
2. The virtual digital human interaction method of claim 1, wherein the step of receiving a digital human interaction request sent by a client is preceded by the step of:
acquiring a full-scale answer library for interactive response from a designated robot engine, wherein the full-scale answer library comprises a plurality of interactive response messages, and each interactive response message corresponds to an answer identifier;
generating a corresponding digital person video aiming at each interactive response information in the full-scale answer library, and acquiring a video identifier corresponding to each digital person video;
and establishing an incidence relation between the video identification and the answer identification to obtain the incidence relation set.
3. The virtual digital human interaction method of claim 2, wherein the step of obtaining a video identifier corresponding to each of the digital human videos comprises:
uploading the full amount of the digital human videos to a cloud server side;
and acquiring a Uniform Resource Locator (URL) of each digital human video at a cloud server end to serve as the video identifier.
4. The virtual digital human interaction method according to claim 3, wherein the step of obtaining the target answer identifier corresponding to the interaction request information and finding the target video identifier associated with the target answer identifier from a preset association set comprises:
and finding a target answer identifier corresponding to the interactive request information from a preset question-answer pair, and finding a target URL associated with the target answer identifier from the association relation set to be used as the target video identifier.
5. The virtual digital human interaction method according to claim 1, wherein a plurality of target video identifiers and clients exist, and the step of sending the target video identifiers to the clients for the clients to obtain corresponding target digital human videos based on the target video identifiers and perform interaction comprises:
and distributing each target video identifier to the corresponding client side, so that the corresponding client side can obtain the corresponding target digital human video based on the corresponding target video identifier, and the target digital human video is played and displayed at the front end.
6. The virtual digital human interaction method according to claim 5, wherein the step of distributing each target video identifier to the corresponding client, so that the corresponding client can obtain the corresponding target digital human video based on the corresponding target video identifier, and display the target digital human video on the front end includes:
and distributing each target digital human video to the corresponding client based on a specified communication protocol so that the corresponding client can acquire the target digital human video corresponding to the target video identification from a cloud server terminal in an accelerated manner based on a content distribution network, and playing and displaying the target digital human video at the front end.
7. The virtual digital human interaction method according to any one of claims 1 to 6, wherein the receiving is based on a digital human interaction request sent by a client, and the step of determining interaction request information according to the digital human interaction request comprises:
receiving a digital human interaction request which is sent by a client and contains a user voice data stream;
and carrying out voice recognition on the user voice data stream to obtain voice text information as the interactive request information.
8. A virtual digital human interaction system, characterized in that the virtual digital human interaction system comprises:
the interactive request determining module is used for receiving a digital human interactive request sent by a client side and determining interactive request information according to the digital human interactive request;
the video identification searching module is used for acquiring a target answer identification corresponding to the interactive request information and searching a target video identification associated with the target answer identification from a preset association relation set;
and the target video display module is used for sending the target video identification to the client so that the client can obtain the corresponding target digital human video based on the target video identification and carry out interactive display, wherein the target digital human video is a pre-manufactured virtual digital human video.
9. A virtual digital human interaction device, characterized in that it comprises: a memory, a processor and a virtual digital human interaction program stored on the memory and executable on the processor, the virtual digital human interaction program when executed by the processor implementing the steps of the virtual digital human interaction method of any one of claims 1 to 7.
10. A computer program product, characterized in that the computer program product comprises a computer program which, when being executed by a processor, carries out the steps of the virtual digital human interaction method according to any one of claims 1 to 7.
CN202111259647.2A 2021-10-26 2021-10-26 Virtual digital human interaction method, system, equipment and computer program product Pending CN113992929A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111259647.2A CN113992929A (en) 2021-10-26 2021-10-26 Virtual digital human interaction method, system, equipment and computer program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111259647.2A CN113992929A (en) 2021-10-26 2021-10-26 Virtual digital human interaction method, system, equipment and computer program product

Publications (1)

Publication Number Publication Date
CN113992929A true CN113992929A (en) 2022-01-28

Family

ID=79743006

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111259647.2A Pending CN113992929A (en) 2021-10-26 2021-10-26 Virtual digital human interaction method, system, equipment and computer program product

Country Status (1)

Country Link
CN (1) CN113992929A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114650265A (en) * 2022-02-16 2022-06-21 浙江毫微米科技有限公司 Information processing method, information processing device, electronic equipment and storage medium
CN116248812A (en) * 2023-05-11 2023-06-09 广州佰锐网络科技有限公司 Business handling method, storage medium and system based on digital human interaction video
CN116430991A (en) * 2023-03-06 2023-07-14 北京黑油数字展览股份有限公司 Exhibition hall digital person explanation method and system based on mixed reality and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130288799A1 (en) * 2012-04-26 2013-10-31 Brett Harris Systems and methods that enable a spectator's experience for online active games
CN111741368A (en) * 2020-02-19 2020-10-02 北京沃东天骏信息技术有限公司 Interactive video display and generation method, device, equipment and storage medium
CN113392201A (en) * 2021-06-18 2021-09-14 中国工商银行股份有限公司 Information interaction method, information interaction device, electronic equipment, medium and program product
CN113505268A (en) * 2021-07-07 2021-10-15 中国工商银行股份有限公司 Interactive processing method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130288799A1 (en) * 2012-04-26 2013-10-31 Brett Harris Systems and methods that enable a spectator's experience for online active games
CN111741368A (en) * 2020-02-19 2020-10-02 北京沃东天骏信息技术有限公司 Interactive video display and generation method, device, equipment and storage medium
CN113392201A (en) * 2021-06-18 2021-09-14 中国工商银行股份有限公司 Information interaction method, information interaction device, electronic equipment, medium and program product
CN113505268A (en) * 2021-07-07 2021-10-15 中国工商银行股份有限公司 Interactive processing method and device

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114650265A (en) * 2022-02-16 2022-06-21 浙江毫微米科技有限公司 Information processing method, information processing device, electronic equipment and storage medium
CN114650265B (en) * 2022-02-16 2024-02-09 浙江毫微米科技有限公司 Information processing method, information processing device, electronic equipment and storage medium
CN116430991A (en) * 2023-03-06 2023-07-14 北京黑油数字展览股份有限公司 Exhibition hall digital person explanation method and system based on mixed reality and electronic equipment
CN116248812A (en) * 2023-05-11 2023-06-09 广州佰锐网络科技有限公司 Business handling method, storage medium and system based on digital human interaction video
CN116248812B (en) * 2023-05-11 2023-08-08 广州佰锐网络科技有限公司 Business handling method, storage medium and system based on digital human interaction video

Similar Documents

Publication Publication Date Title
CN110570698B (en) Online teaching control method and device, storage medium and terminal
CN113992929A (en) Virtual digital human interaction method, system, equipment and computer program product
JP3172870U (en) System for providing and managing interactive services
CN111010586A (en) Live broadcast method, device, equipment and storage medium based on artificial intelligence
CN111711831B (en) Data processing method and device based on interactive behavior and storage medium
CN110488973B (en) Virtual interactive message leaving system and method
CN111755009A (en) Voice service method, system, electronic device and storage medium
CN114466216A (en) Live broadcast room display method, server and live broadcast client
CN114327205A (en) Picture display method, storage medium and electronic device
CN105187295B (en) A kind of method and client, server and system for realizing that bubble is shown in client
CN113850898A (en) Scene rendering method and device, storage medium and electronic equipment
CN110309470A (en) A kind of virtual news main broadcaster system and its implementation based on air imaging
CN116737883A (en) Man-machine interaction method, device, equipment and storage medium
CN114449301B (en) Item sending method, item sending device, electronic equipment and computer-readable storage medium
CN112565913B (en) Video call method and device and electronic equipment
CN112291497B (en) Intelligent video customer service access method and device
CN113947166A (en) Questionnaire statistics real-time processing method, system, electronic equipment and storage medium
CN113742473A (en) Digital virtual human interaction system and calculation transmission optimization method thereof
CN108881978B (en) Resource playing method and device for intelligent equipment
CN112820265A (en) Speech synthesis model training method and related device
CN110300324B (en) Associated information pushing method, system and storage medium
CN113630508B (en) Video color ring management method, device, equipment and medium
CN112837678B (en) Private cloud recognition training method and device
CN114501050B (en) Method and device for outputting information
CN117876170A (en) Online training method and device based on multi-mode large model, storage medium and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination