CN113050791A

CN113050791A - Interaction method, interaction device, electronic equipment and storage medium

Info

Publication number: CN113050791A
Application number: CN202110279584.0A
Authority: CN
Inventors: 刘致远; 杨国基; 王鑫宇; 陈泷翔; 刘云峰
Original assignee: Shenzhen Zhuiyi Technology Co Ltd
Current assignee: Shenzhen Zhuiyi Technology Co Ltd
Priority date: 2021-03-16
Filing date: 2021-03-16
Publication date: 2021-06-29

Abstract

The application discloses an interaction method, an interaction device, electronic equipment and a storage medium, wherein the interaction method comprises the steps of obtaining user information sent by an intelligent terminal; determining an interaction type interacted with a user according to the user information, wherein the interaction type comprises a conversation type; if the interaction type is a conversation type, generating driving information according to the user information, wherein the driving information is used for driving the digital person; driving the digital person based on the driving information, acquiring a digital person image including the digital person, and taking the digital person image as a feedback result, wherein the form of the digital person in the digital person image corresponds to the driving information; and sending the feedback result to the intelligent terminal. By implementing the interaction method, the digital person can be displayed on the intelligent terminal, and when the interaction method is applied to customer service, the consultation requirement of the user can be quickly responded if the interaction method is in a consultation peak period.

Description

Interaction method, interaction device, electronic equipment and storage medium

Technical Field

The present application relates to the field of human-computer interaction technologies, and in particular, to an interaction method, an interaction apparatus, an electronic device, and a storage medium.

Background

In recent years, with the continuous development and application of Network information technology, the traditional PSTN (Public Switched Telephone Network) communication mode gradually cannot meet the requirements of users, and further the PSTN communication mode is gradually replaced by high-definition communication.

Disclosure of Invention

In view of the above problems, the present application provides an interaction method, an interaction apparatus, an electronic device, and a storage medium to solve the above problems.

In a first aspect, an embodiment of the present application provides an interaction method, where the interaction method includes: acquiring user information sent by an intelligent terminal; determining an interaction type interacted with a user according to the user information, wherein the interaction type comprises a conversation type; if the interaction type is a conversation type, generating driving information according to the user information, wherein the driving information is used for driving the digital person; driving the digital person based on the driving information, acquiring a digital person image including the digital person, and taking the digital person image as a feedback result, wherein the form of the digital person in the digital person image corresponds to the driving information; and sending the feedback result to the intelligent terminal.

Optionally, before sending the feedback result to the intelligent terminal, the interaction type includes a manual service type; the interaction method further comprises the following steps: if the interaction type is a manual service type, acquiring identification information of the intelligent terminal; acquiring target customer service according to the identification information; and acquiring the customer service information of the target customer service, and taking the customer service information as a feedback result.

Optionally, before sending the feedback result to the intelligent terminal, the interaction method further includes: acquiring prompt content for presenting at the intelligent terminal according to the user information; and generating an electronic card comprising the prompt content, and taking the electronic card as a feedback result.

Optionally, the number of digital human images is multiple; sending the feedback result to the intelligent terminal includes: acquiring a card type of the electronic card, wherein the card type comprises a dynamic type; if the card type of the electronic card is a dynamic type, acquiring the time sequence of a plurality of digital human images; acquiring card pictures of the electronic card at different times according to the time sequence of the plurality of digital person images, and enabling the plurality of card pictures and the plurality of digital person images to correspond one to one according to the time sequence; and sending the card picture and the plurality of digital human images to the intelligent terminal.

Optionally, the interaction type comprises an information presentation type; before sending the feedback result to the intelligent terminal, the interaction method further comprises the following steps: and if the interaction type is an information presentation type, determining service information for presentation at the intelligent terminal according to the user information, and taking the service information as a feedback result.

Optionally, before obtaining the user information sent by the intelligent terminal, the interaction method includes: receiving a call request sent by the intelligent terminal; and if the call request is determined to be allowed, establishing a call connection with the intelligent terminal.

Optionally, before obtaining the user information sent by the intelligent terminal, the interaction method includes: acquiring an identity of an intelligent terminal; sending a call request to the intelligent terminal based on the identity; and if the intelligent terminal is determined to be connected with the call request, controlling the server side to establish a data channel with the intelligent terminal.

Optionally, the interaction method further includes: acquiring a customer service type of a data channel established between a server and an intelligent terminal; acquiring an original digital human image and original voice information according to the type of the customer service; and sending the original digital human image and the original voice information to the intelligent terminal through a data channel.

Optionally, the driving information comprises text output information; generating the driving information according to the user information includes: determining user intention according to the user information; and acquiring text output information for feedback to the user based on the user intention, and using the text output information as driving information.

Optionally, the driving information comprises feedback voice information; generating the driving information according to the user information includes: determining user intention according to the user information; acquiring text output information for feedback to a user based on the user intention; and generating feedback voice information according to the text output information, and taking the feedback voice information as driving information.

Optionally, before sending the feedback result to the intelligent terminal, the interaction method further includes: and taking the feedback voice information as a feedback result.

Optionally, the feedback result includes a plurality of consecutive frame digital human images and feedback voice information, and the plurality of consecutive frame digital human images correspond to the feedback voice information according to a time sequence.

Optionally, sending the feedback result to the intelligent terminal includes: obtaining privacy information in a feedback result; updating the feedback result according to the privacy information and a preset information protection mode, so that the privacy information in the updated feedback result is hidden; and outputting the updated feedback result.

Optionally, the user information comprises user voice information; determining the user intent from the user information includes: acquiring user voice information in user information; recognizing the voice information of the user to obtain text input information; and performing semantic recognition on the text input information to determine the user intention.

In a second aspect, an embodiment of the present application provides an interaction apparatus, where the interaction apparatus includes a user information obtaining module, an interaction type determining module, a digital human driver module, a feedback result obtaining module, and a feedback result sending module. The user information acquisition module is used for acquiring the user information sent by the intelligent terminal. The interaction type determining module is used for determining the interaction type interacted with the user according to the user information, and the interaction type comprises a conversation type. And the digital person driving module is used for generating driving information according to the user information if the interaction type is the conversation type, and the driving information is used for driving the digital person. The feedback result acquisition module is used for driving the digital person based on the driving information, acquiring a digital person image comprising the digital person, and taking the digital person image as a feedback result, wherein the form of the digital person in the digital person image corresponds to the driving information. And the feedback result sending module is used for sending the feedback result to the intelligent terminal.

Optionally, the interaction type comprises a manual service type; the interaction device also comprises an identification information acquisition module, a target customer service acquisition module and a customer service information acquisition module. The identification information acquisition module is used for acquiring the identification information of the intelligent terminal if the interaction type is the manual service type. And the target customer service acquisition module is used for acquiring the target customer service according to the identification information. The customer service information acquisition module is used for acquiring the customer service information of the target customer service and taking the customer service information as a feedback result.

Optionally, the interactive device further includes a prompt content obtaining module and an electronic card generating module. The prompt content acquisition module is used for acquiring prompt contents for being presented at the intelligent terminal according to the user information. The electronic card generating module is used for generating an electronic card comprising prompt contents and using the electronic card as a feedback result.

Optionally, the number of digital human images is multiple; the feedback result sending module comprises a card type obtaining unit, a time sequence obtaining unit, a card picture obtaining unit and a card picture sending unit. The card type obtaining unit is used for obtaining the card type of the electronic card, and the card type comprises a dynamic type. The time sequence acquisition unit is used for acquiring the time sequences of a plurality of digital human images if the card type of the electronic card is a dynamic type. The card picture acquisition unit is used for acquiring card pictures of the electronic card at different times according to the time sequence of the digital people images, so that the card pictures correspond to the digital people images one by one according to the time sequence. The card picture sending unit is used for sending the card picture and the digital human images to the intelligent terminal.

Optionally, the interaction type comprises an information presentation type; the interaction device also comprises a service information acquisition module. The service information acquisition module is used for determining service information for presenting at the intelligent terminal according to the user information if the interaction type is an information presentation type, and taking the service information as a feedback result.

Optionally, the interaction device includes a call request receiving module and a call connection establishing module. The call request receiving module is used for receiving a call request sent by the intelligent terminal. And the call connection establishing module is used for establishing call connection with the intelligent terminal if the call request is determined to be allowed.

Optionally, the interaction device further includes an identity obtaining module, a call request sending module, and a data channel establishing module. The identity acquisition module is used for acquiring the identity of the intelligent terminal. And the call request sending module is used for sending a call request to the intelligent terminal based on the identity. And the data channel establishing module is used for controlling the server side to establish a data channel with the intelligent terminal if the intelligent terminal is determined to be connected with the call request.

Optionally, the interactive device further includes a customer service type module, an original information obtaining module, and an original information sending module. The customer service type module is used for acquiring the customer service type of a data channel established between the server and the intelligent terminal. The original information acquisition module is used for acquiring original digital human images and original voice information according to the customer service type. And the original information sending module is used for sending the original digital human image and the original voice information to the intelligent terminal through the data channel.

Optionally, the driving information comprises text output information; the digital human driver module includes a user intention determining unit and a text output information acquiring unit. The user intention determining unit is used for determining the user intention according to the user information. The text output information acquisition unit is used for acquiring text output information used for feeding back to a user based on the user intention and taking the text output information as driving information.

Optionally, the driving information comprises feedback voice information; the interactive device further comprises a user intention determining unit, a text output information acquiring unit and a feedback voice information generating unit. The user intention determining unit is used for determining the user intention according to the user information. The text output information acquisition unit is used for acquiring text output information used for feeding back to a user based on the user intention. The feedback voice information generating unit is used for generating feedback voice information according to the text output information and taking the feedback voice information as driving information.

Optionally, the interaction device further comprises a feedback module. The feedback module is used for taking the feedback voice information as a feedback result.

Optionally, the feedback result sending module includes a privacy information obtaining unit, a privacy information hiding unit, and a feedback result output unit. The privacy information acquisition unit is used for acquiring the privacy information in the feedback result. The privacy information hiding unit is used for updating the feedback result according to the privacy information and a preset information protection mode, so that the privacy information in the updated feedback result is hidden. And the feedback result output unit is used for outputting the updated feedback result.

Optionally, the user information comprises user voice information; the user intention determining unit comprises a user voice information acquiring subunit, a text input information acquiring subunit and a semantic identification subunit. The user voice information acquisition subunit is used for acquiring the user voice information in the user information. The text input information acquisition subunit is used for identifying the user voice information to obtain text input information. And the semantic recognition subunit is used for performing semantic recognition on the text input information and determining the user intention.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a memory; one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the steps of the interaction method provided by the second aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, where a computer program is stored, and the computer program is executed by a processor to perform the steps of the interaction method provided in the second aspect.

Compared with the prior art, the scheme provided by the application can display the digital person at the intelligent terminal, when the interaction method is applied to customer service, if the interaction method is in a consultation peak period, the consultation requirement of a user can be responded quickly, meanwhile, the interaction type can be determined based on the user information, the driving information is generated through the user information when the interaction type is the conversation type, the digital person is driven through the driving information, the digital person presented at the intelligent terminal can represent the information fed back to the user, a scene of face-to-face communication between the user and the simulation digital person is simulated, the user feels more comfortable, and the user experience is improved.

These and other aspects of the present application will be more readily apparent from the following description of the embodiments.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram illustrating an application environment of an interaction method according to an embodiment of the present application.

Fig. 2 shows a flowchart of an interaction method provided in an embodiment of the present application.

Fig. 3 is a flow chart illustrating a process of using the customer service information as a feedback result in the method shown in fig. 2.

Fig. 4 shows a flow chart of the method shown in fig. 2 for generating the feedback result based on the electronic card.

Fig. 5 is a flow chart illustrating a process of generating a card screen based on a card type in the method shown in fig. 4.

FIG. 6 is a flow chart illustrating the process of establishing a data channel in the method shown in FIG. 2.

Fig. 7 is another flow chart illustrating the establishment of a data channel in the method shown in fig. 2.

Fig. 8 is a flow chart illustrating a process of transmitting information based on a data channel in the method shown in fig. 2.

FIG. 9 is a flow chart illustrating a process of obtaining text output information based on user information in the method shown in FIG. 2.

Fig. 10 is a flow chart illustrating a process of generating feedback voice information based on user information in the method shown in fig. 2.

FIG. 11 is a flow chart illustrating the determination of user intent based on user speech information in the method shown in FIG. 10.

FIG. 12 is a flow chart illustrating a process of outputting interactive feedback results based on privacy in the method shown in FIG. 2.

Fig. 13 shows a functional block diagram of an interaction apparatus according to an embodiment of the present application.

Fig. 14 shows a functional block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

With the continuous development and application of Network information technology, the traditional PSTN (Public Switched Telephone Network) communication mode can not meet the requirements of users gradually, and then the PSTN communication is replaced by high-definition communication gradually, therefore, video interaction is presented in the market to serve the users through customer service personnel, the video interaction mode can enable the users to feel more intimate, but the situation that the number of the customer service personnel is insufficient is easy to appear in the peak consultation period, and the users need to wait for a long time, so that the user experience is poor.

In recent years, with the continuous development of communication technology, the traditional PSTN call mode is gradually replaced by a high-definition call mode, such as a VOLTE (Voice over Long-Term Evolution) call mode, and compared with the PSTN call mode, the Voice and video call of the high-definition call mode is clearer and has lower dropped rate, and the traffic data service can be simultaneously used in the call process, and meanwhile, the video call can be realized without the help of a program and software installed in an intelligent terminal, so that the call experience of both parties of the call is better. Based on this, the customer service staff and the user have appeared among the prior art and have adopted high definition conversation mode to communicate, and at this in-process, the intelligent terminal that the customer service staff held can send customer service staff's audio and video information to the intelligent terminal that the user held to reach and present the live effect for the user of customer service staff, this kind of mode can let the user feel more intimate, improves the experience that the user consulted to customer service staff. Although the mode can effectively improve the experience of the user, the condition of insufficient number of customer service staff is easy to occur during the consultation peak period, so that the user needs to wait for a long time, and the experience of the user is poor.

In order to solve the above-described problems, the inventor of the present application has made research and development, and in order to improve user experience, the inventor finds that a video can be recorded in advance, and when the user is in a consultation peak period, if the user needs to be acquired, the video recorded in advance is sent to an intelligent terminal held by the user based on the need.

Based on this, the present inventors continue to invest in research and development, and in order to further improve user experience and enable a user to see smooth video pictures about customer service staff, the present inventors have proposed an interaction method, an apparatus, an electronic device, and a storage medium according to an embodiment of the present application, where the interaction method includes acquiring user information sent by an intelligent terminal; determining an interaction type interacted with a user according to the user information, wherein the interaction type comprises a conversation type; if the interaction type is a conversation type, generating driving information according to the user information, wherein the driving information is used for driving the digital person; driving the digital person based on the driving information, acquiring a digital person image including the digital person, and taking the digital person image as a feedback result, wherein the form of the digital person in the digital person image corresponds to the driving information; and sending the feedback result to the intelligent terminal. The digital person can be displayed at the intelligent terminal, when the interactive method is applied to customer service, if the interactive method is in a consultation peak period, the consultation requirement of a user can be responded quickly, meanwhile, the interactive type can be determined based on user information, driving information is generated through the user information when the interactive type is a conversation type, the digital person is driven through the driving information, information fed back to the user can be represented by the digital person presented at the intelligent terminal, a scene that the user and a simulation digital person communicate face to face is simulated, the user feels more intimate, user experience is improved, in addition, the interactive method is particularly suitable for VOLTE conversation, and the interactive effect of the user and the digital person can be achieved without installing an APP at the intelligent terminal.

In order to better understand an interaction method, an interaction device, an electronic device, and a storage medium provided in the embodiments of the present application, an application environment suitable for the embodiments of the present application is described below.

Referring to fig. 1, fig. 1 is a schematic diagram illustrating an application environment suitable for an interactive system 200 according to an embodiment of the present application. The interactive system 200, the method, the apparatus, the electronic device and the storage medium provided in the embodiment of the present application may be applied to the interactive system 200 shown in fig. 1. The interactive system 200 comprises an intelligent terminal 201 and a server 202, wherein the server 202 is in communication connection with the intelligent terminal 201. The server 202 may be implemented by an independent server or a server cluster composed of a plurality of servers. In addition, the server may be a cloud server, and may also be a traditional machine room server, which is not specifically limited herein.

In some embodiments, the smart terminal 201 may be a variety of electronic devices having a display screen and supporting data entry, including but not limited to smartphones, tablets, laptop portable computers, desktop computers, wearable electronic devices, and the like. Specifically, the data input may be based on voice input by a voice module configured by the intelligent terminal 201, characters input by a character input module, images input by an image input module, videos input by a video input module, and the like, and may also be based on a gesture recognition module configured by the intelligent terminal 201, so that a user may implement an interactive mode such as gesture input.

Wherein, the intelligent terminal 201 can be installed with a client application program, and a user can communicate with the server 202 based on the client application program (e.g. APP such as wechat, and wechat applet, etc.), specifically, the server 202 is installed with a corresponding server 202 application program, and the user can register a user account at the server 202 based on the client application program and communicate with the server 202 based on the user account, for example, a user logs into a user account at a client application, and enters through the client application based on the user account, text information, voice information, image information or video information and the like can be input, and after the client application program receives the information input by the user, the information can be sent to the server 202, so that the server 202 can receive, process and store the information, and the server 202 can also receive the information and return corresponding output information to the intelligent terminal 201 according to the information.

In some embodiments, the server 202 may be configured to receive information input by a user, generate a picture simulating a digital person according to the information, and send the picture to the intelligent terminal 201, so as to provide a customer service to the intelligent terminal 201 and perform customer service communication with the user. Specifically, the intelligent terminal 201 may receive information input by the user, and present a screen of the simulated digital person sent by the server 202 to the intelligent terminal 201. The simulated digital human is a software program based on visual graphics, and the software program can present a robot shape simulating biological behaviors or ideas to a user after being executed. The simulated digital person may be a simulated digital person simulating a real person, such as a simulated digital person shaped like a real person built according to the shape of the user himself or other natural persons, or a simulated digital person simulating an animation effect, such as a simulated digital person in the shape of an animal or cartoon character.

In some embodiments, as shown in fig. 1, after acquiring reply information corresponding to the information input by the user, the intelligent terminal 201 may display an artificial digital human image corresponding to the reply information on a display screen of the intelligent terminal 201 or other image output device connected thereto. As a mode, while the simulated digital human image is played, the audio corresponding to the simulated digital human image may be played through the speaker of the intelligent terminal 201 or other audio output devices connected thereto, and the text or the graphic corresponding to the reply information may also be displayed on the display screen of the intelligent terminal 201, so as to implement multi-state interaction with the user in multiple aspects of image, voice, text, and the like.

The above application environments are only examples for facilitating understanding, and it is to be understood that the embodiments of the present application are not limited to the above application environments.

The interaction method, the interaction apparatus, the electronic device, and the storage medium provided by the embodiments of the present application are described in detail below with specific embodiments.

Referring to fig. 2, an interaction method applicable to the interactive system 200 is provided in the embodiment of the present application, which describes a step flow of a server side, and the method may include the following steps S11 to S15.

Step S11: and acquiring the user information sent by the intelligent terminal.

In this embodiment, the user information may include information related to the user. For example, the user information may be information entered by the user at the intelligent terminal 201, identity information of the user, voice of the user, an image of the user, and the like, and the type of the user information is not particularly limited herein.

Illustratively, when a user holds the intelligent terminal 201, and the intelligent terminal 201 establishes a communication connection with the server 202, the user may type information in the intelligent terminal 201, for example, the user may input information such as numbers, characters, letters, and the like on a virtual keyboard or a physical keyboard provided on the intelligent terminal 201, and the intelligent terminal 201 sends the information to the server 202 as user information; the server 202 may obtain a SIM card number communicated with the server 202, and query, at the server 202, information about a package of the number, a service remaining amount of the package, a service available for the number, a number owner of the number, and the like based on the number, and use the information as user information; a microphone arranged on the intelligent terminal 201 can collect voice of a user using the intelligent terminal 201, the voice is used as user information, and the user information is sent to the server 202; the camera disposed in the intelligent terminal 201 may collect a picture of a user using the intelligent terminal 201, use the picture as user information, and send the user information to the server 202, where no specific limitation is imposed on a manner of obtaining the user information.

Step S12: and determining an interaction type of interaction with the user according to the user information, wherein the interaction type comprises a conversation type.

In this embodiment, the interaction type may represent an interaction manner required by the current customer service. Specifically, the conversational style may represent an interactive manner in which a user and a digital person converse and communicate with each other, which is required for the current customer service. For example, if the user information includes information for inquiring about a current package and information corresponding to the current package needs to be presented at the intelligent terminal 201, the type of interaction at this time may be determined as a dialog type.

In this embodiment, intention identification may be performed on the user information, a user intention represented by the user information may be obtained, and the current customer service may be determined based on the user intention, so as to obtain an interaction mode corresponding to the customer service. The mapping relationship between the customer service and the interaction mode can be preset. For example, when the customer service is a, b or c, the interaction type is X; when the customer service is d, e or f, the interaction type is Y.

For example, the intention recognition model may be used to recognize the intention of the user information, and the intention recognition model may be used to use a Machine learning model such as an RNN (Recurrent Neural Network) model, a CNN (Convolutional Neural Network) model, a VAE (variable auto Encoder) model, a BERT (Bidirectional Encoder representation of transformer), a Support Vector Machine (SVM), and the like, which are not limited herein. For example, the intent recognition model may also be a variant or combination of the machine learning models described above, and the like.

Step S13: and if the interaction type is a conversation type, generating driving information according to the user information, wherein the driving information is used for driving the digital person.

In this embodiment, it is possible to obtain the user intention according to the user information, and determine information for feedback to the user according to the user intention, and regarding the information as the drive information, and the manner of specifically obtaining the drive information should be associated with the type of the user information.

When the driving information is acquired, determining user intention based on the user information, and generating the driving information according to the user intention; the user information may be at least one of voice, text and image. The driving information may be text, voice, semantics, etc.

For example, if the user information is speech, the speech may be subjected to speech recognition processing to obtain a text corresponding to the speech, the text may be recognized by using an intention recognition model to obtain a user intention, information fed back to the user may be determined based on the user intention, and driving information may be generated based on the information. For example, the text "please report weather of beijing after five hours" is obtained by the voice recognition, the intention recognition is performed on the text to obtain the intention of the user, and then the information "weather of beijing is heavy rain for 15 hours and 20 minutes" fed back to the user is determined, and the driving information for driving the digital person is generated according to the information. It should be noted that, if the voice is acquired through the intelligent terminal 201, a microphone for collecting the voice may be disposed on the intelligent terminal 201, and the intelligent terminal 201 may also receive information sent by other devices to acquire the voice, where the manner of acquiring the voice is not particularly limited.

For example, if the user information is a character, the character may be recognized by using an intention recognition model, the user intention may be obtained, information fed back to the user may be determined based on the user intention, and the driving information may be generated based on the information. For example, if the word "which province of china west ann belongs to? And identifying the intention of the section of characters to obtain the intention of the user, further determining information fed back to the user, namely Shaanxi province, and generating driving information for driving a digital person according to the information. It should be noted that, if the text is acquired through the intelligent terminal 201, a keyboard for collecting the text to be typed may be disposed on the intelligent terminal 201, and the intelligent terminal 201 may also receive information sent by other devices to acquire the text, where the manner of acquiring the text is not particularly limited.

For example, if the user information is an image, the image may be subjected to image recognition processing to obtain a user intention represented by the image, information fed back to the user may be determined based on the user intention, and driving information may be generated based on the information. For example, if an image includes an "OK gesture," the image is subjected to image recognition processing to obtain "determination" of the user's intention, information fed back to the user is determined, and control parameters for driving a digital person are generated based on the information. It should be noted that, if the image is acquired through the intelligent terminal 201, an image acquisition device for acquiring the image may be disposed on the intelligent terminal 201, and the intelligent terminal 201 may also receive information sent by other devices to acquire the image, where the manner of acquiring the image is not particularly limited.

For example, the intention recognition model may be used to recognize the intention of the user information, and the intention recognition model may be a machine learning model such as an RNN model, a CNN model, a VAE model, a BERT, a support vector machine, and the like, which is not limited herein. For example, the intent recognition model may also be a variant or combination of the machine learning models described above, and the like.

In addition, in the present embodiment, the driving information may be text, voice, semantic, and the like, and the type of the driving information is not particularly limited.

Step S14: driving the digital person based on the driving information, acquiring a digital person image including the digital person, and taking the digital person image as a feedback result, the form of the digital person in the digital person image corresponding to the driving information.

In this embodiment, the form of the digital person may include an action state of the digital person. For example, the digital person may be in the form of an eyeball, a mouth shape, an expression, two arms, or the like of the digital person. It should be noted that the driving information may be a continuity parameter having a time attribute, the motion of the digital person is controlled in time sequence, and the driving information may be associated with the motion state of the digital person. For example, the driving information may include controlling the 3D digital person to head 15 degrees at time t1, controlling the 3D digital person to swing left and right at time t2, controlling the 3D digital person to smile at time t3, and the like.

It is to be noted that the digital person image may be understood as an image obtained by photographing a digital person at an angle in which the driving information of the digital person corresponds to the motion state.

Further, the digital person in the embodiment may be obtained by 3D modeling, or may be a realistic image with an image quality similar to that of a camera shot by generating each frame through a deep learning model, and the digital person has the effect similar to a real person shot by the camera. Alternatively, a video digital person may be generated from a coherent realistic image.

Specifically, the method for obtaining a digital person through a 3D modeling manner may include: acquiring a plurality of sample images comprising a target model; acquiring the form of a target model according to the multiple sample images; acquiring an original digital person and modeling information, wherein the modeling information comprises an original key point of the original digital person; and acquiring target key points of the target model according to the form, and corresponding the original key points to the target key points to generate the digital person.

In this embodiment, the target model may comprise a model associated with a digital person. For example, when the digital person is a certain broadcaster C, the target model may be the broadcaster C, a character similar to the broadcaster C in face, skeleton and stature, or a dummy (e.g., a character wax) similar to the broadcaster C in face, skeleton and stature.

In this embodiment, the sample image may include images of multiple target models at different angles. Specifically, the sample image may be an image of the target model at various angles in the case of various motions, sounds, expressions, and the like.

In some examples, cameras for capturing images of the target model may be arranged annularly around the periphery of the target model, wherein cameras of different focal sizes may be provided relative to the same orientation of the target model. When the target model makes a sound, changes an action, changes a facial expression, and the like, images including the target model may be simultaneously acquired by using the respective cameras, thereby obtaining a sample image.

In this embodiment, the form of the target model may include information related to the body change condition of the target model. For example, the form may be a drop in mouth angle, a right deviation of the eyes, a head lift, a right hand lift, or the like.

In some examples, the shape of each part of the target model may be obtained by acquiring each part of the target model from the sample image through a target detection algorithm, and determining the shape of each part based on the change states of the same part in a plurality of continuous sample images. For example, the target detection algorithm may be a sliding window target detection, a two stage target detection algorithm, a one stage target detection algorithm, or the like.

In this embodiment, the original digital person may comprise a model of an already constructed digital person. For example, the original digital person may be an average human face model of a certain region, or may be a 3D animation model in industrial animation, where the type of the original digital person is not particularly limited. In addition, the modeling information may include parameter information for constructing an original digital person, by which the original digital person may be restored so that the original digital person can be presented.

In this embodiment, the morphology of the target model may be combined with the modeling information such that the morphological features of the target model are added to the original digital person, resulting in a digital person that is substantially the same as the morphology of the target model.

In this embodiment, the original key points of the original digital person may include positions for identifying, locating and controlling various parts of the original digital person. For example, the original keypoints can be the left eye corner position, right eye corner position, mouth corner position, face contour position, eyebrow position, nose wing position, thumb position, shoulder position, etc. of the original digital person. It should be noted that, the more dense the original key points are at each part of the original digital person in the modeling information, the more accurate the digital person is finally constructed.

In this embodiment, the target key points may include positions for identifying and locating various parts of the target model. For example, the target keypoints may be the left eye corner position, the right eye corner position, the mouth corner position, the face contour position, the eyebrow position, the nose wing position, the thumb position, the shoulder position, and the like of the target model. It should be noted that the more dense the target key points are at each part of the target model, the more accurate the digital person is finally constructed.

In this embodiment, the original key points of each part of the original digital person may be in one-to-one correspondence with the target key points of the same part of the target model. For example, the original key points include positions of the contour of the face of the original digital person, the target key points include positions of the contour of the face of the target model, the positions in the upper court of the face of the original digital person correspond to the positions in the upper court of the face of the target model one by one, the positions in the lower court of the face of the original digital person correspond to the positions in the lower court of the face of the target model one by one, and the other original key points of the face of the original digital person correspond to the other target key points of the face of the target model one by one, which will not be described herein again.

In some examples, the target keypoints that are in dynamic state may be corresponded to the original keypoints. Specifically, target key points of the target model can be obtained, the face of the target model in the continuous sample image is marked, and the same target key points of the face of the target model are associated according to the time sequence of the continuous sample image, so that a dynamic change track of each target key point of the face of the target model is obtained; and corresponding the dynamic change track of each target key point with the original key point of the original digital human face, so as to obtain the dynamic change track of the original key point of the original digital human face, calculating an amplitude difference value between the amplitude of the dynamic change track of the original key point and the dynamic change track of the target key point at the same time, and determining that the original key point of the original digital human needs to be corrected if the amplitude difference value is greater than a preset amplitude threshold value.

Illustratively, target key points of a target model can be obtained, the face of the target model in a continuous sample image is marked, the same target key point of the face of the target model is associated according to the time sequence of the continuous sample image, so as to obtain dynamic change tracks of all target key points of the face of the target model, the dynamic change tracks of all target key points are corresponding to original key points of the face of an original digital person, so as to obtain dynamic change tracks of the original key points of the face of the original digital person, the face change amplitude of the original digital person at all times is obtained based on the dynamic change tracks of the original key points, the face change amplitude of the target model at all times is obtained based on the dynamic change tracks of the target key points, the face change amplitude of the original digital person at all times and the face change amplitude of the target model at all times are compared, if the difference between the face variation amplitude of the original digital person and the face variation amplitude of the target model at the same moment is larger than a preset amplitude threshold value, determining that the original key points of the original digital person need to be corrected; and if the difference between the face variation amplitude of the original digital person and the face variation amplitude of the target model at the same moment is smaller than or equal to a preset amplitude threshold value, determining that the face construction of the current digital person is in accordance with the expectation.

Specifically, the creating of the simulation digital human generation model may also include, by pre-creating the simulation digital human model, creating a realistic image with each frame having a quality similar to that of the image captured by the camera by the simulation digital human: acquiring a plurality of sample images including a target model and camera parameters corresponding to each sample image; acquiring sample image configuration parameters corresponding to the camera parameters; acquiring angle information of the target model according to the sample image, and associating the angle information with the sample image configuration parameters; and constructing a simulation digital human model according to the sample image configuration parameters and the angle information to obtain a preset simulation digital human model.

In the present embodiment, the camera parameters may include parameters employed when a photographing device for photographing the sample image photographs the target model. For example, the camera parameters may be focal length, aperture size, and the like. The sample image configuration parameters may include parameters of a sample image generated by a photographing device for photographing a sample image photographing the target model. For example, the sample image configuration parameters may be pixel size, image exposure, the percentage of the target model in the image, the location of the target model in contact with the ground, and the like. The angle information may include an angle at which the target model is presented in the sample image. For example, when the angle between the face orientation of the target model in the sample image and the preset axis direction is 15 degrees, 15 degrees may be taken as the angle information.

In some examples, the sample images may be identified to obtain an angle of the target model. Specifically, each part of the target model can be acquired from the sample image through a target detection algorithm, the angle of the part is determined based on the change state of the same part in the multiple continuous sample images, so that the angle of each part of the target model is obtained, and the angle of each part is used as angle information.

In this embodiment, the sample image may be regarded as being composed of a plurality of regions and a plurality of points, the states of the plurality of regions and the plurality of point locations of the target model at each angle are obtained based on the sample image configuration parameters and the angle information, and the regions and the point locations at each angle are combined to construct the simulated digital human model, so that the simulated digital human model can output images including the target model at different angles.

It should be noted that, compared with the method for acquiring the digital person through 3D modeling, the process of generating the model through constructing the simulation digital person does not need 3D modeling, the acquired simulation digital person is closer to a real person model, the effect is more vivid, and the method is suitable for the situation that different real person models need to be modeled to acquire the simulation digital person in practical application.

Step S15: and sending the feedback result to the intelligent terminal.

In this embodiment, the server 202 may establish a communication connection with the intelligent terminal 201.

In some examples, the server 202 may establish a communication connection with the intelligent terminal 201. The server 202 may directly establish a communication connection with the intelligent terminal 201, or may indirectly establish a communication connection with the intelligent terminal 201, where the manner of establishing a communication connection between the server 202 and the intelligent terminal 201 is not particularly limited.

In some examples, when the server 202 directly establishes a communication connection with the intelligent terminal 201, the server 202 may have a call function and/or an answering function. Specifically, the server 202 may send a call request to the intelligent terminal 201 through the SIM card number, and after one side of the intelligent terminal 201 receives the call request, the server 202 establishes a high-definition call connection with the intelligent terminal 201, and the server 202 may obtain user information from the intelligent terminal 201 and send a feedback result to the intelligent terminal 201; the intelligent terminal 201 may send a call request to the server 202, after the server 202 receives the call request, the server 202 establishes a high-definition call connection with the intelligent terminal 201, and the server 202 may obtain user information from the intelligent terminal 201 and send a feedback result to the intelligent terminal 201.

In some examples, when the server 202 indirectly establishes a communication connection with the intelligent terminal 201, a call platform and a SIP (Session Initiation Protocol) server are set between a communication link formed by the server 202 and the intelligent terminal 201. Specifically, the calling platform can send a calling request to the intelligent terminal 201 through the SIM card number, when the intelligent terminal 201 side receives the call request, the call platform establishes a high-definition call connection with the intelligent terminal 201, meanwhile, the call platform may establish a communication connection with the SIP server through an RTP (Real-time Transport Protocol) Protocol, the SIP server establishes a data connection with the server 202, the call platform may acquire user information of the intelligent terminal 201, and sends the user information to the SIP server, the SIP server can send the user information to the server 202, the server 202 can obtain the user information through the SIP server, and sends the digital human image to the SIP server, the server packages the digital human image into an H264 data packet through an RTP protocol and transmits the H264 data packet to the server 202, and the server 202 sends the H264 data packet to the intelligent terminal 201. In addition, if the feedback result further includes voice, the server 202 may send the voice and the digital human image to the SIP server, respectively, the SIP server packages the voice into a PCMU data packet and/or PCMA data packet to the call platform through an RTP protocol, packages the digital human image into an H264 data packet to the call platform through the RTP protocol, and the call platform sends the data packet to the intelligent terminal 201.

In this embodiment, through the implementation of the above steps S11 to S15, a digital person can be displayed at the intelligent terminal 201, when the interaction method is applied to a customer service, if the interaction method is in a consultation peak period, a consultation requirement of a user can be quickly responded, meanwhile, an interaction type can be determined based on user information, and when the interaction type is a conversation type, driving information is generated through the user information, the digital person is driven through the driving information, so that the digital person presented at the intelligent terminal 201 can represent information fed back to the user, a scene of face-to-face communication between the user and a simulated digital person is simulated, the user feels more appropriate, and user experience is improved.

Further, an interaction method is also provided in the embodiments of the present application, and the interaction method may include the following steps S21 to S25. The interaction method provided in this embodiment may include the same or similar steps as those in the above embodiments, and for the execution of the same or similar steps, reference may be made to the foregoing description, and details are not repeated in this specification.

Step S21: and acquiring the user information sent by the intelligent terminal.

Step S22: and determining an interaction type of interaction with the user according to the user information, wherein the interaction type comprises a conversation type.

Step S23: and if the interaction type is a conversation type, generating driving information according to the user information, wherein the driving information is used for driving the digital person.

Step S24: driving the digital person based on the driving information, acquiring a digital person image including the digital person, and taking the digital person image as a feedback result, the form of the digital person in the digital person image corresponding to the driving information.

Step S25: and sending the feedback result to the intelligent terminal.

In order to better meet the diversified requirements of users, manual service can be provided for the users. The interaction type comprises a manual service type; as shown in fig. 3, before the step S25, the interaction method provided by the present embodiment further includes the following steps S26 to S28.

Step S26: and if the interaction type is the manual service type, acquiring the identification information of the intelligent terminal.

In this embodiment, when the interaction type is a manual service type, the customer service staff may perform a high-definition video call with the user. The manual service type may characterize the type of interaction that customer service personnel interact with the user.

In this embodiment, the identification information may be an identification code used for the communication connection with the server 202 in the intelligent terminal 201. For example, the identification information may be a SIM card number set in the smart terminal 201.

Step S27: and acquiring the target customer service according to the identification information.

In this embodiment, a mapping relationship between the identification information and the target customer service may be set in advance.

In some examples, a home of the SIM card represented by the identification information may be obtained based on the identification information, and a target customer service corresponding to the home may be obtained. For example, determining that the attribution of the SIM card represented by the identification information is beijing according to the identification information, obtaining a customer service person in charge of the beijing SIM card service, and taking the customer service person as a target customer service.

In some examples, a location of the smart terminal 201 where the SIM card represented by the identification information is located may be obtained based on the identification information, and a target customer service corresponding to the location may be obtained. For example, it is determined that the location of the smart terminal 201 where the SIM card represented by the identification information is located is beijing according to the identification information, and a customer service person in charge of the service of the beijing SIM card may be obtained and used as a target customer service.

In some examples, to increase customer stickiness, the user may be associated with a customer service person, and when the interaction type is a manual service type, the customer service person corresponding to the identification information may be obtained and targeted. The corresponding relationship between the identification information and the customer service staff can be preset.

Step S28: and acquiring the customer service information of the target customer service, and taking the customer service information as a feedback result.

In this embodiment, the customer service information may include information of a target customer service, and the corresponding customer service information may be obtained through a connection type of the intelligent terminal 201 and the server 202 that establish a communication connection.

In some examples, when the smart terminal 201 establishes a high-definition call connection with the server 202, the voice of the target customer service and the continuity image including the target customer service may be acquired and used as the feedback result. The voice may be packetized into a PCMU data packet and/or a PCMA data packet through an RTP protocol and transmitted to the intelligent terminal 201, and the continuity image may be packetized into an H264 data packet through the RTP protocol and transmitted to the intelligent terminal 201.

In some examples, when the smart terminal 201 establishes a wireless cellular phone communication connection with the server 202, the voice of the target customer service may be acquired and used as a feedback result. The voice may be packetized into PCMU data packets and/or PCMA data packets through an RTP protocol, and then transmitted to the intelligent terminal 201.

In this embodiment, through the implementation of the steps S26 to S28, manual service can be provided for the user, and diversified requirements of the user can be better satisfied.

Furthermore, in order to enable the user to better understand the service information, the service information can be used as a feedback result; the interaction type comprises an information presentation type; before the step S25, the interaction method provided in this embodiment further includes: if the interaction type is an information presentation type, determining service information for presentation at the intelligent terminal 201 according to the user information, and taking the service information as a feedback result.

In this embodiment, the intention of the user information may be identified, the user intention represented by the user information may be obtained, and the service information that needs to be presented at the intelligent terminal 201 may be determined based on the user intention. The service information may include a service related to the user. For example, the service information may be a package currently transactable by the SIM card, a queryable service, and the like. In this embodiment, the presentation form of the service information may be an image, a text, and the like, and the presentation form of the service information is not particularly limited herein.

In this embodiment, the service information for presentation at the intelligent terminal 201 may be determined according to the user information, and the service information is used as a feedback result to present the service information to the intelligent terminal 201, so as to help the user to better know the service information.

It should be noted that, in the present embodiment, the interaction types may include a dialogue type, a manual service type, and an information presentation type. The interaction type can be determined to be a conversation type and an information presentation type according to the user information, and can also be determined to be a manual service type and an information presentation type according to the user information. That is, when it is determined that the user information satisfies the judgment of the interactive type as the dialogue type and the information presentation type, it is determined that the interactive type is the dialogue type and the information presentation type, and the digital human image and the service information may be transmitted to the smart terminal 201 as a feedback result. When it is determined that the user information satisfies the judgment that the interaction type is the manual service type and the information presentation type, it is determined that the interaction type is the manual service type and the information presentation type, and the customer service information and the service information can be sent to the intelligent terminal 201 as a feedback result.

Further, in order to better prompt the user, the electronic card may be generated at the same time as the digital human image is generated, and for this reason, the embodiment of the present application further provides an interactive method, as shown in fig. 4, the interactive method may include the following steps S31 to S37. The interaction method provided in this embodiment may include the same or similar steps as those in the above embodiments, and for the execution of the same or similar steps, reference may be made to the foregoing description, and details are not repeated in this specification.

Step S31: and acquiring the user information sent by the intelligent terminal.

Step S32: and acquiring prompt content for presentation at the intelligent terminal according to the user information.

In this embodiment, the driving information may be obtained according to the user information, and the prompting content that the user needs to obtain may be obtained from the driving information. For example, when the drive information indicates that packages that the user can handle include package a, package B, and package C, the prompt content may be package a, package B, and package C arranged according to a preset format. For a method of acquiring the driving information according to the user information, reference may be made to the above description of step S13, and details are not repeated here.

Step S33: and generating an electronic card comprising the prompt content, and taking the electronic card as a feedback result.

In this embodiment, the electronic card may be combined with the digital human image and then sent to the intelligent terminal 201 as an interactive feedback result, or may be independently sent to the intelligent terminal 201 as a feedback result.

Exemplarily, when the electronic card is sent to the intelligent terminal 201 as a feedback result, the electronic card and the digital person image may be respectively transmitted to the intelligent terminal 201 in the form of an H264 data packet, when the intelligent terminal 201 receives the electronic card and the digital person image, the electronic card may be used as a first layer, the digital person image is used as a second layer, and when the intelligent terminal 201 needs to simultaneously display the first layer and the second layer, the first layer may be disposed on a top layer, and the second layer may be disposed on a bottom layer.

In this embodiment, an electronic card may be generated, and the electronic card occurs while sending the digital person image to the intelligent terminal 201, so that the user can simultaneously view the prompt content of the electronic card in the process of performing high-definition communication with the digital person service, and help the user better understand and handle the service.

Step S34: and determining an interaction type of interaction with the user according to the user information, wherein the interaction type comprises a conversation type.

Step S35: and if the interaction type is a conversation type, generating driving information according to the user information, wherein the driving information is used for driving the digital person.

Step S36: driving the digital person based on the driving information, acquiring a digital person image including the digital person, and taking the digital person image as a feedback result, the form of the digital person in the digital person image corresponding to the driving information.

Step S37: and sending the feedback result to the intelligent terminal.

Further, a card picture and a plurality of digital person images may be output based on a correspondence between the electronic card and the digital person video, as shown in fig. 5, the number of the digital person images is a plurality; the above step S37 may include the following steps S371 to S374.

Step S371: acquiring a card type of the electronic card, wherein the card type comprises a dynamic type.

In the present embodiment, the card types of the electronic card may include a static type and a dynamic type.

In this embodiment, when the electronic card is of a dynamic type, a plurality of electronic cards may be generated, and the plurality of electronic cards may be continuous electronic cards, and when the intelligent terminal 201 presents the plurality of electronic cards according to a time sequence of the electronic cards, an effect of changing and playing videos of the electronic cards may be presented; when the electronic card is of a static type, one electronic card may be generated, so that the intelligent terminal 201 presents the electronic card, thereby obtaining a presentation effect that the electronic card is in a static state.

Step S372: if the card type of the electronic card is a dynamic type, acquiring the time sequence of the plurality of digital person images, and enabling the plurality of card pictures and the plurality of digital person images to correspond one to one according to the time sequence.

In this embodiment, the timing of the digital human image may be the timing of the digital human image generation.

Step S373: and acquiring card pictures of the electronic card at different times according to the time sequence of the images of the plurality of digital persons.

In this embodiment, the card screens of the electronic cards at different times may be the electronic cards at different times acquired based on the driving information.

Step S374: and sending the card picture and the plurality of digital human images to the intelligent terminal.

In this embodiment, since the electronic card and the digital human image are both obtained through the driving information, and meanwhile, the driving information may have a time attribute, the electronic card and the driving information may be in one-to-one correspondence based on a time relationship between the electronic card and the driving information and a time relationship between the digital human image and the driving information.

In the present embodiment, through implementation of the above steps S371 to S374, a card screen and a digital person image can be output, so that the smart terminal 201 presents the digital person video and the card screen at the same time.

Further, a call connection may be established with the intelligent terminal 201, and for this reason, an interaction method may further be provided in the embodiment of the present application, as shown in fig. 6, where the interaction method may include the following steps S41 to S47. The interaction method provided in this embodiment may include the same or similar steps as those in the above embodiments, and for the execution of the same or similar steps, reference may be made to the foregoing description, and details are not repeated in this specification.

Step S41: and receiving a call request sent by the intelligent terminal.

In this embodiment, the service identifier may be called through the intelligent terminal, and a call request may be sent to the service end. The customer service identifier can be set according to the service requirement.

Step S42: and if the call request is determined to be allowed, establishing call connection with the intelligent terminal.

In this embodiment, the server may send a call request to the intelligent terminal to establish a call connection with the intelligent terminal, or may receive a call request sent by the intelligent terminal to establish a call connection with the intelligent terminal.

Step S43: and acquiring the user information sent by the intelligent terminal.

Step S44: and determining an interaction type of interaction with the user according to the user information, wherein the interaction type comprises a conversation type.

Step S45: and if the interaction type is a conversation type, generating driving information according to the user information, wherein the driving information is used for driving the digital person.

Step S46: driving the digital person based on the driving information, acquiring a digital person image including the digital person, and taking the digital person image as a feedback result, the form of the digital person in the digital person image corresponding to the driving information.

Step S47: and sending the feedback result to the intelligent terminal.

Further, a call connection may be established with the intelligent terminal 201, and for this reason, an interaction method may further be provided in the embodiment of the present application, as shown in fig. 7, where the interaction method may include the following steps S51 to S61. The interaction method provided in this embodiment may include the same or similar steps as those in the above embodiments, and for the execution of the same or similar steps, reference may be made to the foregoing description, and details are not repeated in this specification.

Step S51: and acquiring the identity of the intelligent terminal.

In this embodiment, the terminal identifier may be an identifier for establishing a high-definition call connection. For example, the terminal identification may be a standard SIM card number, a mini SIM card number, a Micro SIM card number, or the like.

Step S52: and sending a call request to the intelligent terminal based on the identity.

Step S53: and if the intelligent terminal is determined to be connected with the call request, controlling the server side to establish a data channel with the intelligent terminal.

In this embodiment, when the intelligent terminal 201 connects the call request, the server 202 may be controlled to establish a data channel with the intelligent terminal 201. The data channel may be a high definition telephony data channel.

Further, after the intelligent terminal 201 establishes the data channel, initial information may be sent to the user. As shown in fig. 8, the interaction method provided by the present embodiment may further include the following steps S54 to S56.

Step S54: and acquiring the type of the customer service business of the data channel established between the server and the intelligent terminal.

In this embodiment, the service type may be obtained based on the current transactable service type, the transacted service type, and the like of the user.

Step S55: and acquiring an original digital human image and original voice information according to the customer service type.

In this embodiment, the original digital human image may be an original digital human image and an original voice message that are preset and correspond to the type of the customer service. For example, the customer service type F may be set to correspond to the original digital person image F and the original voice information X, and the customer service type G may be set to correspond to the original digital person image G and the original voice information Y. The customer service type H can be set to correspond to the original digital human image H and the original voice information Z, and when the customer service type F is obtained, the original digital human image F and the original voice information X can be obtained.

Step S56: and sending the original digital human image and the original voice information to the intelligent terminal through a data channel.

In this embodiment, information may be promoted to the user by sending the original digital person image and the original voice information to the smart terminal 201.

Step S57: and acquiring the user information sent by the intelligent terminal.

Step S58: and determining an interaction type of interaction with the user according to the user information, wherein the interaction type comprises a conversation type.

Step S59: and if the interaction type is a conversation type, generating driving information according to the user information, wherein the driving information is used for driving the digital person.

Step S60: driving the digital person based on the driving information, acquiring a digital person image including the digital person, and taking the digital person image as a feedback result, the form of the digital person in the digital person image corresponding to the driving information.

Step S61: and sending the feedback result to the intelligent terminal.

The embodiment of the present application further provides an interaction method, which may include the following steps S71 to S78. The interaction method provided in this embodiment may include the same or similar steps as those in the above embodiments, and for the execution of the same or similar steps, reference may be made to the foregoing description, and details are not repeated in this specification.

Step S71: and acquiring the user information sent by the intelligent terminal.

Step S72: and determining an interaction type of interaction with the user according to the user information, wherein the interaction type comprises a conversation type.

Further, as an embodiment of the present embodiment, as shown in fig. 9, a digital person may be driven by text; the driving information includes text output information; the above step S72 may include the following steps S721 to S722.

Step S721: the user intent is determined from the user information.

In the present embodiment, the user intention may be determined in a manner corresponding to the type of the user information based on the type. Illustratively, when the user information is speech, the speech may be converted to text input information by ASR techniques, and the user intent may be determined based on the text input information. Further, the user speech information may be recognized by a method based on linguistics and acoustics, a stochastic model, an artificial neural network, a probabilistic grammar, and the like, where the method of recognizing the user speech information is not particularly limited.

Further, the user intent may be derived from an intent recognition model. For example, the intention recognition model may be used to recognize the intention of the user information, and the intention recognition model may be a machine learning model such as an RNN model, a CNN model, a VAE model, a BERT, a support vector machine, and the like, which is not limited herein. For example, the intent recognition model may also be a variant or combination of the machine learning models described above, and the like.

Step S722: and acquiring text output information for feedback to the user based on the user intention, and using the text output information as driving information.

In this embodiment, a feedback platform may be preset, and the feedback platform may determine text output information fed back to the user according to the user intention. The feedback platform can be a question and answer library, a customer service call library and the like which are constructed based on customer service type requirements, and the construction of the feedback platform is not particularly limited.

It should be noted that the forms of the digital people may correspond to the texts one by one, and when the text output information is acquired, the digital people may be driven by the texts in the text output information, so that the forms of the digital people in the output digital people image correspond to the text output information.

In the embodiment, the text input information can be obtained, the user intention can be obtained based on the text input information, the text output information can be obtained based on the user intention, the digital man is driven through the text output information, the content fed back to the user is presented in the form of the digital man, the scene of face-to-face communication between the user and customer service staff is simulated, and the user experience is improved.

Further, as an embodiment of the present embodiment, as shown in fig. 10, a digital person may be driven by voice; the driving information comprises feedback voice information; the above step S72 may further include the following steps S723 to S725.

Step S723: the user intent is determined from the user information.

Further, when the user information includes speech, the speech may be recognized to determine the user's intention; as an implementation manner of this embodiment, as shown in fig. 11, the user information may include user voice information; the above step S73 may include the following steps S7231 to S7233.

Step S7231: and acquiring the user voice information in the user information.

Step S7232: and recognizing the voice information of the user to obtain the text input information.

In this embodiment, speech may be converted to text input information by ASR techniques, and user intent may be determined based on the text input information. Further, the user speech information may be recognized by a method based on linguistics and acoustics, a stochastic model, an artificial neural network, a probabilistic grammar, and the like, where the method of recognizing the user speech information is not particularly limited.

Step S7233: and performing semantic recognition on the text input information to determine the user intention.

In the present embodiment, the user intention may be derived by an intention recognition model. For example, the intention recognition model may be used to recognize the intention of the user information, and the intention recognition model may be a machine learning model such as an RNN model, a CNN model, a VAE model, a BERT, a support vector machine, and the like, which is not limited herein. For example, the intent recognition model may also be a variant or combination of the machine learning models described above, and the like.

Step S724: text output information for feedback to a user is obtained based on the user's intent.

It should be noted that the operation principle of the step S724 is substantially the same as that of the step S722, and the step S724 can be referred to the step S722, which is not described herein again.

Step S725: and generating feedback voice information according to the text output information, and taking the feedback voice information as driving information.

In this embodiment, the text output information may be generated into feedback speech information by TTS technology. Further, the text output information may be synthesized by a parameter synthesis method, a waveform synthesis method, a rule synthesis method, or the like, to obtain feedback voice information.

It should be noted that the form of the digital person may correspond to the voice one by one, and when the feedback voice information is acquired, the digital person may be driven by the voice in the voice feedback information, so that the form of the digital person in the output digital person image corresponds to the voice feedback information.

In addition, in the process of driving the digital person through voice, the mouth shape parameters of the digital person can be controlled through voice. For example, the change of the key points of the mouth of the digital person when the digital person utters the sound corresponding to the feedback voice information may be obtained, so as to obtain the mouth shape parameters for characterizing the change of the key points of the mouth. Wherein the mouth key points may include locations for identifying, locating and controlling various portions of the digital person's mouth. For example, the mouth keypoints may include the left corner of the mouth, the right corner of the mouth, the genioglossus sulcus, the bottom of the nose, and so forth. Meanwhile, the voice in the feedback voice information can be continuous voice, the feedback voice information can have time attribute, and the mouth shape change of the digital person can be controlled at each time node corresponding to the voice in the feedback voice information, so that the change process of the mouth shape of the digital person can be accurately presented at the intelligent terminal 201.

In this embodiment, through the implementation of the steps S723 to S725, the text input information can be synthesized into the feedback voice information, and the digital person is driven by the feedback voice information, so as to simulate the scene of face-to-face communication between the user and the customer service staff, thereby improving the user experience.

Step S73: and if the interaction type is a conversation type, generating driving information according to the user information, wherein the driving information is used for driving the digital person.

Step S74: driving the digital person based on the driving information, acquiring a digital person image including the digital person, and taking the digital person image as a feedback result, the form of the digital person in the digital person image corresponding to the driving information.

Further, as an implementation manner of this embodiment, the feedback voice information may be used as an interactive feedback result, and the interactive method provided by this embodiment may further include step S75.

Step S75: and taking the feedback voice information as an interactive feedback result.

Further, the interactive feedback result may include a plurality of continuous frame digital human images and feedback voice information, and the plurality of continuous frame digital human images correspond to the feedback voice information according to a time sequence.

In this embodiment, the time sequence information of each digital person image may be acquired, and the plurality of digital person images are sequentially ordered according to the time sequence information, thereby synthesizing the simulated digital person video.

In this embodiment, the feedback voice information and the digital human image may be in one-to-one correspondence according to a time sequence, so that when the intelligent terminal 201 presents the digital human image, the voice in the feedback voice information corresponding to the digital human image video can be played synchronously.

Step S76: and sending the feedback result to the intelligent terminal.

Further, as an implementation manner of the present embodiment, in order to prevent some information in the feedback interaction result from leaking, as shown in fig. 12, the above step S80 may include the following steps S801 to S803.

Step S801: and obtaining the privacy information in the interactive feedback result.

In this implementation, the privacy information may include information relating to the privacy of the user. For example, the privacy information may be an identification number, a user balance, a dial record, and the like.

In this embodiment, privacy information in the interaction feedback result may be obtained in different manners based on the content type in the interaction feedback result.

Illustratively, when the interactive feedback result includes an electronic card, the text output information contained in the electronic card may be acquired, and information related to the user privacy in the text output information may be taken as the privacy information. When the interactive feedback result includes voice feedback information, text output information corresponding to the voice feedback information may be acquired, and information related to user privacy in the text output information may be used as privacy information.

Step S802: and updating the interactive feedback result according to the privacy information and a preset information protection mode, so that the privacy information in the updated interactive feedback result is hidden.

In this embodiment, the preset information protection manner may be set based on an actual scene.

Illustratively, when the private information is text in an electronic card, one or more characters in the text may be hidden. For example, when the text in the electronic card is the identification number 110101200108150612, the identification number may be updated, and the updated text in the electronic card is 1101012001 XXXXXXX.

When the privacy information is feedback voice information, text output information corresponding to the feedback voice information can be acquired, if the text output information is user balance, numbers and letters in the user balance can be corresponding, and the corresponding letters are used as updated interactive feedback results. For example, the number "1" may be corresponding to the letter "a", the number "2" may be corresponding to the letter "B", the number "3" may be corresponding to the letter "C", the number "4" may be corresponding to the letter "D", the number "5" may be corresponding to the letter "E", the number "6" may be corresponding to the letter "F", and when the user balance is 12345, the user balance may be updated to "ABCDE", and a voice corresponding to "ABCDE" may be used as an updated interactive feedback result. The mode of corresponding the numbers and the letters can be preset for the user.

Step S803: and outputting the updated interactive feedback result.

In this embodiment, after sending the updated interaction feedback result to the intelligent terminal 201, the intelligent terminal 201 can play the updated interaction feedback result, so as to protect the privacy information in the interaction feedback result.

Referring to fig. 13, a block diagram of an interaction apparatus provided in an embodiment of the present application is shown, where the interaction apparatus may include a user information obtaining module 71, an interaction type determining module 72, a digital human driver module 73, a feedback result obtaining module 74, and a feedback result sending module 75. The user information obtaining module 71 is configured to obtain user information sent by the intelligent terminal 201. The interaction type determination module 72 is configured to determine an interaction type for interacting with the user according to the user information, and the interaction type may include a dialog type. The digital person driving module 73 is configured to generate driving information according to the user information if the interaction type is a dialogue type, where the driving information is used to drive the digital person. The feedback result acquiring module 74 is configured to drive the digital person based on the driving information, acquire a digital person image that may include the digital person, and take the digital person image as a feedback result, the form of the digital person in the digital person image corresponding to the driving information. The feedback result sending module 75 is configured to send a feedback result to the intelligent terminal 201.

Optionally, the interaction type may include a manual service type; the interaction device can also comprise an identification information acquisition module, a target customer service acquisition module and a customer service information acquisition module. The identification information obtaining module is configured to obtain the identification information of the intelligent terminal 201 if the interaction type is an artificial service type. And the target customer service acquisition module is used for acquiring the target customer service according to the identification information. The customer service information acquisition module is used for acquiring the customer service information of the target customer service and taking the customer service information as a feedback result.

Optionally, the interactive device may further include a prompt content acquisition module and an electronic card generation module. The prompt content obtaining module is configured to obtain prompt content for presentation at the intelligent terminal 201 according to the user information. The electronic card generating module is used for generating an electronic card which can comprise prompt contents and using the electronic card as a feedback result.

Optionally, the number of digital human images is multiple; the feedback result transmission module 75 may include a card type acquisition unit, a timing acquisition unit, a card screen acquisition unit, and a card screen transmission unit. The card type acquiring unit is used for acquiring the card type of the electronic card, and the card type can comprise a dynamic type. The time sequence acquisition unit is used for acquiring the time sequences of a plurality of digital human images if the card type of the electronic card is a dynamic type. The card picture acquisition unit is used for acquiring card pictures of the electronic card at different times according to the time sequence of the digital people images, so that the card pictures correspond to the digital people images one by one according to the time sequence. The card picture sending unit is used for sending the card picture and the plurality of digital human images to the intelligent terminal 201.

Optionally, the interaction type may include an information presentation type; the interaction device may further include a service information acquisition module. The service information obtaining module is configured to determine, if the interaction type is an information presentation type, service information to be presented at the intelligent terminal 201 according to the user information, and use the service information as a feedback result.

Optionally, the interactive device may include a call request receiving module and a call connection establishing module. The call request receiving module is configured to receive a call request sent by the intelligent terminal 201. The call connection establishing module is configured to establish a call connection with the intelligent terminal 201 if it is determined that the call request is allowed.

Optionally, the interaction device may further include an identity obtaining module, a call request sending module, and a data channel establishing module. The identity acquiring module is configured to acquire an identity of the intelligent terminal 201. The call request sending module is configured to send a call request to the intelligent terminal 201 based on the identity. The data channel establishing module is configured to control the server 202 to establish a data channel with the intelligent terminal 201 if it is determined that the intelligent terminal 201 connects the call request.

Optionally, the interactive device may further include a customer service type module, an original information obtaining module, and an original information sending module. The customer service type module is used for acquiring the customer service type of a data channel established between the server 202 and the intelligent terminal 201. The original information acquisition module is used for acquiring original digital human images and original voice information according to the customer service type. The original information sending module is used for sending the original digital human image and the original voice information to the intelligent terminal 201 through a data channel.

Alternatively, the drive information may include text output information; the digital human driver module 73 may include a user intention determining unit and a text output information acquiring unit. The user intention determining unit is used for determining the user intention according to the user information. The text output information acquisition unit is used for acquiring text output information used for feeding back to a user based on the user intention and taking the text output information as driving information.

Alternatively, the driving information may include feedback voice information; the digital human driver module 73 may further include a user intention determining unit, a text output information acquiring unit, and a feedback voice information generating unit. The user intention determining unit is used for determining the user intention according to the user information. The text output information acquisition unit is used for acquiring text output information used for feeding back to a user based on the user intention. The feedback voice information generating unit is used for generating feedback voice information according to the text output information and taking the feedback voice information as driving information.

Optionally, the interaction device may further comprise a feedback module. The feedback module is used for taking the feedback voice information as a feedback result.

Optionally, the feedback result may include a plurality of consecutive frames of digital human images and feedback voice information, the plurality of consecutive frames of digital human images corresponding to the feedback voice information in time sequence.

Alternatively, the feedback result transmitting module 75 may include a privacy information acquiring unit, a privacy information hiding unit, and a feedback result outputting unit. The privacy information acquisition unit is used for acquiring the privacy information in the feedback result. The privacy information hiding unit is used for updating the feedback result according to the privacy information and a preset information protection mode, so that the privacy information in the updated feedback result is hidden. And the feedback result output unit is used for outputting the updated feedback result.

Optionally, the user information may include user voice information; the user intention determining unit may include a user voice information acquiring sub-unit, a text input information acquiring sub-unit, and a semantic recognition sub-unit. The user voice information acquisition subunit is used for acquiring the user voice information in the user information. The text input information acquisition subunit is used for identifying the user voice information to obtain text input information. And the semantic recognition subunit is used for performing semantic recognition on the text input information and determining the user intention.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the modules/units/sub-units/components in the above-described apparatus may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, the coupling or direct coupling or communication connection between the modules shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or modules may be in an electrical, mechanical or other form.

In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

Referring to fig. 14, an electronic device provided in an embodiment of the present application is shown, and the electronic device may include a processor 810, a communication module 820, a memory 830, and a bus. The bus may be an ISA bus, PCI bus, EISA bus, CAN bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. Wherein:

and a memory 830 for storing programs. In particular, the memory 830 may be used to store software programs as well as various data. The memory 830 may mainly include a program storage area and a data storage area, wherein the program storage area may store a program required to operate at least one function and may include a program code including computer operation instructions. In addition to storing programs, the memory 830 may temporarily store messages or the like that the communication module 820 needs to send. The memory 830 may include a high-speed RAM memory, and may further include a non-volatile memory (non-volatile memory), such as at least one Solid State Disk (SSD).

The processor 810 is configured to execute programs stored in the memory 830. The program when executed by a processor implements the steps of the interaction method of the embodiments described above.

The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the processes of the interaction methods in the embodiments, and can achieve the same technical effects, and in order to avoid repetition, the details are not repeated here. The computer-readable storage medium includes, for example, a Read-Only Memory (ROM), a Random Access Memory (RAM), an SSD, a charged Erasable Programmable Read-Only Memory (EEPROM), or a Flash Memory (Flash).

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, SSD, Flash), and includes several instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the methods of the embodiments of the present application.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. An interaction method, comprising:

acquiring user information sent by an intelligent terminal;

determining an interaction type interacted with a user according to the user information, wherein the interaction type comprises a conversation type;

if the interaction type is a conversation type, generating driving information according to the user information, wherein the driving information is used for driving the digital person;

driving the digital person based on the driving information, acquiring a digital person image including the digital person, and taking the digital person image as a feedback result, wherein the form of the digital person in the digital person image corresponds to the driving information; and

and sending the feedback result to the intelligent terminal.

2. The interaction method according to claim 1, wherein before said obtaining the user information sent by the intelligent terminal, the method comprises:

receiving a call request sent by the intelligent terminal; and

and if the call request is determined to be allowed, establishing communication connection with the intelligent terminal.

3. The interaction method according to claim 1, wherein before said obtaining the user information sent by the intelligent terminal, the method comprises:

acquiring an identity of the intelligent terminal;

sending a call request to the intelligent terminal based on the identity; and

and if the intelligent terminal is determined to be connected with the call request, controlling the server side to establish a data channel with the intelligent terminal.

4. The interaction method of claim 3, wherein the method further comprises:

acquiring the type of the customer service business of the data channel established between the server and the intelligent terminal;

acquiring an original digital human image and original voice information according to the customer service type; and

and sending the original digital human image and the original voice information to the intelligent terminal through the data channel.

5. The interaction method of claim 1, wherein the interaction type comprises a manual service type; before sending the feedback result to the intelligent terminal, the method further includes:

if the interaction type is the manual service type, acquiring identification information of the intelligent terminal;

acquiring target customer service according to the identification information; and

and acquiring the customer service information of the target customer service, and taking the customer service information as a feedback result.

6. The interaction method according to claim 1 or 5, wherein before the feedback result is sent to the intelligent terminal, the method comprises:

acquiring prompt content for presenting at the intelligent terminal according to the user information; and

and generating an electronic card comprising the prompt content, and taking the electronic card as a feedback result.

7. The interactive method of claim 6, wherein the number of digital human images is multiple; the sending the feedback result to the intelligent terminal comprises:

obtaining a card type of the electronic card, wherein the card type comprises a dynamic type;

if the card type of the electronic card is a dynamic type, acquiring the time sequence of a plurality of digital human images;

according to the time sequence of the digital human images, card pictures of the electronic card at different times are obtained, and the card pictures and the digital human images correspond to each other one by one according to the time sequence; and

and sending the card picture and the plurality of digital person images to the intelligent terminal.

8. The interaction method according to claim 1, wherein the interaction type includes an information presentation type; before sending the feedback result to the intelligent terminal, the method further includes:

and if the interaction type is an information presentation type, determining service information for presentation at the intelligent terminal according to the user information, and taking the service information as a feedback result.

9. The interactive method of claim 1, wherein the driving information comprises text output information; the generating of the driving information according to the user information includes:

determining a user intention according to the user information; and

and acquiring text output information for feedback to the user based on the user intention, and taking the text output information as the driving information.

10. The interactive method of claim 1, wherein the driving information comprises feedback voice information; the generating of the driving information according to the user information includes:

determining a user intention according to the user information;

obtaining text output information for feedback to the user based on the user intent; and

and generating the feedback voice information according to the text output information, and taking the feedback voice information as the driving information.

11. The interaction method according to claim 10, wherein before sending the feedback result to the smart terminal, the method further comprises:

and taking the feedback voice information as a feedback result.

12. The interactive method according to claim 10 or 11, wherein the feedback result includes a plurality of consecutive frames of the digital human image and the feedback voice information, and the plurality of consecutive frames of the digital human image correspond to the feedback voice information in time sequence.

13. The interaction method according to any one of claims 9 to 11, wherein the sending the feedback result to the intelligent terminal comprises:

obtaining privacy information in the feedback result;

updating the feedback result according to the privacy information and a preset information protection mode, so that the privacy information in the updated feedback result is hidden; and

and outputting the updated feedback result.

14. The interaction method according to any one of claims 9 to 11, wherein the user information comprises user voice information; determining the user intent from the user information comprises:

acquiring user voice information in the user information;

recognizing the user voice information to obtain text input information; and

and performing semantic recognition on the text input information to determine the user intention.

15. An interactive apparatus, comprising:

the user information acquisition module is used for acquiring user information sent by the intelligent terminal;

the interaction type determining module is used for determining an interaction type interacted with a user according to the user information, wherein the interaction type comprises a conversation type;

the digital person driving module is used for generating driving information according to the user information if the interaction type is a conversation type, and the driving information is used for driving a digital person;

a feedback result obtaining module, configured to drive the digital person based on the driving information, obtain a digital person image including the digital person, and use the digital person image as a feedback result, where a form of the digital person in the digital person image corresponds to the driving information; and

and the feedback result sending module is used for sending the feedback result to the intelligent terminal.

16. An electronic device, comprising:

one or more processors;

a memory;

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the interaction method of any of claims 1-14.

17. A computer-readable storage medium, in which a program code is stored, the program code being invokable by a processor to perform the interaction method according to any one of claims 1 to 14.