CN111538456A

CN111538456A - Human-computer interaction method, device, terminal and storage medium based on virtual image

Info

Publication number: CN111538456A
Application number: CN202010663796.4A
Authority: CN
Inventors: 李罡; 黄展鸿; 刘云峰
Original assignee: Shenzhen Zhuiyi Technology Co Ltd
Current assignee: Shenzhen Zhuiyi Technology Co Ltd
Priority date: 2020-07-10
Filing date: 2020-07-10
Publication date: 2020-08-14

Abstract

The embodiment of the application discloses a human-computer interaction method, a human-computer interaction device, a human-computer interaction terminal and a storage medium based on an avatar. The method comprises the following steps: displaying an interactive interface, wherein the interactive interface comprises an interface to be awakened and an interface to be interacted, which are used for displaying the virtual image; if the interactive interface is the interface to be awakened, acquiring user input information; if the user input information meets a preset awakening condition, switching the interface to be awakened into the interface to be interacted; acquiring an interaction instruction input by a user based on the interface to be interacted; and executing the operation corresponding to the interactive instruction based on the target interactive interface corresponding to the interactive instruction. By adopting the mode, the man-machine interaction is carried out with the user based on the virtual image, the various interaction requirements of the user can be met on the terminal, a more intelligent man-machine interaction mode is realized, the distance between the user and the equipment is shortened, and the user experience is improved.

Description

Human-computer interaction method, device, terminal and storage medium based on virtual image

Technical Field

The present application relates to the field of human-computer interaction technologies, and in particular, to a human-computer interaction method, apparatus, terminal and storage medium based on an avatar.

Background

In the current human-computer interaction technology, a user interacts with a terminal by triggering a preset operation instruction, so that the requirements of interface switching and the like are met. However, most of the existing human-computer interaction technologies often require a user to frequently touch a screen with a finger to realize interaction, and the interaction mode is single and not natural enough.

Disclosure of Invention

In view of the above problems, the present application provides a human-computer interaction method, apparatus, terminal and storage medium based on an avatar to improve the above problems.

In a first aspect, an embodiment of the present application provides a human-computer interaction method based on an avatar, where the method includes: displaying an interactive interface, wherein the interactive interface comprises an interface to be awakened and an interface to be interacted, which are used for displaying the virtual image; if the interactive interface is the interface to be awakened, acquiring user input information; if the user input information meets a preset awakening condition, switching the interface to be awakened into the interface to be interacted; acquiring an interaction instruction input by a user based on the interface to be interacted; and executing the operation corresponding to the interactive instruction based on the target interactive interface corresponding to the interactive instruction.

Optionally, the executing, based on the target interactive interface corresponding to the interactive instruction, an operation corresponding to the interactive instruction includes: acquiring reply audio information corresponding to the interactive instruction and visual model driving parameters of the virtual image; switching the interface to be interacted into a target interaction interface corresponding to the interaction instruction; and driving the behavior of the virtual image according to the visual model driving parameter based on the target interactive interface, and correspondingly playing the reply audio information aiming at the driven behavior.

Optionally, the obtaining of the reply audio information corresponding to the interactive instruction and the visual model driving parameter of the avatar includes: acquiring reply audio information corresponding to the interactive instruction; generating visual model driving parameters for the avatar based on the reply audio information.

Optionally, the obtaining of the reply audio information corresponding to the interactive instruction and the visual model driving parameter of the avatar includes: searching a first keyword matched with the interactive instruction in a preset database; if the first keyword matched with the interactive instruction cannot be found, acquiring reply audio information corresponding to the interactive instruction; generating visual model driving parameters for the avatar based on the reply audio information.

Optionally, the obtaining of the reply audio information corresponding to the interactive instruction includes: identifying the interactive instruction to acquire corresponding interactive text information; inquiring and acquiring reply text information corresponding to the interactive text information in a question-answer library; and acquiring reply audio information corresponding to the reply text information.

Optionally, the obtaining of the reply audio information corresponding to the interactive instruction further includes: establishing a neural network model based on the question-answer library; the querying and obtaining reply text information corresponding to the interactive text information in the question-answer library includes: and inputting the interactive text information into the neural network model to obtain reply text information corresponding to the interactive text information.

Optionally, the method further comprises: if the first keyword matched with the interactive instruction is found, acquiring reply audio information corresponding to the first keyword and the visual model driving parameter of the virtual image; switching the interface to be interacted into a target interaction interface corresponding to the first keyword; and driving the behavior of the virtual image according to the visual model driving parameter based on the target interactive interface, and correspondingly playing the reply audio information aiming at the driven behavior.

Optionally, if the first keyword matched with the interactive instruction is found, acquiring the reply audio information corresponding to the first keyword and the visual model driving parameter of the avatar, including: and if the first keyword matched with the interactive instruction is found, searching reply audio information corresponding to the first keyword and the visual model driving parameter of the virtual image from the preset database.

Optionally, the method further comprises: acquiring a return instruction based on a target interactive interface corresponding to the first keyword; and switching the target interactive interface corresponding to the first keyword into the interface to be interacted.

Optionally, the executing, based on the target interactive interface corresponding to the interactive instruction, an operation corresponding to the interactive instruction includes: searching a second keyword matched with the interactive instruction in a preset database; if the second keyword matched with the interactive instruction is found, determining a target interactive interface and a video broadcast picture corresponding to the second keyword; and displaying the video broadcast picture at the appointed position of the target interactive interface.

Optionally, the method further comprises: if the interaction instruction input by the user is not acquired based on the interface to be interacted within the preset time period, switching the interface to be interacted into the interface to be awakened.

Optionally, if the user input information meets a preset wake-up condition, switching the interface to be woken up to the interface to be interacted includes: if the user input information contains a preset awakening word, judging that the user input information meets a preset awakening condition; and if the user input information meets a preset awakening condition, switching the interface to be awakened into the interface to be interacted.

In a second aspect, an embodiment of the present application provides an avatar-based human-computer interaction device, where the device includes: the display module is used for displaying an interactive interface, and the interactive interface comprises an interface to be awakened and an interface to be interacted, which are used for displaying an avatar; the awakening module is used for acquiring user input information if the interactive interface is the interface to be awakened; the switching module is used for switching the interface to be awakened into the interface to be interacted if the user input information meets a preset awakening condition; the acquisition module is used for acquiring an interaction instruction input by a user based on the interface to be interacted; and the execution module is used for executing the operation corresponding to the interactive instruction based on the target interactive interface corresponding to the interactive instruction.

Optionally, the execution module includes: parameter module, mutual interface switch module and drive module, wherein: the parameter module is used for acquiring reply audio information corresponding to the interactive instruction and the visual model driving parameters of the virtual image; the interactive interface switching module is used for switching the interface to be interacted into a target interactive interface corresponding to the interactive instruction; and the driving module is used for driving the behavior of the virtual image according to the visual model driving parameter based on the target interactive interface and correspondingly playing the reply audio information aiming at the driven behavior.

Optionally, the parameter module comprises: audio module, visual model parameter module, wherein: the audio module is used for acquiring reply audio information corresponding to the interactive instruction; a visual model parameter module that generates visual model driving parameters of the avatar based on the reply audio information.

Optionally, the parameter module comprises: the device comprises a searching module, an audio acquiring module and a parameter generating module, wherein: the searching module is used for searching a first keyword matched with the interactive instruction in a preset database; the audio acquisition module is used for acquiring reply audio information corresponding to the interactive instruction if the first keyword matched with the interactive instruction cannot be found; a parameter generation module that generates visual model driving parameters of the avatar based on the reply audio information.

Optionally, the parameter module comprises: instruction identification module, reply text audio frequency module, wherein: the instruction identification module is used for identifying the interactive instruction and acquiring corresponding interactive text information; the answer text module is used for inquiring and acquiring answer text information corresponding to the interactive text information in a question-answer library; and the reply text audio module is used for acquiring reply audio information corresponding to the reply text information.

Optionally, the parameter module further includes a network module, configured to establish a neural network model based on the question-answering library; the reply text module also comprises an input module which is used for inputting the interactive text information into the neural network model and acquiring the reply text information corresponding to the interactive text information.

Optionally, the parameter module further includes a first keyword parameter module, a first keyword interface switching module, and a first keyword driving module, where the first keyword parameter module is configured to, if a first keyword matching the interactive instruction is found, obtain reply audio information corresponding to the first keyword and a visual model driving parameter of the avatar; the first keyword interface switching module is used for switching the interface to be interacted into a target interaction interface corresponding to the first keyword; and the first keyword driving module is used for driving the behavior of the virtual image according to the visual model driving parameter based on the target interactive interface and correspondingly playing the reply audio information aiming at the driven behavior.

Optionally, the first keyword parameter module further includes a database searching module, configured to search, if a first keyword matching the interactive instruction is found, reply audio information corresponding to the first keyword and the visual model driving parameter of the avatar from the preset database.

Optionally, the obtaining module further includes a returning module and a returning switching module, wherein the returning module obtains a returning instruction based on the target interactive interface corresponding to the first keyword; and the return switching module is used for switching the target interactive interface corresponding to the first keyword into the interface to be interacted.

Optionally, the execution module further comprises a keyword search module, a second keyword matching module and a video module, wherein the keyword search module is used for searching a second keyword matched with the interactive instruction in a preset database; the second keyword matching module is used for determining a target interactive interface and a video broadcast picture corresponding to the second keyword if the second keyword matched with the interactive instruction is found; and the video module is used for displaying the video broadcast picture at the appointed position of the target interactive interface.

Optionally, the switching module further includes: and the waiting module is used for switching the interface to be interacted into the interface to be awakened if the interaction instruction input by the user is not acquired based on the interface to be interacted within a preset time period.

Optionally, the wake-up module further includes a wake-up determination module and a wake-up switching module, where the wake-up determination module is configured to determine that the user input information meets a preset wake-up condition if the user input information includes a preset wake-up word; and the awakening switching module is used for switching the interface to be awakened into the interface to be interacted if the user input information meets a preset awakening condition.

In a third aspect, an embodiment of the present application provides a terminal, including a memory and a processor, where the memory is coupled to the processor, and the memory stores instructions, and when the instructions are executed by the processor, the processor performs the method of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, in which program code is stored, and the program code can be called by a processor to execute the method according to the first aspect.

According to the man-machine interaction method, the man-machine interaction device, the man-machine interaction terminal and the man-machine interaction medium based on the virtual image in the embodiment of the application, an interaction interface is displayed, wherein the interaction interface comprises an interface to be awakened and an interface to be interacted, the interface to be awakened is used for displaying the virtual image, user input information is acquired if the interaction interface is the interface to be awakened, the interface to be awakened is switched into the interface to be interacted if the user input information meets a preset awakening condition, an interaction instruction input by a user is acquired based on the interface to be interacted, and then operation corresponding to the interaction instruction is executed based on a target interaction interface corresponding. Therefore, the embodiment of the application can interact with the user based on the virtual image, so that the man-machine interaction is more natural, various interaction requirements of the user are met on the terminal, man-machine interaction modes are enriched, and the man-machine interaction experience of the user is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic structural diagram illustrating a network environment provided by an embodiment of the present application;

FIG. 2 is a flowchart illustrating an avatar-based human-machine interaction method according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating an avatar-based human-machine interaction method according to another embodiment of the present application;

FIG. 4 is a flowchart illustrating an avatar-based human-machine interaction method according to another embodiment of the present application;

FIG. 5 is a flowchart illustrating step S350 of FIG. 4 according to an exemplary embodiment of the present application;

FIG. 6 is a flowchart illustrating an avatar-based human-machine interaction method according to yet another embodiment of the present application;

FIG. 7 is a flowchart illustrating an avatar-based human-machine interaction method according to yet another embodiment of the present application;

fig. 8 is a flowchart illustrating step S508 in fig. 7 according to an exemplary embodiment of the present application;

FIG. 9 is a flowchart illustrating an avatar-based human-machine interaction method according to yet another embodiment of the present application;

FIG. 10 is a flowchart illustrating an avatar-based human-machine interaction method according to yet another embodiment of the present application;

FIG. 11 is a diagram illustrating an interface to be woken up of an avatar-based human-computer interaction method according to an exemplary embodiment of the present application;

FIG. 12 is a schematic diagram illustrating an interface to be interacted with in an avatar-based human-computer interaction method according to an exemplary embodiment of the present application;

FIG. 13 is a schematic diagram of an interface to be interacted with by a human-computer interaction method based on an avatar according to another exemplary embodiment of the present application;

FIG. 14 is a diagram illustrating a target interactive interface based on first keywords of an avatar-based human-computer interaction method according to an exemplary embodiment of the present application;

FIG. 15 is a diagram illustrating a target interactive interface based on second keywords of the avatar-based human-computer interaction method according to an exemplary embodiment of the present application;

FIG. 16 is a diagram illustrating a target interactive interface based on first keywords of an avatar-based human-machine interaction method according to another exemplary embodiment of the present application;

FIG. 17 is a block diagram illustrating an avatar-based human-computer interaction apparatus according to an embodiment of the present disclosure;

fig. 18 is a block diagram illustrating a structure of a terminal for performing an avatar-based human-machine interaction method according to an embodiment of the present application;

fig. 19 illustrates a storage unit for storing or carrying program codes for implementing an avatar-based human-machine interaction method according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Based on the interactive interface of the terminal, a user can interact with the terminal by triggering a preset operation instruction, so that the terminal is controlled to execute corresponding operation. However, the current human-computer interaction mode is single, and a user usually needs to click and trigger instructions such as "select" and "return" by fingers to perform interaction, so that the human-computer interaction mode is inconvenient. In addition, the preset operation instructions are often limited, and the requirements of the user cannot be exhausted, so that natural interaction between the terminal and the user based on the existing human-computer interaction mode is difficult to occur at present, the existing human-computer interaction method is lack of intelligence, and the user experience is poor.

Based on the above problems, the inventor provides a human-computer interaction method, device, terminal and storage medium based on an avatar through long-term research, and by displaying an interaction interface, where the interaction interface includes an interface to be awakened and an interface to be interacted, the interaction interface is used for displaying the avatar, if the interaction interface is the interface to be awakened, user input information is obtained, if the user input information meets a preset awakening condition, the interface to be awakened is switched to the interface to be interacted, an interaction instruction input by a user is obtained based on the interface to be interacted, and then an operation corresponding to the interaction instruction is executed based on a target interaction interface corresponding to the interaction instruction. Therefore, the embodiment of the application can interact with the user based on the virtual image, so that the man-machine interaction is more natural, the interaction mode between the user and the terminal is enriched, and the diversified interaction requirements of the user can be met. Compared with the common interface without the virtual image, the distance between the interface displaying the virtual image with human-like action and language function and the user can be shortened by interacting with the user, so that the user feels kiss, and the human-computer interaction experience of the user is improved.

In order to better understand the method, the apparatus, the terminal, and the storage medium for human-computer interaction based on an avatar provided in the embodiments of the present application, an application environment suitable for the embodiments of the present application is described below.

Referring to fig. 1, fig. 1 is a schematic diagram illustrating an application environment suitable for the embodiment of the present application. The human-computer interaction method based on the avatar provided by the embodiment of the application can be applied to the interaction system 10 shown in fig. 1. The interactive system 10 includes a terminal 100 and a server 200.

The terminal 100 may be a smart phone, a tablet computer, a laptop portable computer, a desktop computer, a wearable electronic device, a multimedia display screen, or other electronic devices that are deployed with an avatar-based human-computer interaction apparatus, and the device type of the terminal 100 is not limited in this embodiment of the application.

The server 200 and the terminal 100 are connected through a wireless or wired network, so as to realize data transmission between the terminal 100 and the server 200 based on the network connection, wherein the transmitted data includes but is not limited to audio, video, text, images and the like.

The server 200 may be a conventional server, a cloud server, a server cluster including a plurality of servers, or even a server center including a plurality of servers. The server 200 may be used to provide a background service for the user, and the background service may include, but is not limited to, an avatar-based human-computer interaction service, and the like.

In some embodiments, a client application may be installed on the terminal 100, and a user may communicate with the server 200 based on the client application (e.g., APP, wechat applet, etc.). Specifically, the terminal 100 may obtain input information of the user, and based on a client application on the terminal 100 communicating with the server 200, the server 200 may process the received input information of the user, and the server 200 may further return corresponding output information to the terminal 100 according to the information, and the terminal 100 may perform an operation corresponding to the output information. The input information of the user may be voice information, touch operation information based on a screen, gesture information, action information, and the like, and the output information may be visual model driving parameters of an image, a video, a text, an audio, an avatar, and the like, which is not limited herein.

Wherein the client application provides a human-computer interaction service based on the avatar, and the interaction service may be different based on a difference in scene requirements. In some embodiments, the client application may be used to provide product presentation information or service guidance to the user, for example, the client application may be used in a user service in a public area such as a mall, a bank, an exhibition hall, and the like, and in particular, the client application may receive interaction information input by the user and respond to the interaction information based on the virtual pictographs.

In some embodiments, the device for processing the user input information may also be disposed on the terminal 100, so that the terminal 100 can realize the interaction with the user without relying on establishing communication with the server 200, and realize the human-computer interaction based on the avatar, and in this case, the interactive system 10 may only include the terminal 100.

The above application environments are only examples for facilitating understanding, and it is to be understood that the embodiments of the present application are not limited to the above application environments.

The following describes in detail a human-computer interaction method, an apparatus, a terminal and a storage medium based on an avatar provided by the embodiments of the present application with specific embodiments.

Referring to fig. 2, fig. 2 is a flowchart illustrating a human-computer interaction method based on an avatar according to an embodiment of the present application, which can be applied to the terminal, specifically, in the embodiment, the method may include:

step S110: and displaying the interactive interface.

The interactive interface is an interface which is displayed on the terminal and can interact with a user, a virtual image can be displayed on the interactive interface, and the interface can be an interface to be awakened or an interface to be interacted. The virtual image is displayed on an interactive interface by taking a User Interface (UI) as a carrier, the interface to be awakened can be an interactive interface waiting for information input by a user, and the interface to be interacted can be an interactive interface which is interacted with the user in real time to realize corresponding functions.

In some embodiments, the avatar may be constructed to simulate a real human avatar. Specifically, as one way, the feature points corresponding to the real person may be input into the neural network model to obtain the avatar corresponding to the real person. Wherein the feature points may comprise a combination of one or more of the following: the more feature points are adopted, the higher the similarity with a real person is.

Alternatively, the avatar may be constructed based on cartoon characters, animation images, etc. of non-real characters, or may be an image that is personalized based on user preferences. In addition, the avatar may be a two-dimensional avatar or may be a three-dimensional avatar. By constructing various virtual images, the interactive experience of the user can be enriched.

Step S120: and if the interactive interface is the interface to be awakened, acquiring user input information.

And if the interactive interface displayed by the terminal is the interface to be awakened, acquiring user input information. The interface to be awakened can display the virtual image or can be a preset interface which does not contain the virtual image. Optionally, the interface to be awakened may display instructions about human-computer interaction, for example, in a bank use scenario, the interface to be awakened may display "ask what help you need

You can try to ask me how to handle deposit

'". In this way, the user may be guided in the interaction.

In one embodiment, the terminal may display the interface to be woken up when the terminal is in a standby state.

In another embodiment, when the terminal does not acquire the interactive instruction input by the user based on the time length of the interface to be interacted exceeding the preset waiting time, the interface to be interacted can be switched to the interface to be awakened, so that excessive power consumption caused by long-term waiting response is avoided, and the power consumption of the terminal can be reduced.

In another embodiment, the terminal may further obtain an instruction used by the user to end the interaction based on the specified interactive interface, and then switch the interface displayed in the terminal to the interface to be woken up. The specified interactive interface may be an interface to be interacted, or may be another interactive interface, which is not limited in this embodiment.

In this embodiment, based on the interface to be woken up, the terminal may obtain user input information input by the user through multiple interaction modes. Optionally, the terminal may acquire voice information input by the user through an audio acquisition device, such as a microphone array, or may acquire motion information, gesture information, a user image, and the like input by the user through a camera and an infrared detection camera, or may acquire touch operation of the user through a screen-based sensor, and the like, so as to acquire corresponding user input information. Therefore, man-machine interaction can be carried out through multiple modes, so that the interaction mode is more flexible and natural, and the interaction requirements of users are met.

Step S130: and if the user input information meets the preset awakening condition, switching the interface to be awakened into the interface to be interacted.

Wherein, the preset wake-up condition may include: voice wake-up conditions, motion wake-up conditions, gesture wake-up conditions, facial recognition wake-up conditions, touch wake-up conditions, and the like. In this embodiment, the user may input information in a plurality of interactive modes, and accordingly, the mode satisfying the preset wake-up condition may also be a plurality of modes. In addition, different awakening conditions can be preset according to different use scenes, and different awakening conditions can be set according to different virtual images, so that human-computer interaction modes are enriched.

As one way, the preset wake-up condition may be a voice wake-up condition. In some embodiments, the terminal may acquire the voice information input by the user through an audio acquisition device, such as a microphone array, and determine that the voice information input by the user satisfies a preset wake-up condition if the voice information input by the user includes a preset wake-up word. In other embodiments, if the user input information does not include the preset wake-up word but is related to the wake-up word after semantic analysis, it is also determined that the user input information satisfies the preset wake-up condition. For example, the preset wake-up word is "hello", if the user says "hello" when interacting, the voice information input by the user contains the preset wake-up word, and it is determined that the information input by the user meets the preset wake-up condition; if the user input information is a voice command 'good morning', the meaning of 'good morning' and 'hello' is found to be similar after semantic analysis, and then the user input information is judged to meet the preset awakening condition.

Alternatively, the preset wake-up condition may be a face recognition wake-up condition, for example, the preset wake-up condition may be image data that recognizes that the user is within a preset distance. Specifically, if the terminal collects an image of a user through the camera and the infrared detection camera, the image is judged to be a complete facial image within a preset distance through image analysis, and the eyeball of the user looks directly at the screen, it can be judged that the input information of the user meets a preset awakening condition, and the interface to be awakened is switched to the interface to be interacted. By the method, the terminal can actively interact with the user when the user does not input voice, touch and the like, so that the interactive operation of the user is simplified, and the interactive experience of the user is improved.

As another way, the preset wake-up condition may be a touch wake-up condition. For example, when the terminal acquires the touch operation of the user through the screen-based sensor, it is determined that the user input information meets a preset wake-up condition, and the interface to be woken up is switched to the interface to be interacted. Of course, the wake-up condition is not limited to the above manner, and this embodiment does not limit this.

Step S140: and acquiring an interaction instruction input by a user based on the interface to be interacted.

The interface to be interacted can be an interaction interface displayed by the terminal when the information input by the user meets the preset awakening condition, and is used for waiting for the user to input an interaction instruction, and at the moment, the virtual image on the interface to be interacted is in an awakening state.

In some embodiments, based on the interface to be interacted, the terminal may detect an interaction instruction input by the user based on different input modes, where the interaction instruction input by the user may be information of multiple modalities such as voice, vision, touch, and the like, for example, the terminal may collect voice, image, and the like input by the user, detect a touch operation of the user, and the like. The receiving device for obtaining the multi-modal interaction instruction is installed or configured on the terminal, and may include an audio collecting device such as a microphone for collecting voice input by the user, an image collecting device such as a camera for collecting an image, a touch screen for detecting a touch operation, and the like, which is not limited in this embodiment.

In some embodimentsDifferent wake-up conditions can correspond to different wake-up states, that is, different contents can be displayed on the interface to be interacted. For example, if the user input information meets a preset voice wake-up condition or a touch wake-up condition, the interface to be interacted can display the user input information; for another example, if the user input information satisfies the face recognition wake-up condition, a preset prompt can be displayed in the interface to be interacted, and the reason that the user can wake up the virtual image when the user does not input a voice instruction or a touch instruction can be explained to the user through the prompt, so that the user is prevented from feeling obtrusive due to the switching of the interface to be awakened, and the interaction experience of the user is improved. In an example, when a camera of the terminal captures a facial image of a user and a preset facial recognition wake-up condition is met, a display of "how you are detected by the camera to look at the screen and ask what can help you can be displayed in the interface to be interacted

And then guiding the user to carry out human-computer interaction.

In some embodiments, the interface to be interacted may further display some preset prompting words or buttons of the interaction instruction to prompt the user about the interaction instruction that can be input and the interaction mode that can be selected. In one example, the terminal is applied to a bank scene, buttons such as a bank introduction, a deposit and withdrawal service, a bank card handling and the like can be displayed on the interface to be interacted, and a requirement that "you can click a button on a screen or speak you in voice is displayed. "is used herein. By the method, the learning cost of the user for man-machine interaction by using the terminal can be reduced.

In other embodiments, the terminal may further detect a time length for waiting for the user to input the interactive instruction based on the interface to be interacted, and switch the interface to be interacted to the interface to be woken up if the interactive instruction input by the user cannot be acquired within a preset waiting time length. Optionally, the preset waiting time is 5 seconds. Through the method, on one hand, the computing resources and the terminal power consumption required by staying at the interface to be interacted to wait for the user to input the interactive instruction can be reduced, and on the other hand, the situation that the interface stays at the interface to be interacted for a long time after the previous user leaves and is inconvenient for the next user to use can be avoided, so that the user experience is improved.

Step S150: and executing the operation corresponding to the interactive instruction based on the target interactive interface corresponding to the interactive instruction.

The target interactive interface can be a plurality of preset interactive interfaces, the terminal can pre-store the mapping relation between the interactive interface and the interactive instruction, so that the corresponding interactive interface can be determined as the target interactive interface based on the interactive instruction, and the terminal can switch the interface to be interacted to the target interactive interface corresponding to the interactive instruction based on the interactive instruction input by the user, execute the operation corresponding to the interactive instruction, interact with the user in real time and realize the operation corresponding to the interactive instruction. For example, if the interactive instruction input by the user is "self-introduction", the terminal may switch the interface to be interacted into the interactive interface corresponding to the "self-introduction", and perform self-introduction based on the interactive interface and the avatar therein.

The human-computer interaction method based on the virtual image can perform human-computer interaction with a user based on the virtual image. The user can switch the interface to be awakened into the interface to be interacted through various awakening modes, and the operation corresponding to the interaction instruction is executed based on the target interaction interface. Therefore, the interaction between people is simulated, natural human-computer interaction is realized, diversified interaction modes of users can be supported, and the experience of human-computer interaction is greatly improved.

Referring to fig. 3, fig. 3 illustrates an avatar-based human-computer interaction method according to another embodiment of the present application, which may be applied to the terminal, and the method may include:

s210: and displaying the interactive interface.

S220: and if the interactive interface is the interface to be awakened, acquiring user input information.

S230: and if the user input information meets the preset awakening condition, switching the interface to be awakened into the interface to be interacted.

S240: and acquiring an interaction instruction input by a user based on the interface to be interacted.

S250: and acquiring the reply audio information corresponding to the interactive instruction and the visual model driving parameters of the virtual image.

The reply audio information is audio information of a response obtained according to an interactive instruction input by a user, and the visual model driving parameter is parameter data for driving the avatar to make an expression or an action. The reply audio information may be generated in advance and stored in a database of the terminal or the server, or may be generated in real time according to an interactive instruction input by the user. Similarly, the visual model parameters of the avatar may be generated in advance and stored in a database of the terminal or the server, or may be generated in real time according to the interactive instruction input by the user. In some embodiments, the frequently used reply audio information and visual model driving parameters may be stored in a database of the terminal, and the terminal may directly obtain the reply audio information and visual model driving parameters locally without depending on a network environment, without considering time consumed by communication, thereby improving the real-time performance of interaction and further optimizing the use experience of the user.

The visual model driving parameters of the virtual image can comprise expression driving parameters, posture driving parameters and the like. Taking the expression driving parameters as an example, the expression of the avatar may be driven by the expression driving parameters, including but not limited to mouth shape and other facial movements. By simulating the real person speaking, the virtual image can realize the correspondence of various facial actions including the mouth shape and the voice, and can realize the effect of the virtual image that the facial actions and the voice are as natural as the real person. The gesture driving mode is approximately similar to the expression driving principle, and the virtual image can be driven to make abundant limb actions. Through the visual model driving parameters of the virtual image, when a user carries out human-computer interaction with the virtual image, the virtual image has expressions and actions like conversation with a real person, so that more natural human-computer interaction is realized.

In addition, if the avatar is a two-dimensional image, the visual model driving parameters of the avatar are the driving parameters corresponding to the two-dimensional image, and if the avatar is a three-dimensional image, the visual model driving parameters are the driving parameters corresponding to the three-dimensional image. The specific content and form of the visual model driving parameters of the avatar are not strictly limited.

As an embodiment, the visual model driving parameters of the avatar may include BlendShape model variation data by which facial expression motions of the avatar may be driven and skeleton variation data by which limb motions of the avatar may be driven. The virtual image can be a three-dimensional stereo image or a two-dimensional plane image, and the visual model driving parameters of the virtual image can only comprise BlendShape model change data or only comprise skeleton change data.

In some embodiments, the visual model driving parameters of the avatar may be preset, and the preset multiple expression driving parameters, posture driving parameters, and the like of the avatar may be directly obtained from a database of the terminal or the server. In other embodiments, the visual model driving parameters of the avatar may be generated by a machine learning model.

S260: and switching the interface to be interacted into a target interaction interface corresponding to the interaction instruction.

S270: and driving the behavior of the virtual image according to the visual model driving parameters based on the target interactive interface, and correspondingly playing the reply audio information aiming at the driven behavior.

And driving the behavior of the virtual image according to the driving parameters of the visual model to generate an image or a video of the virtual image, wherein the driving parameters of the visual model can be used for driving the virtual image to perform facial expression actions, limb actions and the like. In one embodiment, the visual model driving parameters may include driving parameters for driving facial expression motions of the avatar, and the terminal may drive only the facial expression motions of the avatar according to the visual model driving parameters; in another embodiment, the visual model driving parameters may only include driving parameters for driving the limb motions of the avatar, and the terminal may only drive the limb motions of the avatar according to the visual model driving parameters; in still another embodiment, the visual model driving parameters may include driving parameters for driving a facial expression motion and a limb motion of the avatar, and the terminal may drive the facial expression motion and the limb motion of the avatar according to the visual model driving parameters.

Based on the target interactive interface, the visual model driving parameters can be played on a display screen of the terminal or other image display devices connected with the terminal to drive the behavior of the virtual image to produce the image or video of the virtual image. The corresponding reply audio information may also be played through the terminal's speaker or other audio output device connected thereto. As a mode, the corresponding reply audio information can be played, and simultaneously, the dialog information is displayed in a dialog box form on the target interactive interface to facilitate the user to look up, so that the interactive mode is more flexible, and the user experience is further improved.

The reply audio information is correspondingly played according to the driven behavior, so that the action of the virtual image and the played reply audio information can be accurately matched, the virtual image is closer to the behavior of human beings during conversation, and the user experience is effectively improved.

It should be noted that, for parts not described in detail in this embodiment, reference may be made to the foregoing embodiments, and details are not described herein again.

The human-computer interaction method based on the virtual image, provided by the embodiment, can acquire the interaction instruction of the user, generate the reply audio information and the visual model driving parameters of the virtual image based on the interaction instruction, drive the expression action or the limb action of the virtual image, and correspondingly play the reply audio information according to the driven action, so that the human-computer interaction mode based on the virtual image is more vivid and natural, and the human-computer interaction experience of the user is improved.

Referring to fig. 4, fig. 4 illustrates an avatar-based human-computer interaction method according to another embodiment of the present application, which may be applied to the terminal, and the method may include:

s310: and displaying the interactive interface.

S320: and if the interactive interface is the interface to be awakened, acquiring user input information.

S330: and if the user input information meets the preset awakening condition, switching the interface to be awakened into the interface to be interacted.

S340: and acquiring an interaction instruction input by a user based on the interface to be interacted.

S350: acquiring reply audio information corresponding to the interactive instruction;

the corresponding reply audio information may have different intonation and speech rate according to different interactive instructions. For example, if the response text corresponding to some interactive instructions is positive, the rhythm of the corresponding response audio information will be fast, and the intonation will be high, so as to express the cheerful emotion; if the response text corresponding to some interactive instructions is relatively negative, the rhythm of the corresponding response audio information is relatively slow, and the tone is relatively low, so that the low emotion is expressed.

In some embodiments, different avatars may generate different reply audio information. For example, when the user inputs the interactive instruction "who you are

If the avatar is a cartoon character, the corresponding reply audio information of the interactive instruction may be based on a lively and lovely tone of voice "i will not tell you that my name is a little one", and if the avatar is a anchor character, the corresponding reply audio information may be based on a calmly tone of voice "you good, my name is a little one". By the method, the acquired reply audio information can be more fit with the current virtual image, and the pleasure of human-computer interaction experience is increased.

In some embodiments, the answer audio information corresponding to the interactive instruction may be obtained through a question-answering library, so that there is a richer answer manner for the interactive instruction of the user, in this case, step S350 may include step S351 and step S253, please refer to fig. 5, fig. 5 shows a flowchart of step S350 in fig. 4 provided in an exemplary embodiment of the present application, and step S350 may include:

s351: and identifying the interactive instruction to acquire corresponding interactive text information.

And aiming at different types of the interactive instructions, corresponding interactive text information can be acquired in different modes. As a mode, if the interactive instruction input by the user is a voice instruction, the voice instruction can be recognized based on a voice recognition model to obtain corresponding interactive text information; if the interactive instruction input by the user is a touch instruction, for example, the user directly clicks an instruction button on an interactive interface, the instruction input by the user can be acquired based on the screen sensor and is recognized as interactive text information; if the interactive instruction input by the user is a gesture instruction, the interactive text information corresponding to the gesture instruction can be acquired based on the gesture recognition model.

S352: and inquiring and acquiring reply text information corresponding to the interactive text information in a question-answer library.

In one embodiment, the terminal or the server may be provided with a question-answer library storing a mapping relationship between the interactive text information and the reply text information, and the reply text information corresponding to the interactive text information may be queried and obtained in the question-answer library. As an embodiment, the interactive text information and the reply text information in the question-and-answer library may be in a one-to-one correspondence. As another mode, the interactive text information and the reply text information in the question-answer library are not in one-to-one correspondence, and possible reply text information can be found in the question-answer library by performing semantic analysis on the interactive text information, and the optimal reply text information is found by a sorting mechanism.

As an implementation manner, different interactive text messages may be sorted according to the frequency of occurrence in the process of interacting with the user, and N interactive text messages with the highest frequency of occurrence and corresponding reply text messages are stored in a question-answering library of the terminal, where N is an integer greater than 0. In this way, the reply text information can be directly obtained from the terminal to interact with the user under the condition of no network connection.

As another embodiment, if the answer text information corresponding to the interactive text information cannot be queried and obtained in the question-answer library, that is, in the case of failure of the answer, a sentence like "no answer, i.e., i do not know your meaning very much" or "no answer, i do not know how to answer" may be used as the answer text information. In some embodiments, the interactive text information corresponding to the case of response failure may be recorded, and the question-answer library may be updated for the case.

In another embodiment, the neural network model may also be established based on a question-answer library, and specifically, the neural network model may be obtained by training using the interactive text information and the corresponding reply text information as training samples, using the interactive text information as input, and using the reply text information corresponding to the interactive text information as expected output. And acquiring reply text information corresponding to the interactive text information by inputting the interactive text information into the neural network model. By using the mode, the reply text information is not limited to the answers pre-stored in the question-answer library, but can be freely chatted with the user in real time based on the virtual image, and the human-computer interaction experience of the user is greatly improved.

S353: reply audio information corresponding to the reply text information is acquired.

The terminal may convert the reply text information into the reply audio information through a speech synthesis technique, and as one way, may set some characteristics of the reply audio information, such as tone, pitch, and speech speed, based on different avatars. Therefore, the reply audio information can be closer to the selected virtual image, and the use experience of the user is further improved.

In some embodiments, a database storing mapping relationships between reply text information and reply audio information of different avatars may be provided in the terminal or the server, and the reply audio information corresponding to the reply text information may be directly obtained through the database. In other embodiments, the terminal may also generate reply audio information corresponding to the reply text information in real time.

S360: visual model driving parameters of the avatar are generated based on the reply audio information.

Generating visual model driving parameters of the avatar based on the reply audio information may include generating expression driving parameters, pose driving parameters, etc. of the avatar based on the reply audio information. Specifically, the action corresponding to the visual model driving parameter may correspond to the reply audio information, for example, when the reply audio information is "predicted wait time is half an hour, apology gives you long, etc.", the visual model driving parameter corresponding to the avatar may be generated, which may drive the avatar to make an expression similar to the apology produced when the true person apologizes.

In some embodiments, some characteristics of the visual model driving parameters may be set based on different avatars, for example, if the current avatar is a cartoon character, the output visual model driving parameters will fit the current cartoon character, and the magnitude of the motion or expression corresponding to the visual model driving parameters is larger, so that the motion or expression of the cartoon character is more exaggerated than that of other types of avatars.

In some embodiments, the visual model driving parameters of the avatar may be generated by inputting the reply audio information to a visual prediction model trained based on a machine learning algorithm using as input the training sample reply audio information and as output the visual model driving parameters corresponding to the training sample reply audio information.

In other embodiments, the visual model driving parameters of the avatar corresponding to the response audio information may also be preset. For example, when the reply audio message is "thank you," a visual model driving parameter corresponding to the reply audio message "thank you," which is pre-stored in the terminal or the server, is obtained, and the parameter can drive the avatar to perform a bow action.

S370: and switching the interface to be interacted into a target interaction interface corresponding to the interaction instruction.

S380: and driving the behavior of the virtual image according to the visual model driving parameters based on the target interactive interface, and correspondingly playing the reply audio information aiming at the driven behavior.

By the human-computer interaction method based on the virtual image, the visual model driving parameters of the virtual image can be generated based on the reply audio information, so that the reply audio of the virtual image and the visual effect of the virtual image have higher consistency, the virtual image is more agile and natural in the process of interacting with the user, and the user experience is effectively improved.

Referring to fig. 6, fig. 6 illustrates an avatar-based human-computer interaction method according to still another embodiment of the present application, which may be applied to the terminal, and the method may include:

s401: and displaying the interactive interface.

S402: and if the interactive interface is the interface to be awakened, acquiring user input information.

S403: and if the user input information meets the preset awakening condition, switching the interface to be awakened into the interface to be interacted.

S404: and acquiring an interaction instruction input by a user based on the interface to be interacted.

S405: and searching a first keyword matched with the interactive instruction in a preset database.

The preset database may be a database stored in the terminal or the server, and the preset first keyword may be stored in the database. In some embodiments, different first keywords may be set according to different usage scenarios. For example, the first keyword may be "self-introduction", "who you are", etc., and is not limited herein.

S406: and if the first keyword matched with the interactive instruction cannot be found, acquiring the reply audio information corresponding to the interactive instruction.

In one embodiment, the interactive command may include the keyword, that is, the interactive command is matched with the first keyword precisely. If the first keyword is "self-introduction", when the user inputs a "please introduce one's own self-introduction" command including the first keyword, for example, the user inputs the "please introduce one's own self-introduction" command by voice, or the user clicks a "self-introduction" button on the interface to be interacted, the first keyword can be precisely matched in the preset database.

In another embodiment, the interactive instruction may not include the first keyword, and the terminal may perform semantic recognition on the interactive instruction at this time, and if a semantic recognition result obtained through the semantic recognition is related to the first keyword, that is, the interactive instruction is in fuzzy matching with the first keyword, the terminal may also find the first keyword in fuzzy matching with the interactive instruction at this time. If the first keyword is self-introduction, when the user inputs an interactive instruction of who you are, the interactive instruction input by the user does not contain the first keyword self-introduction, but is judged to be related to the first keyword self-introduction after semantic recognition, and the terminal can be matched with the first keyword self-introduction in a fuzzy mode in a preset database.

And if the first keyword matched with the interactive instruction cannot be found, acquiring reply audio information corresponding to the interactive instruction. In some embodiments, the reply audio message may be a preset reply where the first keyword matching the interactive instruction cannot be found, such as "sorry, i don't understand your meaning too much".

In other embodiments, the answer audio information corresponding to the interactive instruction may be obtained through the question-answer library, so that a richer answer mode is provided for the interactive instruction of the user. In one example, the question-answering library stores the corresponding reply audio information of the interactive instruction "hello", i am small, i can help you do something

When the user inputs a "hello" interactive instruction, the first keyword "self introduction" matching the interactive instruction cannot be found, and then the reply audio information "hello, i is one little, i can help you do something" corresponding to the interactive instruction can be obtained.

In still other embodiments, the reply audio information may also be based on a question-answer library to establish a neural network model, and specific implementation thereof can be found in the foregoing embodiments and will not be described herein. By the method, the man-machine interaction method based on the virtual image is not limited to preset keywords any more, richer response modes are provided for the interaction instructions of the user, the capability of free chat can be realized, and the man-machine interaction experience of the user is improved.

S407: generating visual model driving parameters for the avatar based on the reply audio information.

S408: and switching the interface to be interacted into a target interaction interface corresponding to the interaction instruction.

S409: and driving the behavior of the virtual image according to the visual model driving parameters based on the target interactive interface, and correspondingly playing the reply audio information aiming at the driven behavior.

By the human-computer interaction method based on the virtual image, when the first keyword matched with the interaction instruction cannot be found in the preset library, the reply audio information corresponding to the interaction instruction and the visual model driving parameter of the virtual image can be obtained.

In some embodiments, if a first keyword matched with the interactive instruction is found in the preset database, the reply audio information corresponding to the first keyword and the visual model driving parameter of the avatar are obtained, the behavior of the avatar is driven according to the visual model driving parameter on the target interactive interface corresponding to the first keyword, and the reply audio information is played corresponding to the driven behavior, so that when a user inputs the interactive instruction matched with the first keyword, the avatar can be driven to reply to the first keyword, and the function corresponding to the first keyword is realized. Referring to fig. 7, fig. 7 is a flowchart illustrating an avatar-based human-computer interaction method according to still another embodiment of the present application, where the method includes: step S501 to step S512.

S501: and displaying the interactive interface.

S502: and if the interactive interface is the interface to be awakened, acquiring user input information.

S503: and if the user input information meets the preset awakening condition, switching the interface to be awakened into the interface to be interacted.

S504: and acquiring an interaction instruction input by a user based on the interface to be interacted.

S505: and searching a first keyword matched with the interactive instruction in a preset database.

S506: and judging whether the first keyword matched with the interactive instruction is found.

In this embodiment, after determining whether the first keyword matched with the interactive instruction is found, the method may further include: if the first keyword matched with the interactive instruction is found, step S507 can be executed; if the first keyword matched with the interactive instruction cannot be found, step S509 may be executed.

S507: and acquiring the reply audio information corresponding to the first keyword and the visual model driving parameter of the virtual image.

And if the first keyword matched with the interactive instruction is found, acquiring reply audio information corresponding to the first keyword and the visual model driving parameter of the virtual image.

As one way, the reply audio information corresponding to the first keyword and the visual model driving parameter of the avatar may be stored in advance in a database of the terminal or a database of the server.

Alternatively, the reply audio information corresponding to the first keyword and the visual model driving parameters of the avatar may be generated in real time based on the interactive instructions input by the user.

In some embodiments, the first keyword may be a "self-introduction", and the preset database may store the reply audio information corresponding to the first keyword "self-introduction" and the visual model driving parameters of the avatar. Alternatively, the visual model driving parameters of the avatar may be generated based on the reply audio information.

S508: and switching the interface to be interacted into a target interaction interface corresponding to the first keyword.

According to different terminal use scenes, the same first keyword can correspond to different target interactive interfaces, namely the target interactive interfaces can be determined by the terminal use scenes and the first keyword together. In an example, when the first keyword is "self-introduction", the target interactive interface corresponding to the first keyword "self-introduction" may be determined according to a terminal usage scenario, for example, when the terminal usage scenario is a bank, the target interactive interface corresponding to the "self-introduction" may be a succinct interface with a design of the bank identifier; when the terminal use scene is a kindergarten, the target interaction interface corresponding to the self introduction can be an interface with characters corresponding to pinyin, so that children who do not know the characters can understand the characters on the interface through the pinyin.

In some embodiments, the terminal may obtain a return instruction based on the target interactive interface corresponding to the first keyword, and at this time, step S508 may include step S5081 and step S5082. Specifically, referring to fig. 8, fig. 8 is a schematic flowchart illustrating step S508 in fig. 7 according to an exemplary embodiment of the present application.

S5081: and acquiring a return instruction based on the target interactive interface corresponding to the first keyword.

The return instruction may be an instruction instructing the terminal to switch the target interactive interface corresponding to the first keyword to the interface to be interacted. And based on the target interactive interface corresponding to the first keyword, a return instruction input by the user can be obtained. The terminal can acquire the return instruction by acquiring voice and user images input by the user and detecting touch operation, for example, when acquiring the user images and acquiring the return instruction, the terminal can match the acquired user images with preset images and judge whether the user images contain preset actions, preset gestures and the like, and if so, the terminal can acquire the return instruction. The preset image can be an image containing a preset action and a preset gesture, can be stored in the terminal in advance, and is associated with the return instruction, so that the return instruction is obtained when the user image matched with the preset image is acquired.

In an implementation manner, a text prompt or a button corresponding to the return instruction may be set on the target interactive interface corresponding to the first keyword, so as to prompt a user to trigger the return instruction by operating or clicking the button according to the text prompt, and switch the current interactive interface into the interface to be interacted.

S5082: and switching the target interactive interface corresponding to the first keyword into an interface to be interacted.

And if a return instruction input by the user is obtained based on the target interactive interface corresponding to the first keyword, switching the target interactive interface corresponding to the first keyword into a to-be-interacted interface, and waiting for the user to input an interactive instruction based on the to-be-interacted interface.

S509: and acquiring the reply audio information corresponding to the interactive instruction.

S510: visual model driving parameters of the avatar are generated based on the reply audio information.

S511: and switching the interface to be interacted into a target interaction interface corresponding to the interaction instruction.

S512: and driving the behavior of the virtual image according to the visual model driving parameters based on the target interactive interface, and correspondingly playing the reply audio information aiming at the driven behavior.

By the human-computer interaction method based on the virtual image, when the first keyword matched with the interaction instruction is found, the corresponding function can be realized based on the target interaction interface corresponding to the first keyword, and if the first keyword matched with the interaction instruction cannot be found, the user can freely chat based on the virtual image. By the method, even if the interactive instruction input by the user cannot be matched with the first keyword, the corresponding reply audio information can be obtained according to the interactive instruction so as to drive the virtual image and play the corresponding reply audio information, so that the interactive instruction input by the user can be accurately responded, natural human-computer interaction is realized, language organization input by the user is not limited, and human-computer interaction experience is further improved.

In some embodiments, the terminal may further search a preset database for a keyword matched with the interactive instruction input by the user, and if a second keyword matched with the interactive instruction is found, the terminal may perform video broadcast based on a target interactive interface corresponding to the second keyword, so that a video broadcast function may be implemented based on the target interactive interface corresponding to the second keyword. Specifically, referring to fig. 9, fig. 9 is a schematic flowchart illustrating a method based on human-computer interaction according to yet another embodiment of the present application, where the method includes the following steps:

s601: and displaying the interactive interface.

S602: and if the interactive interface is the interface to be awakened, acquiring user input information.

S603: and if the user input information meets the preset awakening condition, switching the interface to be awakened into the interface to be interacted.

S604: and acquiring an interaction instruction input by a user based on the interface to be interacted.

S605: and searching a second keyword matched with the interactive instruction in a preset database.

In this embodiment, the second keyword in the preset database is used to trigger the video broadcast function, for example, "play video", "video broadcast", and the like. However, in different usage scenarios, different second keywords may be preset. For example, in a usage scenario of a kindergarten, the second keyword in the preset database may be "animation" or "video".

S606: and if the second keyword matched with the interactive instruction is found, determining a target interactive interface and a video broadcast picture corresponding to the second keyword.

In some embodiments, the video broadcast picture corresponding to the second keyword may be a picture of a video to be broadcast, and the video may be stored locally in the terminal in advance, or may be transmitted to the terminal from the server in real time. As an implementation manner, the video broadcast pictures may be multiple, and each video broadcast picture corresponds to one video to be broadcast. According to different terminal use scenes, the video broadcast picture can contain or not contain an avatar, for example, the terminal use scene is an exhibition, and the video broadcast picture can be a picture of a company introduction, a product advertisement and other videos which do not contain the avatar.

In some embodiments, if the video broadcast picture includes an avatar, the video broadcast picture corresponding to the second keyword may be generated in real time by using a visual model driving parameter of the avatar, and the visual model driving parameter may be pre-stored in the terminal or may be stored in the server. Further, the broadcast video corresponding to the second keyword may also only include an audio file, but not include a video frame, and the video broadcast interface of the broadcast video is empty, for example, an all-white image or an all-black image may be provided, and the audio file may be stored in the terminal or the server. In this way, the storage space required for storing the video data can be reduced.

S607: and displaying the video broadcast picture at the appointed position of the target interactive interface.

In some embodiments, the video presentation screen may be displayed full-screen on the target interactive interface; in other embodiments, the video presentation screen may be displayed at a designated location of the target interactive interface, for example, a window of the video presentation screen is displayed at a central location of the interactive interface.

In some embodiments, when the video broadcast picture is displayed on the display screen of the terminal or other image display devices connected to the display screen, the audio information corresponding to the video broadcast picture may also be played through a speaker of the terminal or other audio output devices connected to the display screen.

In other embodiments, the target interactive interface may also display the video broadcast picture with subtitles without playing the audio data, thereby reducing the impact on other people.

By the human-computer interaction method based on the virtual image, when the second keyword matched with the interaction instruction is found, the function of video broadcasting can be realized based on the target interaction interface corresponding to the second keyword. The terminal can be applied to scenes such as markets, banks, hospitals and the like, video playing such as advertisements is carried out on the basis of the target interaction interface corresponding to the second keyword, the terminal can also be applied to scenes such as kindergartens and schools, and video playing related to teaching and entertainment is carried out on the basis of the target interaction interface corresponding to the second keyword.

In addition, in some embodiments, the human-computer interaction method based on the avatar may search a preset database for a keyword matched with the interaction instruction, if the first keyword can be matched, the self-introduction function is implemented based on the target interaction interface corresponding to the first keyword, if the second keyword can be matched, the video broadcasting function is implemented based on the target interaction interface corresponding to the second keyword, and if the keyword cannot be matched, the target interaction interface corresponding to the free chat function may be entered. Specifically, referring to fig. 10, fig. 10 is a flowchart illustrating a human-computer interaction method based on an avatar according to yet another embodiment of the present application, where the method may include:

s701: and displaying the interactive interface.

In some embodiments, a silent state avatar may be displayed on the interface to be woken up, and in particular, the silent state avatar may be an avatar representing a preset waiting action, and in another area of the interface to be woken up, a reminder and a button of a wake-up instruction may be displayed. As shown in FIG. 11, an exemplary embodiment provides a diagram of the interface to be woken up, in which "you are you! Speaking the ' little one ' voice command can also wake up me's prompt, and a ' wake up little one ' word-like button and the like are displayed to guide the user to interact with the avatar.

S702: and if the interactive interface is the interface to be awakened, acquiring user input information.

S703: and if the user input information meets the preset awakening condition, switching the interface to be awakened into the interface to be interacted.

It should be noted that the situation that the user input information satisfies the preset wake-up condition is substantially similar to the foregoing situation, and is not described herein again.

In an example, referring to fig. 12, fig. 12 shows an interface to be interacted provided by an exemplary embodiment, an avatar is displayed on the interface to be interacted, a welcome word "hello to experience a little at a function bar" is displayed at the right side of the avatar, and buttons of interaction functions such as "self-introduction", "play video case", and the like are displayed, so as to prompt a user of available interaction functions.

In another example, referring to fig. 13, fig. 13 shows an interface to be interacted with provided by another exemplary embodiment, the interface to be interacted with comprises an avatar 1310, a button list 1320, and a text bubble 1330. When the user inputs a command meeting a preset voice wakeup condition or touch wakeup condition, such as a voice command "small one and small one", the buttons with interactive functions such as "self introduction", "video playing case", and the like are displayed in the button list 1320 of the interface to be interacted as shown in fig. 13, and a welcome word "small one is happy to serve you" is displayed above the button list 1320. In addition, on the interface to be interacted shown in fig. 13, a wake-up command "one smaller, one smaller" input by the user is also displayed on the text bubble 1330 as a response to the wake-up command input by the user, thereby enhancing a sense of interaction with the user.

S704: and acquiring an interaction instruction input by a user based on the interface to be interacted.

S705: and judging whether the interactive instruction input by the user is acquired within a preset time period.

In this embodiment, after determining whether the interaction instruction input by the user is acquired within the preset time period, the method may further include: if the interaction instruction input by the user is not acquired based on the interface to be interacted within the preset time period, the step S701 may be executed; if the interaction instruction input by the user is acquired based on the interface to be interacted within the preset time period, step S706 may be executed.

S706: and searching keywords matched with the interactive instruction in a preset database.

Under different use scenes, different keywords can be preset. In this embodiment, the first keyword in the preset database may be "self-introduction", and the second keyword may be "playing video".

S707: and judging whether the first keyword matched with the interactive instruction is found.

After judging whether the first keyword matched with the interactive instruction is found, the method may further include: if the first keyword matched with the interactive instruction is found, step S712 may be executed; if the first keyword matching the interactive instruction is not found, step S708 may be executed.

S708: and judging whether the second keyword matched with the interactive instruction is found.

After judging whether the second keyword matched with the interactive instruction is found, the method may further include: if the second keyword matched with the interactive instruction is found, step S715 may be executed; if the second keyword matched with the interactive instruction is not found, step S709 may be executed.

S709: and acquiring the reply audio information corresponding to the interactive instruction.

S710: visual model driving parameters of the avatar are generated based on the reply audio information.

S711: and switching the interface to be interacted into a target interaction interface corresponding to the interaction instruction.

After the interface to be interacted is switched to the target interaction interface corresponding to the interaction instruction, step S714 may be performed.

S712: and acquiring the reply audio information corresponding to the first keyword and the visual model driving parameter of the virtual image.

S713: and switching the interface to be interacted into a target interaction interface corresponding to the first keyword.

In the first keyword-based target interactive interface in one embodiment shown in fig. 14, a display frame 1430 for displaying text and images or videos corresponding to "self-introduction" is displayed in a prompt box 1420 on the right side of an avatar 1410, a value button 1441, a technique button 1442, and an application button 1443 are provided in a button list 1440 on the upper side of the prompt box 1420, and an end introduction button 1450 is provided below the prompt box 1420. After the interface to be interacted is switched to the interactive interface corresponding to the first keyword, step S714 may be executed.

S714: and driving the behavior of the virtual image according to the visual model driving parameters based on the target interactive interface, and correspondingly playing the reply audio information aiming at the driven behavior.

If the first keyword matched with the interactive instruction is found, the target interactive interface is an interactive interface corresponding to the first keyword, namely an interactive interface corresponding to the self introduction. In the target interactive interface diagram based on the first keyword as shown in fig. 14, in the interactive interface, the self-introduction contents corresponding to each button are sequentially played in the prompt box 1420 according to the order of the buttons in the button list 1440 above the prompt box 1420, and after the self-introduction contents corresponding to the last "application" button 1443 are played, the contents corresponding to the "value" button 1441 are played again. Based on the interactive instruction of the user obtained by the terminal, the self-introduced content played in the prompt box 1420 may be switched to the self-introduced content corresponding to the interactive instruction, where the input instruction may be a voice instruction of the user or a touch instruction of clicking a button in the button list 1440. Optionally, while the self-introduction content is played in the prompt box 1420, the avatar 1410 on the interactive interface may behave accordingly based on the visual model driving parameters, e.g., the avatar's facial expression and mouth shape may be matched with the audio of the currently played self-introduction content.

In some embodiments, an "end introduction" button 1450 may be further disposed below the prompt box, and when the user clicks the "end introduction" button 1450 or inputs an instruction related to the "end introduction", the terminal may switch the target interactive interface corresponding to the first keyword to the interface to be interacted.

If the first keyword matched with the interactive instruction is not found and the second keyword matched with the interactive instruction is not found, the target interactive interface is an interactive interface corresponding to free chat, behaviors such as expressions or action postures of the virtual image can be driven according to visual model parameters of the virtual image generated based on the reply audio information based on the interactive interface, and the reply audio information is played correspondingly according to the driven behaviors, so that the function of free chat with the user is realized.

S715: and determining a target interactive interface and a video broadcast picture corresponding to the second keyword.

S716: and displaying a video broadcast picture at the designated position of the target interactive interface.

In one embodiment, a target interactive interface corresponding to the second keyword is shown in fig. 15, and a currently played video broadcast screen 1510, a progress bar 1520, a video list 1530 and a return button 1540 are displayed in the interactive interface. The user can adjust the current playing progress by sliding the progress bar 1520 in the currently playing video broadcast picture 1510, or can switch the currently playing video broadcast picture 1510 to the picture of the selected video by clicking windows of other videos in the video list 1530, or can switch the interactive interface to the interface to be interacted by clicking the return button 1540. Besides the click command, the user can also input voice commands such as "pause", "play", "switch video", "return", and the like to realize corresponding functions.

S717: and judging whether the user inputs an ending instruction or not.

The ending instruction may be an instruction instructing the terminal to switch the currently displayed target interactive interface into the interface to be interacted. Based on the target interactive interface, the terminal can obtain an ending instruction input by the user. The terminal can acquire the ending instruction by acquiring voice and user images input by the user and detecting touch operation, for example, when the ending instruction is acquired by acquiring the user images, the acquired user images can be matched with the preset images, whether the user images contain preset actions, preset gestures and the like is judged, and if the user images contain the preset actions, the ending instruction can be acquired. The preset image can be an image containing a preset action and a preset gesture, can be stored in the terminal in advance, and is associated with the ending instruction, so that the ending instruction is obtained when the user image matched with the preset image is acquired.

In this embodiment, after determining whether the user inputs the end instruction, the method may further include: if the user inputs an end instruction, switching the target interactive interface to the interface to be interacted, executing the step S704, and acquiring an interactive instruction input by the user based on the interface to be interacted; and if the user does not input an ending instruction, keeping the current target interactive interface. Optionally, in other embodiments, the terminal may further start timing from the beginning of displaying the target interactive interface based on the timing of the current target interactive interface, and if the interactive instruction input by the user cannot be acquired based on that the target interactive interface exceeds the preset waiting time, step S701 may be executed to switch the current interface to the interface to be wakened for displaying the avatar.

In some embodiments, a shortcut button for switching the interactive interface and ending the instruction may be further disposed on the target interactive interface. In another embodiment, the target interaction interface corresponding to the first keyword is shown in FIG. 16, and FIG. 16 is substantially the same as the target interaction interface corresponding to the first keyword shown in FIG. 14, except that the interface shown in FIG. 16 further includes a shortcut button list 1660. The shortcut button list 1660 is provided with a "main interface" button 1661, a "video play" button 1662, and a "self-introduction" button 1663 from top to bottom, and a user can click the button or input a command by voice to implement a corresponding function. For example, if the user clicks the "main interface" button 1661 in the interactive interface shown in fig. 16, the current target interactive interface will be switched to the interface to be interacted with, and similarly, if the user inputs the voice instruction "video playing", the current target interactive interface will be switched to the target interactive interface corresponding to the second keyword "video playing". Through the mode that sets up swift button on interactive interface, can swiftly switch interactive interface more, avoid user's loaded down with trivial details operation to promote user's use experience.

It should be noted that, for parts not described in detail in this embodiment, reference may be made to the foregoing embodiments, and details are not described herein again. The human-computer interaction method based on the virtual image can realize different functions according to different results of the keywords matched with the interaction instruction searched in the preset database, and can effectively shorten the use path when the system menu hierarchy is complex through the design of the shortcut button, so that the user can conveniently carry out human-computer interaction, and the human-computer interaction experience of the user is improved.

It should be understood that the foregoing examples are merely illustrative of the application of the method provided in the embodiments of the present application in a specific scenario, and do not limit the embodiments of the present application. The method provided by the embodiment of the application can also be used for realizing more different applications.

Referring to fig. 17, fig. 17 is a block diagram illustrating a structure of an avatar-based human-computer interaction device 1700 according to an embodiment of the present application. As will be explained below with respect to the block diagram shown in fig. 17, the avatar-based human-machine interaction apparatus 1700 includes: display module 1710, awaken module 1720, switch module 1730, obtain module 1740, execute module 1750, wherein:

a display module 1710, configured to display an interactive interface, where the interactive interface includes an interface to be awakened and an interface to be interacted, and the interface is used to display an avatar;

a wake-up module 1720, configured to obtain user input information if the interactive interface is the interface to be woken up;

a switching module 1730, configured to switch the interface to be woken to the interface to be interacted if the user input information meets a preset wake-up condition;

an obtaining module 1740, configured to obtain an interaction instruction input by a user based on the interface to be interacted;

and the execution module 1750 is configured to execute an operation corresponding to the interactive instruction based on the target interactive interface corresponding to the interactive instruction.

Further, the execution module 1750 comprises: parameter module, mutual interface switch module and drive module, wherein:

the parameter module is used for acquiring reply audio information corresponding to the interactive instruction and the visual model driving parameters of the virtual image;

the interactive interface switching module is used for switching the interface to be interacted into a target interactive interface corresponding to the interactive instruction;

and the driving module is used for driving the behavior of the virtual image according to the visual model driving parameter based on the target interactive interface and correspondingly playing the reply audio information aiming at the driven behavior.

Further, the parameter module includes: audio module, visual model parameter module, wherein:

the audio module is used for acquiring reply audio information corresponding to the interactive instruction;

a visual model parameter module that generates visual model driving parameters of the avatar based on the reply audio information.

Further, the parameter module includes: the device comprises a searching module, an audio acquiring module and a parameter generating module, wherein:

the searching module is used for searching a first keyword matched with the interactive instruction in a preset database;

the audio acquisition module is used for acquiring reply audio information corresponding to the interactive instruction if the first keyword matched with the interactive instruction cannot be found;

a parameter generation module that generates visual model driving parameters of the avatar based on the reply audio information.

Further, the parameter module includes: instruction identification module, reply text audio frequency module, wherein:

the instruction identification module is used for identifying the interactive instruction and acquiring corresponding interactive text information;

the answer text module is used for inquiring and acquiring answer text information corresponding to the interactive text information in a question-answer library;

and the reply text audio module is used for acquiring reply audio information corresponding to the reply text information.

Further, the parameter module further comprises a network module, wherein:

the network module is used for establishing a neural network model based on the question-answer library; the reply text module also comprises an input module which is used for inputting the interactive text information into the neural network model and acquiring the reply text information corresponding to the interactive text information.

Further, the parameter module further comprises a first keyword parameter module, a first keyword interface switching module and a first keyword driving module, wherein:

the first keyword parameter module is used for acquiring reply audio information corresponding to the first keyword and the visual model driving parameter of the virtual image if the first keyword matched with the interactive instruction is found;

the first keyword interface switching module is used for switching the interface to be interacted into a target interaction interface corresponding to the first keyword;

and the first keyword driving module is used for driving the behavior of the virtual image according to the visual model driving parameter based on the target interactive interface and correspondingly playing the reply audio information aiming at the driven behavior.

Further, the first keyword parameter module further comprises a database searching module, wherein:

and the database searching module is used for searching the reply audio information corresponding to the first keyword and the visual model driving parameters of the virtual image from the preset database if the first keyword matched with the interactive instruction is searched.

Further, the obtaining module 1740 further includes a return module and a return switching module, where:

the return module is used for acquiring a return instruction based on the target interactive interface corresponding to the first keyword;

and the return switching module is used for switching the target interactive interface corresponding to the first keyword into the interface to be interacted.

Further, the execution module 1750 further includes a keyword search module, a second keyword matching module, and a video module, wherein:

the keyword searching module is used for searching a second keyword matched with the interactive instruction in a preset database;

the second keyword matching module is used for determining a target interactive interface and a video broadcast picture corresponding to the second keyword if the second keyword matched with the interactive instruction is found;

and the video module is used for displaying the video broadcast picture at the appointed position of the target interactive interface.

Further, the switching module 1730 further includes: a wait module, wherein:

and the waiting module is used for switching the interface to be interacted into the interface to be awakened if the interaction instruction input by the user is not acquired based on the interface to be interacted within a preset time period.

Further, the wake-up module 1720 further includes a wake-up determination module and a wake-up switching module, wherein:

the awakening judging module is used for judging that the user input information meets a preset awakening condition if the user input information contains a preset awakening word;

and the awakening switching module is used for switching the interface to be awakened into the interface to be interacted if the user input information meets a preset awakening condition.

The human-computer interaction device based on the avatar provided by the embodiment of the application is used for realizing the corresponding human-computer interaction method based on the avatar in the embodiment of the method, has the beneficial effects of the corresponding method embodiment, and is not repeated herein.

It can be clearly understood by those skilled in the art that the human-computer interaction device based on the avatar provided in the embodiment of the present application can implement each process in the foregoing method embodiment, and for convenience and brevity of description, the specific working processes of the foregoing description device and module may refer to the corresponding processes in the foregoing method embodiment, which are not described herein again.

In the embodiments provided in the present application, the coupling or direct coupling or communication connection between the modules shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or modules may be in an electrical, mechanical or other form.

In addition, each functional module in the embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

Referring to fig. 18, a block diagram of a terminal according to an embodiment of the present application is shown. The terminal 1800 may be a terminal capable of running an application, such as a smart phone, a tablet computer, an electronic book, or the like. The terminal 1800 in the present application may include one or more of the following components: a processor 1810, memory 1820, and one or more applications, wherein the one or more applications may be stored in the memory 1820 and configured to be executed by the one or more processors 1810, the one or more programs configured to perform a method as described in the aforementioned method embodiments.

Processor 1810 may include one or more processing cores. The processor 1810, using various interfaces and lines to connect various parts throughout the terminal 1800, performs various functions of the terminal 1800 and processes data by executing or executing instructions, programs, code sets or instruction sets stored in the memory 1820 and invoking data stored in the memory 1820. Alternatively, the processor 1810 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 1810 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is to be understood that the modem may not be integrated into the processor 1810, but may be implemented by a communication chip.

The Memory 1820 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory 1820 may be used to store instructions, programs, code sets, or instruction sets. The memory 1820 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (e.g., a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The stored data area may also store data created during use by the terminal 1800 (e.g., phonebook, audiovisual data, chat log data), and the like.

Referring to fig. 19, a block diagram of a computer-readable storage medium according to an embodiment of the present disclosure is shown. The computer-readable storage medium 1900 stores program code that can be called by a processor to perform the methods described in the above-described method embodiments.

The computer-readable storage medium 1900 may be an electronic memory such as a flash memory, an electrically-erasable programmable read-only memory (EEPROM), an erasable programmable read-only memory (EPROM), a hard disk, or a ROM. Optionally, the computer-readable storage medium 1900 includes a non-volatile computer-readable medium (non-transitory-readable storage medium). The computer readable storage medium 1400 has storage space for program code 1910 that performs any of the method steps described above. The program code can be read from or written to one or more computer program products. The program code 1910 may be compressed, for example, in a suitable form.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A human-computer interaction method based on an avatar is characterized by comprising the following steps:

displaying an interactive interface, wherein the interactive interface comprises an interface to be awakened and an interface to be interacted, which are used for displaying the virtual image;

if the interactive interface is the interface to be awakened, acquiring user input information;

if the user input information meets a preset awakening condition, switching the interface to be awakened into the interface to be interacted;

acquiring an interaction instruction input by a user based on the interface to be interacted;

and executing the operation corresponding to the interactive instruction based on the target interactive interface corresponding to the interactive instruction.

2. The method according to claim 1, wherein the executing the operation corresponding to the interactive instruction based on the target interactive interface corresponding to the interactive instruction comprises:

acquiring reply audio information corresponding to the interactive instruction and visual model driving parameters of the virtual image;

switching the interface to be interacted into a target interaction interface corresponding to the interaction instruction;

and driving the behavior of the virtual image according to the visual model driving parameter based on the target interactive interface, and correspondingly playing the reply audio information aiming at the driven behavior.

3. The method according to claim 2, wherein said obtaining the reply audio information corresponding to the interactive instruction and the visual model driving parameters of the avatar comprises:

acquiring reply audio information corresponding to the interactive instruction;

generating visual model driving parameters for the avatar based on the reply audio information.

4. The method according to claim 2 or 3, wherein the obtaining of the reply audio information corresponding to the interactive instruction and the visual model driving parameters of the avatar comprises:

searching a first keyword matched with the interactive instruction in a preset database;

if the first keyword matched with the interactive instruction cannot be found, acquiring reply audio information corresponding to the interactive instruction;

5. The method according to claim 4, wherein the obtaining of the reply audio information corresponding to the interactive instruction comprises:

identifying the interactive instruction to acquire corresponding interactive text information;

inquiring and acquiring reply text information corresponding to the interactive text information in a question-answer library;

and acquiring reply audio information corresponding to the reply text information.

6. The method according to claim 5, wherein said obtaining reply audio information corresponding to the interactive instruction further comprises:

establishing a neural network model based on the question-answer library;

the querying and obtaining reply text information corresponding to the interactive text information in the question-answer library includes:

and inputting the interactive text information into the neural network model to obtain reply text information corresponding to the interactive text information.

7. The method according to claim 4, wherein after searching the preset database for the first keyword matching the interactive instruction, the method further comprises:

if the first keyword matched with the interactive instruction is found, acquiring reply audio information corresponding to the first keyword and the visual model driving parameter of the virtual image;

switching the interface to be interacted into a target interaction interface corresponding to the first keyword;

8. The method according to claim 7, wherein if a first keyword matching the interactive instruction is found, acquiring the reply audio information corresponding to the first keyword and the visual model driving parameters of the avatar, comprises:

and if the first keyword matched with the interactive instruction is found, searching reply audio information corresponding to the first keyword and the visual model driving parameter of the virtual image from the preset database.

9. The method of claim 7, further comprising:

acquiring a return instruction based on a target interactive interface corresponding to the first keyword;

and switching the target interactive interface corresponding to the first keyword into the interface to be interacted.

10. The method according to claim 1, wherein the executing the operation corresponding to the interactive instruction based on the target interactive interface corresponding to the interactive instruction comprises:

searching a second keyword matched with the interactive instruction in a preset database;

if the second keyword matched with the interactive instruction is found, determining a target interactive interface and a video broadcast picture corresponding to the second keyword;

and displaying the video broadcast picture at the appointed position of the target interactive interface.

11. The method of claim 1, further comprising:

if the interaction instruction input by the user is not acquired based on the interface to be interacted within the preset time period, switching the interface to be interacted into the interface to be awakened.

12. The method according to claim 1, wherein if the user input information satisfies a preset wake-up condition, switching the interface to be woken up to the interface to be interacted comprises:

if the user input information contains a preset awakening word, judging that the user input information meets a preset awakening condition;

and if the user input information meets a preset awakening condition, switching the interface to be awakened into the interface to be interacted.

13. An avatar-based human-computer interaction device, said device comprising:

the display module is used for displaying an interactive interface, and the interactive interface comprises an interface to be awakened and an interface to be interacted, which are used for displaying an avatar;

the awakening module is used for acquiring user input information if the interactive interface is the interface to be awakened;

the switching module is used for switching the interface to be awakened into the interface to be interacted if the user input information meets a preset awakening condition;

the acquisition module is used for acquiring an interaction instruction input by a user based on the interface to be interacted;

and the execution module is used for executing the operation corresponding to the interactive instruction based on the target interactive interface corresponding to the interactive instruction.

14. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of any of claims 1-12.

15. A computer-readable storage medium having program code stored therein, the program code being invoked by a processor to perform the method of any of claims 1-12.