CN114461775A

CN114461775A - Man-machine interaction method and device, electronic equipment and storage medium

Info

Publication number: CN114461775A
Application number: CN202210122168.4A
Authority: CN
Inventors: 张林箭; 邹北琪; 王佳安; 蔡泽锐; 张聪; 汪硕芃; 宋有伟; 范长杰; 胡志鹏
Original assignee: Netease Hangzhou Network Co Ltd
Current assignee: Netease Hangzhou Network Co Ltd
Priority date: 2022-02-09
Filing date: 2022-02-09
Publication date: 2022-05-10

Abstract

The present application relates to the field of natural language processing technologies, and in particular, to a human-computer interaction method and apparatus, an electronic device, and a storage medium, which are used for improving a human-computer interaction effect. The main technical scheme comprises: responding to input information of a user, and determining the intention and emotion category of the user according to the input information; determining a friendship between the user and an intelligent virtual character; determining a target event to be executed by the intelligent virtual role according to the intention of the user and the friendliness; determining language information to be replied by the intelligent virtual character aiming at input information according to a target event to be executed by the intelligent virtual character; and controlling the intelligent virtual role to execute the target event and outputting language information.

Description

Man-machine interaction method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of natural language technologies, and in particular, to a human-computer interaction method and apparatus, an electronic device, and a storage medium.

Background

With the development of artificial intelligence and Natural Language Processing (NLP), more and more related technologies are applied to human-computer interaction, so that the dialog between a user and an intelligent virtual character is important. For example, in a game including a human-machine interaction system, the intelligent virtual character outputs different language information for input information of the user.

At present, a user can select words to be spoken to an intelligent virtual character from fixed options, and the intelligent virtual character gives certain language feedback; or the communication between the intelligent virtual role and the intelligent virtual role is interrupted at any time by freely inputting information, the own viewpoint is inserted into the conversation, and the intelligent virtual role gives corresponding language information feedback. However, the above human-computer interaction all adopts a rule-based processing mode, feedback obtained by a user from an intelligent virtual role is not required by the user in many times, and the human-computer interaction lacks certain diversity and degree of freedom, so that the human-computer interaction effect is poor.

Disclosure of Invention

In view of this, the present application provides a human-computer interaction method, an apparatus, an electronic device and a storage medium, which are used to improve human-computer interaction effect.

In a first aspect, an embodiment of the present application provides a human-computer interaction method, where the method includes:

responding to input information of a user, and determining the intention and emotion category of the user according to the input information;

determining a friendship between the user and an intelligent virtual character;

determining a target event to be executed by the intelligent virtual role according to the intention and the friendliness of the user;

determining language information to be replied by the intelligent virtual character aiming at the input information according to a target event to be executed by the intelligent virtual character;

and controlling the intelligent virtual role to execute the target event and outputting the language information.

In an optional embodiment, the method further comprises:

responding to input information of a user, and acquiring historical dialogue information between the user and the intelligent virtual role;

the determining, according to the target event to be executed by the intelligent virtual character, language information to be replied by the intelligent virtual character for the input information includes:

and determining language information to be replied by the intelligent virtual character aiming at the input information according to the target event to be executed by the intelligent virtual character and the historical dialogue information.

In an optional embodiment, the target event includes a first target event and a second target event, where the first target event is an event to be executed by the intelligent virtual character for the user, and the second target event is an event to be composed of an action sequence presented by the intelligent virtual character.

In an optional embodiment, the determining, according to the intention of the user and the friendliness, a target event to be executed by the intelligent virtual character includes:

determining a target virtual article and a target action according to the intention and the friendliness of the user;

and determining a first target event to be executed by the intelligent virtual role for the user according to the target virtual article and the target action.

In an optional embodiment, before determining the target virtual item and the target action according to the user's intention and the friendliness, the method further comprises:

and determining that the friendliness degree is greater than a preset threshold value.

In an optional embodiment, the method further comprises:

obtaining target friendliness according to the emotion category and the friendliness;

and updating the friendliness degree according to the target friendliness degree to obtain the updated friendliness degree.

In an optional embodiment, the determining the user's intention from the input information includes:

determining the user's intent by an unsupervised intent recognition model and/or a small sample intent recognition model for the input information.

In an alternative embodiment, determining the user's intent via an unsupervised intent recognition model for the input information includes:

splicing the input information and a plurality of preset intentions through a preset template to obtain a plurality of intention sentences;

determining relevance scores or confusion scores corresponding to the intention sentences respectively through an unsupervised intention recognition model;

determining the preset intention corresponding to the intention sentence with the relevance score exceeding a first numerical value as the intention of the user; or determining the preset intention corresponding to the intention sentence with the perplexity lower than a second numerical value as the intention of the user.

In an optional embodiment, the determining the user's intention by a small sample intention recognition model for the input information includes:

inputting the input information into a small sample intention recognition model to obtain intention probability values corresponding to a plurality of intentions and a plurality of intentions respectively; the small sample intention recognition model is obtained by training according to the information sample and the corresponding preset intention label;

and determining the intention corresponding to the intention probability value higher than the third numerical value as the intention of the user.

In an optional embodiment, the determining the user's intention for the input information through an unsupervised intention recognition model and a small sample intention recognition model comprises:

aiming at the input information, obtaining a plurality of intentions and correlation scores or confusion scores corresponding to the intentions through an unsupervised intention recognition model;

aiming at the input information, obtaining a plurality of intentions and intention probability values corresponding to the intentions through a small sample intention recognition model;

for any target intention which is the same between a plurality of intentions obtained by the unsupervised intention recognition model and a plurality of intentions obtained by a small sample intention recognition model, carrying out weighted calculation on a relevance score or a confusion score corresponding to the target intention and the intention probability value corresponding to the target intention to obtain a weighted score;

and determining the intention of the user by the intention of the weighted score exceeding a fourth numerical value.

In an optional embodiment, the method further comprises:

acquiring a character relationship between the user and the intelligent virtual character;

and determining language information to be replied by the intelligent virtual character aiming at the input information according to the target event to be executed by the intelligent virtual character and the character relation.

In a second aspect, an embodiment of the present application further provides a human-computer interaction device, where the device includes:

the determining module is used for responding to input information of a user and determining the intention and emotion category of the user according to the input information;

the determining module is further configured to determine the friendliness between the user and the intelligent virtual role;

the determining module is further configured to determine a target event to be executed by the intelligent virtual character according to the intention of the user and the friendliness;

the determining module is further configured to determine, according to a target event to be executed by the intelligent virtual character, language information to be replied by the intelligent virtual character for the input information;

and the output module is used for controlling the intelligent virtual role to execute the target event and outputting the language information.

In an optional embodiment, the apparatus further comprises: an acquisition module;

the acquisition module is used for responding to input information of a user and acquiring historical dialogue information between the user and the intelligent virtual role;

the determining module is specifically configured to determine, according to the target event to be executed by the intelligent virtual character and the historical dialogue information, language information to be replied by the intelligent virtual character for the input information.

In an optional embodiment, the determining module is specifically configured to:

In an optional embodiment, the determining module is further configured to determine that the friendliness is greater than a preset threshold.

In an optional embodiment, the obtaining module is further configured to:

In an alternative embodiment, the determination module is specifically configured to determine the user's intention for the input information through an unsupervised intention recognition model and/or a small sample intention recognition model.

In an optional embodiment, the obtaining module is further configured to obtain a character relationship between the user and the intelligent virtual character;

the determining module is specifically configured to determine, according to the target event to be executed by the intelligent virtual character and the character relationship, language information to be replied by the intelligent virtual character for the input information.

In a third aspect, an embodiment of the present application further provides an electronic device, including: the electronic device comprises a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, the processor and the memory are communicated through the bus when the electronic device runs, and the machine-readable instructions are executed by the processor to execute the steps of the human-computer interaction method in the first aspect.

In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the human-computer interaction method in the first aspect are performed.

According to the man-machine interaction method, the man-machine interaction device, the electronic equipment and the storage medium, input information of a user is responded, and the intention and the emotion type of the user are determined according to the input information; then determining the friendliness between the user and the intelligent virtual role; determining a target event to be executed by the intelligent virtual role according to the intention and the friendliness of the user; determining language information to be replied by the intelligent virtual character aiming at the input information according to a target event to be executed by the intelligent virtual character; and finally, controlling the intelligent virtual role to execute the target event and outputting language information. Compared with the current conversation process based on pure rules or the conversation between the user and the intelligent virtual character realized around semantic similarity, the method and the system identify the intention and the friendliness of the user from the information input by the user at will, control the intelligent virtual character to execute the target event based on the intention and the friendliness of the user, and output the corresponding language information, thereby improving the diversity and the reasonability of the feedback information of the intelligent virtual character in the human-computer interaction process and further improving the human-computer interaction effect.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 shows a schematic structural diagram of a server provided in an embodiment of the present application;

fig. 2 is a schematic structural diagram of an electronic device provided in an embodiment of the present application;

FIG. 3 is a flowchart illustrating a human-computer interaction method according to an embodiment of the present disclosure;

FIG. 4 is a flow chart illustrating a method for determining a target event according to an embodiment of the present disclosure;

fig. 5 shows a block diagram of a human-computer interaction device according to an embodiment of the present application.

Detailed Description

The terms "first," "second," and "third," etc. in the description and claims of this application and the above-described drawings are used for distinguishing between different objects and not for limiting a particular order.

In the embodiments of the present application, words such as "exemplary" or "for example" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present relevant concepts in a concrete fashion for ease of understanding.

In the description of the present application, "/" indicates a relationship in which the objects linked before and after are "or", for example, a/B may indicate a or B; in the present application, "and/or" is only an association relationship describing an association object, and means that there may be three relationships, for example, a and/or B, and may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. Also, in the description of the present application, "a plurality" means two or more than two unless otherwise specified. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple.

In the embodiments of the present application, at least one may also be described as one or more, and a plurality may be two, three, four or more, which is not limited in the present application.

With the development of artificial intelligence and natural language processing, more and more related technologies are applied to human-computer interaction, and the conversation between a user and an intelligent virtual character is particularly important. At present, a user can select words to be spoken to an intelligent virtual character from fixed options, and the intelligent virtual character gives certain language feedback; or the communication between the intelligent virtual character and the intelligent virtual character is interrupted at any time by freely inputting information, the own viewpoint is inserted into the conversation, and then the intelligent virtual character gives corresponding language information feedback. However, many times, the feedback obtained by the user from the intelligent virtual role is not required by the user, so that the human-computer interaction lacks certain diversity and degree of freedom, and further the human-computer interaction effect is poor.

For example, in a game program including a human-machine interaction system, the intelligent virtual character outputs different language information for input information of a user. In a man-machine conversation process realized around semantic similarity, input information of a user is analyzed through preset actions and interactive objects in a scene, and corresponding matching is carried out on the input information and the actions, so that the intelligent virtual character outputs corresponding language information. However, in the conversation process based on pure rules, the user needs to manually select input information among several options, and the intelligent virtual character makes a corresponding preset reply according to the input information selected by the user. Wherein, it is necessary to make a complete dialog tree and flow in advance and set a corresponding reply. For example, the user input information is "Are you not happy? ", the intelligent virtual character makes the corresponding preset language information" You're saving I'm.. decompressed? ". Although there is a certain possibility that the same sentence is not returned, the replies are preset.

In summary, in the existing game-based human-computer interaction method, feedback obtained by a user from an intelligent virtual character is not required by the user at many times, so that a game program can be developed towards a preset specific intention direction, and the human-computer interaction effect is poor.

In view of this, the present invention provides a human-computer interaction method, which has the following basic principles: responding to input information of a user, and determining the intention and emotion category of the user according to the input information; then determining the friendliness degree between the user and the intelligent virtual role; determining a target event to be executed by the intelligent virtual role according to the intention and the friendliness of the user; determining language information to be replied by the intelligent virtual character aiming at the input information according to a target event to be executed by the intelligent virtual character; and finally, controlling the intelligent virtual role to execute the target event and outputting language information. Compared with the current conversation process based on pure rules or the conversation between the user and the intelligent virtual character realized around semantic similarity, the method and the system identify the intention and the friendliness of the user from the input information of the user, control the intelligent virtual character to execute the target event based on the intention and the friendliness of the user, and output the corresponding language information, thereby improving the diversity and the rationality of the feedback information of the intelligent virtual character in the human-computer interaction process and further improving the human-computer interaction effect.

The scheme provided by the embodiment of the application can be applied to the server shown in fig. 1. As shown in fig. 1, the server may include at least one processor 11, memory 12, communication interface 13, and communication bus 14.

The processor 11 is used for responding to input information sent by a user through the electronic equipment and determining the intention and the emotion category of the user according to the input information;

the processor 11 is used for establishing communication with the memory 12 through the communication bus 14, acquiring historical friendliness degrees of the users and the intelligent virtual corners stored in the memory 12, and determining the friendliness degrees of the users and the intelligent virtual corners at the current moment according to the historical friendliness degrees and the emotion categories; determining a target event to be executed by the intelligent virtual role according to the intention of the user and the friendliness; determining language information to be replied by the intelligent virtual character aiming at input information according to a target event to be executed by the intelligent virtual character;

and the processor 11 is used for controlling the intelligent virtual character in the electronic equipment to execute the target event and outputting the language information.

The following describes each component of the server in detail with reference to fig. 1:

the processor 11 is a control center of the server, and may be a single processor or a collective term for a plurality of processing elements. For example, the processor 11 is a Central Processing Unit (CPU), and may also be an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present application, such as: one or more microprocessors (digital signal processors, DSPs), or one or more Field Programmable Gate Arrays (FPGAs).

The processor 11 may perform various functions of the server as a network controller or a network device by running or executing software programs stored in the memory 12 and calling data stored in the memory 12.

In particular implementations, processor 11 may include one or more CPUs, such as CPU0 and CPU1 shown in FIG. 1, for example, as one embodiment.

In one embodiment, the server may include a plurality of processors, and each of the processors may be a single-Core Processor (CPU) or a multi-Core Processor (CPU). A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).

The memory 12 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory 12 may be self-contained and coupled to the processor 11 via a communication bus 14. The memory 12 may also be integrated with the processor 11.

The memory 12 is used for storing software programs for executing the scheme of the application, and is controlled by the processor 11 to execute.

The communication interface 13 is any device, such as a transceiver, for communicating with other devices or communication networks, such as ethernet, Radio Access Network (RAN), Wireless Local Area Networks (WLAN), and so on. The communication interface 13 may include a receiving unit implementing a receiving function and a transmitting unit implementing a transmitting function.

The communication bus 14 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 1, but it is not intended that there be only one bus or one type of bus.

The device architecture shown in fig. 1 does not constitute a limitation on the server, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The solution provided by the embodiment of the present application can also be applied to the electronic device shown in fig. 2, and the electronic device may include at least one processor 21, a memory 22, a display 23, and a transceiver 24.

The electronic device can be operated on a mobile phone, a tablet computer, a notebook computer, a ultra-mobile personal computer (UMPC), an all-in-one machine, an intelligent robot, a netbook, a Personal Digital Assistant (PDA), and other electronic devices. As an embodiment, the electronic device of the present application may operate on a smartphone. The following describes each component of the electronic device in detail with reference to fig. 2:

the processor 21 is a control center of the electronic device, and may be a single processor or a collective term for multiple processing elements. For example, the processor 21 is a CPU, and may also be an ASIC, or one or more integrated circuits configured to implement embodiments of the present application, such as: one or more DSPs, or one or more FPGAs. The processor 21 may perform, among other things, various functions of the electronic device by running or executing software programs stored in the memory 22 and invoking data stored in the memory 22.

In particular implementations, processor 21 may include one or more CPUs such as CPU0 and CPU1 shown in fig. 2 as one example.

In one embodiment, the electronic device may include a plurality of processors, and each of the processors may be a single-CPU processor or a multi-CPU processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).

Memory 22 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory 22 may be a separate entity and is connected to the processor 21 by a communication bus. The memory 22 may also be integrated with the processor 21. The memory 22 is used for storing software programs for executing the scheme of the application, and is controlled by the processor 21 to execute.

The display 23 may be used to display a target event of the execution of the intelligent virtual character and output language information. The display 23 may include a display panel, and the display panel may be configured in the form of a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), or the like.

The transceiver 24, using any transceiver or like device, is used to communicate with other devices or communication networks, such as ethernet, RAN, WLAN, etc. The transceiver 24 may include a receiving unit to implement the receiving function and a transmitting unit to implement the transmitting function.

The electronic device structure shown in fig. 2 does not constitute a limitation of the electronic device and may include more or fewer components than those shown, or some of the components may be combined, or a different arrangement of components. Although not shown, the electronic device may further include a battery, a camera, a bluetooth module, a Global Positioning System (GPS) module, and the like, which is not described herein again.

As shown in fig. 3, the embodiment of the present application provides a human-computer interaction method, which may be applied to the above server or electronic device, and the embodiment may be applied to the fields of games and services (such as a banking machine conversation service, a flight inquiry conversation service, and a network transaction). The man-machine interaction method provided by the application can comprise the following steps:

s301, responding to input information of the user, and determining the intention and emotion type of the user according to the input information.

In this embodiment, the input information of the user may be text information, language information, or the like, which is not specifically limited in this embodiment. The user can input the information which the user wants to express in the man-machine interaction flow, namely the input information is not limited by the preset rule; information formulated based on the rule may also be input in the man-machine interaction dialog process, which is not specifically limited in this embodiment.

For example, the game Player may input information through a text box in the game interface, or may select input information through a fixed selection item in the game interface, and then indicate a session of the game Player to a Non-Player Character (NPC) through the input information. It should be noted that the game in this embodiment is a game requiring interaction between a game player and an NPC, such as a role playing game, a pet raising game, and the like, and this embodiment is not limited in particular.

Wherein the intent is to indicate an event that the user needs the intelligent avatar to perform. For example, in a human-computer interaction game program, the intention of a user, which is determined according to input information of the user, such as eating, drinking, etc., is used to indicate an event that a game player needs an NPC to complete. The emotion category is used to indicate an emotion expressed by the user for the intelligent virtual character or the current time of the user, and may specifically be: sadness, happiness, likes, fear, disgust, anger, surprise, depression, confusion, others.

For example, if the input information of the game player is "i hungry and thirsty", the input information may determine that the game player is intended to "eat and drink" and determine that the corresponding emotion category is "other", that is, the input information does not include emotional colors.

In the embodiment of the present invention, the intention and emotion category of the user may be determined according to a trained data model, the intention and emotion category of the user may be determined according to a preset rule, and the intention and emotion category of the user may also be determined according to text similarity.

In this embodiment, the intention and the emotion category of the user may be determined through a data model, the data model is obtained by training according to the information sample and the corresponding intention label and emotion label, and after the obtained user input information is obtained, the input information is input into the trained data model, so that the emotion and the intention category corresponding to the input information may be obtained. In this embodiment, the intention and the emotion classification respectively corresponding to the input information may also be determined by two data identification models, that is, one data model is used to determine the emotion corresponding to the input information, one data model is used to determine the intention classification corresponding to the input information, the data model for identifying emotion is obtained by training according to the information sample and the corresponding emotion label, and the data model for identifying intention is obtained by training according to the information sample and the corresponding intention label.

For example, the game player's input is "if you can buy me a cup of milk tea, I will be happy! "the input information can be input into a data model, and the intention corresponding to the input information can be obtained through the model as follows: milk tea is drunk; the emotion labels are: is happy. The input information can also be respectively input into the emotion recognition model and the intention recognition model, the corresponding emotion is happy through the emotion recognition model, and the corresponding intention category is milk tea drinking through the intention recognition model.

In this embodiment, the process of identifying the intention and emotion label corresponding to the information according to the preset rule may be: firstly, extracting keywords and verbs in input information, then combining the verbs and the keywords, matching the combined result with a preset corresponding relation, and determining the corresponding intention and emotion category of the input information. For example, the input message is "if you can buy I a milk tea, I will be happy! Firstly, extracting that the keyword in the input information is milk tea, the verb is buying and happy, the sentence combining the verb and the keyword can be milk tea buying and happy, then matching the combined sentence with the preset corresponding relation, and determining that the intention corresponding to the information is milk tea drinking and the emotion category is happy. It should be noted that the preset correspondence is a preset correspondence between a sentence and an intention and an emotion category, and for example, the preset correspondence includes: the milk tea is bought, and the milk tea (the combination of keywords and/or words extracted from input information) is wanted to be drunk correspondingly.

In addition, the process of determining the emotion and intention category corresponding to the input information according to the text similarity may be: determining a contextual model corresponding to the input information, matching the input information with information in an information base corresponding to the contextual model, obtaining the emotion and intention type corresponding to the information with the highest matching degree, and then taking the emotion and intention type corresponding to the information with the highest matching degree as the emotion and intention type of the user. The contextual model may include a character relationship between the user and the intelligent virtual character, and may also include a scene in which the user and the intelligent virtual character are located. For example, in a human-computer interaction game program, the relationship between a game player and the NPC can be a lover relationship, a master relationship with a pet, a work colleague relationship, and the like; the scene where the game player and the NPC are located may be a work scene, a family scene, a party scene, and the like, which is not specifically limited in this embodiment.

In this embodiment, different profiles correspond to different databases, and the databases store information that is frequently used in the profiles, and emotion categories and intentions corresponding to the information. Therefore, the emotion classification and intention corresponding to the user can be accurately identified by matching the information in the corresponding contextual model.

S302, determining the friendliness between the user and the intelligent virtual role.

In this embodiment, the friendliness is used to represent the degree of affinity of the relationship between the user and the intelligent virtual character. In an optional embodiment provided by the invention, the target friendliness is obtained according to the emotion category and the friendliness; and updating the friendliness degree according to the target friendliness degree to obtain the updated friendliness degree. The target friendliness degree is the friendliness degree determined according to input information of the user at the current moment, namely the emotion category of the user is determined according to the current input information, the target friendliness degree is obtained according to a score corresponding to the emotion category, and the friendliness degree is updated according to the target friendliness degree to obtain the updated friendliness degree.

Specifically, in this embodiment, the friendliness between the user and the intelligent virtual character can be specifically represented by a discrete value, and if the initial friendliness between the user and the intelligent virtual character is 60 and the setting range of the friendliness is 0 to 100, when the input information of the user is relatively friendly (positive emotion), the friendliness increases; on the contrary, the input information of the user is relatively malicious (negative emotion), and the friendliness is reduced.

In this embodiment, corresponding scores may be set for different emotion categories, that is, by setting a correspondence between an emotion category and a score, a friendship between a user and an intelligent virtual character is determined. For example, if the emotion category obtained through the input information of the user at the current moment is "like", 2 may be added on the basis of the historical friendliness of the user and the intelligent virtual character; the emotion obtained through the input information of the user at the current moment is 'aversion', and 2 can be reduced on the basis of the historical friendliness degree of the user and the intelligent virtual role, so that the logic of friendship degree updating between the user and the intelligent virtual role is realized.

For example, if the information of the game player is "i'm hate you", the corresponding emotion classification is "dislike" by identifying the information, and if the friendship degree of the game player before the occurrence of the conversation is 60, since the emotion classification result of the text sentence input in the current round is negative, the friendship degree is decreased by 2 according to the rule, that is, the friendship degree between the current game player and the NPC is updated to be 58 according to the emotion of the currently input text sentence.

And S303, determining a target event to be executed by the intelligent virtual role according to the intention and the friendliness of the user.

The target event can comprise a target virtual article and/or a target action corresponding to the user, the target virtual article is a virtual article given to the user by the intelligent virtual role according to the intention and the friendliness of the user, and the number of the virtual articles is zero to more; the target action is the action executed by the user according to the intention and the friendliness of the user, and the number of the target actions is zero to more. Specifically, the target event includes a first target event and a second target event, the first target event is an event to be executed by the intelligent virtual character for the user, and the second target event is an event to be composed of an action sequence presented by the intelligent virtual character.

As shown in fig. 4, in an optional embodiment provided by the present invention, the determining, according to the intention of the user and the friendliness, a target event to be executed by the intelligent virtual character includes:

s3031, determining whether the friendliness degree is larger than a preset threshold value.

The preset threshold may be set according to actual requirements, and if the preset threshold is set to 60, that is, by determining whether the friendship between the user and the intelligent virtual role is greater than 60, if the friendship is greater than 60, the process jumps to step S3042 to continue executing; if the value is less than or equal to 60, the process goes to step S3044 to continue.

And S3032, if the target virtual article is larger than the preset threshold, determining the target virtual article and the target action according to the intention of the user.

For example, the game player may enter the message "window is too cold, I have frozen the cold! ", then the intention of the user obtained from the input information is: if the degree of friendliness between the NPC and the game player after the previous round is 70 and the information input by the user in the current round does not have the emotion, the degree of friendliness between the NPC and the game player is still 70, the target virtual article given to the game player by the NPC is determined to be { cold medicine }, and the target action is { window closing and medicine taking }.

S3033, determining a first target event to be executed by the intelligent virtual character for the player according to the target virtual article and the target action.

For example, if the target virtual item is empty and the target action is { close window }, determining that a first target event to be executed by the intelligent virtual character for the player is close window; if the target virtual article is 'milk tea', the target action is { buying beverage for the game player }, then the first target event to be executed by the intelligent virtual character for the player is { buying milk tea for the game player }; and if the target virtual object is empty and the content in the target action is empty, determining that the first target event to be executed by the intelligent virtual character for the player is also empty.

In an embodiment provided by the present invention, if there are a plurality of target actions, the target actions may be spliced according to a preset template sentence, and the preset template sentence may be defined by itself. For example, the preset template sentence may be constructed in a manner of < info >, so i go to < initial action > first, and then i go to < action 1 >. Wherein < information > is input information of a player in a game, and < initial action > and < action 1> are determined according to an intention identified by the input information of the game player. In the present embodiment, the greater the probability of the intention identified from the input information, the more advanced the template sentence to be spliced, that is, the greater the probability of the initial action in this example is than that of action 1.

And S3034, if the target virtual article and the target action are less than or equal to the preset threshold, determining that the content of the target virtual article and the target action are null.

In this embodiment, the contents of the target virtual item and the target action are null, which indicates that the contents of the first target event that the virtual character is to execute for the player are also null. For example, the input information of the game player is "i eat and eat", and after the previous session is ended, the friendship between the game player and the NPC is 50, which is lower than a preset value, so that even if the intention of the game player is to eat or drink, the NPC does not need to give a corresponding feedback to the intention of the game player, that is, the contents of the target virtual item and the target action are empty.

For example, the intention determined from the input information of the game player is "eat", the last round of the friendship is 70, and the target virtual object that the NPC will give away may be "braised meat". If the last round of finished friendliness is below a certain threshold (e.g., 50), the NPC will not give away the target virtual item to the game player.

Further, in an optional embodiment provided by the present invention, the number of target virtual items and target actions may be determined according to the level of friendliness between the user and the intelligent virtual character. Specifically, the greater the number of target virtual items and target actions. For example, the input information of the game player corresponds to an intention of { eating, closing window }, and if the friendship between the game player and the NPC is 70, the determined target virtual item can be { potato chip } and an intention set { eating to the game player }; if the friendship between the game player and the NPC at the current time is 90, the determined set of virtual items may be { chips } and the set of intentions { get something to eat to the game player, close the window }.

S304, according to the target event to be executed by the intelligent virtual role, determining language information to be replied by the intelligent virtual role aiming at the input information.

For example, if the input information of the game player is "i hungry", if the determined target event to be executed by the NPC is { search for a potato chip in the environment and give the potato chip to the game player }, the language information returned by the NPC may be determined as "you will not feel hungry if you eat such things"; if the determined target event to be executed by the NPC is empty, the language information to be replied by the NPC can be determined as ' I is busy, and you call a takeaway eating bar ' by oneself '.

In one embodiment provided by the invention, the character relationship between the user and the intelligent virtual character is obtained; the determining, according to the target event to be executed by the intelligent virtual character, language information to be replied by the intelligent virtual character for the input information includes: and determining language information to be replied by the intelligent virtual character aiming at the input information according to the target event to be executed by the intelligent virtual character and the character relation.

If the embodiment is applied to a human-computer interaction game program, the character relationship can be a lover relationship, a mother-child relationship, a colleague relationship, a friend relationship and the like; if the embodiment is applied to a human-computer interaction service program, the task relationship may be a relationship between a customer and a service provider, specifically, a banking transaction, a boarding service transaction, and the like, which is not specifically limited in this embodiment.

In an optional embodiment provided by the invention, responding to input information of a user, and acquiring historical dialogue information between the user and the intelligent virtual role; the determining, according to the target event to be executed by the intelligent virtual character, language information to be replied by the intelligent virtual character for the input information includes: and determining language information to be replied by the intelligent virtual character aiming at the input information according to the target event to be executed by the intelligent virtual character and the historical dialogue information.

Specifically, the language information to be replied to for the intelligent virtual character of the input information can be determined by the following method: and acquiring the character relationship between the user and the intelligent virtual character and historical conversation information. Feature vectors of the input information, the person relationships, and the historical dialog information are determined. And inputting the characteristic vector into a language reply model, and determining language information to be replied of the intelligent virtual role.

The language reply model can be obtained by training according to input information, character relations, sample data of feature vectors of historical dialogue information and corresponding labels. It may also be based on a GPT-2 language model trained on a novel corpus. In this embodiment, the zero-shot capability of the GPT-2 language model is utilized to splice and input the feature vectors of the obtained input information, the character relationship and the historical dialogue information into the GPT-2 model, so that the model generates the next language information. According to the embodiment, the factors such as the target event to be executed by the intelligent virtual character and the like are brought into the controlled generation range of the language model through the voice reply model, so that the target event to be executed by the intelligent virtual character and the language information can be aligned, and the situation that the language information is 'I go to close a window' but the target event to be executed by the intelligent virtual character is 'open a window' is prevented.

S305, controlling the intelligent virtual role to execute the target event and outputting language information.

According to the man-machine interaction method provided by the embodiment of the application, input information of a user is responded, and the intention and the emotion type of the user are determined according to the input information; then determining the friendliness degree between the user and the intelligent virtual role; determining a target event to be executed by the intelligent virtual role according to the intention and the friendliness of the user; determining language information to be replied by the intelligent virtual character aiming at the input information according to a target event to be executed by the intelligent virtual character; and finally, controlling the intelligent virtual role to execute the target event and outputting language information. Compared with the current conversation process based on pure rules or the conversation between the user and the intelligent virtual character realized around semantic similarity, the method and the system identify the intention and the friendliness of the user from the information input by the user at will, control the intelligent virtual character to execute the target event based on the intention and the friendliness of the user, and output the corresponding language information, thereby improving the diversity and the reasonability of the feedback information of the intelligent virtual character in the human-computer interaction process and further improving the human-computer interaction effect.

In one embodiment provided by the present invention, determining the user's intent from input information includes: the user's intent is determined for the input information by an unsupervised intent recognition model and/or a small sample intent recognition model.

Wherein determining the user's intent through an unsupervised intent recognition model for input information comprises: splicing the input information and a plurality of preset intentions through a preset template to obtain a plurality of intention sentences; determining relevance scores or confusion scores corresponding to the plurality of intention sentences respectively through an unsupervised intention recognition model; determining a preset intention corresponding to the intention sentence with the relevance score exceeding a first numerical value as the intention of the user; or determining the preset intention corresponding to the intention sentence with the confusion degree lower than the second numerical value as the intention of the user.

The unsupervised intention recognition is to sort each sentence of text based on the existing text evaluation method (such as perplexity, query-answer correlation, etc.), and then select the corresponding intention with the highest/lowest score. The specific operation mode is that input information of a user and preset intentions are spliced through a constructed template to obtain corresponding sentences, and then the spliced sentences are scored. Different evaluation methods correspond to different templates, for example, when a query-answer correlation evaluation mode is used, the < input information > is used, and then the < intention > template is graded by the help of you; when using persistence, < import info >, i then go to the < intent > template to score. Determining relevance scores or confusion scores corresponding to a plurality of intention sentences respectively through unsupervised intention recognition; determining a preset intention corresponding to the intention sentence with the relevance score exceeding a first numerical value as the intention of the user; or determining the preset intention corresponding to the intention sentence with the confusion degree lower than the second numerical value as the intention of the user.

The preset intention may be [ < intention 1>, < intention 2> …, < intention 10> ], and the first value and the second value may be set according to actual requirements, for example, the first value is set to 80%, and the second value is set to 20%, which is not specifically limited in this embodiment.

The accuracy of the unsupervised intention identification method is not high enough, and the supervised intention identification method needs a large number of samples. In the present embodiment, in a small sample scene, an intention recognition model based on a small sample is used to improve the accuracy of intention recognition. Although the small sample intention recognition model can well alleviate the two problems mentioned above, since the small sample intention recognition model is slow in inference speed and not suitable for on-line processing, the embodiment improves the small sample intention recognition model to some extent. Specifically, determining the user's intention for the input information through a small sample intention recognition model includes: inputting the input information into a small sample intention recognition model to obtain intention probability values corresponding to a plurality of intentions and a plurality of intentions respectively; the small sample intention recognition model is obtained by training according to the information sample and the corresponding preset intention label; and determining the intention corresponding to the intention probability value higher than the third numerical value as the intention of the user.

In this embodiment, the training process for the small sample intent recognition model is as follows:

1. 4-8 samples are constructed for each intention category as an annotation data set. Wherein, the sample data is: < information sample > < preset intention label >, for example: the preset intention label can be 'eat, drink, sleep', and the constructed sample data is: i starve-eat, wherein "i starve" is the information sample and "eat" is the corresponding preset intent tag.

2. And constructing a PET template, wherein the PET template comprises a prompt template and a description text template. Under the task of intent recognition, the template constructed is "if below the intent is [ MASK ].

3. A pre-trained language model (such as BERT) is trained in the form of a masked language model.

4. And generating features for the training set and the artificially constructed negative sample by using the language model trained in the last step. Wherein, the negative sample refers to other intention, such as presetting 10 intents, and then the 11 th intention of other is needed to collect the sample. Both the negative sample and the positive sample (i.e. labeled sample data) are subjected to BERT extraction to obtain features, which are used as feature identifiers of each sample data.

5. And training a small sample intention recognition model by using the characteristics generated in the last step, and finally performing intention recognition on the input information.

Although the recognition model can correctly recognize the intention of most information based on the small sample intention, there are cases of partial misclassification. Therefore, in order to further improve the accuracy of the intent recognition, this embodiment provides an optional embodiment, in which, for the input information, the intent of the user is determined through an unsupervised intent recognition model and a small sample intent recognition model, including: aiming at input information, obtaining a plurality of intentions and correlation scores or confusion scores corresponding to the intentions through an unsupervised intention recognition model; aiming at the input information, obtaining a plurality of intentions and intention probability values corresponding to the intentions through a small sample intention recognition model; aiming at any target intention which is the same between a plurality of intentions obtained by an unsupervised intention recognition model and a plurality of intentions obtained by a small sample intention recognition model, carrying out weighted calculation on a relevance score or a confusion score corresponding to the target intention and the intention probability value corresponding to the target intention to obtain a weighted score; and determining the intention of the user, wherein the weighted score exceeds the intention of the fourth numerical value.

The following is a game scene example provided by the present invention for explaining the man-machine interaction method of the present embodiment. Wherein, the characters in the scene: robot (NPC), user (gamer). The relationship of the characters can be changed correspondingly, such as alien and friends.

The objects in the scene are: television, air conditioner, potato chips, cola, window, curtain and sofa. Presetting intentions: eating, drinking, lying down, adjusting windows, adjusting curtains, adjusting televisions, adjusting air conditioners, and no intention. And (3) emotion classification: sadness, happiness, likes, fear, disgust, anger, surprise, depression, confusion, others. And presetting reactions corresponding to each emotion, such as: "like" to correspond to "hug", "have light in the eyes, kiss, compare heart", etc. The virtual article includes: books, movies, animations, animals, plants, recipes, commodities, etc.

And (3) friendliness degree: NPC friendliness to game players. The positive input information of the game player can improve the friendliness and can be reduced otherwise. If the game player commands the NPC to do something (e.g., help me turn on air conditioner, fast go to turn on tv, etc.), the friendliness is also reduced. The default initial friendliness is 60.

The input information in the game after the game player first enters the game is as follows: i eat too hungry and thirst. Then the intention with the highest score obtained according to the input information is "eat", the corresponding emotion category is "other", and the friendliness is still 60 after the game player inputs information because "other" is neutral emotion and does not affect the friendliness. After the intention and the friendliness degree corresponding to the input information are obtained, the target virtual items which are given to the game player by the NPC are determined to be 'potato chips and cols', the target action to be executed by the NPC is 'taking food and beverage for the user', the target events determined according to the target virtual items and the target action are 'taking cola and potato chips for the user', the language output based on the target events is returned to 'I has taken potato chips and cola', and finally the target events executed by the NPC in the game scene are as follows: taking potato chips and cola for game players; the language entered reverts to: "I get you with potato chips and cola".

In the case of dividing the functional modules according to the respective functions, fig. 5 shows a schematic diagram of a possible composition of the human-computer interaction device described above and in the embodiments, as shown in fig. 5, the human-computer interaction device may include:

a determining module 51, configured to respond to input information of a user, and determine an intention and an emotion category of the user according to the input information;

the determining module 51 is further configured to determine the friendliness between the user and the intelligent virtual character;

the determining module 51 is further configured to determine, according to the intention of the user and the friendliness, a target event to be executed by the intelligent virtual character;

the determining module 51 is further configured to determine, according to a target event to be executed by the intelligent virtual character, language information to be replied by the intelligent virtual character for the input information;

and the output module 52 is configured to control the intelligent virtual character to execute the target event, and output the language information.

In an optional embodiment, the apparatus further comprises: an acquisition module 53;

the obtaining module 53 is configured to respond to input information of a user, and obtain historical dialogue information between the user and the intelligent virtual character;

the determining module 51 is specifically configured to determine, according to the target event to be executed by the intelligent virtual character and the historical dialogue information, language information to be replied by the intelligent virtual character for the input information.

In an alternative embodiment, the determining module 51 is specifically configured to:

In an optional embodiment, the determining module 51 is further configured to determine that the friendliness is greater than a preset threshold.

In an optional embodiment, the obtaining module 53 is further configured to:

In an alternative embodiment, the determining module 51 is specifically configured to determine the user's intention by an unsupervised intention recognition model and/or a small sample intention recognition model with respect to the input information.

determining the preset intention corresponding to the intention sentence with the relevance score exceeding a first numerical value as the intention of the user; or determining the preset intention corresponding to the intention sentence with the confusion degree lower than a second numerical value as the intention of the user.

In an optional embodiment, the obtaining module 53 is further configured to obtain a character relationship between the user and the intelligent virtual character;

the determining module 51 is specifically configured to determine, according to the target event to be executed by the intelligent virtual character and the character relationship, language information to be replied by the intelligent virtual character with respect to the input information.

Based on the same application concept, embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the human-computer interaction method provided in the foregoing embodiments are performed.

Specifically, the storage medium can be a general storage medium, such as a removable disk, a hard disk, and the like, when a computer program on the storage medium is run, the above-mentioned human-computer interaction method can be executed, and the user can control the marker icon to move on the surface of the virtual object in the virtual game environment by operating the marker control, and then mark the specific position of the virtual object, so that the accuracy of marking the virtual object can be improved by the present invention.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A human-computer interaction method, characterized in that the method comprises:

determining the friendliness between the user and the intelligent virtual role;

2. The human-computer interaction method of claim 1, further comprising:

3. The human-computer interaction method according to claim 1, wherein the target event comprises a first target event and a second target event, the first target event is an event to be executed by the intelligent virtual character for the user, and the second target event is an event to be composed of an action sequence presented by the intelligent virtual character.

4. The human-computer interaction method according to claim 3, wherein the determining a target event to be executed by the intelligent virtual character according to the user's intention and the friendliness degree comprises:

5. The method of claim 4, wherein prior to determining a target virtual item and a target action based on the user's intent and the friendliness, the method further comprises:

6. The human-computer interaction method of claim 1, further comprising:

7. The method of claim 1, wherein determining the user's intent from the input information comprises:

8. The method of claim 7, wherein determining the user's intent via an unsupervised intent recognition model for the input information comprises:

splicing the input information and a plurality of preset intents through a preset template to obtain a plurality of intention sentences;

9. The method of claim 7, wherein determining the user's intent via a small sample intent recognition model for the input information comprises:

10. The method of claim 7, wherein determining the user's intent via an unsupervised intent recognition model and a small sample intent recognition model for the input information comprises:

11. The method of claim 1, further comprising: acquiring a character relationship between the user and the intelligent virtual character;

12. A human-computer interaction device, characterized in that the device comprises:

13. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is operated, the machine-readable instructions being executable by the processor to perform the steps of the human-computer interaction method according to any one of claims 1 to 11.

14. A computer-readable storage medium, having stored thereon a computer program for performing, when executed by a processor, the steps of the human-computer interaction method according to any one of claims 1 to 11.