CN112732951A

CN112732951A - Man-machine interaction method and device

Info

Publication number: CN112732951A
Application number: CN202011607894.2A
Authority: CN
Inventors: 刘永霞; 邵星阳
Original assignee: Qingdao Hisense Smart Life Technology Co Ltd
Current assignee: Qingdao Hisense Smart Life Technology Co Ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-04-30

Abstract

The invention discloses a man-machine interaction method and a man-machine interaction device. Accurate recommendation is performed by fusing various recommendation strategies such as voiceprint user portrait, knowledge graph, Query rewriting and the like, and the recommendation quality is improved by using the above user preference information in the conversation. And the recommendation result is jointly displayed through the dialogue interaction and the interface, and the recommendation reason is provided. And training a language generation model to generate a recommended language in an end-to-end mode by combining the conversation scene of the user, the mood of the user, the recommendation basis and the knowledge map information of the recommended resources. Compared with the reply based on the template, the contents of the recommended sentences are rich, vivid, flexible and changeable, the user mind is approached, and the user experience is improved.

Description

Man-machine interaction method and device

Technical Field

The invention relates to the technical field of intelligent hotels, in particular to a human-computer interaction method and device.

Background

At present, many voice interaction scenes of intelligent equipment need to be supported by a recommendation system. Such as these scenarios: the user has no clear intention before sitting in the magic mirror, the semantic engine does not understand the intention of the user, the user has clear intention but does not specify a specific resource name, and no related resource is found after query. In a voice interaction scene, the intelligent device such as a magic mirror can only use a fixed control instruction, the device can only execute corresponding operation according to the fixed control instruction, and intelligent voice interaction with a user cannot be carried out, so that value information provided for the user is single.

Disclosure of Invention

The embodiment of the invention provides a method and a device for man-machine interaction, which are used for increasing voice interaction intellectualization and improving man-machine interaction experience compared with reply contents based on a template.

In a first aspect, an embodiment of the present invention provides a method for human-computer interaction, including:

the intelligent equipment acquires voice information of the user in the process of interacting with the user;

the intelligent equipment identifies the voice information and determines an identification result;

the intelligent equipment determines a recommendation result according to the identification result and multiple preset recommendation strategies;

and the intelligent equipment recommends to the user in a voice and display mode based on the recommendation result.

According to the technical scheme, accurate recommendation is carried out according to various recommendation strategies by combining the conversation scene of the user, and the recommendation result is displayed together through conversation interaction and an interface, so that the recommendation quality can be improved, the recommendation content is rich, vivid and flexible, the user mind is approached, and the user experience is improved.

Optionally, the identification result includes that the user has no specific intention, the intention of the user cannot be resolved, the user is interested but has no specific resource name, the resource name exists but the resource cannot be queried, or the resource name does not exist.

Optionally, the recommendation policy includes one or any combination of the following policies:

recommendations based on Query rewrite;

rewriting the Query;

a recommendation based on the voiceprint user representation;

an association recommendation based on the above or current entity's knowledge graph;

recommendations based on new hot resources;

a region-based recommendation;

time-based recommendations.

Optionally, the determining, by the intelligent device, a recommendation result according to the identification result and multiple preset recommendation strategies includes:

the intelligent equipment determines a recommendation strategy corresponding to the identification result from the preset multiple recommendation strategies according to the identification result;

and the intelligent equipment determines the recommendation result from a resource library by using the recommendation strategy corresponding to the identification result and combining with the conversation content in the user interaction process.

Optionally, the recommendation result includes a recommended resource and a recommendation corresponding to the recommended resource;

the intelligent device recommends to the user in a voice and display mode based on the recommendation result, and the method comprises the following steps:

and the intelligent equipment recommends the recommended resources to the user in a voice mode by combining the recommending words and displays the recommending words and the recommended resources to the user on display equipment of the intelligent equipment.

In a second aspect, an embodiment of the present invention provides a human-computer interaction apparatus, including:

the acquisition unit is used for acquiring the voice information of the user in the process of interacting with the user;

the processing unit is used for identifying the voice information and determining an identification result; determining a recommendation result according to the identification result and a plurality of preset recommendation strategies;

and the recommending unit is used for recommending to the user in a voice and display mode based on the recommending result.

recommendations based on Query rewrite;

rewriting the Query;

a recommendation based on the voiceprint user representation;

recommendations based on new hot resources;

a region-based recommendation;

time-based recommendations.

Optionally, the processing unit is specifically configured to:

determining a recommendation strategy corresponding to the identification result from the preset multiple recommendation strategies according to the identification result;

and determining the recommendation result from a resource library by using the recommendation strategy corresponding to the identification result and combining with the conversation content in the user interaction process.

the recommendation unit is specifically configured to:

recommending the recommended resources to the user in a voice mode by combining the recommended words, and displaying the recommended words and the recommended resources to the user on display equipment of the intelligent equipment.

In a third aspect, an embodiment of the present invention further provides a computing device, including:

a memory for storing program instructions;

and the processor is used for calling the program instructions stored in the memory and executing the human-computer interaction method according to the obtained program.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable non-volatile storage medium, which includes computer-readable instructions, and when the computer-readable instructions are read and executed by a computer, the computer is caused to execute the above-mentioned human-computer interaction method.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of a system architecture according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a human-computer interaction method according to an embodiment of the present invention;

fig. 3 is a schematic diagram of an identification result according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a human-computer interaction device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a system architecture provided in an embodiment of the present invention. As shown in FIG. 1, the system architecture may be a smart device that may include modules such as a base data 100, recall policies 200, ranking policies 300, data sources 400, compute and storage centers 500, and the like. The intelligent device can be a magic mirror, an intelligent television, an intelligent interaction device and the like.

The data source 400 is used to provide historical viewing data of a user, such as user log data and a resource library. And manual marking data can be provided, so that the calculation and storage center can train conveniently.

The computation and storage center 500 is primarily used to store data using bins, process data provided by data sources using Spark computation platforms, and train language generation models using TensorFlow training platforms. To generate a recommended interpretation of the resource.

The basic data 100 is mainly used for providing voiceprint portrait information data, dialogue text data, knowledge graph data, user behavior data, a resource library, a Query rewrite library, a service scene and historical recommendation resources.

Recall policies 200 may also be referred to as recommendation policies and may include voiceprint user profile policies, temporal policies, knowledge graph policies, geographic policies, new thermal policies, and Query rewrite policies.

Ranking policy 300 is primarily used to rank recalled resources, which may be based on deep learning ranking models, regression-based, and popularity-based ranking.

In the specific recommendation process, the recall is firstly carried out, then the filtering and the sorting are carried out, finally the post-processing is carried out, and the display is carried out on the interface after the post-processing is finished.

It should be noted that the structure shown in fig. 1 is only an example, and the embodiment of the present invention is not limited thereto.

Based on the above description, fig. 2 shows in detail a flow of a human-computer interaction method provided by an embodiment of the present invention, where the flow may be executed by a human-computer interaction device.

As shown in fig. 2, the process specifically includes:

step 201, the intelligent device obtains the voice information of the user in the process of interacting with the user.

In the embodiment of the invention, the intelligent device can acquire each piece of voice information in the process of interacting with the user. The recommendation is performed throughout the voice interaction process when the trigger opportunity is met.

Step 202, the intelligent device identifies the voice information and determines an identification result.

After the intelligent device obtains the voice information, voice recognition can be performed to obtain a recognition result, wherein the recognition result can include that the user has no clear intention, the intention of the user cannot be analyzed, the user is interested but has no specific resource name, the resource name exists but the resource cannot be inquired or the resource name does not exist.

As shown in fig. 3, for example, if the user does not explicitly follow the graph, the user has a chat about Query: i are bored/mood bad. Meaningless Query: that.

The semantic engine cannot resolve the user's intention, for example, a case where the user does not say clearly: i want; case where the semantic engine does not resolve: can be divided into positioning problem, error correction problem, labeling problem and identification problem.

The user has a clear intention, but does not define specific resource names: i want to watch a movie, i want to listen to a song, games, i want to click to take away, liudeluxe, i want to listen, what good looking movies there, recommend me movies, etc.

There are also situations where the user has a clear intent but cannot find the resource: the resources can not be found: i want to listen to the night piano music five. There are resources, but only resource types: film, music, games, gourmet, poems, characters, etc.

The recognition result is also expressed as a trigger condition, which can also be called a recommendation condition, and the recognition result is in various states from analysis to feedback in the whole voice interaction process, so that the human-computer interaction experience is improved.

And 203, the intelligent equipment determines a recommendation result according to the identification result and a plurality of preset recommendation strategies.

The preset multiple recommendation strategies can comprise one or any combination of the following strategies:

recommendations based on Query rewrite;

rewriting the Query;

a recommendation based on the voiceprint user representation;

recommendations based on new hot resources;

a region-based recommendation;

time-based recommendations.

The embodiment of the invention carries out accurate recommendation by fusing the voiceprint user portrait, the knowledge map and query rewriting. The recommendation strategy extracted from the voice angle and the voice assistant user log is more suitable for being applied to a voice interaction scene.

In the specific recommendation process, the intelligent device determines a recommendation strategy corresponding to the identification result from a plurality of preset recommendation strategies according to the identification result, and then determines the recommendation result from the resource library by combining the recommendation strategy corresponding to the identification result with the conversation content in the user interaction process.

The embodiment of the invention combines a recommendation system and a dialogue system so as to mutually benefit each other: the result of the recommendation system is to provide important information for maintaining multiple rounds of conversations, and the above user preference information included in the conversation system may improve the quality of the recommendation.

The recommendation result comprises recommendation resources and recommendation words corresponding to the recommendation resources, the recommendation words are generated by training a language generation model in an end-to-end mode by combining a conversation scene, user moods, recommendation bases and knowledge map information of the recommendation resources, and the recommendation words run through the whole voice interaction process.

The essence of the recommendation is to efficiently retrieve Top-k's related resources from a full-scale repository. The scheme combines a main stream framework of a recommendation system with an application reality, and divides a recommendation process into the following four stages:

(1) a module recalling stage: and (4) retrieving related resources from a resource library based on the multi-recall strategy (recommendation strategy) and the current scene to form a resource list.

(2) A filtering module stage: the resources that the user has watched/listened to and recommended are filtered.

(3) Sequencing module stage: and sequencing the resource list formed by the recall module according to the click probability of the user.

(4) Post-processing model stage: presenting a sorted preset number (e.g., top 10) of recommended resources and adding two new hot resources. These resources do not necessarily conform to user interest preferences, but are not diametrically opposed to user interests. Adding random resources outside of the preferences may avoid limiting the user's view. And the recommendation diversity and novelty are increased.

And step 204, recommending to the user by the intelligent equipment in a voice and display mode based on the recommendation result.

The recommendation result comprises the recommended resources and recommendation words corresponding to the recommended resources, and the intelligent device can recommend the resources to the user in a voice mode by combining the recommendation words and the recommendation resources and display the recommendation words and the recommendation resources to the user on the display device of the intelligent device.

In the embodiment of the invention, the display form is important for the recommendation system. The existing recommended display is based on text and picture forms, so that the existing recommended display is not vivid enough, if the existing recommended display is busy, the text display form has the great disadvantage, and voice interaction can realize washing while listening to music and the like.

For example: the following makeup contents are recommended for you, and then the poster showing various makeup is in the interface of dense and rough hemp. However, the user is likely not to see what these assets are presented.

In addition, the recommendation language generated by training the language generation model in an end-to-end mode is richer, more vivid and more flexible than the reply content based on the template by combining the conversation scene, the mood of the user, the recommendation basis and the knowledge graph information of the recommendation resources, and is close to the mind of the user.

And the recommendation result is jointly displayed through the dialogue interaction and the interface, and the recommendation reason is provided. The recommendation interpretation can gain user trust, and the user can accept the recommendation result more easily. As shown in table 1, the present invention implements the interactive effects that can be achieved.

TABLE 1

User Query	Intelligent interactive answers
		I am very bored/boring	When you are on the woolen, we look at the dressing bar together!
Is difficult to die	Put back bad mood, the mirror accompany you see the video bar of doing fun!
		Liu Dehua	The glasses have the music content of Liu De Hua and do not want to listen to one sound?
I want to listen to a song about a baby	You like you to listen to his (baby) bar when you like easy to melt
		I want to watch heddles	Seeing what heddle art bar is gone up recently to thousand xi
Za and fat	Can eat less meat and more vegetables to make the meat beautiful
		I want to see beautiful makeup	Recently, several new products are newly created, and can be tried

For example: when the toilet is long, people want to look dramatic, but do not look what they want again.

Magic mirror recommendation: so long together, the glasses know you like a [ frightening ] class movie, recommending you see a [ women and a car ] to relieve distress!

The embodiment of the invention integrates various recommendation strategies such as voiceprint user portrait, knowledge graph, Query rewriting and the like to carry out accurate recommendation, combines a recommendation system and a dialogue system, displays a recommendation result through dialogue interaction and an interface, provides a recommendation reason, and improves the recommendation quality by using the above user preference information in the dialogue. The recommendation language which is rich in content, vivid, flexible and changeable and is close to the inner heart of the user is generated through an end-to-end language generation model.

In the embodiment of the invention, the intelligent equipment acquires the voice information of the user in the process of interacting with the user, identifies the voice information, determines the identification result, determines the recommendation result according to the identification result and the preset multiple recommendation strategies, and recommends the user in a voice and display mode based on the recommendation result. Throughout the voice interaction process, a number of deficiencies in the semantic parsing to query process are served. From a voice perspective, a multi-recommendation strategy based on voice assistant user log data mining. Accurate recommendation is performed by fusing various recommendation strategies such as voiceprint user portrait, knowledge graph, Query rewriting and the like, a recommendation system and a conversation system are combined, and the recommendation quality is improved by using the above user preference information in the conversation. And jointly displaying the recommendation result through the dialogue interaction and the interface. And training a language generation model to generate a recommended language in an end-to-end mode by combining the conversation scene of the user, the mood of the user, the recommendation basis and the knowledge map information of the recommended resources. Compared with the reply based on the template, the contents of the recommended sentences are rich, vivid, flexible and changeable, the contents approach the inner core of the user, and the human-computer interaction is warmer.

Based on the same technical concept, fig. 4 exemplarily shows a structure of a human-computer interaction device provided by an embodiment of the present invention, and the device can perform a flow of human-computer interaction.

As shown in fig. 4, the apparatus specifically includes:

an obtaining unit 401, configured to obtain voice information of a user in an interaction process with the user;

a processing unit 402, configured to identify the voice information and determine an identification result; determining a recommendation result according to the identification result and a plurality of preset recommendation strategies;

a recommending unit 403, configured to recommend to the user in a voice and display manner based on the recommendation result.

recommendations based on Query rewrite;

rewriting the Query;

a recommendation based on the voiceprint user representation;

recommendations based on new hot resources;

a region-based recommendation;

time-based recommendations.

Optionally, the processing unit 402 is specifically configured to:

the recommending unit 403 is specifically configured to:

Based on the same technical concept, an embodiment of the present invention further provides a computing device, including:

a memory for storing program instructions;

Based on the same technical concept, the embodiment of the invention also provides a computer-readable non-volatile storage medium, which comprises computer-readable instructions, and when the computer-readable instructions are read and executed by a computer, the computer is enabled to execute the above human-computer interaction method.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method of human-computer interaction, comprising:

2. The method of claim 1, wherein the recognition result comprises no explicit intent of the user, no intent of the user to resolve, no specific resource name intended by the user, a resource name but no resource queried, or no resource name with resource.

3. The method of claim 1, wherein the recommendation policy comprises one or any combination of the following policies:

recommendations based on Query rewrite;

rewriting the Query;

a recommendation based on the voiceprint user representation;

recommendations based on new hot resources;

a region-based recommendation;

time-based recommendations.

4. The method of claim 1, wherein the intelligent device determines the recommendation according to the identification and a plurality of preset recommendation strategies, and the method comprises:

5. The method of any one of claims 1 to 4, wherein the recommendation result comprises a recommended resource and a recommendation corresponding to the recommended resource;

6. A human-computer interaction device, comprising:

7. The apparatus of claim 6, wherein the recognition result comprises no explicit intent of the user, no intent of the user to resolve, no specific resource name intended by the user, a resource name but no resource queried, or no resource name with resource.

8. The apparatus of claim 6, wherein the recommendation policy comprises one or any combination of the following policies:

recommendations based on Query rewrite;

rewriting the Query;

a recommendation based on the voiceprint user representation;

recommendations based on new hot resources;

a region-based recommendation;

time-based recommendations.

9. A computing device, comprising:

a memory for storing program instructions;

a processor for calling program instructions stored in said memory to execute the method of any one of claims 1 to 5 in accordance with the obtained program.

10. A computer-readable non-transitory storage medium including computer-readable instructions which, when read and executed by a computer, cause the computer to perform the method of any one of claims 1 to 5.