CN111899728A

CN111899728A - Training method and device for intelligent voice assistant decision strategy

Info

Publication number: CN111899728A
Application number: CN202010719035.6A
Authority: CN
Inventors: 朱飞; 连欢
Original assignee: Hisense Electronic Technology Wuhan Co ltd
Current assignee: Hisense Electronic Technology Wuhan Co ltd
Priority date: 2020-07-23
Filing date: 2020-07-23
Publication date: 2020-11-06
Anticipated expiration: 2040-07-23
Also published as: CN111899728B

Abstract

The training method and the training device for the intelligent voice assistant decision strategy can analyze the feature vector of the voice request of the user by utilizing log data; then taking the feature vector as input, and enabling the DDPG model to output decision content corresponding to the voice request; then, under the condition that the decision content inquires the opinion from the user, the inquired content is supplemented according to the real intention of the voice request; and finally, taking the supplemented decision content as a corresponding decision strategy of the voice request in the intelligent voice assistant. According to the technical scheme, the interaction behavior of the user and the intelligent voice assistant can be simulated in the offline state of the intelligent device, and the decision content supplemented during the simulation of the user interaction is used as the decision strategy of the intelligent voice assistant. After the intelligent voice assistant finishes training, if the voice request corresponds to the return results of the plurality of service modules, the return results desired by the user can be accurately determined and provided for the user, and the interaction accuracy between the user and the intelligent voice assistant is improved.

Description

Training method and device for intelligent voice assistant decision strategy

Technical Field

The application relates to the technical field of intelligent home, in particular to a training method and a training device for an intelligent voice assistant decision strategy.

Background

Currently, many intelligent devices implement intelligent interaction with users through intelligent voice assistants. The decision engine is a core part of the intelligent voice assistant, and has the main task of performing semantic analysis and comprehensive judgment on user requests of users under various services and outputting request results which can best meet the real intentions of the users. Therefore, the accuracy of the decision engine directly affects the intelligence of the intelligent voice assistant, and further affects the user experience.

The decision mode of the decision engine is mainly realized by a rule algorithm or a classification model algorithm and the like. The rule algorithm determines the output sequence of the services by setting the priority degrees of certain services, and determines the priority degree of each service in a mode of manually setting a threshold value, for example, video services such as television, movies and the like are main services at a television end, and video, audio and video services and the like are preferentially output when the threshold value is higher, but audio services such as music and the like may be preferentially output at a sound box end; the classification model algorithm mainly takes the analysis results of various services as input and the labels of the modules as output, judges the classification probability of the analysis results of the modules through the classification model and outputs the service with the highest probability.

The rule algorithm and the classification model algorithm play an important role in the early development stage of the intelligent voice assistant, but as the integrated services are more and more diversified, the result given by the decision engine is the same for the same request of the user no matter how many times the request is made, and the result is possibly right and possibly wrong. When the user request is ambiguous, that is, a plurality of service fields are involved, the decision engine cannot determine which service field the result of the user request is, and the accuracy of the given result is lower. Thereby affecting the experience of the user searching for content using the intelligent voice assistant.

Disclosure of Invention

The application provides a training method and a training device for an intelligent voice assistant decision strategy, which are used for solving the problem that the accuracy of searching contents of the intelligent voice assistant in the existing intelligent equipment is low, and further ensuring the experience effect of a user.

In a first aspect, the present application provides a method for training an intelligent voice assistant decision strategy, including:

acquiring log data of a user; the log data is used for representing behavior data and request data of a user in historical voice operation of interacting with the intelligent device;

analyzing a feature vector of the voice request of the user by using the log data; the characteristic vector is used for representing a vector formed by different return results received after a user sends a voice request;

taking the feature vector corresponding to the voice request as input, and outputting decision content corresponding to the voice request by using a depth certainty strategy gradient (DDPG) model; the decision content is used for representing a predicted return result which is predicted by the depth determination strategy gradient DDPG model and is corresponding to the voice request and should be received by the user;

in the case that the decision content is to ask the user for opinions, supplementing the content of inquiry according to the real intention of the voice request; the real intent is used to represent a pre-tagged user's expected result for the voice request;

and taking the supplemented decision content as a corresponding decision strategy of the voice request in the intelligent voice assistant.

In some embodiments, the step of analyzing the feature vector of the user voice request by using the log data comprises:

analyzing the request characteristics that the return result sent to the user by the intelligent equipment according to the voice request belongs to different service modules and the resource characteristics of the media resource to which the return result belongs;

analyzing the historical characteristics of the user according to the time when the user sends the voice request in the intelligent equipment and the attention content in the intelligent equipment;

and composing the feature vector by using the request feature, the resource feature and the historical feature.

In some embodiments, after the step of outputting the decision content corresponding to the voice request by using a depth deterministic policy gradient DDPG model with the feature vector corresponding to the voice request as an input, the method further includes:

accepting the decision content with a probability greater than a preset probability threshold if the decision content meets the true intent of the voice request;

and taking the decision content as a corresponding decision strategy of the voice request in the intelligent voice assistant.

In some embodiments, after the step of using the decision content as the corresponding decision policy of the voice request in the intelligent voice assistant, the method further comprises:

and feeding back the action of accepting the decision content as a decision result corresponding to the voice request to the depth determination strategy gradient DDPG model.

objecting to the decision content with a probability greater than a preset probability threshold if the decision content does not meet the true intent of the voice request;

feeding back an action against the decision content as a decision result corresponding to the voice request to the depth determination strategy gradient DDPG model;

enabling the DDPG model to adjust the decision content corresponding to the voice request according to the decision result;

and if the decision content meets the user intention of the voice request, taking the decision content as a corresponding decision strategy of the voice request in an intelligent voice assistant.

In a second aspect, an embodiment of the present application further provides an apparatus for training an intelligent voice assistant decision making strategy, including:

the data acquisition module is used for acquiring log data of a user; the log data is used for representing behavior data and request data of a user in historical voice operation of interacting with the intelligent device;

the characteristic simulation module is used for analyzing a characteristic vector of the user voice request by utilizing the log data; the characteristic vector is used for representing a vector formed by different return results received after a user sends a voice request;

the decision module is used for taking the feature vector corresponding to the voice request as input and outputting decision content corresponding to the voice request by utilizing a depth certainty strategy gradient DDPG model; the decision content is used for representing a predicted return result which is predicted by the depth determination strategy gradient DDPG model and is corresponding to the voice request and should be received by the user;

the behavior simulation module is used for supplementing inquired contents according to the real intention of the voice request under the condition that the decision content inquires the opinion from the user; the real intent is used to represent a pre-tagged user's expected result for the voice request;

and the decision module is used for taking the supplemented decision content as a corresponding decision strategy of the voice request in the intelligent voice assistant.

In some embodiments, the feature simulation module is further configured to:

In some embodiments, the behavior simulation module is further configured to accept the decision content with a probability greater than a preset probability threshold if the decision content meets the real intention of the voice request;

the decision module is also used for taking the decision content as a corresponding decision strategy of the voice request in the intelligent voice assistant.

In some embodiments, the apparatus further includes a feedback module, configured to feed back, as a decision result corresponding to the voice request, an action of accepting the decision content to the depth determination policy gradient DDPG model.

In some embodiments, the behavior simulation module is further configured to object the decision content with a probability greater than a preset probability threshold if the decision content does not conform to the true intent of the voice request;

the feedback module is further used for feeding back an action against the decision content as a decision result corresponding to the voice request to the depth determination strategy gradient DDPG model;

the decision module is further configured to enable the deep deterministic policy gradient DDPG model to adjust the decision content corresponding to the voice request according to the decision result;

According to the above content, the training method and device for the intelligent voice assistant decision strategy in the technical scheme of the application can analyze the feature vector of the voice request of the user by using the log data stored in the intelligent device; then, the feature vector is used as the input of a depth certainty strategy gradient DDPG model, so that the depth certainty strategy gradient DDPG model outputs decision content corresponding to the voice request; then, under the condition that the decision content inquires the opinion from the user, the inquired content is supplemented according to the real intention of the voice request; and finally, taking the supplemented decision content as a corresponding decision strategy of the voice request in the intelligent voice assistant. According to the technical scheme, the interaction behavior of the user and the intelligent voice assistant can be simulated in the offline state of the intelligent device, and the decision content supplemented during the simulation of the user interaction is used as the decision strategy of the intelligent voice assistant. After the intelligent voice assistant finishes training, if the voice request corresponds to the return results of the plurality of service modules, the return results desired by the user can be accurately determined and provided for the user, and the interaction accuracy between the user and the intelligent voice assistant is improved.

Drawings

In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flowchart illustrating a training method of an intelligent voice assistant decision making strategy according to an embodiment of the present application;

FIG. 2 is a flow chart illustrating a method for obtaining feature vectors according to an embodiment of the present disclosure;

FIG. 3 is a flowchart illustrating another method for training an intelligent voice assistant decision making strategy according to an embodiment of the present application;

FIG. 4 is a schematic diagram illustrating training of an intelligent voice assistant decision making strategy according to an embodiment of the present application;

FIG. 5 is a block diagram of a training apparatus for an intelligent voice assistant decision making strategy according to an embodiment of the present application.

Detailed Description

To make the objects, embodiments and advantages of the present application clearer, the following description of exemplary embodiments of the present application will clearly and completely describe the exemplary embodiments of the present application with reference to the accompanying drawings in the exemplary embodiments of the present application, and it is to be understood that the described exemplary embodiments are only a part of the embodiments of the present application, and not all of the embodiments.

All other embodiments, which can be derived by a person skilled in the art from the exemplary embodiments described herein without inventive step, are intended to be within the scope of the claims appended hereto. In addition, while the disclosure herein has been presented in terms of one or more exemplary examples, it should be appreciated that aspects of the disclosure may be implemented solely as a complete embodiment.

It should be noted that the brief descriptions of the terms in the present application are only for the convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.

The terms "first," "second," "third," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between similar or analogous objects or entities and are not necessarily intended to limit the order or sequence of any particular one, Unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein.

Furthermore, the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or device that comprises a list of elements is not necessarily limited to those elements explicitly listed, but may include other elements not expressly listed or inherent to such product or device.

The term "module," as used herein, refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.

Currently, many intelligent devices implement intelligent interaction with users through intelligent voice assistants. For example, when the user uses the smart television with the smart voice assistant, the user may speak the desired content by voice, and the smart voice assistant may find the desired content and display the content on the smart television.

The decision engine is a core part of the intelligent voice assistant, and has the main task of performing semantic analysis and comprehensive judgment on user requests of users under various services and outputting request results which can best meet the real intentions of the users. For example, the user speaks the request content of "XX municipality" through the intelligent voice assistant, and the decision engine may search for some relevant information about "XX municipality" in the "encyclopedia" service according to the request content. Therefore, the accuracy of the decision engine directly affects the intelligence of the intelligent voice assistant, and further affects the user experience.

At present, the decision mode of the decision engine is mainly realized by a rule algorithm or a classification model algorithm and the like. The rule algorithm determines the output sequence of the services by setting the priority degrees of certain services, and determines the priority degree of each service in a mode of manually setting a threshold value, for example, video services such as television, movies and the like are main services at a television end, and video, audio and video services and the like are preferentially output when the threshold value is higher, but audio services such as music and the like may be preferentially output at a sound box end; the classification model algorithm mainly takes the analysis results of various services as input and the labels of the modules as output, judges the classification probability of the analysis results of the modules through the classification model and outputs the service with the highest probability.

The rule algorithm and the classification model algorithm play an important role in the early development stage of the intelligent voice assistant, but as the integrated services are more and more diversified, the result given by the decision engine is the same for the same request of the user no matter how many times the request is made, and the result is possibly right and possibly wrong. When a user requests to compare the results of a plurality of business fields (for example, "XX city leader" may relate to encyclopedia business and news business), the decision engine cannot determine which business field the user requests the results of, and thus the accuracy of the given results is low. Thereby affecting the experience of the user searching for content using the intelligent voice assistant.

Based on the above content, the embodiment of the application provides a training method and device for an intelligent voice assistant decision-making strategy, which can simulate the interaction behavior of a user and an intelligent voice assistant in an offline state of an intelligent device, and take the decision-making content supplemented during the simulation of the user interaction as the decision-making result of the intelligent voice assistant. After the intelligent voice assistant finishes training, if the voice request corresponds to the return results of the plurality of service modules, the return results desired by the user can be accurately determined and provided for the user, and the interaction accuracy between the user and the intelligent voice assistant is improved.

Fig. 1 is a flowchart of a training method for an intelligent voice assistant decision making strategy according to an embodiment of the present application, and as shown in fig. 1, the method includes the following steps:

step S101, acquiring log data of a user; the log data is used for representing behavior data and request data of a user in historical voice operation of interacting with the intelligent device.

Generally, when a user uses the intelligent device, the intelligent voice assistant of the intelligent device collects a voice request of the user and searches a corresponding return result according to the request, the user gives a certain feedback action after receiving or seeing the return result, the ratio of the return result is that the user wants the return result, the user accepts the return result, if the return result is not that the user wants, the user may object to the return result and input the voice again, or the return result presents a plurality of search results, and the user can select one of the return results to accept.

The user log stores some operation records and operation results, etc. when the user uses the smart device, for example, the voice request in the embodiment of the present application, the return result corresponding to the voice request, the feedback of the user on the return result, etc., which may be summarized as behavior data and request data, etc.

It should be noted that the intelligent device in the embodiments of the present application refers to an intelligent device capable of implementing voice input and voice control, and an intelligent voice assistant is usually integrated in the intelligent device to implement functions such as voice input and voice control. The smart device may not be limited to smart televisions, smart phones, smart speakers, etc.

And step S102, analyzing the feature vector of the user voice request by using the log data.

The feature vector is used for representing a vector formed by different return results received after a user sends a voice request. The features in the feature vector specifically comprise request features, resource features, historical features and the like, wherein the request features are used for representing different probabilities that returned results sent to a user by the intelligent equipment according to a voice request belong to different service modules and confidence degrees matched with the service modules; the resource characteristics are used for representing the display data or historical click rate of the media resources to which the returned results sent to the user by the intelligent equipment according to the voice request are attributed; the historical characteristics are used for representing the time of the user operating in the intelligent device, the content with high attention and the like.

In the embodiment of the present application, the content of step S102 may be understood as performing feature simulation on the voice request of the user according to the log data of the smart device.

And step S103, taking the feature vector corresponding to the voice request as input, and outputting the decision content corresponding to the voice request by using a depth deterministic strategy gradient DDPG model.

And the decision content is used for representing a predicted return result which is predicted by the depth determination strategy gradient DDPG model and is corresponding to the voice request and should be received by the user.

The DDPG model is a common algorithm model formed based on a reinforcement learning algorithm, wherein the DDPG model is a reinforcement learning algorithm based on continuous determination of strategy gradients. The DDPG model can predict the return result corresponding to each voice request in the strategy through a self computing network and an algorithm according to the characteristic vector and the like, and the predicted return result is called the decision content output by the DDPG model.

The DDPG model comprises an actor network and a critic network, wherein the actor has the main task of acting according to the feature vectors, namely outputting a result returned by a certain service module according to a voice request; the main task of critic is to monitor whether the action of the actor is correct and reasonable, and play the role of a judge, and critic feeds back the monitoring result to the actor so as to optimize the action in the future.

And step S104, in the case that the decision content is to inquire the opinion of the user, supplementing the inquired content according to the real intention of the voice request.

The true intent is used to indicate the expected result of a pre-tagged user for a voice request, e.g., a user requesting "XX City captain," who actually wants to see news about XX City captain, then the true intent of the voice request should be tagged as "news.

In general, after receiving a voice request of a user, the intelligent voice assistant returns a returned result to the user through searching, and the content of the returned result generally has three situations, one is that the content of the returned result generally meets the requirements of the user, one is that the content of the returned result generally does not meet the requirements of the user, and the other is that the content of the returned result generally contains a plurality of search contents and needs to inquire the requirements of the user. Then, the user has three operation modes for returning the voice request result, namely accepting, objecting or supplementing the content. For example, the user may request, using voice input, that "XX city chief" the news content about XX city chief is desired, and when the intelligent voice assistant returns an encyclopedia content about "XX city chief", the user may object to the return, and when the intelligent voice assistant returns a news content about "XX city chief", the user may accept the return, and when the intelligent voice assistant returns both the encyclopedia content about "XX city chief" and the news content and asks the user which one wants, the user may select the news content.

Step S104 in the embodiment of the present application may also be understood as a process of simulating a user action. The DDPG model takes a voice request as input, and can output a predicted return result which is most likely to occur as decision content after some processing and calculation. In fact, the decision content includes three contents, which are accepted by the user, not accepted by the user, and ask the user for opinions, as the above returned result. And the action of the simulated user is to simulate whether the user accepts, opposes or supplements the decision content according to the content of the inquiry, namely to simulate the interaction process of the user and the intelligent voice assistant.

When a user request is ambiguous, that is, multiple service fields are involved, for example, "XX city leader" may involve encyclopedia service and news service, etc., a decision engine in the current intelligent voice assistant cannot determine which service field the result requested by the user is, and thus the accuracy of the given result is relatively low. However, with the training method in the embodiment of the present application, in the case that the decision content is an opinion inquiry to the user, the action of the user to supplement the content may be simulated, and the business content meeting the real intention may be selected as the real decision content, for example, "news" may be selected as the decision content of the voice request "XX city leader". Then, in an actual scene, when the user inputs the voice request "XX city leader" again, the intelligent voice assistant does not output ambiguous results any more, and the intelligent voice assistant can directly output news contents according with the real intention of the user according to the training contents.

And step S105, taking the supplemented decision content as a corresponding decision strategy of the voice request in the intelligent voice assistant.

It should be noted that, in the embodiment of the present application, the contents of steps S103 to S105 may also be regarded as a process of training the DDPG model in the intelligent voice assistant, and due to the network structure characteristics of the DDPG model, the DDPG model may continuously iterate the optimization algorithm by itself through feedback or reward of the output content, and further obtain more accurate output content through continuous learning and training.

Therefore, the embodiment of the application provides a training method for a decision strategy of an intelligent voice assistant, which can simulate the interaction behavior of a user and the intelligent voice assistant in an offline state of an intelligent device, and take the decision content supplemented during the simulation of the interaction of the user as the decision result of the intelligent voice assistant. After the intelligent voice assistant finishes training, if the voice request corresponds to the return results of the plurality of service modules, the return results desired by the user can be accurately determined and provided for the user, and the interaction accuracy between the user and the intelligent voice assistant is improved.

The feature vector is a vector formed by different return results received after a user sends a voice request, wherein the features specifically comprise request features, resource features and history features. Fig. 2 is a flowchart illustrating an embodiment of obtaining a feature vector, where as shown in fig. 2, in some embodiments, the step of analyzing the feature vector of the user voice request by using the log data includes:

step S201, analyzing the request characteristics that the return result sent to the user by the intelligent device according to the voice request belongs to different service modules and the resource characteristics of the media resource to which the return result belongs.

Step S202, analyzing the historical characteristics of the user according to the time when the user sends the voice request in the intelligent equipment and the attention content in the intelligent equipment.

Step S203, the request feature, the resource feature and the history feature are used for forming the feature vector.

The request characteristics are used for representing different probabilities that returned results sent to the user by the intelligent equipment according to the voice request belong to different service modules and the confidence coefficient matched with each service module; for example, for the content of "XX city captain" requested by the user, the encyclopedia result of XX city captain may be parsed in the "encyclopedia" service module, the probability that the returned result belongs to encyclopedia is p1, the confidence level of the matched returned result is c1, the latest news result of XX city captain may be parsed in the "news" service module, the probability that the returned result belongs to news is p2, the confidence level of the matched returned result is c2, and the request characteristics may be represented as [ p1, c1, p2, c2 ].

The resource characteristics are used for representing display data or historical click rate of media resources to which returned results sent by the intelligent device to the user according to the voice request, for example, news resources are high in popularity at present or click rate is high, and target popularity of encyclopedic resources is low or click rate is low in all media resources acquired by the intelligent television. The resource feature may be a numerical value obtained by counting the click rate of the news resource to which the returned result belongs and the click rate of the encyclopedia resource, and then performing normalization processing. The numerical value may represent a resource characteristic.

The historical characteristics are used for representing the operation time of the user in the intelligent device, the content with high attention degree and the like, for example, statistics is carried out on the record requested by the user in the past 10 minutes or 1 hour, resources or services with high attention degree of the user are analyzed, the behavior habits of the user are further reflected, and a numerical value representing the historical characteristics can be obtained through analysis and calculation.

In the mathematical calculation, there may be many common statistical methods or mathematical methods to obtain the request feature, the resource feature and the history feature in the embodiment of the present application, and the embodiment of the present application is not particularly limited. In addition, the dimension of each feature can be set according to different training requirements, for example, the requirement request feature has 22 dimensions, the resource feature has 1 dimension, the history feature has 1 dimension, and then the combined feature vector has 24 dimensions.

Of course, the training of the intelligent assistant decision strategy in the embodiment of the present application is not limited to only simulating the actions of the user to supplement the content, and in some embodiments, if the decision content output by the DDPG model is determined, the training method in the embodiment of the present application may still simulate the actions accepted or rejected by the user to process the decision content. Therefore, after the step of outputting the decision content corresponding to the voice request by using the deep deterministic strategy gradient DDPG model with the feature vector corresponding to the voice request as an input, the method further comprises the following steps:

step S301, under the condition that the decision content accords with the real intention of the voice request, the decision content is received with the probability larger than the preset probability threshold.

For example, if the user requests "i want to see bullet flying" for speech with the true intent of "movie", then when the decision content is a "bullet flying" movie, the decision content is accepted with 99% probability and objected to the decision content or supplemental content with 1% probability. It can be seen that if the decision content meets the real intention, it is shown with a high probability that the decision content is accurate.

And step S302, taking the decision content as a corresponding decision strategy of the voice request in the intelligent voice assistant.

If the decision content corresponding to the voice request meets the real intention of the voice request, the intelligent voice assistant can use the decision content as a final decision strategy, when the user requests the same content from the intelligent voice assistant again, the corresponding decision strategy is directly output to the user, and the decision strategy is closest to the real intention of the user during training, so that the decision strategy is also closest to the requirement of the user during the real request of the user.

In addition, in some embodiments, after the decision-making policy is determined, the accepting action of the user on the decision-making content in each simulation is fed back to the DDPG model as a decision-making result, so that the DDPG model performs self-optimization according to the decision-making result.

The preset frequency threshold may be set to different values, such as 99%, 95%, etc., according to actual requirements. Meanwhile, the content of the step S301 may also be regarded as a noise processing procedure, that is, after the DDPG model outputs the decision content for a voice request, an action corresponding to the real intention of the user is executed at a large probability and another action is executed at a small probability. For example, 99% acceptance, 1% rejection or supplementation, 95% acceptance, 5% rejection or supplementation, etc.

As described above, if the decision content output by the DDPG model is definite, the training method in the embodiment of the present application can still simulate the accepted or objected actions of the user to process the decision content. Furthermore, in some embodiments, after the step of outputting the decision content corresponding to the voice request by using the depth deterministic policy gradient DDPG model with the feature vector corresponding to the voice request as an input, the method further includes:

step S401, in case that the decision content does not conform to the real intention of the voice request, objecting to the decision content with a probability greater than a preset probability threshold.

For example, if the user requests "i want to see bullet flying" for voice with the true intention of "movie", when the decision content is music "flying bullet", the decision content is objected to with a probability of 99% and accepted or supplemented with a probability of 1%. It can be seen that if the decision content does not conform to the true intent, it is shown with a high probability that this decision content is inaccurate.

Step S402, the action against the decision content is fed back to the DDPG model of the depth determination strategy gradient as the decision result corresponding to the voice request.

The process of continuously feeding back the decision result to the DDPG model can be understood as the process of training the DDPG model and then training the intelligent voice assistant, and the DDPG model can make the output decision content more accurate through continuous learning and training.

Step S403, the DDPG model of the depth certainty strategy gradient adjusts the decision content corresponding to the voice request according to the decision result.

When the decision content does not meet the real intention of the voice request, it is indicated that in an actual application scene, the decision content does not meet the requirement of the user in a high probability, so that the DDPG model needs to readjust the algorithm of the DDPG model according to the feedback result and optimize the strategy, so that the strategy content output next time is more difficult to meet the real intention of the user. Still taking the above as an example, if the user wants to see the movie "let bullet fly", and the decision content returns music with the result "let bullet fly", the user object to the decision content and feeds back the object action to the DDPG model, which relearns, and the decision content of the movie "let bullet fly" may be output next time for the request.

And S404, taking the decision content as a corresponding decision strategy of the voice request in the intelligent voice assistant under the condition that the decision content accords with the user intention of the voice request.

If the decision content re-output by the DDPG model accords with the real intention of the voice request, the intelligent voice assistant can use the decision content as a final decision strategy, when the user requests the same content from the intelligent voice assistant again, the corresponding decision strategy is directly output to the user, and the decision strategy is closest to the real intention of the user during training, so that the decision strategy is also closest to the requirement of the user during the real request of the user.

Fig. 3 is a flowchart of another training method for an intelligent voice assistant decision making strategy according to an embodiment of the present application, and in some embodiments, the above steps S301 to S302, and steps S401 to S404 may be further combined into fig. 1, so as to form the training method for an intelligent voice assistant decision making strategy shown in fig. 3, which includes steps S501 to S511.

In the embodiment of the application, the decision strategy of the intelligent voice assistant is trained, and in fact, the DDPG model is trained. Fig. 4 is a training diagram of an intelligent voice assistant decision making strategy according to an embodiment of the present application, and as shown in fig. 4, the feature simulation and the action simulation in the above embodiments may be collectively regarded as being performed by a simulation module 601. As shown in fig. 4, the decision content and the real intention comparing process in the above embodiments can also be implemented by the arbiter 603, and the result of the comparison is fed back to the DDPG model 602.

In the training process of the DDPG model, firstly, the DDPG model is enabled to learn the preference of a user by utilizing the feature vector requested by the user, and then a predicted decision content is output, for example, the user requests 'i want to see to let the bullet fly' from the intelligent voice assistant, and the decision content can be 'let the bullet fly' movie, 'let the bullet fly' music or let the user select 'let the bullet fly' movie or music. Simulating the operation of the user on the decision content according to the real intention of the user, wherein the operation is accepting, resisting or supplementing the content, if the user accepts, the decision content is shown to meet the requirement of the user in practical application with high probability, the decision content can be used as a decision strategy corresponding to the voice request in the intelligent voice assistant, for example, a movie which is allowed to fly by a bullet is accepted, and the movie which is allowed to fly by a bullet is used as a decision strategy which requests that the user wants to see the bullet to fly by the bullet in the intelligent voice assistant; if music which lets the bullet fly is objected to, the DDPG model readjusts the decision content, the decision content output next time is probably the movie which lets the bullet fly, and the new decision content is used as a decision strategy for 'i want to see the bullet fly' in the intelligent voice assistant; for a movie or music that asks the user to select "fly bullet", if the user directly selects "movie", the "fly bullet" movie can be used as the decision strategy in the intelligent voice assistant that i want to see the fly bullet.

In order to make the decision content output by the model more accurate, the training of the model may be thousands of times, and if the content is supplemented manually in the case of ambiguous decision content (such as movie or music "fly bullet"), too much manpower is wasted and it is time-consuming. In order to complete the training of the DDPG model more efficiently, in the embodiment of the application, the operation action of a user on decision content is simulated, so that the interaction process of a person and the model is simulated, excessive manual work is avoided participating in the model training process, and the model training efficiency can be effectively improved.

After the decision strategy of the intelligent voice assistant is determined, in the process of practical application, when the user inputs the same voice request again, the result which the user wants can be directly given.

One of the most prominent advantages of the embodiment of the present application for simulating the user interaction process is that when the decision content includes a plurality of selection contents, the selection or supplement process of the user can be simulated, so that the uncertain contents in the decision content can be changed into the determined contents, and then in practical applications, when the intelligent voice assistant encounters the same voice request again, the determined contents in the decision content can be output, for example, when the user supplements a movie that the user wants to watch "let bullet fly" during training, then when in practical operation, the user inputs "i want to watch bullet fly" again, and then the movie that the user lets bullet fly "can be obtained directly from the intelligent voice assistant. Furthermore, in the interaction process with the user, the intelligent voice assistant can improve the accuracy of the searched content, and the user can often obtain the correct result by using the intelligent voice assistant.

It is worth mentioning that, in the training method of the intelligent voice assistant decision making strategy according to the embodiment of the present application, in an offline state of the intelligent device, a user interaction process may also be simulated by using only local log data of the intelligent device, so as to implement training of the decision making strategy.

According to the content, the embodiment of the application provides a training method for an intelligent voice assistant decision strategy, which can analyze the feature vector of a user voice request by using log data stored in intelligent equipment; then, the feature vector is used as the input of a depth certainty strategy gradient DDPG model, so that the depth certainty strategy gradient DDPG model outputs decision content corresponding to the voice request; then, under the condition that the decision content inquires the opinion from the user, the inquired content is supplemented according to the real intention of the voice request; and finally, taking the supplemented decision content as a corresponding decision strategy of the voice request in the intelligent voice assistant. According to the technical scheme, the interaction behavior of the user and the intelligent voice assistant can be simulated in the offline state of the intelligent device, and the decision content supplemented during the simulation of the user interaction is used as the decision strategy of the intelligent voice assistant. After the intelligent voice assistant finishes training, if the voice request corresponds to the return results of the plurality of service modules, the return results desired by the user can be accurately determined and provided for the user, and the interaction accuracy between the user and the intelligent voice assistant is improved.

Fig. 5 is a block diagram of a training apparatus for an intelligent voice assistant decision making strategy according to an embodiment of the present application, and as shown in fig. 5, the training apparatus for an intelligent voice assistant decision making strategy according to an embodiment of the present application includes:

a data obtaining module 701, configured to obtain log data of a user; the log data is used for representing behavior data and request data of a user in historical voice operation of interacting with the intelligent device; a feature simulation module 702, configured to analyze a feature vector of the user voice request by using the log data; the characteristic vector is used for representing a vector formed by different return results received after a user sends a voice request; a decision module 703, configured to take the feature vector corresponding to the voice request as an input, and output a decision content corresponding to the voice request by using a depth deterministic policy gradient DDPG model; the decision content is used for representing a predicted return result which is predicted by the depth determination strategy gradient DDPG model and is corresponding to the voice request and should be received by the user; a behavior simulation module 704, configured to supplement the queried content according to the real intention of the voice request in the case that the decision content is to query the user for an opinion; the real intent is used to represent a pre-tagged user's expected result for the voice request; and the decision module 705 is configured to use the supplemented decision content as a corresponding decision policy of the voice request in the intelligent voice assistant.

The feature simulation module 702 and the behavior simulation module 704 may implement the contents implemented by the simulation module 601 shown in fig. 4, the decision module 703 may implement the contents implemented by the DDPG model 602 shown in fig. 4, and the decision module 705 may implement the contents implemented by the discriminator 603 shown in fig. 4.

In some embodiments, the feature simulation module is further configured to: analyzing the request characteristics that the return result sent to the user by the intelligent equipment according to the voice request belongs to different service modules and the resource characteristics of the media resource to which the return result belongs; analyzing the historical characteristics of the user according to the time when the user sends the voice request in the intelligent equipment and the attention content in the intelligent equipment; and composing the feature vector by using the request feature, the resource feature and the historical feature.

In some embodiments, the behavior simulation module is further configured to accept the decision content with a probability greater than a preset probability threshold if the decision content meets the real intention of the voice request; the decision module is also used for taking the decision content as a corresponding decision strategy of the voice request in the intelligent voice assistant.

In some embodiments, the apparatus further comprises: and the feedback module is used for feeding back the action of receiving the decision content as a decision result corresponding to the voice request to the depth determination strategy gradient DDPG model.

In some embodiments, the behavior simulation module is further configured to object the decision content with a probability greater than a preset probability threshold if the decision content does not conform to the true intent of the voice request; the feedback module is further used for feeding back an action against the decision content as a decision result corresponding to the voice request to the depth determination strategy gradient DDPG model; the decision module is further configured to enable the deep deterministic policy gradient DDPG model to adjust the decision content corresponding to the voice request according to the decision result; and if the decision content meets the user intention of the voice request, taking the decision content as a corresponding decision strategy of the voice request in an intelligent voice assistant.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A training method of an intelligent voice assistant decision strategy is characterized by comprising the following steps:

2. The method of claim 1, wherein the step of analyzing the feature vector of the user voice request using the log data comprises:

3. The method according to claim 1, wherein after the step of outputting the decision content corresponding to the voice request by using a Deep Deterministic Policy Gradient (DDPG) model with the feature vector corresponding to the voice request as an input, the method further comprises:

4. The method of claim 3, wherein after the step of using the decision content as the corresponding decision policy of the voice request in the intelligent voice assistant, the method further comprises:

5. The method according to any one of claims 1-4, wherein after the step of outputting the decision content corresponding to the voice request by using a Deep Deterministic Policy Gradient (DDPG) model with the feature vector corresponding to the voice request as an input, the method further comprises:

6. An intelligent voice assistant decision making strategy training device, comprising:

7. The apparatus of claim 6, wherein the feature simulation module is further configured to:

8. The apparatus of claim 6,

the behavior simulation module is further used for receiving the decision content with a probability greater than a preset probability threshold under the condition that the decision content meets the real intention of the voice request;

9. The apparatus of claim 8, further comprising a feedback module configured to feed back an action of accepting the decision content as a decision result corresponding to the voice request to the Depth Determination Policy Gradient (DDPG) model.

10. The apparatus of claim 9,

the behavior simulation module is further used for objecting to the decision content with a probability greater than a preset probability threshold under the condition that the decision content does not accord with the real intention of the voice request;