CN111899728B

CN111899728B - Training method and device for intelligent voice assistant decision strategy

Info

Publication number: CN111899728B
Application number: CN202010719035.6A
Authority: CN
Inventors: 朱飞; 连欢
Original assignee: Hisense Electronic Technology Wuhan Co ltd
Current assignee: Hisense Electronic Technology Wuhan Co ltd
Priority date: 2020-07-23
Filing date: 2020-07-23
Publication date: 2024-05-28
Anticipated expiration: 2040-07-23
Also published as: CN111899728A

Abstract

The training method and the training device for the intelligent voice assistant decision strategy can analyze the feature vector of the user voice request by utilizing the log data; then, the feature vector is used as input, so that the DDPG model outputs decision content corresponding to the voice request; then, under the condition that the decision content is opinion inquired from a user, supplementing the inquired content according to the real intention of the voice request; and finally taking the supplemented decision content as a decision strategy corresponding to the voice request in the intelligent voice assistant. According to the technical scheme, the interaction behavior of the user and the intelligent voice assistant can be simulated in an off-line state of the intelligent equipment, and the complementary decision content during the simulation of the user interaction is used as a decision strategy of the intelligent voice assistant. After the intelligent voice assistant finishes training, if the voice request corresponds to the return results of the plurality of service modules, the return results wanted by the user can be accurately determined and provided for the user, and the interaction accuracy of the user and the intelligent voice assistant is improved.

Description

Training method and device for intelligent voice assistant decision strategy

Technical Field

The application relates to the technical field of intelligent home, in particular to a training method and device for an intelligent voice assistant decision strategy.

Background

Currently, many intelligent devices implement intelligent interactions with users through intelligent voice assistants. The decision engine is a core part of the intelligent voice assistant, and the main task of the decision engine is to perform semantic analysis and comprehensive judgment on a user request of a user under each service and output a request result which can be most in line with the actual intention of the user. Therefore, the accuracy of the decision engine directly affects the intelligentization degree of the intelligent voice assistant, and further affects the user experience.

The decision mode of the decision engine is mainly realized by a rule algorithm or a classification model algorithm and the like. The rule algorithm determines the output sequence of the services by setting the priority of certain services, and determines the priority of each service by manually setting a threshold, for example, at a television end, video services such as a television film and the like are main services, if the threshold is higher, the services such as video, audio and the like are output preferentially, but at a sound box end, audio services such as music and the like may be output preferentially; the classification model algorithm mainly takes the analysis results of various services as input, takes the labels of all the modules as output, judges the classification probability of the analysis results of all the modules through the classification model, and outputs the service with the highest probability.

The rule algorithm and the classification model algorithm play an important role in the early development stage of the intelligent voice assistant, but as the integrated service becomes more and more diversified, the decision engine gives the same result for the same request of the user, no matter how many times the request is, and the result is possibly right and wrong. When the user requests to compare ambiguities, i.e. to relate to a plurality of service areas, the decision engine cannot determine which service area the user requests results in, and thus the accuracy of the given results is low. Thereby affecting the user's experience of searching for content using the intelligent voice assistant.

Disclosure of Invention

The application provides a training method and device for an intelligent voice assistant decision strategy, which are used for solving the problem of low accuracy of content searching by an intelligent voice assistant in the existing intelligent equipment, so as to ensure the experience effect of a user.

In a first aspect, the present application provides a training method for an intelligent voice assistant decision strategy, including:

Acquiring log data of a user; the log data is used for representing behavior data and request data of a user in historical voice operation interacted with the intelligent equipment;

analyzing the feature vector of the user voice request by utilizing the log data; the feature vector is used for representing a vector formed by different return results received after a user sends out a voice request;

taking the feature vector corresponding to the voice request as input, and outputting decision content corresponding to the voice request by using a depth deterministic strategy gradient DDPG model; the decision content is used for representing a predicted return result corresponding to the voice request, which is predicted by the depth determination strategy gradient DDPG model and which is received by a user;

Under the condition that the decision content is opinion inquired from a user, supplementing the inquired content according to the real intention of the voice request; the real intent is to represent an expected result of a pre-labeled user for the voice request;

And taking the supplemented decision content as a decision strategy corresponding to the voice request in the intelligent voice assistant.

In some embodiments, the step of using the log data to analyze the feature vector of the user voice request comprises:

Analyzing request characteristics of the returned results sent to the user by the intelligent equipment according to the voice request and resource characteristics of media resources to which the returned results belong, wherein the request characteristics belong to different service modules;

analyzing historical characteristics of a user according to the time of the user sending the voice request in the intelligent device and the attention content in the intelligent device;

the feature vector is composed using the request feature, the resource feature, and the history feature.

In some embodiments, after the step of outputting the decision content corresponding to the voice request by using the depth deterministic policy gradient DDPG model, the step of taking the feature vector corresponding to the voice request as an input further includes:

accepting the decision content with a probability greater than a preset probability threshold under the condition that the decision content accords with the real intention of the voice request;

And taking the decision content as a decision strategy corresponding to the voice request in an intelligent voice assistant.

In some embodiments, after the step of using the decision content as the decision policy for the voice request in the intelligent voice assistant, the method further comprises:

And feeding back the action of receiving the decision content as a decision result corresponding to the voice request to the depth determination strategy gradient DDPG model.

in the case that the decision content does not conform to the true intent of the voice request, objecting the decision content with a probability greater than a preset probability threshold;

Feeding back an action against the decision content as a decision result corresponding to the voice request to the depth determination strategy gradient DDPG model;

Enabling the depth deterministic strategy gradient DDPG model to adjust the decision content corresponding to the voice request according to the decision result;

and taking the decision content as a decision strategy corresponding to the voice request in an intelligent voice assistant under the condition that the decision content accords with the user intention of the voice request.

In a second aspect, an embodiment of the present application further provides a training device for an intelligent voice assistant decision strategy, including:

the data acquisition module is used for acquiring log data of a user; the log data is used for representing behavior data and request data of a user in historical voice operation interacted with the intelligent equipment;

the feature simulation module is used for analyzing feature vectors of the user voice request by utilizing the log data; the feature vector is used for representing a vector formed by different return results received after a user sends out a voice request;

The decision module is used for taking the feature vector corresponding to the voice request as input, and outputting decision content corresponding to the voice request by using a depth deterministic strategy gradient DDPG model; the decision content is used for representing a predicted return result corresponding to the voice request, which is predicted by the depth determination strategy gradient DDPG model and which is received by a user;

The behavior simulation module is used for supplementing the inquired content according to the real intention of the voice request under the condition that the decision content is inquired opinion of a user; the real intent is to represent an expected result of a pre-labeled user for the voice request;

and the decision module is used for taking the supplemented decision content as a decision strategy corresponding to the voice request in the intelligent voice assistant.

In some embodiments, the feature simulation module is further configured to:

In some embodiments, the behavior modeling module is further configured to accept the decision content with a probability greater than a preset probability threshold if the decision content meets the true intent of the voice request;

the decision module is further configured to use the decision content as a decision policy corresponding to the voice request in the intelligent voice assistant.

In some embodiments, the method further includes a feedback module, configured to feedback the action of accepting the decision content to the depth determination policy gradient DDPG model as a decision result corresponding to the voice request.

In some embodiments, the behavior modeling module is further configured to, if the decision content does not conform to the true intent of the voice request, objection the decision content with a probability greater than a preset probability threshold;

The feedback module is further configured to feedback an action against the decision content as a decision result corresponding to the voice request to the depth determination policy gradient DDPG model;

The decision module is further configured to enable the depth deterministic strategy gradient DDPG model to adjust the decision content corresponding to the voice request according to the decision result;

As can be seen from the above, the training method and apparatus for the decision strategy of the intelligent voice assistant in the technical solution of the present application can analyze the feature vector of the user voice request by using the log data stored in the intelligent device; then, the feature vector is used as the input of the depth deterministic strategy gradient DDPG model, so that the depth deterministic strategy gradient DDPG model outputs decision content corresponding to the voice request; then, under the condition that the decision content is opinion inquired from a user, supplementing the inquired content according to the real intention of the voice request; and finally taking the supplemented decision content as a decision strategy corresponding to the voice request in the intelligent voice assistant. According to the technical scheme, the interaction behavior of the user and the intelligent voice assistant can be simulated in an off-line state of the intelligent equipment, and the complementary decision content during the simulation of the user interaction is used as a decision strategy of the intelligent voice assistant. After the intelligent voice assistant finishes training, if the voice request corresponds to the return results of the plurality of service modules, the return results wanted by the user can be accurately determined and provided for the user, and the interaction accuracy of the user and the intelligent voice assistant is improved.

Drawings

In order to more clearly illustrate the technical solution of the present application, the drawings that are needed in the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

FIG. 1 is a flow chart of a training method of an intelligent voice assistant decision strategy according to an embodiment of the present application;

FIG. 2 is a flow chart of obtaining feature vectors according to an embodiment of the present application;

FIG. 3 is a flow chart of another method of training an intelligent voice assistant decision strategy according to an embodiment of the present application;

FIG. 4 is a training schematic diagram of an intelligent voice assistant decision strategy according to an embodiment of the present application;

fig. 5 is a block diagram of a training device for an intelligent voice assistant decision strategy according to an embodiment of the present application.

Detailed Description

For the purposes of making the objects, embodiments and advantages of the present application more apparent, an exemplary embodiment of the present application will be described more fully hereinafter with reference to the accompanying drawings in which exemplary embodiments of the application are shown, it being understood that the exemplary embodiments described are merely some, but not all, of the examples of the application.

Based on the exemplary embodiments described herein, all other embodiments that may be obtained by one of ordinary skill in the art without making any inventive effort are within the scope of the appended claims. Furthermore, while the present disclosure has been described in terms of an exemplary embodiment or embodiments, it should be understood that each aspect of the disclosure can be practiced separately from the other aspects.

It should be noted that the brief description of the terminology in the present application is for the purpose of facilitating understanding of the embodiments described below only and is not intended to limit the embodiments of the present application. Unless otherwise indicated, these terms should be construed in their ordinary and customary meaning.

The terms first, second, third and the like in the description and in the claims and in the above-described figures are used for distinguishing between similar or similar objects or entities and not necessarily for describing a particular sequential or chronological order, unless otherwise indicated (Unless otherwise indicated). It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application are, for example, capable of operation in sequences other than those illustrated or otherwise described herein.

Furthermore, the terms "comprise" and "have," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to those elements expressly listed, but may include other elements not expressly listed or inherent to such product or apparatus.

The term "module" as used in this disclosure refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the function associated with that element.

Currently, many intelligent devices implement intelligent interactions with users through intelligent voice assistants. For example, when a user uses a smart television with a smart voice assistant, he or she speaks what he or she wants to see in a voice manner, and the smart voice assistant finds what he or she wants to see and displays the content on the smart television.

The decision engine is a core part of the intelligent voice assistant, and the main task of the decision engine is to perform semantic analysis and comprehensive judgment on a user request of a user under each service and output a request result which can be most in line with the actual intention of the user. For example, the user, through the intelligent voice assistant, speaks the requested content of "XX City Inquiries", from which the decision engine can search for some relevant information about "XX City Inquiries" in the "encyclopedia" service. Therefore, the accuracy of the decision engine directly affects the intelligentization degree of the intelligent voice assistant, and further affects the user experience.

Currently, the decision making mode of the decision engine is mainly implemented through a rule algorithm or a classification model algorithm. The rule algorithm determines the output sequence of the services by setting the priority of certain services, and determines the priority of each service by manually setting a threshold, for example, at a television end, video services such as a television film and the like are main services, if the threshold is higher, the services such as video, audio and the like are output preferentially, but at a sound box end, audio services such as music and the like may be output preferentially; the classification model algorithm mainly takes the analysis results of various services as input, takes the labels of all the modules as output, judges the classification probability of the analysis results of all the modules through the classification model, and outputs the service with the highest probability.

The rule algorithm and the classification model algorithm play an important role in the early development stage of the intelligent voice assistant, but as the integrated service becomes more and more diversified, the decision engine gives the same result for the same request of the user, no matter how many times the request is, and the result is possibly right and wrong. When a user requests to compare ambiguities, i.e. to relate to multiple business areas (e.g. "XX city length" may relate to encyclopedia business and news business), the decision engine cannot determine which business area the user requests results in, and the accuracy of the results is lower. Thereby affecting the user's experience of searching for content using the intelligent voice assistant.

Based on the above, the embodiment of the application provides a training method and a training device for an intelligent voice assistant decision strategy, which can simulate the interaction behavior of a user and the intelligent voice assistant in an off-line state of intelligent equipment, and take the complementary decision content during the simulated user interaction as a decision result of the intelligent voice assistant. After the intelligent voice assistant finishes training, if the voice request corresponds to the return results of the plurality of service modules, the return results wanted by the user can be accurately determined and provided for the user, and the interaction accuracy of the user and the intelligent voice assistant is improved.

Fig. 1 is a flowchart of a training method of an intelligent voice assistant decision strategy according to an embodiment of the present application, as shown in fig. 1, the method includes the following steps:

Step S101, acquiring log data of a user; the log data is used to represent behavior data and request data of a user in historical voice operations of interacting with the smart device.

Typically, when a user uses a smart device, the smart voice assistant of the smart device collects a voice request of the user and searches for a corresponding return result according to the request, the user gives a certain feedback action after receiving or seeing the return result, the proportion is that the user wants to receive the return result, if the return result is not that the user wants to receive the return result, the user may be opposite to the return result and re-input the voice, or the return result presents multiple search results, and the user can select one of the return results to receive.

The user log stores some operation records and operation results and the like when the user uses the intelligent device, for example, a voice request, a return result corresponding to the voice request, feedback of the return result by the user and the like in the embodiment of the application, and the operation records and the operation results and the like can be summarized into behavior data, request data and the like.

It should be noted that, in the embodiment of the present application, the intelligent device refers to an intelligent device that can implement voice input and implement voice control, and generally, an intelligent voice assistant is integrated in the intelligent device to implement functions such as voice input and voice control. The smart device may not be limited to a smart television, a smart phone, a smart speaker, etc.

Step S102, the characteristic vector of the user voice request is analyzed by using the log data.

The feature vector is used for representing a vector formed by different returned results received after the user sends out the voice request. The features in the feature vector specifically comprise a request feature, a resource feature, a history feature and the like, wherein the request feature is used for representing different probabilities that a return result sent to a user by the intelligent device according to a voice request belongs to different service modules and confidence degrees matched with the service modules; the resource characteristics are used for representing display data or historical click rate and the like of media resources to which a returned result is attributed by the intelligent equipment according to the voice request; the history feature is used to characterize the time the user is operating in the smart device, content of high interest, and so on.

The content of step S102 in the embodiment of the present application may be understood as feature simulation of the voice request of the user according to the log data of the smart device.

Step S103, taking the feature vector corresponding to the voice request as input, and outputting the decision content corresponding to the voice request by using the depth deterministic strategy gradient DDPG model.

The decision content is used to represent the predicted return result corresponding to the voice request that the user predicted by the depth determination policy gradient DDPG model should receive.

The depth deterministic strategy gradient DDPG model is a commonly used algorithm model formed based on a reinforcement learning algorithm, and the depth deterministic strategy gradient DDPG is a reinforcement learning algorithm based on a continuous determination strategy gradient. The depth deterministic strategy gradient DDPG model can predict the corresponding return result of each voice request on strategy through the self computing network and algorithm according to the feature vector and the like, and the predicted return result is called the decision content output by the DDPG model.

The DDPG model comprises a actor (actor) network and a critic (reviewer) network, wherein the main task of actor is to act according to the feature vector, namely, output the returned result of a certain business module according to a voice request; critic the main task is to supervise whether the action of actor is correct and reasonable, acting as a referee, critic will feed back the result of supervision to actor for future optimization of the action.

Step S104, in the case that the decision content is the query opinion of the user, the queried content is supplemented according to the real intention of the voice request.

The true intent is used to represent the expected outcome of a pre-labeled user for a voice request, e.g., the user request "XX City Length", which is what is meant to be looking at news about XX City Length, then the true intent of the voice request should be labeled "news".

In general, after the intelligent voice assistant receives the voice request of the user, a return result is returned to the user after searching, and the contents of the return result generally have three conditions, namely, the content of the return result accords with the requirement of the user, the content of the return result does not accord with the requirement of the user, and the content of the return result contains various search contents and needs to inquire the requirement of the user. Then the user has three ways of operating the return of the voice request, either to accept, to reject or to supplement the content, respectively. For example, the user uses voice input that the requested content is "XX City, what is desired is news content about XX City, and when the intelligent voice assistant returns an encyclopedia content about" XX City ", the user can reject the return result, and when the intelligent voice assistant returns a news content about" XX City ", the user can accept the return result, and when the intelligent voice assistant returns both the encyclopedia content about" XX City "and the news content and asks which user wants.

Step S104 in the embodiment of the present application may also be understood as a process of simulating the user action. The DDPG model takes the voice request as input, and after some processing and calculation, the predicted return result most likely to occur can be output as decision content. In fact, the decision content is the same as the returned results described above, including three types of content accepted by the user, not accepted by the user, and asking the user for comments. The action of the simulation user is to simulate the user to accept, reject or supplement the decision content according to the inquired content, namely, simulate the interaction process of the user and the intelligent voice assistant.

When the user requests to compare ambiguities, namely, a plurality of service areas, such as "XX city length" may relate to encyclopedia service and news service, the decision engine in the current intelligent voice assistant cannot determine which service area is the result of the user request, and the accuracy of the given result is lower. However, using the training method in the embodiment of the present application, it is possible to simulate the action of the user's supplementary content in the case where the decision content is asking the opinion, select the business content conforming to the actual intention as the actual decision content, for example, select "news" as the decision content of the voice request "XX city length". In an actual scene, when the user inputs the voice request 'XX city length', the intelligent voice assistant can not output the ambiguous result any more, and can directly output news content which accords with the actual intention of the user according to the trained content.

Step S105, taking the supplemented decision content as a decision strategy corresponding to the voice request in the intelligent voice assistant.

It should be noted that, in the embodiment of the present application, the content of steps S103 to S105 may also be regarded as a process of training the DDPG model in the intelligent voice assistant, and due to the network structure characteristics of the DDPG model, the method may continuously iterate the optimization algorithm by itself through feedback or rewarding of the output content, and may further obtain more accurate output content through continuous learning training.

As can be seen from the foregoing, the embodiment of the present application provides a training method for an intelligent voice assistant decision strategy, which can simulate the interaction behavior of a user and an intelligent voice assistant in an offline state of an intelligent device, and uses the decision content supplemented during the simulated user interaction as a decision result of the intelligent voice assistant. After the intelligent voice assistant finishes training, if the voice request corresponds to the return results of the plurality of service modules, the return results wanted by the user can be accurately determined and provided for the user, and the interaction accuracy of the user and the intelligent voice assistant is improved.

The feature vector is a vector formed by different returned results received after a user sends out a voice request, wherein the features specifically comprise request features, resource features and history features. FIG. 2 is a flow chart of obtaining feature vectors according to an embodiment of the present application, as shown in FIG. 2, in some embodiments, the step of analyzing feature vectors of a user voice request using the log data includes:

Step S201, analyzing, according to the voice request, the returned result sent to the user by the intelligent device belongs to the request characteristics of different service modules and the resource characteristics of the media resource to which the returned result belongs.

Step S202, analyzing the history characteristics of the user according to the time when the user sends the voice request in the intelligent device and the attention content in the intelligent device.

And step S203, the request feature, the resource feature and the history feature are utilized to form the feature vector.

The request features are used for representing different probabilities that the returned results sent to the user by the intelligent device according to the voice request belong to different service modules and confidence degrees matched with the service modules; for example, for the content of "XX city length" requested by the user, in the "encyclopedia" service module, the encyclopedia result of the XX city length may be resolved, where the probability that the returned result belongs to the encyclopedia is p1, the confidence coefficient of the matched returned result is c1, in the "news" service module, the latest news result of the XX city length may be resolved, where the probability that the returned result belongs to the news is p2, the confidence coefficient of the matched returned result is c2, and further the request feature may be represented as [ p1, c1, p2, c2].

The resource characteristics are used for representing display data or historical click rate and the like of media resources to which a return result sent by the intelligent device to the user according to the voice request belongs, for example, the current heat of news resources is higher or the click rate is higher, and the target heat of encyclopedia resources is lower or the click rate is lower in all media resources acquired by the intelligent television. The resource characteristic can be a numerical value obtained by counting the click rate of the news resource to which the returned result belongs and the click rate of the encyclopedia resource and then normalizing. The value may represent a resource characteristic.

The historical features are used for representing the time of the user operating in the intelligent device, content with high attention, and the like, for example, records of the user requesting in the past 10 minutes or 1 hour are counted, resources or services with high attention of the user are analyzed, the behavior habit of the user is reflected, and a numerical value can be obtained through analysis and calculation to represent the historical features.

In the mathematical calculation, there may be many commonly used statistical methods or mathematical methods to obtain the request feature, the resource feature and the history feature in the embodiment of the present application, which is not limited in detail in the embodiment of the present application. In addition, the dimensions of each feature can be set according to different training requirements, for example, 22 dimensions are required for the requested feature, 1 dimension for the resource feature, 1 dimension for the history feature, and then 24 dimensions are required for the combined feature vector.

Of course, training of the decision strategy of the intelligent assistant in the embodiment of the present application is not limited to simulating the action of the user's supplemental content, and in some embodiments, if the decision content output by the DDPG model is determined, the training method in the embodiment of the present application may still simulate the action accepted or objectionable by the user to process. Thus, after the step of outputting the decision content corresponding to the voice request by using the depth deterministic strategy gradient DDPG model with the feature vector corresponding to the voice request as input, the method further comprises:

Step S301, in the case that the decision content accords with the real intention of the voice request, accepting the decision content with a probability larger than a preset probability threshold.

For example, if the user's true intent to "I want to see the bullet" is "movie" for a voice request, then when the decision content is "fly bullet" movie, it will be accepted with 99% probability and with 1% probability it will be against the decision content or the supplemental content. It can be seen that if the decision content meets the real intent, it is accurate to say that this decision content is accurate on a large probability.

Step S302, the decision content is used as a decision strategy corresponding to the voice request in the intelligent voice assistant.

If the decision content corresponding to the voice request accords with the real intention of the voice request, the intelligent voice assistant can use the decision content as a final decision strategy, and when the user requests the same content from the intelligent voice assistant again, the corresponding decision strategy is directly output to the user, and the decision strategy is closest to the real intention of the user during training, so that the decision strategy is also closest to the user requirement during the real request of the user.

In addition, in some embodiments, after determining the decision strategy, the decision result is fed back to the DDPG model as the decision result to make DDPG model perform self optimization according to the decision result.

The preset probability threshold may be set to different values, for example, 99%, 95%, etc., according to actual requirements. Meanwhile, the content of the above step S301 may also be regarded as a noise processing procedure, that is, when the DDPG model outputs the decision content for a voice request, the action corresponding to the real intention of the user is executed with a high probability, and the other actions are executed with a low probability. For example, with 99% probability, with 1% probability, against or supplement content, further for example, with 95% probability, with 5% probability, etc.

As described above, if the decision content of DDPG model output is determined, the training method in the embodiment of the present application can still simulate the user's accepted or objectionable actions to process. Furthermore, in some embodiments, after the step of outputting the decision content corresponding to the voice request by using the depth deterministic strategy gradient DDPG model, the method further includes:

In step S401, in the case that the decision content does not conform to the real intention of the voice request, the decision content is objectively processed with a probability greater than a preset probability threshold.

For example, if the user's true intent to "i want to see the bullet" is "movie" for a voice request, then when the decision content is "fly bullet" music, it is against the decision content with 99% probability, and accepts the decision content or supplemental content with 1% probability. It can be seen that if the decision content does not fit the actual intent, it is highly probable that this decision content is inaccurate.

Step S402, the action against the decision content is fed back to the depth determination strategy gradient DDPG model as a decision result corresponding to the voice request.

The process of continuously feeding back decision results to the DDPG model can be understood as a process of training the DDPG model and further training the intelligent voice assistant, and the DDPG model enables the output decision content to be more accurate through continuous learning training.

Step S403, the depth deterministic strategy gradient DDPG model adjusts the decision content corresponding to the voice request according to the decision result.

When the decision content does not accord with the real intention of the voice request, the fact that the decision content does not meet the requirement of the user in a practical application scene in a high probability mode is indicated, so that the DDPG model needs to readjust the algorithm of the decision content according to the feedback result and optimize the strategy, and the strategy content output next time is more difficult to meet the real intention of the user. Taking the above example as still, if the user wants to watch the movie "let bullet fly", and the result returned in the decision content is "let bullet fly" music, then the user will reject the decision content and feed back the reject action to the DDPG model, the DDPG model relearns, and the next time the movie "let bullet fly" decision content may be output for this request.

In step S404, in the case that the decision content meets the user intention of the voice request, the decision content is used as the decision policy corresponding to the voice request in the intelligent voice assistant.

If the decision content re-output by the DDPG model accords with the real intention of the voice request, the intelligent voice assistant can use the decision content as a final decision strategy, and when the user requests the same content from the intelligent voice assistant again, the corresponding decision strategy is directly output to the user, and the decision strategy is closest to the real intention of the user during training, so that the decision strategy is also the result closest to the user requirement during the real request of the user.

Fig. 3 is a flowchart of another training method of an intelligent voice assistant decision strategy according to an embodiment of the present application, and in some embodiments, the above steps S301-S302, and steps S401-S404 may be further combined into fig. 1, so as to form the training method of an intelligent voice assistant decision strategy including steps S501-S511 shown in fig. 3.

In the embodiment of the application, the decision strategy of the intelligent voice assistant is trained, and the DDPG model is actually trained. Fig. 4 is a training schematic diagram of an intelligent voice assistant decision strategy according to an embodiment of the present application, and as shown in fig. 4, the feature simulation and the action simulation in the above embodiment may be collectively regarded as being performed by a simulation module 601. The process of comparing the decision content with the actual intent in the above embodiment may also be implemented by the arbiter 603, as shown in fig. 4, and the result of the comparison is fed back to the DDPG model 602.

In the training process of DDPG model, firstly, the user learns the user's preference by using the feature vector requested by the user, and then outputs a predicted decision content, for example, the user requests "I want to watch the bullet fly" to the intelligent voice assistant, and the decision content can be a movie of "fly the bullet", music of "fly the bullet" or a movie or music of "fly the bullet" selected by the user. Then simulating the operation of the user on the decision content according to the real intention of the user, namely accepting, countering or supplementing the content, if the user accepts the decision content, then explaining that the decision content meets the requirement of the user in the actual application with high probability, then taking the decision content as a decision strategy corresponding to the voice request in the intelligent voice assistant, for example, accepting a movie of 'flying a bullet', and taking the movie of 'flying a bullet' as a decision strategy for requesting 'I want to see flying a bullet' in the intelligent voice assistant; if the music of 'flying bullet' is reversed, the DDPG model will readjust the decision content, the next output decision content is probably the movie of 'flying bullet', and then the new decision content is used as the decision strategy of 'i want to see flying bullet' in the intelligent voice assistant; whereas for a movie or music asking the user to select "fly bullet" if the user directly selects "movie", the movie "fly bullet" can be used as a decision strategy in the intelligent voice assistant "i want to see fly bullet".

In order to make the decision content output by the model more accurate, training of the model can be hundreds of times, and if the decision content is ambiguous (such as a movie or music for flying a bullet) and the content is manually supplemented, excessive manpower is wasted and time is consumed. In order to more efficiently complete DDPG model training, in the embodiment of the application, the operation action of a user on decision content is simulated, so that the interaction process of a person and a model is simulated, excessive manual participation in the model training process is avoided, and the model training efficiency can be effectively improved.

After determining the decision strategy of the intelligent voice assistant, the user can directly give the result wanted by the user when inputting the same voice request again in the actual application process.

The method has the advantages that when the decision content comprises a plurality of selection contents, the selection or supplement process of the user can be simulated, the uncertain contents in the decision content can be changed into the determined contents, and then in practical application, when the intelligent voice assistant encounters the same voice request again, the determined contents in the decision content can be output, for example, when the user supplements a movie which wants to watch a bullet, and when the user actually operates, the user inputs the movie which wants to watch the bullet again, and the movie which allows the bullet to fly can be directly obtained from the intelligent voice assistant. Furthermore, in the interaction process with the user, the intelligent voice assistant can improve the accuracy of the searched content, and the user can also obtain correct results frequently by using the intelligent voice assistant.

It should be noted that, in the training method of the decision strategy of the intelligent voice assistant according to the embodiment of the application, the process of user interaction can be simulated by only using the local log data of the intelligent device in the offline state of the intelligent device, so as to realize the training of the decision strategy.

From the above, it can be seen that the embodiment of the present application provides a training method for an intelligent voice assistant decision strategy, which can analyze a feature vector of a user voice request by using log data stored in an intelligent device; then, the feature vector is used as the input of the depth deterministic strategy gradient DDPG model, so that the depth deterministic strategy gradient DDPG model outputs decision content corresponding to the voice request; then, under the condition that the decision content is opinion inquired from a user, supplementing the inquired content according to the real intention of the voice request; and finally taking the supplemented decision content as a decision strategy corresponding to the voice request in the intelligent voice assistant. According to the technical scheme, the interaction behavior of the user and the intelligent voice assistant can be simulated in an off-line state of the intelligent equipment, and the complementary decision content during the simulation of the user interaction is used as a decision strategy of the intelligent voice assistant. After the intelligent voice assistant finishes training, if the voice request corresponds to the return results of the plurality of service modules, the return results wanted by the user can be accurately determined and provided for the user, and the interaction accuracy of the user and the intelligent voice assistant is improved.

Fig. 5 is a block diagram of a training device for an intelligent voice assistant decision strategy according to an embodiment of the present application, where, as shown in fig. 5, the training device for an intelligent voice assistant decision strategy in an embodiment of the present application includes:

A data acquisition module 701, configured to acquire log data of a user; the log data is used for representing behavior data and request data of a user in historical voice operation interacted with the intelligent equipment; the feature simulation module 702 is configured to analyze a feature vector of a user voice request using the log data; the feature vector is used for representing a vector formed by different return results received after a user sends out a voice request; the decision module 703 is configured to take the feature vector corresponding to the voice request as input, and output decision content corresponding to the voice request by using a depth deterministic strategy gradient DDPG model; the decision content is used for representing a predicted return result corresponding to the voice request, which is predicted by the depth determination strategy gradient DDPG model and which is received by a user; a behavior simulation module 704, configured to supplement the queried content according to the real intention of the voice request, in the case that the decision content is query opinion of the user; the real intent is to represent an expected result of a pre-labeled user for the voice request; and the decision module 705 is configured to use the supplemented decision content as a decision policy corresponding to the voice request in the intelligent voice assistant.

Wherein the feature simulation module 702 and the behavior simulation module 704 may be combined with the content implemented by implementing the simulation module 601 shown in fig. 4, the decision module 703 may implement the content implemented by the DDPG model 602 described in fig. 4, and the decision module 705 may implement the content implemented by the arbiter 603 shown in fig. 4.

In some embodiments, the feature simulation module is further configured to: analyzing request characteristics of the returned results sent to the user by the intelligent equipment according to the voice request and resource characteristics of media resources to which the returned results belong, wherein the request characteristics belong to different service modules; analyzing historical characteristics of a user according to the time of the user sending the voice request in the intelligent device and the attention content in the intelligent device; the feature vector is composed using the request feature, the resource feature, and the history feature.

In some embodiments, the behavior modeling module is further configured to accept the decision content with a probability greater than a preset probability threshold if the decision content meets the true intent of the voice request; the decision module is further configured to use the decision content as a decision policy corresponding to the voice request in the intelligent voice assistant.

In some embodiments, the apparatus further comprises: and the feedback module is used for feeding back the action of receiving the decision content as a decision result corresponding to the voice request to the depth determination strategy gradient DDPG model.

In some embodiments, the behavior modeling module is further configured to, if the decision content does not conform to the true intent of the voice request, objection the decision content with a probability greater than a preset probability threshold; the feedback module is further configured to feedback an action against the decision content as a decision result corresponding to the voice request to the depth determination policy gradient DDPG model; the decision module is further configured to enable the depth deterministic strategy gradient DDPG model to adjust the decision content corresponding to the voice request according to the decision result; and taking the decision content as a decision strategy corresponding to the voice request in an intelligent voice assistant under the condition that the decision content accords with the user intention of the voice request.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. The illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. The training method of the intelligent voice assistant decision strategy is characterized by comprising the following steps of:

taking the supplemented decision content as a decision strategy corresponding to the voice request in the intelligent voice assistant;

the step of analyzing the feature vector of the user voice request by using the log data comprises the following steps:

2. The method of claim 1, wherein after the step of outputting decision content corresponding to the voice request using a depth deterministic strategy gradient DDPG model with the feature vector corresponding to the voice request as input, further comprising:

3. The method of claim 2, wherein after the step of responding to the decision content as the voice request in an intelligent voice assistant to a decision policy, further comprising:

4. A method according to any one of claims 1-3, wherein after the step of outputting decision content corresponding to the voice request using a depth deterministic strategy gradient DDPG model, taking as input the feature vector corresponding to the voice request, further comprises:

5. An intelligent voice assistant decision strategy training device, comprising:

The feature simulation module is used for analyzing feature vectors of the user voice request by utilizing the log data; the feature vector is used for representing a vector formed by different return results received after a user sends out a voice request; the feature simulation module analyzes feature vectors of the user voice request by using the log data, and is configured to: analyzing request characteristics of the returned results sent to the user by the intelligent equipment according to the voice request and resource characteristics of media resources to which the returned results belong, wherein the request characteristics belong to different service modules; analyzing historical characteristics of a user according to the time of the user sending the voice request in the intelligent device and the attention content in the intelligent device; utilizing the request feature, the resource feature and the history feature to form the feature vector;

6. The apparatus of claim 5, wherein the device comprises a plurality of sensors,

The behavior simulation module is further configured to accept the decision content with a probability greater than a preset probability threshold value, if the decision content meets the true intention of the voice request;

7. The apparatus of claim 6, further comprising a feedback module configured to feedback the action of accepting the decision content as a decision result corresponding to the voice request to the depth determination policy gradient DDPG model.

8. The apparatus of claim 7, wherein the device comprises a plurality of sensors,

The behavior simulation module is further configured to, in a case where the decision content does not conform to the real intention of the voice request, objectively determine the decision content with a probability greater than a preset probability threshold;