CN106445147B

CN106445147B - The behavior management method and device of conversational system based on artificial intelligence

Info

Publication number: CN106445147B
Application number: CN201610862653.XA
Authority: CN
Inventors: 高原; 李大任; 戴岱; 佘俏俏
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2016-09-28
Filing date: 2016-09-28
Publication date: 2019-05-10
Anticipated expiration: 2036-09-28
Also published as: CN106445147A

Abstract

The embodiment of the invention discloses the behavior management method and devices of the conversational system based on artificial intelligence.This method comprises: generating current session feature according to current system interaction mode, current user state and system action sequence；According to the system action trigger model that the current session feature and training obtain, the candidate system behavior of user-association is selected from the system action sequence；It is interacted according to the candidate system behavior with the user.In the technical solution of the embodiment of the present invention, candidate system behavior is determined according to system action trigger model, candidate system behavior is determined by the system action triggering rule according to static configuration in compared with the prior art, improves flexibility and the generalization ability of conversational system.

Description

The behavior management method and device of conversational system based on artificial intelligence

Technical field

The present invention relates to human-computer interaction technique fields, more particularly to the behavior management side of the conversational system based on artificial intelligence Method and device.

Background technique

Artificial intelligence (Artificial Intelligence), english abbreviation AI.It is research, develop for simulating, Extend and the theory of the intelligence of extension people, method, a new technological sciences of technology and application system.Artificial intelligence is to calculate One branch of machine science, it attempts to understand essence of intelligence, and produce it is a kind of new can be in such a way that human intelligence be similar The intelligence machine made a response, the research in the field include robot, language identification, image recognition, natural language processing and specially Family's system etc..

The behavior management scheme of existing conversational system is the side for using static rule to configure by relevant product experience Formula come generate candidate system behavior and selection optimizer system behavior.Namely according to current specific products application, configuring This is filled in file using relevant system action triggering and ordering rule, and is used when selected behavior executes and matched in advance The static rule set predicts subsequent user behavior.

The characteristics of due to existing rule-based candidate behavior triggering and ordering rule being all according to specific products, is artificial Manual configuration.Thus, the prior art has some shortcomings below: 1) being compared using the vertical class conversation process that static rule configures It is fixed, it is merely able to complete specific logic in rule, it is inflexible；2) static rule be the specific logic based on specific products come It is configured, does not have generalization ability, it can not be by these rules in others vertical class and product.

Summary of the invention

In view of this, the embodiment of the present invention provides the behavior management method and device of the conversational system based on artificial intelligence, To improve flexibility and the generalization ability of the conversational system based on artificial intelligence.

In a first aspect, the embodiment of the invention provides the behavior management methods of the conversational system based on artificial intelligence, comprising:

According to current system interaction mode, current user state and system action sequence, current session feature is generated；

According to the system action trigger model that the current session feature and training obtain, from the system action sequence The candidate system behavior of middle selection user-association；

It is interacted according to the candidate system behavior with the user.

Second aspect, the embodiment of the invention provides the behavior management devices of the conversational system based on artificial intelligence, comprising:

Current signature generation module is used for according to current system interaction mode, current user state and system action sequence, Generate current session feature；

Candidate behavior selecting module, the system action trigger mode for being obtained according to the current session feature and training Type selects the candidate system behavior of user-association from the system action sequence；

System action decision-making module, for being interacted according to the candidate system behavior with the user.

Technical solution provided in an embodiment of the present invention first passes through the rule that machine learning triggers system action in advance and builds Mould obtains system action trigger model, then foundation current system interaction mode, current user state and system action sequence, Current session feature is generated, and determines candidate system behavior according to current session feature and system action trigger model, that is, we Candidate system behavior is determined according to system action trigger model in case, compared with the prior art in by according to static configuration System action triggering rule determine candidate system behavior, improve the flexibility of the conversational system based on artificial intelligence and general Change ability.

Detailed description of the invention

Fig. 1 is the process of the behavior management method for the conversational system based on artificial intelligence that the embodiment of the present invention one provides Figure；

Fig. 2 is the process of the behavior management method of the conversational system provided by Embodiment 2 of the present invention based on artificial intelligence Figure；

Fig. 3 is the process of the behavior management method for the conversational system based on artificial intelligence that the embodiment of the present invention three provides Figure；

Fig. 4 is the signal of the behavior management method for the conversational system based on artificial intelligence that the embodiment of the present invention three provides Figure；

Fig. 5 is the structure of the behavior management device for the conversational system based on artificial intelligence that the embodiment of the present invention four provides Figure.

Specific embodiment

The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.

Embodiment one

Fig. 1 is the process of the behavior management method for the conversational system based on artificial intelligence that the embodiment of the present invention one provides Figure.The method of the present embodiment can be executed by the behavior management device of the conversational system based on artificial intelligence, which can pass through The mode of hardware and/or software is realized.The method of the present embodiment is generally applicable to conversational system and user carries out human-computer interaction Situation.With reference to Fig. 1, the behavior management method of the conversational system provided in this embodiment based on artificial intelligence be can specifically include It is as follows:

S11, foundation current system interaction mode, current user state and system action sequence, generate current session feature.

Conversational system is interacted using chat robots interactive frame with user, which includes: NLU ((Natural Language Understanding, natural language understanding) module, for understanding the natural language of user as used The natural language of user is converted to the structured representation that machine is understood that by the query language at family；UST(User Status Updates, User Status update) module, the dialog state information of user is updated for the output according to NLU module, wherein using The dialog state information at family includes system interaction state, user's intention and User Status etc.；System action triggers (Action- Trigger) module picks out a series of subsequent may hold for the dialog state information according to the updated user of UST module Capable candidate system behavior constitutes candidate system behavior list；Behaviour decision making (Policy) module, for system trigger module The candidate system behavior of triggering is ranked up and selects an optimizer system behavior, and predicts subsequent user behavior； Best behavior executes (Action-Exe) module: the optimizer system behavior of process performing decision-making module selection；NLG(Natural Language Generation, spatial term) module, for being carried out according to the implementing result of best behavior execution module Spatial term generates the natural language result for being finally presented to user.

In the present embodiment, current system interaction mode is used to characterize user's currently some locating system stages of interaction, Such as start state phase, clear state stage and recommends state phase.Current user state may include the need of user's epicycle Intention is sought, such as obtaining the intention of making a reservation of restaurant information, more wheel demand values of the user in different demands dimension are such as being looked for Under the scene of restaurant, flavor that user can go Sichuan cuisine and Guangdong dishes etc. different in this demand slot position of flavor at the restaurant.System action Sequence refers to the sequence being made of system action, and system action refers to the behavior that conversational system is able to carry out, such as system row To be that recommendation movement, clarification movement and information meet etc..

Specifically, NLU module obtains the current natural language of user, and by the current natural language processing of user at structure Changing indicates, UST module determines current user state according to structured representation information, and then, the acquisition of system action trigger module is worked as Preceding User Status, and determine current system interaction mode and system action sequence, and generate current session feature.Due to current right Words are characterized according to current system interaction mode, current user state and the generation of system action sequence, thus current session is special Sign contains the feature of current system interaction mode, current user state and each system action.

S12, the system action trigger model obtained according to the current session feature and training, from the system action The candidate system behavior of user-association is selected in sequence.

In the present embodiment, system action trigger model can be it is pre- first pass through what robot learning off-line training obtained, For picking out a series of subsequent possible candidate system behaviors executed.

Illustratively, the system action trigger model can be trained in the following way and be obtained: based on artificial mark number According to determining the first incidence relation between business scenario and system action and second between User Status and system action Incidence relation；According to first incidence relation and second incidence relation, universal interaction feature is extracted, wherein described general Interaction feature includes system interaction state, User Status, user is intended to and the implementing result of upper wheel system behavior；According to extraction Universal interaction feature, construct the system action trigger model.

Wherein, user is intended to refer to that the demand of user is intended to, and such as obtains the intention of making a reservation of restaurant information.Specifically, Marked when needing to be implemented system action in dialog procedure personnel artificially mark between business scenario and system action the The second incidence relation between one incidence relation and User Status and system action, mark personnel can also be to business scenario Under dialog logic be labeled, wherein business scenario can be tourism scene, scene of making a reservation, ticket booking scene or amusement and recreation field Scape etc..Universal interaction feature is extracted subsequently, based on labor standard information such as incidence relation, and using machine learning model to system Behavior triggering rule carries out off-line modeling, can such as be based on decision-tree model, touch according to the system action of universal interaction feature construction Send out model.

It is that the system action trigger model obtained according to machine learning determines by candidate system behavior in this present embodiment, The machine learning mode makes behavior triggering logic have flexibility and generalization ability compared to pure static configuration, so as to push away The wide vertical class for arriving different field.

S13, it is interacted according to the candidate system behavior with the user.

Technical solution provided in this embodiment first passes through the rule that machine learning triggers system action in advance and models, System action trigger model is obtained, then according to current system interaction mode, current user state and system action sequence, is generated Current session feature, and candidate system behavior is determined according to current session feature and system action trigger model, that is, in this programme Candidate system behavior determines according to system action trigger model, compared with the prior art in by being according to static configuration System behavior triggering rule determines candidate system behavior, improves flexibility and the generalization ability of conversational system.

Illustratively, may include: before the candidate system behavior for selecting user-association in the system action sequence Based on preset behavior configuration rule, prescreening processing is carried out to the system action for including in the system action sequence.

Illustratively, may include: after the candidate system behavior for selecting user-association in the system action sequence Based on preset behavior configuration rule, additions and deletions intervention processing is carried out to the candidate system behavior.

The triggering of candidate behavior is modeled by the way of machine learning, and the static behavior with human configuration Configuration rule completes the triggering work of candidate behavior jointly, and triggering logic has certain flexibility and general compared to pure static configuration Change ability can be generalized to the vertical class in different fields.

Embodiment two

The present embodiment provides the new conversational system based on artificial intelligence of one kind on the basis of the above embodiment 1 Behavior management method.Fig. 2 is the behavior management method of the conversational system provided by Embodiment 2 of the present invention based on artificial intelligence Flow chart.With reference to Fig. 2, the behavior management method of the conversational system provided in this embodiment based on artificial intelligence be can specifically include It is as follows:

S21, foundation current system interaction mode, current user state and system action sequence, generate current session feature.

S22, the system action trigger model obtained according to the current session feature and training, from the system action The candidate system behavior of user-association is selected in sequence.

S23, the enhancing study order models obtained according to online incremental training arrange the candidate system behavior Sequence, and optimizer system behavior is determined according to ranking results.

After candidate behavior triggering, enhancing learns each candidate system behavior that order models are triggered, and foundation Current system conditions and current user state are ranked up each candidate system behavior, obtain optimizer system behavior.

Illustratively, on-line training obtains the enhancing study order models in the following way: according to system interaction shape State, User Status, user's intention, the environmental feedback information of the candidate system behavior and the candidate system behavior, pass through Online incremental training obtains the enhancing study order models.

In the present embodiment, the environmental feedback information of the candidate system behavior may include that user clicks behavior, user Lower single act, user's return information and user's evaluation information.For example, being used as epicycle if epicycle interactive user has click behavior Interactive positive feedback is used as the negative-feedback of epicycle interaction if epicycle interactive user does not click behavior.

Enhancing study (Reinforcement Learning) model is selected to model the sequence of candidate behavior.Enhancing Study is also referred to as intensified learning, is one of the research hotspot of machine learning in recent years and field of intelligent control.Enhancing study is intended to By in the case where being participated in without extraneous " teacher ", intelligence system (Agent) itself constantly with environmental interaction or trial and error, according to Feedback Evaluation signal adjustment movement obtains optimal strategy to adapt to environment.Compared to supervised learning, enhance the process packet of study Containing several elements: 1) adaptability, i.e. intelligence system constantly improve model performance using environmental feedback information；2) reactive, i.e., Intelligence system can directly acquire state action rule from experience；3) incremental nature, i.e. intensified learning are a kind of increment types It practises, can use online.

To sum up, conversational system is obtained user and is believed the environmental feedback of candidate system behavior by the continuous dialogue with user Breath carries out self-teaching and adjustment, completes the study of online increment type, obtains enhancing study order models.With quantity of study Increase, the effect of order models is constantly promoted.

S24, it is interacted according to the optimizer system behavior with the user.

Specifically, behaviour decision making module after selecting optimizer system behavior in candidate system behavior, executes optimizer system Behavior, NLG module generate the natural language result for being finally presented to user according to the implementing result of optimizer system behavior.

In technical solution provided in this embodiment, conversational system trains obtained system action trigger mode according to machine learning Type determines candidate system behavior, and by the enhancing that online incremental training obtains learn order models to candidate system behavior into Row sequence, obtains optimizer system behavior, and interact with user according to optimizer system behavior.Since enhancing learns order models It is that conversational system obtains environmental feedback information by the continuous dialogue with user, and carries out self according to environmental feedback information and learn Practise and adjustment obtain, thus the sort method flexibly, it is accurate and there is versatility.

Embodiment three

The present embodiment provides a kind of new based on artificial intelligence on the basis of above-described embodiment one and embodiment two The behavior management method of conversational system.Fig. 3 is the behavior for the conversational system based on artificial intelligence that the embodiment of the present invention three provides The flow chart of management method.With reference to Fig. 3, the behavior management method of the conversational system provided in this embodiment based on artificial intelligence has Body may include as follows:

S31, foundation current system interaction mode, current user state and system action sequence, generate current session feature.

S32, the system action trigger model obtained according to the current session feature and training, from the system action The candidate system behavior of user-association is selected in sequence.

S33, the enhancing study order models obtained according to online incremental training arrange the candidate system behavior Sequence, and optimizer system behavior is determined according to ranking results.

S34, it is interacted according to the optimizer system behavior with the user.

S35, the corresponding candidate boot options of the optimizer system behavior are determined.

In the present embodiment, candidate boot options are used to guide the candidate actions of user.

S36, the enhancing learning behavior prediction model obtained according to online incremental training, from the candidate boot options Select best boot options.

Specifically, enhancing learning behavior prediction model is according to current system conditions and works as after obtaining candidate boot options Preceding User Status is ranked up each candidate boot options, obtains best boot options, that is, predicts lower whorl user behavior.

Illustratively, on-line training obtains the enhancing learning behavior prediction model in the following way: handing over according to system The mutually environmental feedback information of state, User Status, user's intention, the candidate boot options and the candidate boot options, The enhancing learning behavior prediction model is obtained by online incremental training.

In the present embodiment, the environmental feedback information of the candidate boot options may include user's return information and use Family evaluation information, for example, use click behavior of the user for the boot options of displaying as the positive feedback of the boot options, if It is then negative-feedback that the boot options, which are not clicked,.

To sum up, with reference to Fig. 4, off-line training obtains system action in the following way before conversational system is interacted with user Trigger model: obtaining the labeled data set manually marked, and characteristic extracting module extracts universal interaction feature from normal data, And system action trigger model is obtained according to training under universal interaction characteristic curve.It is in user interaction process, NLU module obtains And the natural language of structured representation user, processing result is transferred to UST module, UST module updates the status information of user, And the status information of user is transferred to system action trigger module, system action trigger module is according to system action trigger mode Type, the preconfigured static rule of the status information of user and intention and the pre-execution information of system action determine candidate system System behavior, and the candidate actions list comprising candidate system behavior is transferred to behaviour decision making module.On the one hand, behaviour decision making mould Block obtains order models to the feedback information on-line study of candidate system behavior according to user, according to order models to candidate system Behavior is ranked up, and determines optimizer system behavior according to ranking results；On the other hand, behaviour decision making module determines optimizer system The candidate path of navigation of behavior obtains behavior prediction model to the feedback information on-line study of candidate path of navigation according to user, And lower whorl user's behavior prediction is carried out according to behavior prediction model.Best behavior executes optimizer system behavior；NLG module is according to most The implementing result of good behavior execution module generates the natural language result for being finally presented to user.

In technical solution provided in this embodiment, conversational system trains obtained system action trigger mode according to machine learning Type determines candidate system behavior, and by the enhancing that online incremental training obtains learn order models to candidate system behavior into Row sequence, obtains optimizer system behavior, and interact with user according to optimizer system behavior.Also, pass through online increment type The enhancing learning behavior prediction model that training obtains is ranked up candidate boot options to obtain best boot options.Due to enhancing Study order models are that conversational system obtains environmental feedback information by the continuous dialogue with user, and believes according to environmental feedback Breath carries out self-teaching and adjustment obtains, thus behavior prediction method flexibly, it is accurate and there is versatility.

Example IV

Fig. 5 is the structure of the behavior management device for the conversational system based on artificial intelligence that the embodiment of the present invention four provides Figure.The device is generally applicable to the conversational system based on artificial intelligence and the situation of user's progress human-computer interaction.Referring to Fig. 5, The specific structure of the behavior management device of conversational system provided in this embodiment based on artificial intelligence is as follows:

Current signature generation module 41, for according to current system interaction mode, current user state and system action sequence Column generate current session feature；

Candidate behavior selecting module 42, the system action triggering for being obtained according to the current session feature and training Model selects the candidate system behavior of user-association from the system action sequence；

System action decision-making module 43, for being interacted according to the candidate system behavior with the user.

Illustratively, above-mentioned apparatus includes behavior trigger model training module, and the behavior trigger model training module can To be used for:

Based on artificial labeled data, the first incidence relation and user's shape between business scenario and system action are determined The second incidence relation between state and system action；

According to first incidence relation and second incidence relation, universal interaction feature is extracted, wherein described general Interaction feature includes system interaction state, User Status, user is intended to and the implementing result of upper wheel system behavior；

According to the universal interaction feature extracted, the system action trigger model is constructed.

Illustratively, above-mentioned apparatus may include:

Pre-screening module, for from the system action sequence select user-association candidate system behavior before, Based on preset behavior configuration rule, prescreening processing is carried out to the system action for including in the system action sequence；Alternatively,

Additions and deletions intervention module, for from the system action sequence select user-association candidate system behavior it Afterwards, it is based on preset behavior configuration rule, additions and deletions intervention processing is carried out to the candidate system behavior.

Illustratively, the system action decision-making module 43 may include:

Optimizer system behavior determination unit, the enhancing study order models for obtaining according to online incremental training are to institute It states candidate system behavior to be ranked up, and determines optimizer system behavior according to ranking results；

System dialog unit, for being interacted according to the optimizer system behavior with the user.

Illustratively, above-mentioned apparatus includes order models training module, and the order models training module can be used for:

According to system interaction state, User Status, user's intention, the candidate system behavior and the candidate system row For environmental feedback information, the enhancing is obtained by online incremental training and learns order models.

Illustratively, the environmental feedback information of the candidate system behavior may include that user clicks behavior, user places an order Behavior, user's return information and user's evaluation information.

Illustratively, above-mentioned apparatus may include:

Candidate boot options determining module, described in determining after determining optimizer system behavior according to ranking results The corresponding candidate boot options of optimizer system behavior；

Best boot options selecting module, the enhancing learning behavior for obtaining according to online incremental training predict mould Type selects best boot options from the candidate boot options.

Illustratively, above-mentioned apparatus may include behavior prediction model training module, the behavior prediction model training mould Block can be used for:

It is intended to according to system interaction state, User Status, user, the candidate boot options and the candidate guide choosing The environmental feedback information of item, obtains the enhancing learning behavior prediction model by online incremental training.

The behavior management device of conversational system provided in this embodiment based on artificial intelligence, with any embodiment of that present invention The behavior management method of the provided conversational system based on artificial intelligence belongs to same inventive concept, and it is any that the present invention can be performed The behavior management method of conversational system based on artificial intelligence provided by embodiment has and executes the dialogue based on artificial intelligence The corresponding functional module of behavior management method and beneficial effect of system.The not technical detail of detailed description in the present embodiment, It can be found in the behavior management method for the conversational system based on artificial intelligence that any embodiment of that present invention provides.

Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims

1. the behavior management method of the conversational system based on artificial intelligence characterized by comprising

According to the system action trigger model that the current session feature and training obtain, selected from the system action sequence Select the candidate system behavior of user-association；

It is interacted according to the candidate system behavior with the user；

Wherein, training obtains the system action trigger model in the following way:

Based on artificial labeled data, determine the first incidence relation between business scenario and system action and User Status with The second incidence relation between system action；

According to first incidence relation and second incidence relation, universal interaction feature is extracted, wherein the universal interaction Feature includes system interaction state, User Status, user is intended to and the implementing result of upper wheel system behavior；

2. the method according to claim 1, wherein

It include: to be configured based on preset behavior before the candidate system behavior for selecting user-association in the system action sequence Rule carries out prescreening processing to the system action for including in the system action sequence；Alternatively,

It include: to be configured based on preset behavior after the candidate system behavior for selecting user-association in the system action sequence Rule carries out additions and deletions intervention processing to the candidate system behavior.

3. the method according to claim 1, wherein being handed over according to the candidate system behavior with the user Mutually, comprising:

The enhancing study order models obtained according to online incremental training are ranked up the candidate system behavior, and foundation Ranking results determine optimizer system behavior；

It is interacted according to the optimizer system behavior with the user.

4. according to the method described in claim 3, it is characterized in that, enhancing study order models are online in the following way Training obtains:

According to system interaction state, User Status, user's intention, the candidate system behavior and the candidate system behavior Environmental feedback information obtains the enhancing by online incremental training and learns order models.

5. according to method described in claim requirement 4, which is characterized in that the environmental feedback packet of the candidate system behavior It includes user and clicks single act, user's return information and user's evaluation information under behavior, user.

6. according to the method described in claim 3, it is characterized in that, being wrapped after determining optimizer system behavior according to ranking results It includes:

Determine the corresponding candidate boot options of the optimizer system behavior；

According to the enhancing learning behavior prediction model that online incremental training obtains, selection is best from the candidate boot options Boot options.

7. according to the method described in claim 6, it is characterized in that, the enhancing learning behavior prediction model in the following way On-line training obtains:

According to system interaction state, User Status, user's intention, the candidate boot options and the candidate boot options Environmental feedback information obtains the enhancing learning behavior prediction model by online incremental training.

8. the behavior management device of the conversational system based on artificial intelligence characterized by comprising

Current signature generation module, for generating according to current system interaction mode, current user state and system action sequence Current session feature；

Candidate behavior selecting module, the system action trigger model for being obtained according to the current session feature and training, The candidate system behavior of user-association is selected from the system action sequence；

System action decision-making module, for being interacted according to the candidate system behavior with the user；

Wherein, described device includes behavior trigger model training module, and the behavior trigger model training module is used for:

9. device according to claim 8 characterized by comprising

Pre-screening module, for being based on before the candidate system behavior for selecting user-association in the system action sequence Preset behavior configuration rule carries out prescreening processing to the system action for including in the system action sequence；Alternatively,

Additions and deletions intervention module, for from the system action sequence select user-association candidate system behavior after, base In preset behavior configuration rule, additions and deletions intervention processing is carried out to the candidate system behavior.

10. device according to claim 8, which is characterized in that the system action decision-making module includes:

Optimizer system behavior determination unit, the enhancing study order models for obtaining according to online incremental training are to the time It selects system action to be ranked up, and determines optimizer system behavior according to ranking results；

11. device according to claim 10, which is characterized in that including order models training module, the order models Training module is used for:

12. according to device described in claim requirement 11, which is characterized in that the environmental feedback information of the candidate system behavior Single act, user's return information and user's evaluation information under behavior, user are clicked including user.

13. device according to claim 10 characterized by comprising

Candidate boot options determining module, it is described best for determining after determining optimizer system behavior according to ranking results The corresponding candidate boot options of system action；

Best boot options selecting module, the enhancing learning behavior prediction model for being obtained according to online incremental training, from Best boot options are selected in candidate's boot options.

14. device according to claim 13, which is characterized in that including behavior prediction model training module, the behavior Prediction model training module is used for: