CN111061846A - Electric power new installation and capacity increase conversation customer service system and method based on layered reinforcement learning - Google Patents
Electric power new installation and capacity increase conversation customer service system and method based on layered reinforcement learning Download PDFInfo
- Publication number
- CN111061846A CN111061846A CN201911137278.2A CN201911137278A CN111061846A CN 111061846 A CN111061846 A CN 111061846A CN 201911137278 A CN201911137278 A CN 201911137278A CN 111061846 A CN111061846 A CN 111061846A
- Authority
- CN
- China
- Prior art keywords
- strategy
- conversation
- class
- standard
- dialogue
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 230000002787 reinforcement Effects 0.000 title claims abstract description 42
- 238000009434 installation Methods 0.000 title description 6
- 238000005457 optimization Methods 0.000 claims abstract description 9
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims description 51
- 230000009471 action Effects 0.000 claims description 20
- 230000005611 electricity Effects 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 10
- 239000010410 layer Substances 0.000 claims description 8
- 230000004044 response Effects 0.000 claims description 8
- 238000013461 design Methods 0.000 claims description 7
- 230000003247 decreasing effect Effects 0.000 claims description 6
- 238000011478 gradient descent method Methods 0.000 claims description 6
- 230000007246 mechanism Effects 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 4
- 238000011156 evaluation Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 4
- 238000012549 training Methods 0.000 claims description 4
- 238000002474 experimental method Methods 0.000 claims description 3
- 239000002356 single layer Substances 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 14
- 238000004590 computer program Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 4
- 239000003795 chemical substances by application Substances 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/01—Customer relationship services
- G06Q30/012—Providing warranty services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/01—Customer relationship services
- G06Q30/015—Providing customer assistance, e.g. assisting a customer within a business location or via helpdesk
- G06Q30/016—After-sales
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0281—Customer communication at a business location, e.g. providing product or service information, consulting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Game Theory and Decision Science (AREA)
- Tourism & Hospitality (AREA)
- Entrepreneurship & Innovation (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention belongs to the technical field of text online dialogue system strategy optimization, and particularly relates to a new electric power capacity increasing dialogue customer service system and method based on layered reinforcement learning. In particular to an on-line implementation method of a power consumption customer service dialogue system with mixed task attributes, which aims at the type of a task-based dialogue system and is based on hierarchical reinforcement learning. The invention comprises the following steps: the system comprises a power service understanding module, a conversation state tracker, a conversation strategy and a power service feedback module. The invention carries out multi-layer decomposition on the subtasks with certain professional backgrounds, and adds the subtasks into the database related to the professional backgrounds to search the corresponding slot value information at any time. The customer service conversation with the professional background is realized, and the conversation success rate and the continuity are remarkably improved. The invention can save cost, improve the success rate of new electric power loading and capacity increasing conversation services, improve the smoothness degree of conversation and obviously improve the user experience.
Description
Technical Field
The invention belongs to the technical field of text online dialogue system strategy optimization, and particularly relates to a new electric power capacity increasing dialogue customer service system and method based on layered reinforcement learning. In particular to an on-line implementation method of a power consumption customer service dialogue system with mixed task attributes, which aims at the type of a task-based dialogue system and is based on hierarchical reinforcement learning.
Background
Along with the rapid development of artificial intelligence technology, the dialogue system is widely applied to the fields of smart phones, smart homes, unmanned vehicles and the like, and internet companies and research institutions at home and abroad also put a large amount of resources into the dialogue system as a research hotspot. In general, there are three types of dialog systems, namely question-answer type, task type and open type. The task-based dialog system focuses on specific task targets, and is a mainly used technical type of the customer service dialog system. The new loading and capacity increasing electricity utilization business is a main service project of an electric power hall, and currently, more human resources are occupied. The traditional multi-task conversation system can only process simple preset tasks and is difficult to complete for customer service conversations with certain professional properties.
Therefore, the traditional customer service conversation system has the defects of insufficient consideration of subtask relevance, less return value, incapability of meeting the subtask due to semantic constraint, and poor user experience and even conversation failure caused by frequent switching of different subtasks.
Disclosure of Invention
The invention provides a new electric power capacity increasing conversation customer service system and method based on layered reinforcement learning, aiming at the technical problems at present, and the system is a customer service intelligent system aiming at completing a conversation task under a mixed framework based on the layered reinforcement learning. The customer service dialogue system aims to solve the problem of professional customer service dialogue with certain professional background knowledge and provides a customer service dialogue system which has an association relation and needs to be completed by all subtasks in response to multiple subtasks with certain professional backgrounds. Has strong adaptability for different users and can contain certain professional knowledge.
In order to realize the purpose, the invention is realized by adopting the following technical scheme:
electric power newly-installed capacity-increased dialogue customer service system based on hierarchical reinforcement learning comprises: the system comprises a power service understanding module, a conversation state tracker, a conversation strategy and a power service feedback module; wherein:
the electric power business understanding module: the system is used for understanding and identifying specific demand information of the power consumer and transmitting the information to the conversation state tracker;
dialog state tracker: the system is used for tracking and recording the current conversation state and preparing to call state information at any time;
conversation strategy: the system is used for generating an optimization response to the power consumer and updating the conversation strategy to optimize iteration continuously;
the power service feedback module: and the response generated according to the conversation strategy is translated into information understandable by the user and fed back to the power consumer.
The new electric power capacity increasing conversation customer service method based on layered reinforcement learning comprises the following steps:
step 1, a dialogue system obtains service linguistic data from an electric power service understanding module, wherein the service linguistic data can be converted into a text extraction slot value through sound, and the text extraction slot value can also be directly extracted from an online text;
step 2, when the electric power customer talks with the intelligent customer service, intelligent body dialogue data is obtained from a shared multi-field general dialogue corpus and an electric power English item corpus;
and 3, receiving the new capacity-increasing electricity application successfully, feeding back the information of the power customer by the intelligent dialog according to the multi-standard layered reinforcement learning dialog strategy until the requirement of the customer is met, and judging that the dialog is successful.
The dialogue system is used for extracting text corpora of the electricity consumer; due to professional knowledge related to electricity utilization, a conversation strategy is decomposed into two reward values of a class 1 standard strategy and a class 2 standard strategy, wherein the class 1 standard strategy is called an external reward value and comprises multiple layers; decomposing the electric power professional knowledge for many times until the knowledge in the corpus and the database can cover all the contents; the class 2 standard strategy is called an internal reward value and comprises decomposed subtasks and actions; and the two reward values are respectively optimized for reinforcement learning, and guide the customer service system and learn.
The corpus information comprises the number of the dialogues, the number of the signs of success or failure of the dialogues, the related information of the user power and the related information of the power replied by the system.
The slot value information is used for decomposing the target of the conversation power customer into a series of slot values, and comprises the following steps:
the new capacity increasing tank value shows the new capacity increasing requirements of the power customers;
requesting a slot value, and displaying the information of the power customer inquiry dialogue system;
the slot value required by the electric power customer target is from a database set of daily electric power business hall service and real electric power customer conversation;
extracting all the slot values appearing in the dialogue paragraph, if one slot has a plurality of values, the slot is regarded as soft constraint of the power customer, and the user may change his option later to search for other options in the dialogue; if a slot value has only one option, then this is a hard constraint that cannot be negotiated; if a slot value is empty, it may be a demand of the power consumer, and if the value is not present in the database, the slot value is removed from the power consumer's possible target; the whole capacity increasing process at least needs 2 processes, capacity increasing value determination and engineering design unit determination, and both the capacity increasing value and the engineering design unit value comprise a plurality of numerical values.
The multi-standard layered reinforcement learning dialogue strategy comprises the following steps: multi-layer class 1 standard dialog strategygnAnd single-layer class 2 dialogue strategya,gn。
The class 1 standard strategy pignObtaining a state s from the environment and selecting a subtask g, wherein the subtask can be further decomposed, and the number of decomposition layers is represented by n; all executable sub-tasks with reward values and termination conditions require the use of a class 2 standard policy πa,gn。
Inputting a state s and a subtask gn into the class 2 standard strategy, and outputting a basic action a; subtask gn strategy 2 type standard strategy pia,gnKeeping constant input until a termination condition is reached to end the subtask gn; internal award value T provided by internal evaluation mechanism in dialog managert i(gnt) The reward signal is used for revealing whether the subtask gn is about to be completed or not, and the reward value signal is also used for optimizing the class 2 standard strategy pia,gn(ii) a The state s contains global information of the conversation and tracking information of all subtasks; to optimize class 2 criteria strategy πa,gnMaximizing the accumulated internal expected reward at each step t
In the above formula, rt+k iRepresenting internal evaluation reward in t + k steps, class 1 standard strategy pignOptimizing the accumulated reward value in the t step;
in the above formula, rt+k eRepresenting the reward value received externally from the environment when a new subtask starts, at step t + k, the internal and external reward values work together to cause the dialogue learning strategy to select the appropriate dialogue action.
The class 1 standard strategy pignAnd class 2 Standard strategy πa,gnLearning by adopting a deep Q learning method; wherein, the class 1 standard dialogue strategy optimization Q function needs to satisfy:
in the above formula, N represents the standard dialog strategy of class 2a,gnThe number of steps required to complete a subtask; gn' represents in state st+NThe next subtask;
class 2 standard dialog strategy pia,gnThe optimized Q function satisfies:
in the above formula, Q1 *(s, gn) and Q2 *(s, a, gn) is represented by a neural network and is represented by θ1And theta2Parameterized as Q1(s,gn;θ1) And Q2(s,a,gn;θ2)。
Optimizing the performance of the dialogue system, defining a loss function of a training network, amplifying the action probability with positive reward value, and reducing the action probability with negative reward value;
the loss function for a class 1 standard dialog strategy at each iteration i is:
In the above formula, re=∑γkrt+k eA discount value representing the sum of rewards when the sub-target gn completes; n represents the number of steps at completion;
the class 2 standard dialog strategy minimum loss function is:
whereinriRepresenting a prize value containing a discount factor; minimizing a loss function by a random gradient descent method;
updating the conversation strategy through the accumulated reward of the Q value, thereby realizing effective conversation with the client;
using the stochastic gradient descent method to minimize the loss function, for a class 1 standard dialog strategy gradient:
in the formula:representing the decreasing gradient of the loss function, E representing the future expectation, D representing the empirical replay buffer, gamma representing the discount factor,represents Q1A falling gradient function of the function;
the class 2 standard dialog strategy is:
in the formula:representing the decreasing gradient of the loss function, E representing the future expectation, D representing the empirical replay buffer, gamma representing the discount factor,represents Q2A falling gradient function of the function;
the conversation strategy performance is further improved, and the performance is improved by using two heuristic methods, namely a target network and experience playback; playback tuple of experiment (s, g, r)eS') and (s, g, a, r)iS'); the dialogue strategy function is continuously updated iteratively with each round of dialogue updating of the Q function until the final convergence.
Compared with the prior art, the invention has the advantages and beneficial effects that:
the invention realizes the customer service dialogue with a certain professional background. The success rate of the conversation is obviously improved, and the consistency of the conversation is obviously improved. The labor cost is saved, the intelligent customer service conversation in other professional fields can be realized by the implementation method due to the change of the professional database, and the conversation success rate can be further improved as the strategy needs to train a large amount of data and the knowledge data in the field is more.
The new added capacity (newly added or added capacity) service customer service system of the power grid at least comprises the following two subtasks, wherein the first subtask is used for determining the capacity balance of the new added capacity. And a second sub-task of selecting a design company and a general engineering quantity determined according to the capacity. There are time, expense and logic relations among all subtasks, but all the schemes in the customer service system need to be completed together in a conversation, and the conversation tasks cannot be completed without one step. The invention adopts the dialogue manager formed by a layered deep reinforcement learning method, and can solve complex tasks in the dialogue in different scale spaces by using the method; the relevance degree of the conversation is improved; has strong adaptability for different users and can contain certain professional knowledge.
The invention relates to a method for realizing a power consumer service conversation online system with a mixed task completion attribute based on layered reinforcement learning. In the method, a dialogue strategy decomposes a target task into a plurality of layers until the target task is decomposed into subtasks which can be understood and executed by a system, and a deep reinforcement learning method is used for learning and training. The method has the greatest advantage that the subtasks with certain professional backgrounds are subjected to multi-layer decomposition, and the subtasks are added into the database related to the professional backgrounds to find corresponding groove value information at any time. The invention can improve the success rate of the electric power new installation and capacity increase conversation business, improve the fluency degree of the conversation and improve the user experience.
Drawings
In order to facilitate the understanding and practice of the present invention for those of ordinary skill in the art, the following detailed description of the present invention is provided in conjunction with the accompanying drawings and the detailed description, the following examples are provided to illustrate the present invention, but it should be understood that the scope of the present invention is not limited by the detailed description.
FIG. 1 is a system block diagram of the present invention;
FIG. 2 is an overview of the dialog method of the present invention;
FIG. 3 is a schematic diagram of a class 1 standard dialog strategy learner of the present invention;
FIG. 4 is a schematic diagram of the class 2 standard dialog strategy learner of the present invention.
Detailed Description
The invention relates to a power new-installation capacity-increasing dialogue customer service system and a method based on hierarchical reinforcement learning, wherein the power new-installation capacity-increasing dialogue customer service system based on the hierarchical reinforcement learning comprises the following steps: the system comprises a power service understanding module, a conversation state tracker, a conversation strategy and a power service feedback module. Wherein:
the electric power business understanding module: the system is used for understanding and identifying specific demand information of the power consumer and transmitting the information to the conversation state tracker;
dialog state tracker: the system is used for tracking and recording the current conversation state and preparing to call state information at any time;
conversation strategy: the system is used for generating an optimization response to the power consumer and updating the conversation strategy to optimize iteration continuously;
the power service feedback module: and the response generated according to the conversation strategy is translated into information understandable by the user and fed back to the power consumer.
The invention relates to a realization method of a new electric power capacity increasing conversation customer service system based on layered reinforcement learning, which is characterized in that a power consumption customer converts text information into groove value information through an electric power business understanding module and transmits the groove value information to a conversation manager, the conversation manager composed of a conversation state tracker and a conversation strategy transmits response information to an electric power business feedback module, and the electric power business feedback module generates semantic texts which can be understood by the power consumption customer and feeds the semantic texts back to the customer. The professional knowledge can be queried and updated through the database. As shown in fig. 1, fig. 1 is a system configuration diagram of the present invention.
The implementation method of the electric power new-installation capacity-increasing dialogue customer service system based on the layered reinforcement learning comprises the following steps:
step 1, the dialogue system obtains the service linguistic data from the electric power service understanding module, wherein the service linguistic data can be extracted by converting voice into a text, and can also be extracted directly from an online text.
And 2, when the electric power customer talks with the intelligent customer service, the intelligent body dialogue data is obtained from a shared multi-field general dialogue corpus and an electric English item corpus.
And 3, receiving the new capacity increasing application successfully, and feeding back the information of the power customer by the intelligent dialog according to the multi-standard layered reinforcement learning dialog strategy until the requirement of the customer is met, wherein the dialog is regarded as successful.
The dialogue system is used for extracting text corpora of the electricity consumer according to the step 1. Due to professional knowledge related to electricity utilization, the conversation strategy is decomposed into a 1-class standard strategy pignAnd class 2 Standard strategy πa,gnTwo reward values, of which class 1 standard strategy πgnReferred to as external prize values, may contain multiple layers. The power expertise is decomposed many times until the knowledge in the corpus and database can cover the entire content. Class 2 standard strategy pia,gnReferred to as internal prize values, contain the decomposed subtasks and actions. And the two reward values are respectively optimized for reinforcement learning, and guide the customer service system and learn.
The slot value information is a decomposition of the goal of the conversational power consumer into a series of slot values according to step 1. For example, the new capacity-increasing tank value, dst _ cap ═ 10KVA, shows the new capacity-increasing demand of the power customer. Request slot values, e.g., Price? The electricity customer inquiry dialogue system information is displayed. The slot values needed for the power customer objectives are from a data base set of daily power business hall servicers conversing with real power customers. All the slot values appearing in the dialog paragraph are extracted, and if a slot has multiple values, for example, or _ cap ═ 20KVA, we consider this to be a soft constraint for the power consumer, who may later change his option to explore other options in the dialog. If a slot value has only one option, then this is a hard constraint that cannot be negotiated. If a slot value is empty, it may be a demand of the power consumer, and if the value is not present in the database, the slot value is removed from the power consumer's possible target. At least 2 processes are needed in the whole compatibilization process, and the compatibilization value is determined and the engineering design unit is determined. And both the new package capacity value and the engineering unit value comprise a plurality of values.
And 2, the corpus information comprises the number of the dialogues, the number of the signs of success or failure of the dialogues, the user power related information and the power related information replied by the system. As the electric power professional information is continuously evolved along with the development of the times, the electric power database needs to be inquired in a dialogue to perfect and supplement the electric power professional information.
According to the step 3, the multi-standard layered reinforcement learning dialogue strategy adopts a deep reinforcement learning method to update the strategy, and the deep reinforcement learning needs a large amount of data and linguistic data to train, and can adopt a virtualizer to train a network.
The multi-standard layered reinforcement learning conversation strategy comprises a multi-layer class-1 standard conversation strategy pi according to the step 3gnAnd single-layer class 2 dialogue strategya,gn。
Class 1 standard strategy pignThe state s is obtained from the environment and the subtask g is selected, which can be further decomposed, the number of decomposition levels being denoted by n. All executable sub-tasks with reward values and termination conditions require the use of a class 2 standard policy πa,gn。
The class 2 standard policy inputs state s and subtask gn, and outputs basic action a. Subtask gn towards class 2 standard strategy pia,gnThe constant input is kept until the termination condition is reached to end the subtask gn. Internal reward value r is provided by an internal rating mechanism in the dialog managert i(gnt) The reward signal is used to reveal whether the subtask gn is about to be completed, and the reward value signal is also used to optimize the strategy pia,gn. The state s contains global information of the dialog and also tracking information of all subtasks. To optimize class 2 criteria strategy πa,gnMaximizing the accumulated internal expected reward at each step t
In the above formula, rt+k iRepresenting internal evaluation reward in t + k steps, class 1 standard strategy pignAnd (4) optimizing the accumulated reward value in the t step.
In the above formula, rt+k eRepresenting the prize value received externally from the environment when a new subtask starts, at step t + k. The internal and external reward values work together to cause the dialogue learning strategy to select the appropriate dialogue action.
Class 1 standard strategy pignAnd class 2 Standard strategy πa,gnThe deep Q learning method is adopted for learning. Therein, class 1 standard dialog strategy pignThe optimization of the Q function needs to satisfy:
in the above formula, N represents the standard dialog strategy of class 2a,gnThe number of steps required to complete the subtask. gn' represents in state st+NThe next subtask.
The class 2 standard dialogue strategy optimization Q function satisfies:
in the above formula, Q1 *(s, gn) and Q2 *(s, a, gn) is represented by a neural network and is represented by θ1And theta2Parameterized as Q1(s,gn;θ1) And Q2(s,a,gn;θ2). The neural network selected in the present invention is DQN (deep Q network).
In order to optimize the performance of the dialogue system, a loss function of a training network is defined, the action probability with positive reward value is amplified, and the action probability with negative reward value is reduced. Class 1 standard dialog strategy pignThe minimum loss function at each iteration i is:
In the above formula, re=∑γkrt+k eRepresenting the discount value of the prize sum when the sub-target gn is completed. N represents the number of steps at completion.
Class 2 standard dialog strategy pia,gnMinimum loss boxThe number is as follows:
whereinriRepresenting a prize value containing a discount factor. The loss function is minimized by a random gradient descent method.
And updating the conversation strategy through the accumulated reward of the Q value, thereby realizing effective conversation with the client.
Minimizing the loss function using the stochastic gradient descent method, for class 1 standard dialog strategy πgnThe gradient is:
in the formula:representing the decreasing gradient of the loss function, E representing the future expectation, D representing the empirical replay buffer, gamma representing the discount factor,represents Q1Decreasing gradient function of the function.
Class 2 standard dialog strategy pia,gnThe gradient is:
in the formula:representing the decreasing gradient of the loss function, E representing the future expectation, D representing the empirical replay buffer, gamma representing the discount factor,representsQ2Decreasing gradient function of the function.
In order to further improve the conversation strategy performance, two heuristic methods, namely a target network and experience playback, are used for improving the performance. Playback tuple of experiment (s, g, r)eS') and (s, g, a, r)iS'). The dialogue strategy function is continuously updated iteratively with each round of dialogue updating of the Q function until the final convergence.
As shown in fig. 2, fig. 2 is an overall view of the dialog method of the present invention. Is a general overview of the method of completing a dialog based on a hybrid task.
Two types of layered reinforcement learning agents form a dialogue learning strategy, wherein the 2 types of standard strategies are associated with an internal judgment mechanism, the internal judgment mechanism can be updated in a single step iteration mode, receives dialogue actions of the 2 types of standard strategies, and provides internal reward values r for the strategiesi. External reward value r fed back by power customer is received by type 1 standard strategyeReceiving conversation state s of power customer, 1 type standard strategy pi at the same timegnSub-targets can be deeply decomposed in multiple layers, the number of layers can be more than 2, and the sub-targets can be up to pi of 2-type standard strategiesa,gnCan process single step and receive sub-targets, 2-type standard strategy pia,gnThe selected dialog action is implemented to the power consumer. A general overview of the hybrid task based completion dialog method is shown in fig. 1.
As shown in FIGS. 3 and 4, FIG. 3 shows a class 1 standard dialog strategy of the present inventiongnFIG. 4 is a schematic diagram of a class 2 standard dialog strategy of the present inventiona,gnA learner diagram. The intelligent agent is used for learning the hierarchical conversation strategy respectively representing the 1-type standard and the 2-type standard.
For example, a client applies for newly increasing the power capacity of 10KVA-20KVA, a conversation strategy aims at a complex task synthesized by 10KVA, firstly, one subtask is selected to determine the capacity, the subtasks are multilayer, a series of actions are taken to collect relevant information until all information needed by the client is collected, and the subtask is ended, wherein the process comprises one time of handling 20KVA and two times of respectively increasing the capacity of 10 KVA; and searching for the next subtask and selecting a design company until all subtask information is collected. The dialogue strategy is realized by combining deep reinforcement learning and a hierarchical value function, and the hierarchical decomposition of the method can decompose the electric power professional content into options which can be directly judged to have sequence.
The option here refers to a generalized action concept with a termination function (attenuation coefficient) γ containing a policy π and a dependent state.
Pi is the policy function in the customer service dialog system, s is the state in the customer service dialog system, a is the action of the customer service dialog system, gamma is the decay factor in the customer service dialog system, and k is its exponent. r is the reward or penalty value in the customer service dialog system. A slot is an attribute that an agent has well-defined.
The invention carries out deep disassembly subtasks on the tasks, and compared with the traditional deep reinforcement learning, the method greatly improves the success rate of the conversation. The subtask decomposition is more sufficient, the continuity of semantic communication is better, the learning speed is faster, and the convergence performance is better.
Embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.
The embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (10)
1. Electric power newly-installed capacity-increased dialogue customer service system based on hierarchical reinforcement learning is characterized in that: the method comprises the following steps: the system comprises a power service understanding module, a conversation state tracker, a conversation strategy and a power service feedback module; wherein:
the electric power business understanding module: the system is used for understanding and identifying specific demand information of the power consumer and transmitting the information to the conversation state tracker;
dialog state tracker: the system is used for tracking and recording the current conversation state and preparing to call state information at any time;
conversation strategy: the system is used for generating an optimization response to the power consumer and updating the conversation strategy to optimize iteration continuously;
the power service feedback module: and the response generated according to the conversation strategy is translated into information understandable by the user and fed back to the power consumer.
2. The new electric power capacity increasing conversation customer service method based on layered reinforcement learning is characterized in that: the method comprises the following steps:
step 1, a dialogue system obtains service linguistic data from an electric power service understanding module, wherein the service linguistic data can be converted into a text extraction slot value through sound, and the text extraction slot value can also be directly extracted from an online text;
step 2, when the electric power customer talks with the intelligent customer service, intelligent body dialogue data is obtained from a shared multi-field general dialogue corpus and an electric power English item corpus;
and 3, receiving the new capacity-increasing electricity application successfully, feeding back the information of the power customer by the intelligent dialog according to the multi-standard layered reinforcement learning dialog strategy until the requirement of the customer is met, and judging that the dialog is successful.
3. The new electric power capacity increasing conversation customer service method based on layered reinforcement learning as claimed in claim 2, characterized in that: the dialogue system is used for extracting text corpora of the electricity consumer; due to professional knowledge related to electricity utilization, a conversation strategy is decomposed into two reward values of a class 1 standard strategy and a class 2 standard strategy, wherein the class 1 standard strategy is called an external reward value and comprises multiple layers; decomposing the electric power professional knowledge for many times until the knowledge in the corpus and the database can cover all the contents; the class 2 standard strategy is called an internal reward value and comprises decomposed subtasks and actions; and the two reward values are respectively optimized for reinforcement learning, and guide the customer service system and learn.
4. The new electric power capacity increasing conversation customer service method based on layered reinforcement learning as claimed in claim 2, characterized in that: the corpus information comprises the number of the dialogues, the number of the signs of success or failure of the dialogues, the related information of the user power and the related information of the power replied by the system.
5. The new electric power capacity increasing conversation customer service method based on layered reinforcement learning as claimed in claim 2, characterized in that: the slot value information is used for decomposing the target of the conversation power customer into a series of slot values, and comprises the following steps:
the new capacity increasing tank value shows the new capacity increasing requirements of the power customers;
requesting a slot value, and displaying the information of the power customer inquiry dialogue system;
the slot value required by the electric power customer target is from a database set of daily electric power business hall service and real electric power customer conversation;
extracting all the slot values appearing in the dialogue paragraph, if one slot has a plurality of values, the slot is regarded as soft constraint of the power customer, and the user may change his option later to search for other options in the dialogue; if a slot value has only one option, then this is a hard constraint that cannot be negotiated; if a slot value is empty, it may be a demand of the power consumer, and if the value is not present in the database, the slot value is removed from the power consumer's possible target; the whole capacity increasing process at least needs 2 processes, capacity increasing value determination and engineering design unit determination, and both the capacity increasing value and the engineering design unit value comprise a plurality of numerical values.
6. The new electric power capacity increasing conversation customer service method based on layered reinforcement learning as claimed in claim 2, characterized in that: the multi-standard layered reinforcement learning dialogue strategy comprises the following steps: multi-layer class 1 standard dialog strategygnAnd single-layer class 2 dialogue strategya,gn。
7. The new electric power capacity increasing conversation customer service method based on layered reinforcement learning as claimed in claim 6, wherein: said class 1Standard strategy pignObtaining a state s from the environment and selecting a subtask g, wherein the subtask can be further decomposed, and the number of decomposition layers is represented by n; all executable sub-tasks with reward values and termination conditions require the use of a class 2 standard policy πa,gn。
8. The new electric power capacity increasing conversation customer service method based on layered reinforcement learning as claimed in claim 6, wherein: inputting a state s and a subtask gn into the class 2 standard strategy, and outputting a basic action a; subtask gn strategy 2 type standard strategy pia,gnKeeping constant input until a termination condition is reached to end the subtask gn; internal reward value r is provided by an internal rating mechanism in the dialog managert i(gnt) The reward signal is used for revealing whether the subtask gn is about to be completed or not, and the reward value signal is also used for optimizing the class 2 standard strategy pia,gn(ii) a The state s contains global information of the conversation and tracking information of all subtasks; to optimize class 2 criteria strategy πa,gnMaximizing the accumulated internal expected reward at each step t
In the above formula, rt+k iRepresenting internal evaluation reward in t + k steps, class 1 standard strategy pignOptimizing the accumulated reward value in the t step;
in the above formula, rt+k eRepresenting the reward value received externally from the environment when a new subtask starts, at step t + k, the internal and external reward values work together to cause the dialogue learning strategy to select the appropriate dialogue action.
9. The new electric power capacity increasing conversation customer service method based on layered reinforcement learning as claimed in claim 2, characterized in that: said class 1Standard strategy pignAnd class 2 Standard strategy πa,gnLearning by adopting a deep Q learning method; wherein, the class 1 standard dialogue strategy optimization Q function needs to satisfy:
in the above formula, N represents the standard dialog strategy of class 2a,gnThe number of steps required to complete a subtask; gn' represents in state st+NThe next subtask;
class 2 standard dialog strategy pia,gnThe optimized Q function satisfies:
in the above formula, Q1 *(s, gn) and Q2 *(s, a, gn) is represented by a neural network and is represented by θ1And theta2Parameterized as Q1(s,gn;θ1) And Q2(s,a,gn;θ2)。
10. The new electric power capacity increasing conversation customer service method based on layered reinforcement learning as claimed in claim 1, characterized in that: optimizing the performance of the dialogue system, defining a loss function of a training network, amplifying the action probability with positive reward value, and reducing the action probability with negative reward value;
the loss function for a class 1 standard dialog strategy at each iteration i is:
In the above formula, re=∑γkrt+k eA discount value representing the sum of rewards when the sub-target gn completes;n represents the number of steps at completion;
the class 2 standard dialog strategy minimum loss function is:
whereinriRepresenting a prize value containing a discount factor; minimizing a loss function by a random gradient descent method;
updating the conversation strategy through the accumulated reward of the Q value, thereby realizing effective conversation with the client;
using the stochastic gradient descent method to minimize the loss function, for a class 1 standard dialog strategy gradient:
in the formula:representing the decreasing gradient of the loss function, E representing the future expectation, D representing the empirical replay buffer, gamma representing the discount factor,represents Q1A falling gradient function of the function;
the class 2 standard dialog strategy is:
in the formula:representing the decreasing gradient of the loss function, E representing the future expectation, D representing the empirical replay buffer, gamma representing the discount factor,represents Q2A falling gradient function of the function;
the conversation strategy performance is further improved, and the performance is improved by using two heuristic methods, namely a target network and experience playback; playback tuple of experiment (s, g, r)eS') and (s, g, a, r)iS'); the dialogue strategy function is continuously updated iteratively with each round of dialogue updating of the Q function until the final convergence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911137278.2A CN111061846A (en) | 2019-11-19 | 2019-11-19 | Electric power new installation and capacity increase conversation customer service system and method based on layered reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911137278.2A CN111061846A (en) | 2019-11-19 | 2019-11-19 | Electric power new installation and capacity increase conversation customer service system and method based on layered reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111061846A true CN111061846A (en) | 2020-04-24 |
Family
ID=70298560
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911137278.2A Pending CN111061846A (en) | 2019-11-19 | 2019-11-19 | Electric power new installation and capacity increase conversation customer service system and method based on layered reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111061846A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112860869A (en) * | 2021-03-11 | 2021-05-28 | 中国平安人寿保险股份有限公司 | Dialogue method, device and storage medium based on hierarchical reinforcement learning network |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108255934A (en) * | 2017-12-07 | 2018-07-06 | 北京奇艺世纪科技有限公司 | A kind of sound control method and device |
CN108282587A (en) * | 2018-01-19 | 2018-07-13 | 重庆邮电大学 | Mobile customer service dialogue management method under being oriented to strategy based on status tracking |
CN109817329A (en) * | 2019-01-21 | 2019-05-28 | 暗物智能科技(广州)有限公司 | A kind of medical treatment interrogation conversational system and the intensified learning method applied to the system |
CN109829044A (en) * | 2018-12-28 | 2019-05-31 | 北京百度网讯科技有限公司 | Dialogue method, device and equipment |
US20190324795A1 (en) * | 2018-04-24 | 2019-10-24 | Microsoft Technology Licensing, Llc | Composite task execution |
-
2019
- 2019-11-19 CN CN201911137278.2A patent/CN111061846A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108255934A (en) * | 2017-12-07 | 2018-07-06 | 北京奇艺世纪科技有限公司 | A kind of sound control method and device |
CN108282587A (en) * | 2018-01-19 | 2018-07-13 | 重庆邮电大学 | Mobile customer service dialogue management method under being oriented to strategy based on status tracking |
US20190324795A1 (en) * | 2018-04-24 | 2019-10-24 | Microsoft Technology Licensing, Llc | Composite task execution |
CN109829044A (en) * | 2018-12-28 | 2019-05-31 | 北京百度网讯科技有限公司 | Dialogue method, device and equipment |
CN109817329A (en) * | 2019-01-21 | 2019-05-28 | 暗物智能科技(广州)有限公司 | A kind of medical treatment interrogation conversational system and the intensified learning method applied to the system |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112860869A (en) * | 2021-03-11 | 2021-05-28 | 中国平安人寿保险股份有限公司 | Dialogue method, device and storage medium based on hierarchical reinforcement learning network |
CN112860869B (en) * | 2021-03-11 | 2023-02-03 | 中国平安人寿保险股份有限公司 | Dialogue method, device and storage medium based on hierarchical reinforcement learning network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA3040373C (en) | Deep learning techniques based multi-purpose conversational agents for processing natural language queries | |
US11507756B2 (en) | System and method for estimation of interlocutor intents and goals in turn-based electronic conversational flow | |
CN108170792A (en) | Question and answer bootstrap technique, device and computer equipment based on artificial intelligence | |
CN107463601B (en) | Dialog understanding system construction method, device and equipment based on artificial intelligence and computer readable storage medium | |
US10679006B2 (en) | Skimming text using recurrent neural networks | |
US20190138652A1 (en) | Real-time data input correction and facilitation of data entry at point of input | |
US20130097264A1 (en) | System and method for optimizing response handling time and customer satisfaction scores | |
CN116127020A (en) | Method for training generated large language model and searching method based on model | |
CN116226334A (en) | Method for training generated large language model and searching method based on model | |
Kshetri et al. | Big data and cloud computing for development: Lessons from key industries and economies in the global south | |
CN111651571A (en) | Man-machine cooperation based session realization method, device, equipment and storage medium | |
CN111563158A (en) | Text sorting method, sorting device, server and computer-readable storage medium | |
CN110399472A (en) | Reminding method, device, computer equipment and storage medium are putd question in interview | |
CN111061846A (en) | Electric power new installation and capacity increase conversation customer service system and method based on layered reinforcement learning | |
US20220026862A1 (en) | Determination of task automation using an artificial intelligence model | |
CN113191880A (en) | Bank teller terminal cash adding suggestion determination method and device | |
CN116777568A (en) | Financial market transaction advanced intelligent dialogue ordering method, device and storage medium | |
US11610068B2 (en) | Systems and method for intent messaging | |
CN109710939A (en) | Method and apparatus for determining theme | |
CN109002498A (en) | Interactive method, device, equipment and storage medium | |
CN111724767B (en) | Spoken language understanding method based on Dirichlet variation self-encoder and related equipment | |
CN115186179A (en) | Insurance product pushing method and device | |
US20210081600A1 (en) | Coaching system and coaching method | |
CN107169585A (en) | Film box office Forecasting Methodology, device and storage medium based on artificial intelligence | |
CN111782792A (en) | Method and apparatus for information processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |