CN111061846A - Electric power new installation and capacity increase conversation customer service system and method based on layered reinforcement learning - Google Patents

Electric power new installation and capacity increase conversation customer service system and method based on layered reinforcement learning Download PDF

Info

Publication number
CN111061846A
CN111061846A CN201911137278.2A CN201911137278A CN111061846A CN 111061846 A CN111061846 A CN 111061846A CN 201911137278 A CN201911137278 A CN 201911137278A CN 111061846 A CN111061846 A CN 111061846A
Authority
CN
China
Prior art keywords
strategy
conversation
class
standard
dialogue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911137278.2A
Other languages
Chinese (zh)
Inventor
高曦莹
张冶
蔡颖凯
王浩淼
曹世龙
李强
田睿
宋晓文
张雯舒
李丹
宋锦春
叶宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Original Assignee
State Grid Corp of China SGCC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC filed Critical State Grid Corp of China SGCC
Priority to CN201911137278.2A priority Critical patent/CN111061846A/en
Publication of CN111061846A publication Critical patent/CN111061846A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/01Customer relationship services
    • G06Q30/012Providing warranty services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/01Customer relationship services
    • G06Q30/015Providing customer assistance, e.g. assisting a customer within a business location or via helpdesk
    • G06Q30/016After-sales
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0281Customer communication at a business location, e.g. providing product or service information, consulting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Game Theory and Decision Science (AREA)
  • Tourism & Hospitality (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the technical field of text online dialogue system strategy optimization, and particularly relates to a new electric power capacity increasing dialogue customer service system and method based on layered reinforcement learning. In particular to an on-line implementation method of a power consumption customer service dialogue system with mixed task attributes, which aims at the type of a task-based dialogue system and is based on hierarchical reinforcement learning. The invention comprises the following steps: the system comprises a power service understanding module, a conversation state tracker, a conversation strategy and a power service feedback module. The invention carries out multi-layer decomposition on the subtasks with certain professional backgrounds, and adds the subtasks into the database related to the professional backgrounds to search the corresponding slot value information at any time. The customer service conversation with the professional background is realized, and the conversation success rate and the continuity are remarkably improved. The invention can save cost, improve the success rate of new electric power loading and capacity increasing conversation services, improve the smoothness degree of conversation and obviously improve the user experience.

Description

Electric power new installation and capacity increase conversation customer service system and method based on layered reinforcement learning
Technical Field
The invention belongs to the technical field of text online dialogue system strategy optimization, and particularly relates to a new electric power capacity increasing dialogue customer service system and method based on layered reinforcement learning. In particular to an on-line implementation method of a power consumption customer service dialogue system with mixed task attributes, which aims at the type of a task-based dialogue system and is based on hierarchical reinforcement learning.
Background
Along with the rapid development of artificial intelligence technology, the dialogue system is widely applied to the fields of smart phones, smart homes, unmanned vehicles and the like, and internet companies and research institutions at home and abroad also put a large amount of resources into the dialogue system as a research hotspot. In general, there are three types of dialog systems, namely question-answer type, task type and open type. The task-based dialog system focuses on specific task targets, and is a mainly used technical type of the customer service dialog system. The new loading and capacity increasing electricity utilization business is a main service project of an electric power hall, and currently, more human resources are occupied. The traditional multi-task conversation system can only process simple preset tasks and is difficult to complete for customer service conversations with certain professional properties.
Therefore, the traditional customer service conversation system has the defects of insufficient consideration of subtask relevance, less return value, incapability of meeting the subtask due to semantic constraint, and poor user experience and even conversation failure caused by frequent switching of different subtasks.
Disclosure of Invention
The invention provides a new electric power capacity increasing conversation customer service system and method based on layered reinforcement learning, aiming at the technical problems at present, and the system is a customer service intelligent system aiming at completing a conversation task under a mixed framework based on the layered reinforcement learning. The customer service dialogue system aims to solve the problem of professional customer service dialogue with certain professional background knowledge and provides a customer service dialogue system which has an association relation and needs to be completed by all subtasks in response to multiple subtasks with certain professional backgrounds. Has strong adaptability for different users and can contain certain professional knowledge.
In order to realize the purpose, the invention is realized by adopting the following technical scheme:
electric power newly-installed capacity-increased dialogue customer service system based on hierarchical reinforcement learning comprises: the system comprises a power service understanding module, a conversation state tracker, a conversation strategy and a power service feedback module; wherein:
the electric power business understanding module: the system is used for understanding and identifying specific demand information of the power consumer and transmitting the information to the conversation state tracker;
dialog state tracker: the system is used for tracking and recording the current conversation state and preparing to call state information at any time;
conversation strategy: the system is used for generating an optimization response to the power consumer and updating the conversation strategy to optimize iteration continuously;
the power service feedback module: and the response generated according to the conversation strategy is translated into information understandable by the user and fed back to the power consumer.
The new electric power capacity increasing conversation customer service method based on layered reinforcement learning comprises the following steps:
step 1, a dialogue system obtains service linguistic data from an electric power service understanding module, wherein the service linguistic data can be converted into a text extraction slot value through sound, and the text extraction slot value can also be directly extracted from an online text;
step 2, when the electric power customer talks with the intelligent customer service, intelligent body dialogue data is obtained from a shared multi-field general dialogue corpus and an electric power English item corpus;
and 3, receiving the new capacity-increasing electricity application successfully, feeding back the information of the power customer by the intelligent dialog according to the multi-standard layered reinforcement learning dialog strategy until the requirement of the customer is met, and judging that the dialog is successful.
The dialogue system is used for extracting text corpora of the electricity consumer; due to professional knowledge related to electricity utilization, a conversation strategy is decomposed into two reward values of a class 1 standard strategy and a class 2 standard strategy, wherein the class 1 standard strategy is called an external reward value and comprises multiple layers; decomposing the electric power professional knowledge for many times until the knowledge in the corpus and the database can cover all the contents; the class 2 standard strategy is called an internal reward value and comprises decomposed subtasks and actions; and the two reward values are respectively optimized for reinforcement learning, and guide the customer service system and learn.
The corpus information comprises the number of the dialogues, the number of the signs of success or failure of the dialogues, the related information of the user power and the related information of the power replied by the system.
The slot value information is used for decomposing the target of the conversation power customer into a series of slot values, and comprises the following steps:
the new capacity increasing tank value shows the new capacity increasing requirements of the power customers;
requesting a slot value, and displaying the information of the power customer inquiry dialogue system;
the slot value required by the electric power customer target is from a database set of daily electric power business hall service and real electric power customer conversation;
extracting all the slot values appearing in the dialogue paragraph, if one slot has a plurality of values, the slot is regarded as soft constraint of the power customer, and the user may change his option later to search for other options in the dialogue; if a slot value has only one option, then this is a hard constraint that cannot be negotiated; if a slot value is empty, it may be a demand of the power consumer, and if the value is not present in the database, the slot value is removed from the power consumer's possible target; the whole capacity increasing process at least needs 2 processes, capacity increasing value determination and engineering design unit determination, and both the capacity increasing value and the engineering design unit value comprise a plurality of numerical values.
The multi-standard layered reinforcement learning dialogue strategy comprises the following steps: multi-layer class 1 standard dialog strategygnAnd single-layer class 2 dialogue strategya,gn
The class 1 standard strategy pignObtaining a state s from the environment and selecting a subtask g, wherein the subtask can be further decomposed, and the number of decomposition layers is represented by n; all executable sub-tasks with reward values and termination conditions require the use of a class 2 standard policy πa,gn
Inputting a state s and a subtask gn into the class 2 standard strategy, and outputting a basic action a; subtask gn strategy 2 type standard strategy pia,gnKeeping constant input until a termination condition is reached to end the subtask gn; internal award value T provided by internal evaluation mechanism in dialog managert i(gnt) The reward signal is used for revealing whether the subtask gn is about to be completed or not, and the reward value signal is also used for optimizing the class 2 standard strategy pia,gn(ii) a The state s contains global information of the conversation and tracking information of all subtasks; to optimize class 2 criteria strategy πa,gnMaximizing the accumulated internal expected reward at each step t
Figure BDA0002279913120000031
In the above formula, rt+k iRepresenting internal evaluation reward in t + k steps, class 1 standard strategy pignOptimizing the accumulated reward value in the t step;
Figure BDA0002279913120000032
in the above formula, rt+k eRepresenting the reward value received externally from the environment when a new subtask starts, at step t + k, the internal and external reward values work together to cause the dialogue learning strategy to select the appropriate dialogue action.
The class 1 standard strategy pignAnd class 2 Standard strategy πa,gnLearning by adopting a deep Q learning method; wherein, the class 1 standard dialogue strategy optimization Q function needs to satisfy:
Figure BDA0002279913120000033
in the above formula, N represents the standard dialog strategy of class 2a,gnThe number of steps required to complete a subtask; gn' represents in state st+NThe next subtask;
class 2 standard dialog strategy pia,gnThe optimized Q function satisfies:
Figure BDA0002279913120000041
in the above formula, Q1 *(s, gn) and Q2 *(s, a, gn) is represented by a neural network and is represented by θ1And theta2Parameterized as Q1(s,gn;θ1) And Q2(s,a,gn;θ2)。
Optimizing the performance of the dialogue system, defining a loss function of a training network, amplifying the action probability with positive reward value, and reducing the action probability with negative reward value;
the loss function for a class 1 standard dialog strategy at each iteration i is:
Figure BDA0002279913120000042
wherein
Figure BDA0002279913120000043
In the above formula, re=∑γkrt+k eA discount value representing the sum of rewards when the sub-target gn completes; n represents the number of steps at completion;
the class 2 standard dialog strategy minimum loss function is:
Figure BDA0002279913120000044
wherein
Figure BDA0002279913120000045
riRepresenting a prize value containing a discount factor; minimizing a loss function by a random gradient descent method;
updating the conversation strategy through the accumulated reward of the Q value, thereby realizing effective conversation with the client;
using the stochastic gradient descent method to minimize the loss function, for a class 1 standard dialog strategy gradient:
Figure BDA0002279913120000046
in the formula:
Figure BDA0002279913120000047
representing the decreasing gradient of the loss function, E representing the future expectation, D representing the empirical replay buffer, gamma representing the discount factor,
Figure BDA0002279913120000051
represents Q1A falling gradient function of the function;
the class 2 standard dialog strategy is:
Figure BDA0002279913120000052
in the formula:
Figure BDA0002279913120000053
representing the decreasing gradient of the loss function, E representing the future expectation, D representing the empirical replay buffer, gamma representing the discount factor,
Figure BDA0002279913120000054
represents Q2A falling gradient function of the function;
the conversation strategy performance is further improved, and the performance is improved by using two heuristic methods, namely a target network and experience playback; playback tuple of experiment (s, g, r)eS') and (s, g, a, r)iS'); the dialogue strategy function is continuously updated iteratively with each round of dialogue updating of the Q function until the final convergence.
Compared with the prior art, the invention has the advantages and beneficial effects that:
the invention realizes the customer service dialogue with a certain professional background. The success rate of the conversation is obviously improved, and the consistency of the conversation is obviously improved. The labor cost is saved, the intelligent customer service conversation in other professional fields can be realized by the implementation method due to the change of the professional database, and the conversation success rate can be further improved as the strategy needs to train a large amount of data and the knowledge data in the field is more.
The new added capacity (newly added or added capacity) service customer service system of the power grid at least comprises the following two subtasks, wherein the first subtask is used for determining the capacity balance of the new added capacity. And a second sub-task of selecting a design company and a general engineering quantity determined according to the capacity. There are time, expense and logic relations among all subtasks, but all the schemes in the customer service system need to be completed together in a conversation, and the conversation tasks cannot be completed without one step. The invention adopts the dialogue manager formed by a layered deep reinforcement learning method, and can solve complex tasks in the dialogue in different scale spaces by using the method; the relevance degree of the conversation is improved; has strong adaptability for different users and can contain certain professional knowledge.
The invention relates to a method for realizing a power consumer service conversation online system with a mixed task completion attribute based on layered reinforcement learning. In the method, a dialogue strategy decomposes a target task into a plurality of layers until the target task is decomposed into subtasks which can be understood and executed by a system, and a deep reinforcement learning method is used for learning and training. The method has the greatest advantage that the subtasks with certain professional backgrounds are subjected to multi-layer decomposition, and the subtasks are added into the database related to the professional backgrounds to find corresponding groove value information at any time. The invention can improve the success rate of the electric power new installation and capacity increase conversation business, improve the fluency degree of the conversation and improve the user experience.
Drawings
In order to facilitate the understanding and practice of the present invention for those of ordinary skill in the art, the following detailed description of the present invention is provided in conjunction with the accompanying drawings and the detailed description, the following examples are provided to illustrate the present invention, but it should be understood that the scope of the present invention is not limited by the detailed description.
FIG. 1 is a system block diagram of the present invention;
FIG. 2 is an overview of the dialog method of the present invention;
FIG. 3 is a schematic diagram of a class 1 standard dialog strategy learner of the present invention;
FIG. 4 is a schematic diagram of the class 2 standard dialog strategy learner of the present invention.
Detailed Description
The invention relates to a power new-installation capacity-increasing dialogue customer service system and a method based on hierarchical reinforcement learning, wherein the power new-installation capacity-increasing dialogue customer service system based on the hierarchical reinforcement learning comprises the following steps: the system comprises a power service understanding module, a conversation state tracker, a conversation strategy and a power service feedback module. Wherein:
the electric power business understanding module: the system is used for understanding and identifying specific demand information of the power consumer and transmitting the information to the conversation state tracker;
dialog state tracker: the system is used for tracking and recording the current conversation state and preparing to call state information at any time;
conversation strategy: the system is used for generating an optimization response to the power consumer and updating the conversation strategy to optimize iteration continuously;
the power service feedback module: and the response generated according to the conversation strategy is translated into information understandable by the user and fed back to the power consumer.
The invention relates to a realization method of a new electric power capacity increasing conversation customer service system based on layered reinforcement learning, which is characterized in that a power consumption customer converts text information into groove value information through an electric power business understanding module and transmits the groove value information to a conversation manager, the conversation manager composed of a conversation state tracker and a conversation strategy transmits response information to an electric power business feedback module, and the electric power business feedback module generates semantic texts which can be understood by the power consumption customer and feeds the semantic texts back to the customer. The professional knowledge can be queried and updated through the database. As shown in fig. 1, fig. 1 is a system configuration diagram of the present invention.
The implementation method of the electric power new-installation capacity-increasing dialogue customer service system based on the layered reinforcement learning comprises the following steps:
step 1, the dialogue system obtains the service linguistic data from the electric power service understanding module, wherein the service linguistic data can be extracted by converting voice into a text, and can also be extracted directly from an online text.
And 2, when the electric power customer talks with the intelligent customer service, the intelligent body dialogue data is obtained from a shared multi-field general dialogue corpus and an electric English item corpus.
And 3, receiving the new capacity increasing application successfully, and feeding back the information of the power customer by the intelligent dialog according to the multi-standard layered reinforcement learning dialog strategy until the requirement of the customer is met, wherein the dialog is regarded as successful.
The dialogue system is used for extracting text corpora of the electricity consumer according to the step 1. Due to professional knowledge related to electricity utilization, the conversation strategy is decomposed into a 1-class standard strategy pignAnd class 2 Standard strategy πa,gnTwo reward values, of which class 1 standard strategy πgnReferred to as external prize values, may contain multiple layers. The power expertise is decomposed many times until the knowledge in the corpus and database can cover the entire content. Class 2 standard strategy pia,gnReferred to as internal prize values, contain the decomposed subtasks and actions. And the two reward values are respectively optimized for reinforcement learning, and guide the customer service system and learn.
The slot value information is a decomposition of the goal of the conversational power consumer into a series of slot values according to step 1. For example, the new capacity-increasing tank value, dst _ cap ═ 10KVA, shows the new capacity-increasing demand of the power customer. Request slot values, e.g., Price? The electricity customer inquiry dialogue system information is displayed. The slot values needed for the power customer objectives are from a data base set of daily power business hall servicers conversing with real power customers. All the slot values appearing in the dialog paragraph are extracted, and if a slot has multiple values, for example, or _ cap ═ 20KVA, we consider this to be a soft constraint for the power consumer, who may later change his option to explore other options in the dialog. If a slot value has only one option, then this is a hard constraint that cannot be negotiated. If a slot value is empty, it may be a demand of the power consumer, and if the value is not present in the database, the slot value is removed from the power consumer's possible target. At least 2 processes are needed in the whole compatibilization process, and the compatibilization value is determined and the engineering design unit is determined. And both the new package capacity value and the engineering unit value comprise a plurality of values.
And 2, the corpus information comprises the number of the dialogues, the number of the signs of success or failure of the dialogues, the user power related information and the power related information replied by the system. As the electric power professional information is continuously evolved along with the development of the times, the electric power database needs to be inquired in a dialogue to perfect and supplement the electric power professional information.
According to the step 3, the multi-standard layered reinforcement learning dialogue strategy adopts a deep reinforcement learning method to update the strategy, and the deep reinforcement learning needs a large amount of data and linguistic data to train, and can adopt a virtualizer to train a network.
The multi-standard layered reinforcement learning conversation strategy comprises a multi-layer class-1 standard conversation strategy pi according to the step 3gnAnd single-layer class 2 dialogue strategya,gn
Class 1 standard strategy pignThe state s is obtained from the environment and the subtask g is selected, which can be further decomposed, the number of decomposition levels being denoted by n. All executable sub-tasks with reward values and termination conditions require the use of a class 2 standard policy πa,gn
The class 2 standard policy inputs state s and subtask gn, and outputs basic action a. Subtask gn towards class 2 standard strategy pia,gnThe constant input is kept until the termination condition is reached to end the subtask gn. Internal reward value r is provided by an internal rating mechanism in the dialog managert i(gnt) The reward signal is used to reveal whether the subtask gn is about to be completed, and the reward value signal is also used to optimize the strategy pia,gn. The state s contains global information of the dialog and also tracking information of all subtasks. To optimize class 2 criteria strategy πa,gnMaximizing the accumulated internal expected reward at each step t
Figure BDA0002279913120000081
In the above formula, rt+k iRepresenting internal evaluation reward in t + k steps, class 1 standard strategy pignAnd (4) optimizing the accumulated reward value in the t step.
Figure BDA0002279913120000082
In the above formula, rt+k eRepresenting the prize value received externally from the environment when a new subtask starts, at step t + k. The internal and external reward values work together to cause the dialogue learning strategy to select the appropriate dialogue action.
Class 1 standard strategy pignAnd class 2 Standard strategy πa,gnThe deep Q learning method is adopted for learning. Therein, class 1 standard dialog strategy pignThe optimization of the Q function needs to satisfy:
Figure BDA0002279913120000083
in the above formula, N represents the standard dialog strategy of class 2a,gnThe number of steps required to complete the subtask. gn' represents in state st+NThe next subtask.
The class 2 standard dialogue strategy optimization Q function satisfies:
Figure BDA0002279913120000091
in the above formula, Q1 *(s, gn) and Q2 *(s, a, gn) is represented by a neural network and is represented by θ1And theta2Parameterized as Q1(s,gn;θ1) And Q2(s,a,gn;θ2). The neural network selected in the present invention is DQN (deep Q network).
In order to optimize the performance of the dialogue system, a loss function of a training network is defined, the action probability with positive reward value is amplified, and the action probability with negative reward value is reduced. Class 1 standard dialog strategy pignThe minimum loss function at each iteration i is:
Figure BDA0002279913120000092
wherein
Figure BDA0002279913120000093
In the above formula, re=∑γkrt+k eRepresenting the discount value of the prize sum when the sub-target gn is completed. N represents the number of steps at completion.
Class 2 standard dialog strategy pia,gnMinimum loss boxThe number is as follows:
Figure BDA0002279913120000094
wherein
Figure BDA0002279913120000095
riRepresenting a prize value containing a discount factor. The loss function is minimized by a random gradient descent method.
And updating the conversation strategy through the accumulated reward of the Q value, thereby realizing effective conversation with the client.
Minimizing the loss function using the stochastic gradient descent method, for class 1 standard dialog strategy πgnThe gradient is:
Figure BDA0002279913120000101
in the formula:
Figure BDA0002279913120000102
representing the decreasing gradient of the loss function, E representing the future expectation, D representing the empirical replay buffer, gamma representing the discount factor,
Figure BDA0002279913120000103
represents Q1Decreasing gradient function of the function.
Class 2 standard dialog strategy pia,gnThe gradient is:
Figure BDA0002279913120000104
in the formula:
Figure BDA0002279913120000105
representing the decreasing gradient of the loss function, E representing the future expectation, D representing the empirical replay buffer, gamma representing the discount factor,
Figure BDA0002279913120000106
representsQ2Decreasing gradient function of the function.
In order to further improve the conversation strategy performance, two heuristic methods, namely a target network and experience playback, are used for improving the performance. Playback tuple of experiment (s, g, r)eS') and (s, g, a, r)iS'). The dialogue strategy function is continuously updated iteratively with each round of dialogue updating of the Q function until the final convergence.
As shown in fig. 2, fig. 2 is an overall view of the dialog method of the present invention. Is a general overview of the method of completing a dialog based on a hybrid task.
Two types of layered reinforcement learning agents form a dialogue learning strategy, wherein the 2 types of standard strategies are associated with an internal judgment mechanism, the internal judgment mechanism can be updated in a single step iteration mode, receives dialogue actions of the 2 types of standard strategies, and provides internal reward values r for the strategiesi. External reward value r fed back by power customer is received by type 1 standard strategyeReceiving conversation state s of power customer, 1 type standard strategy pi at the same timegnSub-targets can be deeply decomposed in multiple layers, the number of layers can be more than 2, and the sub-targets can be up to pi of 2-type standard strategiesa,gnCan process single step and receive sub-targets, 2-type standard strategy pia,gnThe selected dialog action is implemented to the power consumer. A general overview of the hybrid task based completion dialog method is shown in fig. 1.
As shown in FIGS. 3 and 4, FIG. 3 shows a class 1 standard dialog strategy of the present inventiongnFIG. 4 is a schematic diagram of a class 2 standard dialog strategy of the present inventiona,gnA learner diagram. The intelligent agent is used for learning the hierarchical conversation strategy respectively representing the 1-type standard and the 2-type standard.
For example, a client applies for newly increasing the power capacity of 10KVA-20KVA, a conversation strategy aims at a complex task synthesized by 10KVA, firstly, one subtask is selected to determine the capacity, the subtasks are multilayer, a series of actions are taken to collect relevant information until all information needed by the client is collected, and the subtask is ended, wherein the process comprises one time of handling 20KVA and two times of respectively increasing the capacity of 10 KVA; and searching for the next subtask and selecting a design company until all subtask information is collected. The dialogue strategy is realized by combining deep reinforcement learning and a hierarchical value function, and the hierarchical decomposition of the method can decompose the electric power professional content into options which can be directly judged to have sequence.
The option here refers to a generalized action concept with a termination function (attenuation coefficient) γ containing a policy π and a dependent state.
Pi is the policy function in the customer service dialog system, s is the state in the customer service dialog system, a is the action of the customer service dialog system, gamma is the decay factor in the customer service dialog system, and k is its exponent. r is the reward or penalty value in the customer service dialog system. A slot is an attribute that an agent has well-defined.
The invention carries out deep disassembly subtasks on the tasks, and compared with the traditional deep reinforcement learning, the method greatly improves the success rate of the conversation. The subtask decomposition is more sufficient, the continuity of semantic communication is better, the learning speed is faster, and the convergence performance is better.
Embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.
The embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (10)

1. Electric power newly-installed capacity-increased dialogue customer service system based on hierarchical reinforcement learning is characterized in that: the method comprises the following steps: the system comprises a power service understanding module, a conversation state tracker, a conversation strategy and a power service feedback module; wherein:
the electric power business understanding module: the system is used for understanding and identifying specific demand information of the power consumer and transmitting the information to the conversation state tracker;
dialog state tracker: the system is used for tracking and recording the current conversation state and preparing to call state information at any time;
conversation strategy: the system is used for generating an optimization response to the power consumer and updating the conversation strategy to optimize iteration continuously;
the power service feedback module: and the response generated according to the conversation strategy is translated into information understandable by the user and fed back to the power consumer.
2. The new electric power capacity increasing conversation customer service method based on layered reinforcement learning is characterized in that: the method comprises the following steps:
step 1, a dialogue system obtains service linguistic data from an electric power service understanding module, wherein the service linguistic data can be converted into a text extraction slot value through sound, and the text extraction slot value can also be directly extracted from an online text;
step 2, when the electric power customer talks with the intelligent customer service, intelligent body dialogue data is obtained from a shared multi-field general dialogue corpus and an electric power English item corpus;
and 3, receiving the new capacity-increasing electricity application successfully, feeding back the information of the power customer by the intelligent dialog according to the multi-standard layered reinforcement learning dialog strategy until the requirement of the customer is met, and judging that the dialog is successful.
3. The new electric power capacity increasing conversation customer service method based on layered reinforcement learning as claimed in claim 2, characterized in that: the dialogue system is used for extracting text corpora of the electricity consumer; due to professional knowledge related to electricity utilization, a conversation strategy is decomposed into two reward values of a class 1 standard strategy and a class 2 standard strategy, wherein the class 1 standard strategy is called an external reward value and comprises multiple layers; decomposing the electric power professional knowledge for many times until the knowledge in the corpus and the database can cover all the contents; the class 2 standard strategy is called an internal reward value and comprises decomposed subtasks and actions; and the two reward values are respectively optimized for reinforcement learning, and guide the customer service system and learn.
4. The new electric power capacity increasing conversation customer service method based on layered reinforcement learning as claimed in claim 2, characterized in that: the corpus information comprises the number of the dialogues, the number of the signs of success or failure of the dialogues, the related information of the user power and the related information of the power replied by the system.
5. The new electric power capacity increasing conversation customer service method based on layered reinforcement learning as claimed in claim 2, characterized in that: the slot value information is used for decomposing the target of the conversation power customer into a series of slot values, and comprises the following steps:
the new capacity increasing tank value shows the new capacity increasing requirements of the power customers;
requesting a slot value, and displaying the information of the power customer inquiry dialogue system;
the slot value required by the electric power customer target is from a database set of daily electric power business hall service and real electric power customer conversation;
extracting all the slot values appearing in the dialogue paragraph, if one slot has a plurality of values, the slot is regarded as soft constraint of the power customer, and the user may change his option later to search for other options in the dialogue; if a slot value has only one option, then this is a hard constraint that cannot be negotiated; if a slot value is empty, it may be a demand of the power consumer, and if the value is not present in the database, the slot value is removed from the power consumer's possible target; the whole capacity increasing process at least needs 2 processes, capacity increasing value determination and engineering design unit determination, and both the capacity increasing value and the engineering design unit value comprise a plurality of numerical values.
6. The new electric power capacity increasing conversation customer service method based on layered reinforcement learning as claimed in claim 2, characterized in that: the multi-standard layered reinforcement learning dialogue strategy comprises the following steps: multi-layer class 1 standard dialog strategygnAnd single-layer class 2 dialogue strategya,gn
7. The new electric power capacity increasing conversation customer service method based on layered reinforcement learning as claimed in claim 6, wherein: said class 1Standard strategy pignObtaining a state s from the environment and selecting a subtask g, wherein the subtask can be further decomposed, and the number of decomposition layers is represented by n; all executable sub-tasks with reward values and termination conditions require the use of a class 2 standard policy πa,gn
8. The new electric power capacity increasing conversation customer service method based on layered reinforcement learning as claimed in claim 6, wherein: inputting a state s and a subtask gn into the class 2 standard strategy, and outputting a basic action a; subtask gn strategy 2 type standard strategy pia,gnKeeping constant input until a termination condition is reached to end the subtask gn; internal reward value r is provided by an internal rating mechanism in the dialog managert i(gnt) The reward signal is used for revealing whether the subtask gn is about to be completed or not, and the reward value signal is also used for optimizing the class 2 standard strategy pia,gn(ii) a The state s contains global information of the conversation and tracking information of all subtasks; to optimize class 2 criteria strategy πa,gnMaximizing the accumulated internal expected reward at each step t
Figure FDA0002279913110000021
In the above formula, rt+k iRepresenting internal evaluation reward in t + k steps, class 1 standard strategy pignOptimizing the accumulated reward value in the t step;
Figure FDA0002279913110000031
in the above formula, rt+k eRepresenting the reward value received externally from the environment when a new subtask starts, at step t + k, the internal and external reward values work together to cause the dialogue learning strategy to select the appropriate dialogue action.
9. The new electric power capacity increasing conversation customer service method based on layered reinforcement learning as claimed in claim 2, characterized in that: said class 1Standard strategy pignAnd class 2 Standard strategy πa,gnLearning by adopting a deep Q learning method; wherein, the class 1 standard dialogue strategy optimization Q function needs to satisfy:
Figure FDA0002279913110000032
in the above formula, N represents the standard dialog strategy of class 2a,gnThe number of steps required to complete a subtask; gn' represents in state st+NThe next subtask;
class 2 standard dialog strategy pia,gnThe optimized Q function satisfies:
Figure FDA0002279913110000033
in the above formula, Q1 *(s, gn) and Q2 *(s, a, gn) is represented by a neural network and is represented by θ1And theta2Parameterized as Q1(s,gn;θ1) And Q2(s,a,gn;θ2)。
10. The new electric power capacity increasing conversation customer service method based on layered reinforcement learning as claimed in claim 1, characterized in that: optimizing the performance of the dialogue system, defining a loss function of a training network, amplifying the action probability with positive reward value, and reducing the action probability with negative reward value;
the loss function for a class 1 standard dialog strategy at each iteration i is:
Figure FDA0002279913110000034
wherein
Figure FDA0002279913110000035
In the above formula, re=∑γkrt+k eA discount value representing the sum of rewards when the sub-target gn completes;n represents the number of steps at completion;
the class 2 standard dialog strategy minimum loss function is:
Figure FDA0002279913110000041
wherein
Figure FDA0002279913110000042
riRepresenting a prize value containing a discount factor; minimizing a loss function by a random gradient descent method;
updating the conversation strategy through the accumulated reward of the Q value, thereby realizing effective conversation with the client;
using the stochastic gradient descent method to minimize the loss function, for a class 1 standard dialog strategy gradient:
Figure FDA0002279913110000043
in the formula:
Figure FDA0002279913110000044
representing the decreasing gradient of the loss function, E representing the future expectation, D representing the empirical replay buffer, gamma representing the discount factor,
Figure FDA0002279913110000045
represents Q1A falling gradient function of the function;
the class 2 standard dialog strategy is:
Figure FDA0002279913110000046
in the formula:
Figure FDA0002279913110000047
representing the decreasing gradient of the loss function, E representing the future expectation, D representing the empirical replay buffer, gamma representing the discount factor,
Figure FDA0002279913110000048
represents Q2A falling gradient function of the function;
the conversation strategy performance is further improved, and the performance is improved by using two heuristic methods, namely a target network and experience playback; playback tuple of experiment (s, g, r)eS') and (s, g, a, r)iS'); the dialogue strategy function is continuously updated iteratively with each round of dialogue updating of the Q function until the final convergence.
CN201911137278.2A 2019-11-19 2019-11-19 Electric power new installation and capacity increase conversation customer service system and method based on layered reinforcement learning Pending CN111061846A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911137278.2A CN111061846A (en) 2019-11-19 2019-11-19 Electric power new installation and capacity increase conversation customer service system and method based on layered reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911137278.2A CN111061846A (en) 2019-11-19 2019-11-19 Electric power new installation and capacity increase conversation customer service system and method based on layered reinforcement learning

Publications (1)

Publication Number Publication Date
CN111061846A true CN111061846A (en) 2020-04-24

Family

ID=70298560

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911137278.2A Pending CN111061846A (en) 2019-11-19 2019-11-19 Electric power new installation and capacity increase conversation customer service system and method based on layered reinforcement learning

Country Status (1)

Country Link
CN (1) CN111061846A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112860869A (en) * 2021-03-11 2021-05-28 中国平安人寿保险股份有限公司 Dialogue method, device and storage medium based on hierarchical reinforcement learning network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108255934A (en) * 2017-12-07 2018-07-06 北京奇艺世纪科技有限公司 A kind of sound control method and device
CN108282587A (en) * 2018-01-19 2018-07-13 重庆邮电大学 Mobile customer service dialogue management method under being oriented to strategy based on status tracking
CN109817329A (en) * 2019-01-21 2019-05-28 暗物智能科技(广州)有限公司 A kind of medical treatment interrogation conversational system and the intensified learning method applied to the system
CN109829044A (en) * 2018-12-28 2019-05-31 北京百度网讯科技有限公司 Dialogue method, device and equipment
US20190324795A1 (en) * 2018-04-24 2019-10-24 Microsoft Technology Licensing, Llc Composite task execution

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108255934A (en) * 2017-12-07 2018-07-06 北京奇艺世纪科技有限公司 A kind of sound control method and device
CN108282587A (en) * 2018-01-19 2018-07-13 重庆邮电大学 Mobile customer service dialogue management method under being oriented to strategy based on status tracking
US20190324795A1 (en) * 2018-04-24 2019-10-24 Microsoft Technology Licensing, Llc Composite task execution
CN109829044A (en) * 2018-12-28 2019-05-31 北京百度网讯科技有限公司 Dialogue method, device and equipment
CN109817329A (en) * 2019-01-21 2019-05-28 暗物智能科技(广州)有限公司 A kind of medical treatment interrogation conversational system and the intensified learning method applied to the system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112860869A (en) * 2021-03-11 2021-05-28 中国平安人寿保险股份有限公司 Dialogue method, device and storage medium based on hierarchical reinforcement learning network
CN112860869B (en) * 2021-03-11 2023-02-03 中国平安人寿保险股份有限公司 Dialogue method, device and storage medium based on hierarchical reinforcement learning network

Similar Documents

Publication Publication Date Title
CA3040373C (en) Deep learning techniques based multi-purpose conversational agents for processing natural language queries
US11507756B2 (en) System and method for estimation of interlocutor intents and goals in turn-based electronic conversational flow
CN108170792A (en) Question and answer bootstrap technique, device and computer equipment based on artificial intelligence
CN107463601B (en) Dialog understanding system construction method, device and equipment based on artificial intelligence and computer readable storage medium
US10679006B2 (en) Skimming text using recurrent neural networks
US20190138652A1 (en) Real-time data input correction and facilitation of data entry at point of input
US20130097264A1 (en) System and method for optimizing response handling time and customer satisfaction scores
CN116127020A (en) Method for training generated large language model and searching method based on model
CN116226334A (en) Method for training generated large language model and searching method based on model
Kshetri et al. Big data and cloud computing for development: Lessons from key industries and economies in the global south
CN111651571A (en) Man-machine cooperation based session realization method, device, equipment and storage medium
CN111563158A (en) Text sorting method, sorting device, server and computer-readable storage medium
CN110399472A (en) Reminding method, device, computer equipment and storage medium are putd question in interview
CN111061846A (en) Electric power new installation and capacity increase conversation customer service system and method based on layered reinforcement learning
US20220026862A1 (en) Determination of task automation using an artificial intelligence model
CN113191880A (en) Bank teller terminal cash adding suggestion determination method and device
CN116777568A (en) Financial market transaction advanced intelligent dialogue ordering method, device and storage medium
US11610068B2 (en) Systems and method for intent messaging
CN109710939A (en) Method and apparatus for determining theme
CN109002498A (en) Interactive method, device, equipment and storage medium
CN111724767B (en) Spoken language understanding method based on Dirichlet variation self-encoder and related equipment
CN115186179A (en) Insurance product pushing method and device
US20210081600A1 (en) Coaching system and coaching method
CN107169585A (en) Film box office Forecasting Methodology, device and storage medium based on artificial intelligence
CN111782792A (en) Method and apparatus for information processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination