CN113366510A

CN113366510A - Performing multi-objective tasks via trained raw network and dual network

Info

Publication number: CN113366510A
Application number: CN202080010330.2A
Authority: CN
Inventors: A·坎图尔; G·尤齐尔; A·安贝-塔瓦尔
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2019-02-03
Filing date: 2020-02-03
Publication date: 2021-09-07
Also published as: JP2022518671A; GB2595123A; JP7361121B2; WO2020157731A1

Abstract

An example system includes a processor to receive data for a multi-objective task. The processor also performs the multi-objective task on the received data via the trained raw network. And training the original network and the dual network for the multi-target task by using a Lagrange loss function representing multiple targets. The original network is trained to minimize the lagrangian loss function, and the dual network is trained to maximize the lagrangian loss function.

Description

Performing multi-objective tasks via trained raw network and dual network

Background

The present technology relates to artificial neural networks. More particularly, the present technology relates to training and using neural networks to perform multi-objective tasks.

Disclosure of Invention

According to embodiments described herein, a system includes a processor to receive data for a multi-objective task. The processor may further perform the multi-objective task on the received data via a trained original network, wherein the original network and the dual network are trained for the multi-objective task using lagrangian loss functions representing multiple objectives. The original network is trained to minimize the lagrangian loss function, and the dual network is trained to maximize the lagrangian loss function.

According to another embodiment described herein, a method includes training an original network and a dual network for a multi-objective task using lagrangian loss functions representing multiple objectives. Training the original network and the dual network includes training the original network to minimize the Lagrangian loss function and training the dual network to maximize the Lagrangian loss function. The method may also include receiving data for the multi-objective task. The method may also include performing multi-objective tasks on the received data via the trained raw network.

According to an embodiment, a computer program product is provided comprising program code means adapted to perform the method as described in paragraph 3 or paragraph 7 when said program is run on a computer.

According to another embodiment described herein, a computer program product for training a neural network to perform multi-objective tasks includes a computer-readable storage medium having program code embodied therewith. The computer readable storage medium itself is not a transitory signal. The program code is executable by a processor to cause the processor to train an original network and a dual network for a multi-objective task using lagrangian loss functions representing multiple objectives. The program code may also cause the processor to train the original network to minimize the lagrangian loss function and train the dual network to maximize the lagrangian loss function. The program code may also cause the processor to receive data for a multi-objective task. The program code may also cause the processor to perform multi-objective tasks on the received data via the trained raw network.

According to one embodiment, there is provided a system comprising a processor configured to: receiving a prefix and a text input of a conversation; and generating a completed response based on the prefix of the session and the text input via a trained original network, wherein the original network is trained to minimize a lagrangian loss function representing multiple objectives and a dual network is trained to maximize the lagrangian loss function.

According to one embodiment, there is provided a computer-implemented method comprising: receiving a prefix and a text input of a conversation; and generating a finished response based on the prefix of the session and the text input via a trained original network, wherein the trained original network is trained to minimize a lagrangian loss function representing multiple objectives, and a dual network is trained to maximize the lagrangian loss function.

According to one embodiment, there is provided a computer program product for completing response generation, the computer program product comprising a computer readable storage medium having program code embodied therewith, wherein the computer readable storage medium is not itself a transitory signal, the program code executable by a processor to cause the processor to: training an original network to minimize a lagrangian loss function representing a plurality of objectives, and training a dual network to maximize the lagrangian loss function; receiving a prefix and a text input of a conversation; and generating, via the trained raw network, a completed response based on the prefix of the conversation and the text input.

Drawings

Preferred embodiments of the present invention will now be described, by way of example only, with reference to the following drawings:

FIG. 1 is a block diagram of an example min-max neural network that may train an original network to perform multi-objective tasks;

FIG. 2 is a process flow diagram of an example method that may perform multi-objective tasks using a trained raw network;

FIG. 3 is a process flow diagram of an example method that may perform automatic response generation using a trained raw network;

FIG. 4 is a process flow diagram of an example method for training an original network to perform multi-objective tasks;

FIG. 5 is a process flow diagram of an example method for training an original network to perform automatic response generation;

FIG. 6 is an exemplary primordial neural network including three Long Short Term Memory (LSTM) units;

FIG. 7A is a diagram of an example chat display including a generated set of completed responses;

FIG. 7B is a diagram of an example chat display including an updated set of completed responses generated;

FIG. 8 is a block diagram of an example computing device that may perform multi-objective tasks using an original network trained with a minimum-maximum neural network architecture;

FIG. 9 is a block diagram of an example computing device that may perform automatic response generation using an original network trained with a minimum-maximum neural network architecture;

FIG. 10 is a diagram of an example cloud computing environment, according to embodiments described herein;

FIG. 11 is a diagram of example abstraction model layers according to embodiments described herein;

FIG. 12 is a block diagram of an example tangible, non-transitory computer-readable medium that may perform multi-objective tasks using a trained minimum maximum neural network; and is

Fig. 13 is a block diagram of an example tangible, non-transitory computer-readable medium that may train an original network to perform automatic response generation.

Detailed Description

The neural network is trained using an objective function or a loss function on a training data set to perform a particular task related to a particular objective represented by the objective function in a training process. In some real-world applications, several targets may be sought to be performed simultaneously. One way to incorporate several objective functions into the learning framework may be by assigning relative weights to the objectives, which transforms a multi-objective problem into a single-objective problem. However, it may be difficult to optimize such weights, as it may not be clear how to compare different targets with different cells. Furthermore, the single target approach may not be suitable where the learner needs to meet the threshold for some targets. Furthermore, adjusting the weights and adjusting the weights for merging several objects into one single object may be computationally difficult.

In accordance with the techniques described herein, a system may include a processor to receive data for a multi-objective task including multiple objectives. The processor may perform multi-objective tasks on the received data via the trained raw network. Alternately training the original network and the dual network using Lagrangian loss functions representing the multiple objectives. In one example, a processor may receive a prefix and a text input for a conversation. The processor may then generate a completed response via the trained raw network based on the prefix of the conversation and the entered text. Thus, the techniques described herein enable training neural networks with multiple objectives without setting predefined relative weights. Further, the techniques provide improved automatic and semi-automatic responses to queries in sessions using neural networks trained on multiple targets. Furthermore, the techniques may be used in a variety of applications, including automated summarization of text, autonomous machine learning for interacting with its environment, and so forth.

In some scenarios, the techniques described herein may be implemented in a cloud computing environment. As discussed in more detail below with reference to at least fig. 8-13, computing devices configured to perform multi-objective tasks may be implemented in a cloud computing environment. It is to be understood in advance that although the present disclosure may include descriptions with respect to cloud computing, implementation of the teachings referenced herein is not limited to a cloud computing environment. Rather, embodiments of the invention can be implemented in connection with any other type of computing environment now known or later developed.

Cloud computing is a service delivery model for convenient, on-demand network access to a shared pool of configurable computing resources. Configurable computing resources are resources that can be deployed and released quickly with minimal administrative cost or interaction with a service provider, such as networks, network bandwidth, servers, processing, memory, storage, applications, virtual machines, and services. Such a cloud model may include at least five features, at least three service models, and at least four deployment models.

Is characterized by comprising the following steps:

self-service on demand: consumers of the cloud are able to unilaterally automatically deploy computing capabilities such as server time and network storage on demand without human interaction with the service provider.

Wide network access: computing power may be acquired over a network through standard mechanisms that facilitate the use of the cloud through heterogeneous thin or thick client platforms (e.g., mobile phones, laptops, Personal Digital Assistants (PDAs)).

Resource pool: the provider's computing resources are relegated to a resource pool and serve multiple consumers through a multi-tenant (multi-tenant) model, where different physical and virtual resources are dynamically allocated and reallocated as needed. Typically, the customer has no control or even knowledge of the exact location of the resources provided, but can specify the location at a higher level of abstraction (e.g., country, state, or data center), and thus has location independence.

Quick elasticity: computing power can be deployed quickly, flexibly (and sometimes automatically) to enable rapid expansion, and quickly released to shrink quickly. The computing power available for deployment tends to appear unlimited to consumers and can be available in any amount at any time.

Measurable service: cloud systems automatically control and optimize resource utility by utilizing some level of abstraction of metering capabilities appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled and reported, providing transparency for both service providers and consumers.

The service model is as follows:

software as a service (SaaS): the capability provided to the consumer is to use the provider's applications running on the cloud infrastructure. Applications may be accessed from various client devices through a thin client interface (e.g., web-based email) such as a web browser. The consumer does not manage nor control the underlying cloud infrastructure including networks, servers, operating systems, storage, or even individual application capabilities, except for limited user-specific application configuration settings.

Platform as a service (PaaS): the ability provided to the consumer is to deploy consumer-created or acquired applications on the cloud infrastructure, which are created using programming languages and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure, including networks, servers, operating systems, or storage, but has control over the applications that are deployed, and possibly also the application hosting environment configuration.

Infrastructure as a service (IaaS): the capabilities provided to the consumer are the processing, storage, network, and other underlying computing resources in which the consumer can deploy and run any software, including operating systems and applications. The consumer does not manage nor control the underlying cloud infrastructure, but has control over the operating system, storage, and applications deployed thereto, and may have limited control over selected network components (e.g., host firewalls).

The deployment model is as follows:

private cloud: the cloud infrastructure operates solely for an organization. The cloud infrastructure may be managed by the organization or a third party and may exist inside or outside the organization.

Community cloud: the cloud infrastructure is shared by several organizations and supports a specific community of common interest relationships, such as mission missions, security requirements, policy and compliance considerations. A community cloud may be managed by multiple organizations or third parties within a community and may exist within or outside of the community.

Public cloud: the cloud infrastructure is offered to the public or large industry groups and owned by organizations that sell cloud services.

Mixing cloud: the cloud infrastructure consists of two or more clouds (private, community, or public) of deployment models that remain unique entities but are bound together by standardized or proprietary technologies that enable data and application portability (e.g., cloud bursting traffic sharing technology for load balancing between clouds).

Computing environments are service-oriented with features focused on statelessness, low coupling, modularity, and semantic interoperability. At the heart of cloud computing is an infrastructure that contains a network of interconnected nodes.

Referring now to FIG. 1, a block diagram illustrates an example min-max neural network that may train an original network to perform multi-objective tasks. A minimum maximum neural network 100 is illustrated. Fig. 1 includes an original network 102 and a dual network 104. The original network 102 and the dual network 104 are communicatively coupled to a lagrangian function 106. The dual network 104 is shown outputting a pair of lagrangian multipliers 108. Original network 102 includes LSTM unit 110. Dual network 104 includes LSTM unit 112. The original network is shown receiving

values

114A, 114B, 114C and outputting values 116A-116D. The dual net is shown receiving

values

114A, 114B, and 114C.

In the example of fig. 1, the original network 102 and the dual network 104 are alternately trained with respect to each other using lagrangian functions 106. For example, the original network 102 may be trained to minimize the lagrangian function 106 while the dual network 104 is held constant. Similarly, the dual network 104 may be trained to maximize the lagrangian function while the original network 102 is held constant. Thus, alternating iterations of gradient descent against the policy and gradient ascent against the lagrangian multiplier may be performed.

In the example of fig. 1, the original network 102 is generated. Thus, given an input utterance that includes the words represented by the

values

114A, 114B, and 114C, the original network 102 generates output responses for the words represented by the values 116A-116D. Given the same input utterance represented by

values

114A, 114B, and 114C, dual network 104 outputs lagrangian multiplier 108 as part of lagrangian operator 106.

In particular, the decision problem may be a Markov Decision Process (MDP) with a finite state and action space. In general, a finite MDP can be represented as a tuple (X, a, R, D,

) Wherein X ═ {1, …, n, X_TerAre the state and motion spaces, respectively, and x ═ 1, …, m_TerIs the recursive terminal state. For state x and action a, R (x, a) may be a bounded reward function, and D₁(x,a),…,D_n(x, a) is a constraint cost function.

May be a transition probability distribution, and P₀(. cndot.) can be an initial state distribution. The fixed policy μ (· | x) for MDP is a probability distribution over actions that is conditional on the current state. In the policy gradient method, this policy can be parameterized by a k-dimensional vector θ, and using this notation, we can write the space of the policy as μ (· | X; θ), X ∈ X,

since in this setup the policy μ is uniquely defined by its parameter vector θ, the policy correlation function can be written as a function of μ or θ, and we use μ (· | x; θ) to represent the policy and θ to represent the correlation to the policy (parameter). For multi-objective MDP, the optimization can be represented using the following formula:

maximize_θv^θ(x₀) Equation 1

subject to

Wherein gamma is₁…γ_nIs a user defined threshold. To solve the above problem, equation 1 mayThe transformation is performed using a lagrangian relaxation process. The result is an unconstrained problem in the form of a lagrangian function, also referred to herein as the lagrangian function:

wherein λ_iIs the lagrange multiplier. In order to achieve the purpose of multiple targets, the original network can be trained to converge to the minimum-maximum strategy. In particular, alternating strategy gradient updates, such as gradient down with respect to the strategy and gradient up with respect to the lambda multiplier, may be used to converge to the optimal strategy. Furthermore, the lagrangian variables may be constrained by the state space. In other words, the bivariates can be viewed as different data-dependent models parameterized with a parameter ζ, where

As shown in the following equation 4:

the equations of equations 3 and 4 start with an enlarged search space and include the case where the lambda variable is constant.

Using the above expression in equation 4, two policies are modeled simultaneously using two different networks (original network 102 and dual network 104), as can be seen in fig. 1. In particular, the original network 102 and the dual network 104 of fig. 1 may be modeled as two LSTM models with different parameters for use with a chat robot or any other generating task. The objective function of the network is a lagrangian network, where the original network 102 is configured to find a strategy that minimizes the lagrangian function, while the dual network is configured to find a strategy that maximizes the lagrangian function.

Any number of multiple targets can be represented by the lagrangian multipliers and incorporated into the training of the original network. Some reward functions that may be incorporated in a particular generation task that generates a response may include a redundant invisibility goal, a semantic dissimilarity goal, and a semantic consistency goal. Redundant irrelevancy targets may improve the ease of answering in a dialog. For example, one of the desired attributes of having a well-generated dialog may be that in each round a semi-automatic or automated agent will generate a response that the other party will find it easy to respond to. Training a common Seq2Seq model may result in giving a high probability of redundant answers, such as "i don't know what you are talking about", "i don't know", etc. While these answers may be appropriate in terms of the complexity of the language model and suitable for a wide range of questions, it is difficult to continue the conversation after receiving such responses. Thus, a look-ahead function may be used to measure the ease with which an agent may answer a generated turn. In some examples, this may be done using a negative log likelihood of response to utterances with redundant responses. To build this functionality several redundant responses can be obtained a priori, so a set S of such responses can be built manually, which can occur very frequently in the SEQ2SEQ model of the conversation. Although it may be very difficult, if not impossible, to find all such responses manually, it may be assumed that similar responses will be closely embedded to the response on S. Thus, a system that is unlikely to generate utterances in the list is also unlikely to generate other blanking (null) responses. The following formula can be used to calculate the non-likelihood target for redundancy:

wherein)_SDenotes the base number of S and)_sRepresents the number of tokens in the response s and

is the probability of the language model.

In some examples, semantic dissimilarityThe goal may be to improve the flow of information. For example, another desirable attribute of a session proxy may be that in each turn, new information is to be added to the dialog. To generate a long dialog, each agent must provide new information in each turn. In other words, the dialog should flow and duplicate sequences should be avoided. Thus, in some instances, semantic similarity between successive rounds from the same agent may be penalized. In form, given e_pi,e_pi+1As an embedded representation of two consecutive rounds pi and pi +1 obtained from the encoder, the reward can be calculated by the negative logarithm of the cosine similarity between them, as shown in the formula:

where (·, ·) is the euclidean inner product and | is the euclidean norm.

In some examples, semantic consistency objectives may be included in training to improve semantic consistency. In addition to the previous rewards, another goal may be to ensure that the generated responses are truly consistent and related to the topic of the conversation, and more precisely, the topic of the previous turn of the conversation. This type of demand can be measured using mutual information between action a and previous history rounds to ensure that the generated response is consistent, and is appropriate using the following equation:

wherein due to

Representing a given previous conversational utterance [ pi, q ]_i]Generate a probability of response and

indicating the backward probability of generating the previous conversational utterance qi based on the response a. To train this model, one canThe same seq2seq (with the LSTM model of interest) is trained with source and target swapped. The reward may be divided by the length of the utterance to measure the loss.

To merge all of these rewards together, one of the rewards may be selected as the primary loss to the other objectives limited by the threshold. Thus, the problem can be converted to a multi-objective problem using the formula:

subject to

it should be understood that the block diagram of fig. 1 is not intended to indicate that the min-max neural network 100 will include all of the components shown in fig. 1. Conversely, the min-max neural network 100 may include fewer or additional components not shown in fig. 1 (e.g., additional inputs, outputs, models, neural networks, units, lagrange multipliers, etc.).

FIG. 2 is a process flow diagram of an example method that may perform multi-objective tasks using a trained raw network. The method 200 may be implemented with any suitable computing device, such as computing device 800 of fig. 8. For example, the method 200 may be implemented using the processor 802 of the computing device 800 of fig. 8 or using the processor 1202 and the computer-readable medium 1200 of fig. 12.

At block 202, the original network and the dual network of the min-max neural network are trained for the multi-objective task using lagrangian loss functions representing the multiple objectives. The original network is trained to minimize the lagrangian loss function, and the dual network is trained to maximize the lagrangian loss function. In some examples, the multi-objective task is a markov decision process that includes a finite state space and a finite action space. In different examples, the original network is pre-trained using a general strategy learned from another setting during training or randomly initializing the original network. In some examples, the dual networks are randomly initialized during training. In some examples, the gradients of the original network and the dual network are estimated based on likelihood ratios. In various examples, policy gradients of the original network and the dual network are alternately updated based on different step sizes of the original network and the dual network. In some examples, the original network and the dual network are instead trained using pre-existing datasets, simulators, feedback from the environment, or any combination thereof. For example, the min-max neural network may be trained using the method 400 of fig. 4.

At block 204, data for a multi-objective task is received. For example, in the case of automated response generation, the data may include words from the input text and prefixes of the conversation.

At block 206, multi-objective tasks are performed on the received data via the trained raw network. For example, the multi-objective task may be an automatic response generation, selection, classification, or any other multi-objective task that may be performed using a neural network.

As illustrated by arrow 208, in some examples, additional data for the multi-objective task and additional multi-objective tasks performed based on the additional data may be received. For example, the additional data may be additional input text and an additional prefix for the conversation.

The process flow diagram of fig. 2 is not intended to indicate that the operations of method 200 are to be performed in any particular order, or that all of the operations of method 200 are to be included in each case. Additionally, method 200 may include any suitable number of additional operations.

FIG. 3 is a process flow diagram of an example method that may perform automatic response generation using a trained raw network. Method 300 may be implemented with any suitable computing device, such as computing device 900 of fig. 9. For example, the method 300 may be implemented using the processor 802 of the computing device 900 of fig. 9 or using the processor 1302 and the computer-readable medium 1300 of fig. 13.

At block 302, a prefix and text input for a conversation is received. For example, the prefix of the session may include one or more rounds of conversation between the first user and the second user. The text input includes one or more words entered by the first user in response to a query from the second user.

At block 304, a completed response is generated via the trained raw network based on the prefix of the conversation and the entered text. The trained original network is trained to minimize lagrangian loss functions representing multiple objectives. The dual network is trained to maximize the lagrangian loss function. For example, the original network and the dual network may be alternately trained to minimize and maximize the lagrangian loss function. In various examples, the original network is trained using a first limit for multiple rounds of sessions and incrementally increasing the limit to a second limit for the multiple rounds of sessions. In some examples, the original network is trained using sequences that are less likely to generate redundant responses among all sequences in the training dataset. In some examples, multiple completed responses may be generated. In some examples, a completed response may be iteratively constructed word-by-word starting from the text input. In some examples, beam searching is used to generate multiple completed responses.

At block 306, the completed response, including the completed response, is presented to the first user for selection. For example, the completed responses may be displayed as a list as shown in fig. 7A and 7B.

At block 308, the selected response is received from the completed responses. For example, the user may select a response by clicking on the response or scrolling down and selecting the response from a list of responses or by adding completed responses word by word.

At block 310, the selected response is sent to the second user. For example, the selected response may be sent to the second user as if the first user typed the response and sent the response. Thus, the selected response may be sent via a communication tool or application. In some examples, additional queries may be received from the second user, and the method may begin again at block 302.

The process flow diagram of fig. 3 is not intended to indicate that the operations of method 300 are to be performed in any particular order, or that all of the operations of method 300 are to be included in each case. Additionally, method 300 may include any suitable number of additional operations. For example, method 300 may be repeated for additional received prefixes of the conversation and text input. In some examples, method 300 may include sending a completed response as a response to the query in response to detecting that a confidence score of the completed response exceeds a threshold score.

FIG. 4 is a process flow diagram of an example method for training an original network to perform multi-objective tasks. Method 400 may be implemented with any suitable computing device, such as computing device 800 of fig. 8. For example, the method 400 may be implemented using the processor 802 of the computing device 800 of fig. 8 or using the processor 1202 and the computer-readable medium 1200 of fig. 12.

At block 402, a training data set and multiple targets are received. The training data set may include data that depends on the particular multi-objective task to be performed. For example, the data set of the generative text task may include a conversation as discussed with respect to fig. 5. These goals may include relevance, risk reduction, reduced redundancy, reduced semantic similarity and semantic consistency, among other possible goals. In some examples, the target may be received in the form of a loss function.

At block 404, the original network and the dual network of the minimum neural network to be trained are initialized. For example, the original network may be pre-trained using a generic response strategy learned from fully supervised settings, or randomly initialized. The dual networks may be randomly initialized.

At block 406, the original network and the dual network are alternately trained using lagrangian loss functions representing multiple objectives. In some examples, the multi-objective task may be a Markov decision process that includes a finite state space and a finite action space. The original network may be trained to minimize the lagrangian loss function, and the dual networks may be alternately trained to maximize the lagrangian loss function.

At block 408, policy gradients for the original network and the dual network are updated based on the different step sizes of the original network and the dual network. In some examples, gradients of the original network and the dual network are estimated based on a likelihood ratio estimator.

At decision diamond 410, a determination is made as to whether the training is exhausted. For example, a preset condition on the verification of the segmentation may be set before training, or a manual check of some measure may be performed.

At block 412, training ends. The original network may then be used to perform multi-objective tasks on the received data (as shown in FIG. 2).

The process flow diagram of fig. 4 is not intended to indicate that the operations of method 400 are to be performed in any particular order, or that all of the operations of method 400 are to be included in each case. Additionally, method 400 may include any suitable number of additional operations. For example, additional decision diamonds or conditions, or even manual inspection of different metrics measured during training, may be included in the method 400.

FIG. 5 is a process flow diagram of an example method for training an original network to perform automatic response generation. Method 500 may be implemented with any suitable computing device, such as computing device 900 of fig. 9. For example, the method 500 may be implemented using the processor 802 of the computing device 900 of fig. 9 or using the processor 1302 and the computer-readable medium 1300 of fig. 13.

At block 502, a training data set and multiple targets are received. For example, the training data set may be the OpenSubtit1es data set of a movie session. The opensutit 1es data set contains sentences that are uttered by characters in a movie. For example, in the dataset, each utterance can be considered as a response to a previous utterance and as a context for the next response. Thus the training and verification split may include 62 million sentences (923 million tokens) as training examples, and the test set may have 26 million sentences (395 million tokens). The splitting is carried out in such a way that: each sentence of a pair of sentences appears either together in the training set or the test set, but not simultaneously in the training set or the test set. Given the broad range of movies, this is an open domain session data set. Each turn in the dataset may be considered a target and a concatenation of two previous sentences may be considered a source input.

At block 504, the original network is pre-trained based on the pre-selected model and the dual networks are randomly initialized. For example, the original network may be initialized with a sequence-to-sequence (Seq2Seq) language model. In some examples, a Reinforcement Learning (RL) system is initialized using a general response generation policy learned from fully supervised settings. SelectingThe model selected for pre-training may be a simple model and may be replaced with any other model. The generated sentences may be viewed as actions taken according to policies defined by the Seq2Seq language model. A policy may be given by a given state as

Is defined by a probability distribution over the actions. In some examples, this probability distribution is modeled using a Seq2Seq LSTM model. In some instances, the LSTM model may be replaced with any other suitable another language generation model.

Thus, the action may be the generated utterance. Since the training of method 500 includes a dialog, the state space may include information about the past round of the session. For example, the status may include the previous two dialog wheels [ pi, qi ]. The vector representation of the state is thus encoded by the concatenation of the previously generated response pi and the response from the second agent qi.

At block 506, the original network and the dual network of the minimum neural network are alternately trained on the training data set using lagrangian loss functions to represent multiple objectives. In some examples, the minimum neural network is trained using an initial limit of two rounds of the session, and the limit is gradually increased to five rounds of the session. In some examples, the min-max neural network is trained using a predetermined number of sequences, including sequences that have a lower likelihood of generating redundant responses than other sequences in the training data set. In some examples, as another helpful step in conversation simulation, a subset of 100 ten thousand messages from the opensutit 1e dataset may be retrieved and 800 ten thousand sequences extracted with the lowest probability of generating redundant responses from the set to ensure that the initial input is easy to respond.

At block 508, the policy gradients for the original network and the dual network are alternately updated based on the different step sizes. For example, different time scale methods may be used to update the policy gradient alternately. Since the min-max network training includes two different (alternating) gradient strategy updates, each of the original network and the dual network will have a different step size. For example, during training, the step size of the original network may be an order of magnitude different from the step size of the dual network. Thus, the original network will receive a higher convergence rate, while the dual network will receive a smaller step size and will therefore converge slower. In some examples, to estimate the gradient of the strategy, a likelihood ratio estimator is used. The likelihood ratio estimator may estimate the gradient based on statistical theory. For example, the likelihood ratio estimator may use the REINFORCE technique released in 1992.

At decision diamond 510, a determination is made as to whether training is exhausted. For example, a preset number of training iterations may be set prior to training.

At block 512, training ends. The original network may then be used to generate an automatic response to the received query, as described above in FIG. 3.

The process flow diagram of fig. 5 is not intended to indicate that the operations of method 500 are to be performed in any particular order, or that all of the operations of method 500 are to be included in each case. Additionally, method 500 may include any suitable number of additional operations. For example, additional decision diamonds or conditions may be included in the method 500, or even different metrics measured during training may be checked manually.

FIG. 6 is an exemplary primordial neural network that includes three Long Short Term Memory (LSTM) units. The example neural network 600 may be trained using the method 500 and may be used to generate a completed response in the method 300 and the computing device 900 of fig. 9. For example, the neural network 600 may be a sequence-to-sequence deep learning architecture with or without a mechanism of interest. Fig. 6 includes three

LSTM units

602, 604, and 606. The first LSTM unit 602 includes

words

608A, 608B, 608C, and 608D corresponding to the customer query. Second LSTM unit 604 includes

text inputs

610A and 610B corresponding to text inputs from a human agent. The third LSTM cell 606 includes

words

612A, 612B, 612C, and 612D corresponding to the completion portion of the completion response.

As shown in fig. 6, a first LSTM 602 receives the full prefix of a client's query or session word by word and encodes the words into a fixed-length hidden state vector hA. Second LSTM604 receives text input from the human agent and converts vector hA to hidden state vector hB by encoding the text input word-by-word. The third LSTM 606 transforms (decodes) the vector hB into a sequence of output words 612A-612D, which is the completion of the completion response. A completed response may be generated by concatenating

text input

610A, 610B with completions 612A-612D. In some examples, rather than generating one completed answer, the neural network 600 may extract several completed responses. For example, a beam search may be used to extract several completed responses.

In the training phase, all historical sessions are converted into training triples that include the client's query, the start of the response of the corresponding human agent, and the end of the agent's response. The human agent reaction is divided into all combinations of beginning and ending. The response may be segmented at each word to generate different training samples. The neural network 600 may be trained using multiple objectives via lagrangian loss functions in conjunction with all objectives. For example, objectives used during training may include word-by-word probabilities, complexity and relevance of the end of the agent's response, and other objectives described herein.

FIG. 7A is a diagram of an example chat display including a generated set of completion responses. The example chat display 700A may be generated using the computing device 900 of fig. 9 using the

methods

300 and 500 of fig. 3 and 5.

In FIG. 7A, a chat display 700A of a human customer service agent over a communication channel is shown. The first message 702 in fig. 7A is automatically generated by the company. A second message is received from the client 704. At the bottom of the screen, above the horizontal line, the human agent is typing his response. In plain text, we see the proxy's text input 706A "I'll Be Happy". Below text entry 706A, three

suggestions

708A, 708B, and 708C are displayed for the agent on how to complete the response. For example,

suggestions

708A, 708B, and 708C may be generated using techniques described herein. The selected suggestion 708A also appears above the line, continuous with the text entry in the selected text. As can be seen in the example chat display 700A, the three

automated suggestions

708A, 708B, and 708C can be based on both the session context (including the first message 702 and the second message 704) and the text input 706A of the agent.

FIG. 7B is a diagram of an example chat display including an updated set of completion responses generated. The example chat display 700A may be generated using the computing device 900 of fig. 9 using the

methods

300 and 500 of fig. 3 and 5.

In fig. 7B, as the agent continues to type, a new set of suggested

responses

708D, 708E, 708F is displayed below the updated text input 706B in the updated chat display 700B. Thus, a new suggested response may be generated in real-time as a proxy type. For example, each time the agent enters an additional word into text input 706B, a new set of suggested responses may be generated. If one of the suggested responses (such as the selected response 708E) is correct, the agent may select the response and the response 708E will be sent to the user.

Thus, given a prefix of a conversation between the human agent and the customer that contains zero or more text messages, and given initial text input by the agent that corresponds to a partial response, the application may suggest one or more completed responses. Completion may correspond to a particular text entry and an ongoing conversation. In some examples, other types of data and metadata besides text may be included as part of a session prefix, text input, or even a completed response. For example, the metadata may include images, videos, web links, and the like. In some examples, completed responses may be automatically learned from historical sessions. For example, historical sessions may be used as training data to train a neural network for generating a completed response. Similarly, historical sessions can also be used to train neural networks to create end-to-end solutions, such as chat robots. The application may continue to improve with each new session because the neural network may be trained on additional sessions. In some examples, the completed response may be fully automated without human intervention in those portions of the session where the response is identified with high confidence even before the agent types a single word. Thus, time may be saved by using suggested responses or automatically sending high confidence responses rather than typing them in at all. In addition, the quality of the response may be improved by removing opportunities for typing and other errors.

FIG. 8 is a block diagram of an example computing device that may perform multi-objective tasks using raw networks trained using a minimum-maximum neural network architecture. Computing device 800 may be, for example, a server, a desktop computer, a laptop computer, a tablet computer, or a smartphone. In some examples, computing device 800 may be a cloud computing node. Computing device 800 may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. Computing device 800 may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

Computing device 800 may include a processor 802 for executing stored instructions, a memory device 804 for providing temporary memory space for the operation of the instructions during operation. The processor may be a single core processor, a multi-core processor, a compute cluster, or any number of other configurations. The memory 804 may include Random Access Memory (RAM), read-only memory, flash memory, or any other suitable memory system.

The processors 802 may communicate with a system interconnect 806 (e.g.,

etc.) to an input/output (I/O) device interface 808 adapted to connect the computing device 800 to one or more I/O devices 810. The I/O devices 810 may include, for example, a keyboard and a pointing device, wherein the pointing device may include a touchpad or a touchscreen, among others. The I/O device 810 may be a built-in component of the computing device 800 or may be a device that is externally connected to the computing device 800.

The processor 802 may also be linked through a system interconnect 806 to a display interface 812 suitable for connecting the computing device 800 to a display device 814. Display device 814 may include a display screen as a built-in component of computing device 800. Display device 814 may also include a computer monitor, television, or projector, among others, that is externally connected to computing device 800. In addition, a Network Interface Controller (NIC)816 may be adapted to connect computing device 800 to a network 818 via system interconnect 806. In some embodiments, the NIC816 may use any suitable interface or protocol (such as an Internet Small computer System interface, etc.) to communicate data. The network 818 may be a cellular network, a radio network, a Wide Area Network (WAN), a Local Area Network (LAN), the internet, or the like. An external computing device 820 may be connected to the computing device 800 through the network 818. In some examples, the external computing device 820 may be an external web server 820. In some examples, external computing device 820 may be a cloud computing node.

The processor 802 may also be linked through the system interconnect 806 to a storage device 822, which storage device 822 may include a hard disk drive, an optical disk drive, a USB flash drive, an array of drives, or any combination thereof. In some examples, the storage device may include receiver 824, original network 826, and training network 828. Receiver 824 may receive data for multiple target tasks. Multiple objectives may also be received, for example, in the form of a loss function. The target task may be a selection task, a classification task, or a generation task, among other possible tasks. For example, multi-objective tasks may include selection, classification, regression, recommendation, generation, or any other type of predictive task. Raw network 826 may be trained to perform multi-objective tasks on received data via training network 828. For example, the training network 828 may be a min-max neural network. For example, the training network 828 may include an original network and a dual network. The training network 828 may use lagrangian loss functions representing the number of targets to train the original network and the dual network for multi-target tasks. The training network 828 trains the original network to minimize the lagrangian loss function and the dual network to maximize the lagrangian loss function. In some examples, the multi-objective task is a markov decision process that includes a finite state space and a finite action space. In some examples, training network 828 pre-trains the original network using a general strategy or random initialization learned from another setting. The training network 828 may randomly initialize the dual networks during training. In some examples, the original network has a step size during training that is less than the step size of the dual network. For example, the step size of the original network may be an order of magnitude or more smaller than the step size of the dual network. In some examples, training network 828 may estimate the gradient based on likelihood ratio estimates. For example, the training network 828 may use the method 400 of fig. 4 to train the original network.

It should be understood that the block diagram of fig. 8 is not intended to indicate that the computing device 800 will include all of the components shown in fig. 8. Rather, computing device 800 may include fewer or additional components (e.g., additional memory components, embedded controllers, modules, additional network interfaces, etc.) not illustrated in fig. 8. Further, any of the functions of receiver 824, original network 826, and training network 828 may be partially or fully implemented in hardware and/or processor 802. For example, the functionality may be implemented with an application specific integrated circuit, logic implemented in an embedded controller, logic implemented in the processor 802, or the like. In some embodiments, the functionality of receiver 824, original network 826, and training network 828 may be implemented in logic, where logic as referred to herein may comprise any suitable hardware (e.g., processor, etc.), software (e.g., application, etc.), firmware, or any suitable combination of hardware, software, and firmware.

FIG. 9 is a block diagram of an example computing device that may perform automatic response generation using a raw network trained with a minimum-maximum neural network architecture. Computing device 900 of fig. 9 includes like numbering of the elements of fig. 8. Additionally, computing device 900 includes a response display 902 and a response transmitter 904.

In example computing device 900, receiver 824 may receive a prefix and text input for a session. For example, the prefix of the conversation may include a dialog between the first user and the second user, and the text input may include a portion of the completion response. The session-based prefix and text input generates a completed response via training network 828 and training original network 826 using a pre-existing dataset. For example, the training network 828 may be a min-max neural network. For example, the training network 828 may include an original network and a dual network trained using lagrangian loss functions representing multiple objectives. Multiple objectives may include complex objectives, relevance objectives, redundant non-likelihood objectives, semantic dissimilarity objectives, semantic consistency objectives, and other objectives, or any combination thereof. In some examples, the original network and the dual network are Long Short Term Memory (LSTM) models with different parameters. In some examples, the trained raw network 826 may be trained to minimize the lagrangian loss function by generating a completed response. The dual network is trained to maximize the lagrangian loss function. Response display 902 can display the completed response generated by original network 826. For example, the response display 902 can present a plurality of completed responses, including completed responses, to the user for selection. For example, the completed response may be displayed as a list in an application such as in fig. 7A and 7B. The response sender 904 may receive the selected response from the completed responses and send the selected response to the second user.

Referring now to fig. 10, an illustrative cloud computing environment 1000 is depicted. As shown, cloud computing environment 1000 includes one or more cloud computing nodes 1002, with which local computing devices used by cloud consumers, such as Personal Digital Assistants (PDAs) or cellular telephones 1004A, desktop computers 1004B, laptop computers 1004C, and/or automobile computer systems 1004N, may communicate. The nodes 1002 may communicate with each other. They may be physically or virtually grouped (not shown) in one or more networks, such as private, community, public, or hybrid clouds described above, or a combination thereof. This allows the cloud computing environment 1000 to provide infrastructure, platforms, and/or software as a service without the cloud consumer needing to maintain resources on the local computing device. It should be appreciated that the types of computing devices 1004A-N shown in fig. 10 are intended to be illustrative only, and that computing node 1002 and cloud computing environment 1000 may communicate with any type of computerized device over any type of network and/or network addressable connection (e.g., using a web browser).

Referring now to fig. 11, a set of functional abstraction layers provided by cloud computing environment 1000 (fig. 10) is illustrated. It should be understood in advance that the components, layers, and functions shown in fig. 11 are intended to be illustrative only and embodiments of the invention are not limited thereto. As depicted, the following layers and corresponding functionality are provided.

The hardware and software layer 1100 includes hardware and software components. Examples of hardware components include mainframes, in one example mainframes

A system; RISC (reduced instruction set computer) architecture based server, in one example IBM

A system; IBM

A system; IBM

A system; a storage device; networks and networking components. Examples of software components include web application server software, in one example, IBM

Application server software; and database software, in one example, IBM

Database software. (IBM, zSeries, pSeries, xSeries, BladeCenter, WebSphere, and DB2 are trademarks registered by International Business machines corporation in many jurisdictions around the world).

The virtualization layer 1102 provides an abstraction layer from which the following examples of virtual entities may be provided: a virtual server; virtual storage; virtual networks, including virtual private networks; virtual applications and operating systems; and a virtual client. In one example, the management layer 1104 can provide the functionality described below. Resource provisioning provides dynamic acquisition of computing resources and other resources for performing tasks within a cloud computing environment. Metering and pricing provide cost tracking when resources are utilized within a cloud computing environment and account or invoice for the consumption of these resources. In one example, these resources may include application software licenses. Security provides authentication for cloud consumers and tasks, as well as protection of data and other resources. The user portal provides access to the cloud computing environment for consumers and system administrators. Service level management provides cloud computing resource allocation and management such that a desired service level is met. Service Level Agreement (SLA) planning and fulfillment provides for prearrangement and procurement of cloud computing resources, whose future requirements are anticipated according to the SLA.

Workload layer 1106 provides an example of the functionality that may utilize a cloud computing environment. Examples of workloads and functions that may be provided from this layer include: maps and navigation; software development and lifecycle management; virtual classroom education delivery; analyzing and processing data; transaction processing; and multi-objective task processing.

The present technology may be a system, method or computer program product. The computer program product may include a computer-readable storage medium (or media) having computer-readable program instructions thereon for causing a processor to perform various aspects of the present invention.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present technology may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry may execute computer-readable program instructions to implement aspects of the present technology by utilizing state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present technology are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the technology. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

Referring now to fig. 12, a block diagram of an example tangible, non-transitory computer-readable medium 1200 that can train an original network to perform multi-objective tasks is depicted. The tangible, non-transitory computer-readable medium 1200 is accessible by the processor 1202 through a computer interconnect 1204. Further, the tangible, non-transitory computer-readable medium 1200 may include code to direct the processor 1202 to perform the operations of the

methods

200 and 400 of fig. 2 and 4.

The different software components discussed herein may be stored on a tangible, non-transitory computer readable medium 1200, as indicated in fig. 12. For example, the network training module 1206 includes code for training the original network and the dual network for the multi-objective task using lagrangian loss functions representing multiple objectives. The network training module 1206 also includes code for training the original network to minimize a Lagrangian loss function and training the dual network to maximize the Lagrangian loss function. The network training module 1206 may also include code for processing the multi-objective task into a Markov decision process that includes a finite state space and a finite action space. In different examples, the network training module 1206 includes code to alternately train the original network and the dual network using a pre-existing dataset, a simulator, feedback from the environment, or any combination thereof. In some examples, network training module 1206 includes code to pre-train the original network using a general strategy learned from another setting or by randomly initializing the original network during training. In various examples, network training module 1206 includes code to randomly initialize dual networks during training. In some examples, network training module 1206 includes code to estimate gradients of the original network and the dual network based on likelihood ratios. In some examples, network training module 1206 includes code to update policy gradients for the original network and the dual network based on different step sizes of the original network and the dual network. Receiver module 1208 includes code for receiving data for a multi-target task that includes multiple targets. The raw neural network module 1210 includes code for performing multi-objective tasks including multiple objectives on the received data via the trained raw network. It should be understood that any number of additional software components not shown in fig. 12 may be included within tangible, non-transitory computer-readable medium 1200 depending on the particular application.

Referring now to fig. 13, a block diagram of an example tangible, non-transitory computer-readable medium 1300 that can train an original network to perform automatic response generation is depicted. The tangible, non-transitory computer-readable medium 1300 is accessible by the processor 1302 through a computer interconnect 1304. Further, the tangible, non-transitory computer-readable medium 1300 may include code to direct the processor 1302 to perform the operations of the

methods

300 and 500 of fig. 3 and 5 above.

The different software components discussed herein may be stored on a tangible, non-transitory computer readable medium 1300, as indicated in fig. 13. For example, the network training module 1306 includes code for training the original network to minimize a lagrangian loss function representing multiple objectives and training the dual network to maximize the lagrangian loss function. The network training module 1306 may also include code for training the minimum neural network using the first limit of the round of sessions and gradually increasing the limit to a second limit of the round of sessions. As one example, the first limit may be a two-round conversation and the second limit may be a five-round conversation. The receiver module 1308 includes code for receiving a prefix and text input for a conversation. The raw neural network module 1310 includes code for generating a completed response based on the prefix and the text input for the session. For example, the original neural network module 1310 may include code for iteratively constructing a sentence starting with the text input word by word. The primitive neural network module 1310 also includes code for generating a plurality of completed responses. For example, the raw neural network module 1310 may include code for generating a plurality of completed responses including a completed response using a beam search. Response display module 1312 includes code for presenting a plurality of completed responses, including completed responses, to the user for selection. The response sending module 1314 includes code for receiving the selected response from the completed responses and sending the selected response to the second user. The response sending module 1314 may include code for automatically sending a completed response as a response to the query in response to detecting that the confidence score of the completed response exceeds the threshold score. It should be understood that any number of additional software components not shown in fig. 13 may be included within the tangible, non-transitory computer-readable medium 1300 depending on the particular application.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present technology. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It should be understood that any number of additional software components not shown in fig. 12 and 13 may be included within the tangible, non-transitory computer-

readable media

1200 and 1300 depending on the particular application.

The description of different embodiments of the present technology has been presented for purposes of illustration but is not intended to be exhaustive or limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application, or technical improvements found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A system comprising a processor configured to:

receiving data for a multi-objective task; and

performing the multi-objective task on the received data via a trained original network, wherein the original network and a dual network are trained on the multi-objective task using a Lagrangian loss function representing a multi-objective, wherein training the original network minimizes the Lagrangian loss function and training the dual network is maximized the Lagrangian loss function.

2. The system of claim 1, wherein the multi-objective task comprises a markov decision process comprising a finite state space and a finite action space.

3. The system of claim 1, wherein the original network is pre-trained using a general strategy learned from another setting or random initialization.

4. The system of claim 1, wherein the dual network is randomly initialized during training.

5. The system of claim 1, wherein the original network includes a step size during training that is different from a step size of the dual network.

6. The system of claim 1, wherein the processor is operable to estimate a gradient based on a likelihood ratio estimate.

7. The system of claim 1, wherein the multi-objective tasks include selection, classification, regression, recommendation, generation, or prediction tasks.

8. The system of claim 1, wherein the data received by the processor is a prefix of a conversation and a text input, and wherein the processor is operable to generate a completed response based on the prefix of the conversation and the text input via the trained raw network.

9. The system of claim 8, wherein the processor is operable to:

generating a plurality of completed responses;

presenting the plurality of completed responses, including the completed response, to a user for selection;

receiving a selected response from the completed responses; and

sending the selected response to the second user.

10. The system of claim 8, wherein the prefix of the conversation comprises a dialog between the first user and the second user, and the text input comprises a portion of the completed response.

11. The system of claim 8, wherein the original network and the dual network comprise Long Short Term Memory (LSTM) models with different parameters and possibly additional network elements.

12. The system of claim 8, wherein the plurality of objectives comprise complex objectives or relevance objectives.

13. The system of claim 8, wherein the multiple objectives comprise a redundant non-likelihood objective or a semantic dissimilarity objective.

14. The system of claim 13, wherein the multiple targets comprise semantic consistency targets.

15. A computer-implemented method, comprising:

training an original network and a dual network for a multi-objective task using lagrangian loss functions representing multiple objectives, wherein training the original network and the dual network comprises training the original network to minimize the lagrangian loss functions and training the dual network to maximize the lagrangian loss functions;

receiving data for the multi-objective task; and

performing the multi-objective task on the received data via the trained raw network.

16. The computer-implemented method of claim 15, comprising the multi-objective task as a markov decision process, the markov decision process comprising a finite state space and a finite action space.

17. The computer-implemented method of claim 15, comprising: the original network is pre-trained using a general strategy learned from another setting or randomly initializing the original network during training.

18. The computer-implemented method of claim 15, comprising randomly initializing the dual network during training.

19. The computer-implemented method of claim 15, wherein training the original network and the dual network comprises estimating gradients of the original network and the dual network based on likelihood ratios.

20. The computer-implemented method of claim 15, comprising updating policy gradients for the original network and the dual network based on different step sizes of the original network and the dual network.

21. The computer-implemented method of claim 15, wherein training the original network and the dual network comprises training the original network and the dual network alternately.

22. The computer-implemented method of claim 15, wherein the received data is a prefix and a text input for a conversation, and wherein the method comprises:

generating, via the trained raw network, a completed response based on the prefix of the conversation and the text input.

23. The computer-implemented method of claim 22, comprising:

generating a plurality of completed responses;

receiving a selected response from the completed responses; and

sending the selected response to the second user.

24. The computer-implemented method of claim 22, comprising:

in response to detecting that the confidence score of the completed response exceeds a threshold score, sending the completed response as a response to a query.

25. The computer-implemented method of claim 22, wherein generating the completed response comprises iteratively constructing the completed response starting with the text input word by word.

26. The computer-implemented method of claim 22, wherein generating the completed response comprises beam searching to generate a plurality of completed responses.

27. The computer-implemented method of claim 22, comprising: training the original network using a first limit of a round of sessions, and incrementally increasing the first limit to a second limit of a round of sessions.

28. The computer-implemented method of claim 22, comprising: the original network is trained using sequences that have a lower likelihood of generating redundant responses among all sequences in a training data set.

29. A computer program product for training a neural network to perform multi-objective tasks, the computer program product comprising a computer-readable storage medium having program code embodied therewith, wherein the computer-readable storage medium is not itself a transitory signal, the program code executable by a processor to cause the processor to:

training an original network and a dual network for a multi-objective task by using a Lagrangian loss function representing multiple objectives;

training the original network to minimize the Lagrangian loss function and training the dual network to maximize the Lagrangian loss function;

receiving data for the multi-objective task; and

30. The computer program product of claim 29, further comprising program code executable by the processor to train the original network and the dual network using a pre-existing data set, a simulator, feedback from an environment, or any combination thereof.

31. The computer program product of claim 29, further comprising program code executable by the processor for pre-training the original network using a generic strategy learned from another setting or by randomly initializing the original network during training.

32. The computer program product of claim 29, further comprising program code executable by the processor to estimate a gradient of the original network and the dual network based on likelihood ratios.

33. The computer program product of claim 29, further comprising program code executable by the processor to update the policy gradients of the original network and the dual network based on different step sizes of the original network and the dual network.

34. The computer program product of claim 29, further comprising program code executable by the processor to randomly initialize the dual networks during training.

35. The computer program product of claim 29, wherein the received data is a prefix and a text input for a conversation, and wherein the program code is executable by the processor to cause the processor to:

36. The computer program product of claim 35, further comprising program code executable by the processor for:

generating a plurality of completed responses;

receiving a selected response from the completed responses; and

sending the selected response to the second user.

37. The computer program product of claim 35, further comprising program code executable by the processor for:

38. The computer program product of claim 35, further comprising program code executable by the processor to iteratively construct a sentence starting with the text input word by word.

39. The computer program product of claim 35, further comprising program code executable by the processor to generate a plurality of completed responses including the completed response using beam searching.

40. The computer program product of claim 35, further comprising program code executable by the processor to: training the original network using a first limit of a round of sessions, and gradually increasing the first limit to a second limit of a round of sessions.

41. A computer program comprising program code means adapted to perform the method of any of claims 1 to 28 when said program is run on a computer.