CN112189192A

CN112189192A - System and method for training and using chat robots

Info

Publication number: CN112189192A
Application number: CN201880093787.7A
Authority: CN
Inventors: 车正平; 刘燕; 江嵩; 姜波
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2018-06-02
Filing date: 2018-06-02
Publication date: 2021-01-05
Also published as: WO2019227505A1

Abstract

A system and method for training an emotion-soothing chat robot is provided. The method may include obtaining a corpus; applying one or more machine learning processes to the corpus to train a chat robot model to obtain a machine learning chat robot model; applying one or more machine learning processes to the corpus to train an emotion prediction model to obtain a machine-learned emotion prediction model; and applying one or more machine learning processes to the corpus to train the emotion-pacifying chat robot model to obtain the machine-learned emotion-pacifying chat robot model. An emotion-soothing chat robot model may be constructed based on the machine-learned emotion prediction model and the machine-learned chat robot model.

Description

System and method for training and using chat robots

Technical Field

The application relates to the technical field of computers, in particular to an emotion placating robot system and method.

Background

In recent years, chat robots, also called conversation systems, have played an important role in our daily lives, particularly in customer service, due to the rapid increase in customer demand. Compared to traditional manual customer service systems, chat robot solutions have many advantages, such as 24/7 availability, instant response, and low labor cost, and thus can be used in many business scenarios, such as Microsoft xiaoiice, Facebook Messenger Bots, and AliMe from arbiba. Typically, customers communicate with customer service departments to resolve the problem, but in many cases they wish to relieve their negative emotions, such as a large dissatisfaction with flight delays or a great pressure due to overtime. Negative emotions can not only affect the customer himself, but can also have a serious impact on customer service personnel, thereby affecting the quality of the overall customer service. Accordingly, it is desirable to provide systems and methods for training and using chat robots that can sooth the negative emotions of customers.

Disclosure of Invention

One aspect of the present application includes a system for training an emotion-soothing chat robot model, comprising a computer-readable storage medium storing executable instructions for training an emotion-soothing chat robot model and at least one processor in communication with the computer-readable storage medium. The executable instructions, when executed, may direct the at least one processor to cause the system to: obtaining a corpus, wherein the corpus comprises at least two message exchange pairs, and the at least two message exchange pairs comprise input messages and response messages; applying one or more machine learning processes to the corpus to train the chat robot model to obtain a machine learning chat robot model, wherein the chat robot model generates a first response message upon being entered with the input message; applying one or more machine learning processes to the corpus to train an emotion prediction model to obtain a machine-learned emotion prediction model, wherein the emotion prediction model generates an emotional state result upon being input into the message exchange pair; and applying one or more machine learning processes to the corpus to train the emotion-pacifying chat robot model to obtain a machine-learned emotion-pacifying chat robot model, wherein the emotion-pacifying chat robot model is constructed based on the machine-learned emotion prediction model and the machine-learned chat robot model, wherein upon input of the input message, the emotion-pacifying chat robot model generates a second response message, the second response message determined based at least in part on an emotional state result produced by the machine-learned emotion prediction model.

In some embodiments, a chat robot model may be constructed based on a sequence-to-sequence model and an attention model. The emotion prediction model may be constructed based on a dual RNN model.

In some embodiments, the format of the input message may include at least one of text, images, sound, and video.

In some embodiments, to apply one or more machine learning processes to the corpus to train the emotion prediction model to obtain the machine-learned emotion prediction model, the at least one processor may be further instructed to cause the system to: for each input message of the at least two message exchange pairs, generating an emotional state result indicative of an emotional estimate of the input message for the input message to obtain a labeled corpus; and applying one or more machine learning processes to the labeled corpus to train an emotion prediction model to obtain a machine-learned emotion prediction model.

In some embodiments, to generate an emotional state result for the input message, the at least one processor may be further instructed to cause the system to: an emotional state result for the input message is generated using an emotional annotator model. The emotion annotator model may be a fusion model built based on at least two emotion estimation models. Upon entering the input message, the emotion annotator model can generate emotional state results for the input message.

In some embodiments, the at least two emotion estimation models may include at least one of a bayesian model and a dictionary-based model. The dictionary-based model may be configured to: classifying at least two target words associated with the emotion into at least two categories representing different types of emotions; filtering the input message to obtain one or more words contained in the target word; and determining the emotional state result for the input message based on one or more types of emotions corresponding to one or more words.

In some embodiments, to apply one or more machine learning processes to the labeled corpus to train the emotion prediction model to obtain a machine-learned emotion prediction model, the at least one processor may be further instructed to cause the system to: for each of the at least two message exchange pairs: obtaining a predicted emotional state result for a next input message of the message exchange pair by inputting the message exchange pair to the emotion prediction model; and obtaining a true emotional state result of the next input message based on the labeled corpus. The at least one processor may be further configured to cause the system to obtain the machine-learned emotional prediction model by adjusting parameters of the emotional prediction model to minimize a difference between the at least two predicted emotional state outcomes and the at least two true emotional states.

In some embodiments, the second response message may include a soothing element that reacts to the emotional element of the input message.

In some embodiments, to apply one or more machine learning processes to a corpus to train an emotion-soothing chat robot model to obtain a machine-learned emotion-soothing robot model, the at least one processor may be further directed to cause the system to: for each of the at least two message exchange pairs: generating a provisional response message by inputting an input message of the message exchange pair into the machine-learned chat robot model; generating a temporally predicted emotional state result relative to a next input message of the message exchange pair by inputting the input message and a temporally responsive message of the message exchange pair to a machine-learned emotional prediction model; determining a first difference between a provisional response message and a true response message included in the message exchange pair; determining a second difference between the temporary predicted emotional state outcome and a target emotional state outcome; and determining a combined difference based on the first difference and the second difference. The at least one processor may be further directed to cause the system to acquire the machine-learned emotion-soothing chat robot by adjusting parameters of the machine-learned chat robot model to minimize a sum of the at least two combined differences.

In some embodiments, to determine a combined difference based on the first difference and the second difference, the at least one processor may be further instructed to cause the system to: the first difference and the second difference are combined according to a predetermined ratio to obtain a combined difference.

According to another aspect of the present application, a method for training an emotion-soothing chat robot model may include: obtaining a corpus, wherein the corpus comprises at least two message exchange pairs, wherein at least one message exchange pair comprises an input message and a response message; applying one or more machine learning processes to the corpus to train the chat robot model to obtain a machine-learned chat robot model, wherein the chat robot model generates a first response message upon being entered with the input message; applying one or more machine learning processes to the corpus to train an emotion prediction model to obtain a machine-learned emotion prediction model, wherein the emotion prediction model produces an emotional state result upon input of the message exchange pair; and applying one or more machine learning processes to the corpus to train an emotion-pacifying chat robot model, wherein the emotion-pacifying chat robot model is constructed based on the machine-learned emotion prediction model and the machine-learned chat robot model, wherein upon input of the input message, the emotion-pacifying chat robot model generates a second response message that is determined based at least in part on the emotional state results generated by the machine-learned emotion prediction model.

According to another aspect of the present application, a non-transitory computer-readable medium may include at least one set of instructions for training an emotion-soothing chat robot model. When executed by at least one processor of an electronic terminal, the at least one set of instructions may instruct the at least one processor to: obtaining a corpus, wherein the corpus comprises at least two message exchange pairs, wherein at least one of the at least two message exchange pairs comprises an input message and a response message; applying one or more machine learning processes to the corpus to train the chat robot model to obtain a machine-learned chat robot model, wherein upon input of the input message, the chat robot model generates a first response message; applying one or more machine learning processes to the corpus to train an emotion prediction model to obtain a machine-learned emotion prediction model, wherein the emotion prediction model produces an emotional state result upon input of the message exchange pair; and applying one or more machine learning processes to the corpus to develop an emotion-pacifying chat robot model, wherein the emotion-pacifying chat robot model is constructed based on the machine-learned emotion prediction model and the machine-learned chat robot model, wherein upon input of the input message, the emotion-pacifying chat robot model generates a second response message, the second response message determined based at least in part on the emotional state results generated by the machine-learned emotion prediction model.

According to another aspect of the present application, a chat robot system may include a computer-readable storage medium storing executable instructions, and at least one processor in communication with the computer-readable storage medium. The executable instructions, when executed, may direct the at least one processor to cause the system to: receiving an input message from an input device, wherein the input message includes an emotive element indicating a negative level of emotion of a user using the input device; applying an emotional placation chat robot model to the input message to generate a response message based on the emotional elements, wherein the response message includes a placation element that reacts to the emotional elements of the input message; and transmits the response message to the output device.

According to another aspect of the present application, a method may include receiving an input message from an input device, wherein the input message includes an emotive element indicating a negative level of emotion of a user using the input device; applying an emotion soothing chat robot model to the input message to generate an emotional element-based response message, wherein the response message includes a soothing element that reacts to the emotional element of the input message; and sends a response message to the output device.

According to another aspect of the present application, a chat robot system may include a computer-readable storage medium storing executable instructions, and at least one processor in communication with the computer-readable storage medium. The executable instructions, when executed, may direct the at least one processor to cause the system to: sending an input message to a processor, wherein the input message includes an emotive element indicating a negative emotive level of the user; and receiving a response message from the processor, wherein the response message is generated by applying an emotion soothing chat robot model to the input message based on the emotional element, and wherein the response message includes a soothing element that reacts to the emotional element of the input message.

According to another aspect of the present application, a method may comprise: sending an input message to a processor, wherein the input message includes an emotive element indicating a negative emotive level of the user; receiving a response message from the processor, wherein the response message is generated by applying an emotion soothing chat robot model to the input message based on the emotional element, and wherein the response message includes a soothing element that reacts to the emotional element of the input message.

Additional features of the present application will be set forth in part in the description which follows. Additional features of some aspects of the present application will be apparent to those of ordinary skill in the art in view of the following description and accompanying drawings, or in view of the production or operation of the embodiments. The features of the present application may be realized and attained by practice or use of the methods, instrumentalities and combinations of the various aspects of the specific embodiments described below.

Drawings

The present application will be further described by way of exemplary embodiments. These exemplary embodiments will be described in detail by means of the accompanying drawings. These embodiments are non-limiting exemplary embodiments in which like reference numerals represent similar structures throughout the several views of the drawings, and wherein:

fig. 1 is a schematic diagram of an exemplary emotion-soothing chat robot (SAC) system, shown in accordance with some embodiments of the present application;

FIG. 2 is a schematic diagram of exemplary hardware and software components of a computing device on which processing engine 112 may be implemented according to some embodiments of the present application;

FIG. 3 is a schematic diagram of exemplary hardware and/or software components of a mobile device on which user terminal 130 may be implemented according to some embodiments of the present application;

FIG. 4 is a block diagram of an exemplary processing engine of a server shown in accordance with some embodiments of the present application;

5A-5D are schematic diagrams of exemplary models used in the present application, shown in accordance with some embodiments of the present application;

FIG. 6 is a flow diagram of an exemplary process for training a SAC model, shown in accordance with some embodiments of the present application;

FIG. 7 is a schematic diagram of an exemplary architecture of a recurrent neural network, shown in accordance with some embodiments of the present application;

FIG. 8 is an exemplary chat robot model, shown in accordance with some embodiments of the present application;

FIG. 9 illustrates an exemplary architecture for combining a machine learning chat robot model and a machine learning SP model;

FIG. 10 is a flow diagram of an exemplary process of training an SP model, shown in accordance with some embodiments of the present application;

FIG. 11 is an architecture of an exemplary SP model shown in accordance with some embodiments of the present application;

FIG. 12 is a flow diagram of an exemplary process for operating a SAC model, shown in accordance with some embodiments of the present application; and

FIG. 13 is a flow diagram of an exemplary process for operating a SAC model, shown in accordance with some embodiments of the present application.

Detailed Description

The following description is presented to enable one of ordinary skill in the art to make and use the invention and is provided in the context of a particular application and its requirements. It will be apparent to those of ordinary skill in the art that various changes can be made to the disclosed embodiments and that the general principles defined in this application can be applied to other embodiments and applications without departing from the principles and scope of the application. Thus, the present application is not limited to the described embodiments, but should be accorded the widest scope consistent with the claims.

The terminology used in the description presented herein is for the purpose of describing particular example embodiments only and is not intended to limit the scope of the present application. As used herein, the singular forms "a", "an" and "the" may include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, components, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, components, and/or groups thereof.

These and other features, aspects, and advantages of the present application, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description of the accompanying drawings, all of which form a part of this specification. It is to be understood, however, that the drawings are designed solely for the purposes of illustration and description and are not intended as a definition of the limits of the application. It should be understood that the drawings are not to scale.

Flow charts are used herein to illustrate operations performed by systems according to some embodiments of the present application. It should be understood that the operations in the flow diagrams may be performed out of order. Rather, various steps may be processed in reverse order or simultaneously. Also, one or more other operations may be added to the flowcharts. One or more operations may also be deleted from the flowchart.

Further, while the systems and methods herein are described primarily with respect to training and using an emotion placating chat robot (SAC) model in the context of customer service, it should also be understood that this is but one exemplary embodiment. The systems and methods in this application may be applied to any other scenario where it may be desirable to use an emotion-placating chat robot (SAC) model. For example, the systems and methods of the present application may be applied in different scenarios including help desk, website navigation, guided sales, technical support, etc., or any combination thereof.

Customer service may involve responding to customer questions about products and services, such as answering questions about applying for asking the price of a mobile phone. The help desk may be related to answering internal employee questions, for example, answering human resources questions. Web site navigation may involve directing a customer to a relevant portion of a complex web site. Directing sales may involve providing answers and guidance during the sales process, particularly for complex products sold to novice customers. Technical support may solve technical problems, such as diagnostic equipment problems.

In commerce, explicit communication may be critical to acquiring, serving, and retaining customers. Companies typically introduce potential customers to their products and services while improving customer satisfaction and customer retention by making explicit knowledge of customer needs. However, in some cases, customers often become frustrated, such as by conducting a search through a website with no results, waiting a long time for a call queue to talk to a customer service representative, or delaying an email reply for several days. Customers may complain about and may angry send messages to the service provider. Therefore, it is highly desirable to promote quality of response to customer messages in the area of calming negative emotions.

The emotional placation chat robot (SAC) model may take into account the mood of the customer. When generating the response message, the SAC model may take into account the accuracy of the response message relative to the client message, and take into account the client's experience when reading the response message.

One aspect of the present application relates to systems and methods for SAC models. To do so, a processor of the system may obtain a corpus. The processor may further apply one or more machine learning processes to the corpus to train the chat robot model to obtain a machine-learned chat robot model and an emotion prediction model. The processor may construct a SAC model based on the machine-learned chat robot model and the machine-learned SP model, and then apply one or more machine learning processes to the corpus to train the SAC model.

After training the SAC model, the processor or a processor of another system may interact with a user terminal including an input device and an output device using the machine-learned SAC model. The processor obtains the input message directly or indirectly through the input device. The processor may operate the machine-learned SAC model to generate a response message based on the input message. The response message may include a soothing element that may react to the emotional element of the input message. The processor may further send, directly or indirectly, the response message to the output device.

Fig. 1 is a schematic diagram of an exemplary emotion-soothing chat robot (SAC) system, shown in accordance with some embodiments of the present application. The SAC system 100 may include a server 110, a network 120, a user terminal 130, and a memory 160. The server 110 may include a processing engine 112.

In some embodiments, the server 110 may be a single server or a group of servers. The set of servers can be centralized or distributed (e.g., the servers 110 can be a distributed system). In some embodiments, the server 110 may be local or remote. For example, server 110 may access messages and/or data stored in user terminal 130 or memory 160 via network 120. As another example, server 110 may be coupled to user terminal 130 and/or memory 160 to access stored messages and/or data. In some embodiments, the server 110 may be implemented on a cloud platform. By way of example only, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an internal cloud, a multi-tiered cloud, and the like, or any combination thereof. In some embodiments, server 110 may execute on computing device 200 described in FIG. 2, which includes one or more components.

In some embodiments, the server 110 may include a processing engine 112. Processing engine 112 may process messages and/or data related to an incoming message to perform one or more functions described herein. For example, processing engine 112 may generate a response message based on the input message. In some embodiments, the processing engine 112 may comprise one or more processing engines (e.g., a single chip processing engine or a multi-chip processing engine). By way of example only, the processing engine 112 may include one or more hardware processors, such as a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), an application specific instruction set processor (ASIP), an image processing unit (GPU), a physical arithmetic processing unit (PPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), a Programmable Logic Device (PLD), a controller, a microcontroller unit, a Reduced Instruction Set Computer (RISC), a microprocessor, or the like, or any combination thereof.

Network 120 may facilitate the exchange of messages and/or data. In some embodiments, one or more components of the SAC system 100 (e.g., the server 110, the user terminal 130, and the memory 160) may send messages and/or data over the network 120 via other components in the SAC system 100. For example, server 110 may receive an incoming message from user terminal 130 via network 120. In some embodiments, the network 120 may be any form of wired or wireless network, or any combination thereof. By way of example only, network 120 may include a cable network, a wired network, a fiber optic network, a telecommunications network, an intranet, the Internet, a Local Area Network (LAN), a wide area network

(WAN), Wireless Local Area Network (WLAN), Metropolitan Area Network (MAN), Public Switched Telephone Network (PSTN), Bluetooth network, ZigBee network, Near Field Communication (NFC) network, the like, or any combination thereof. In some embodiments, network 120 may include one or more network access points. For example, the network 120 may include wired or wireless network access points, such as base stations and/or internet exchange points 120-1, 120-2, through which one or more components of the SAC system 100 may connect to the network 120 to exchange data and/or messages therebetween.

In some embodiments, the user terminal 130 may include a mobile device 130-1, a tablet computer 130-2, a laptop computer 130-3, a Personal Computer (PC)130-4, or the like, or any combination thereof. In some embodiments, user terminal 130 may include an input device and an output device. The user terminal may interact with the processing engine 112. For example, an input device of the user terminal 130 may send a message to the processing engine 112, and an output device of the user terminal 130 may receive a response message from the processing engine 112. In some embodiments, mobile device 130-1 may include a wearable device, a smart mobile device, a virtual reality device, an augmented reality device, and the like, or any combination thereof. In some embodiments, the wearable device may include a smart bracelet, smart footwear, smart glasses, smart helmet, smart watch, smart clothing, smart backpack, smart accessory, or the like, or any combination thereof. In some embodiments, the smart mobile device may include a smart phone, a Personal Digital Assistant (PDA), a gaming device, a navigation device, a point of sale (POS), etc., or any combination thereof. In some embodiments, the virtual reality device and/or the enhanced virtual reality device may include a virtual reality helmet, virtual reality glasses, virtual reality eyecups, augmented reality helmets, augmented reality glasses, augmented reality eyecups, and the like, or any combination thereof. For example, the virtual reality device and/or augmented reality device may include Google glass, RiftCon, FragmentsTM, Gear VRTM, and the like.

Memory 160 may store data and/or instructions. In some embodiments, memory 160 may store data retrieved from user terminal 130. In some embodiments, storage 160 may store data and/or instructions for execution or causing by server 110 to perform the example methods described herein. In some embodiments, storage 160 may include mass storage, removable storage, volatile read-write memory, read-only memory (ROM), and the like, or any combination thereof. Exemplary mass storage devices may include magnetic disks, optical disks, solid state disks, and the like. Exemplary removable memory may include flash drives, floppy disks, optical disks, memory cards, compact disks, magnetic tape, and the like. Exemplary volatile read and write memory can include Random Access Memory (RAM). Exemplary RAM may include Dynamic Random Access Memory (DRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), Static Random Access Memory (SRAM), thyristor random access memory (T-RAM), and zero capacitance random access memory (Z-RAM), among others. Exemplary read-only memories may include mask read-only memory (MROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM), digital versatile disc read-only memory (dvd-ROM), and the like. In some embodiments, the memory 160 may be implemented on a cloud platform. By way of example only, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an internal cloud, a multi-tiered cloud, and the like, or any combination thereof.

In some embodiments, memory 160 may be connected to network 120 to communicate with one or more components of SAC system 100 (e.g., server 110, user terminal 130). One or more components in the SAC system 100 may access data or instructions stored in the memory 160 via the network 120. In some embodiments, memory 160 may be directly connected to or in communication with one or more components in SAC system 100 (e.g., server 110, user terminal 130). In some embodiments, memory 160 may be part of server 110.

In some embodiments, one or more components of the SAC system 100 (e.g., server 110, user terminal 130) may access the memory 160. In some embodiments, one or more components of the SAC system 100 may read and/or modify messages related to the user and/or the public when one or more conditions are satisfied. For example, server 110 may read and/or modify one or more users' messages during a conversation.

Those of ordinary skill in the art will appreciate that when a component of the SAC system 100 executes, the component may execute via electrical and/or electromagnetic signals. For example, when the user terminal 130 processes a task such as transmitting/receiving a message from the server 110, the user terminal 130 may operate a logic circuit in its processor to process such a task. When user terminal 130 sends a message to server 110, a processor serving user terminal 130 may generate an electrical signal encoding the user message. The processor of the user terminal 130 may then send the electrical signal to the output port. If user terminal 130 communicates with server 110 via a wired network, the output port may be physically connected to a cable, which may also send electrical signals to the input port of server 110. If user terminal 130 communicates with server 110 via a wireless network, the output port of user terminal 130 may be one or more antennas that may convert electrical signals to electromagnetic signals. Within an electronic device, such as user terminal 130 and/or server 110, when its processor processes instructions, issues instructions, and/or performs actions, the instructions and/or actions are performed by electrical signals. For example, when the processor retrieves or stores data from a storage medium (e.g., memory 160), it may send electrical signals to the storage medium's read/write device, which may read or write structured data in the storage medium. The structured data may be transmitted to the processor in the form of electrical signals via a bus of the electronic device. Herein, an electrical signal may refer to one electrical signal, a series of electrical signals, and/or at least two discrete electrical signals.

FIG. 2 is a schematic diagram of exemplary hardware and software components of a computing device on which processing engine 112 may be implemented according to some embodiments of the present application. As shown in FIG. 2, computing device 200 may include a processor 210, memory 220, input/output (I/O)230, and communication ports 240.

The processor 210 (e.g., logic circuitry) may execute computer instructions (e.g., program code) and perform the functions of the processing engine 112 in accordance with the techniques described herein. For example, the processor 210 may include an interface circuit 210-a and a processing circuit 210-b therein. The interface circuit may be configured to receive electronic signals from a bus (not shown in fig. 2), where the electronic signals encode structured data and/or instructions for the processing circuit to be able to process. The processing circuitry may perform logical computations and then determine the conclusion, result, and/or instruction encoding as electrical signals. The interface circuit may then send the electrical signal from the processing circuit via the bus.

Computer instructions may include, for example, routines, programs, objects, components, data structures, procedures, modules, and functions that perform particular functions described herein. In some embodiments, processor 210 may include one or more hardware processors, such as microcontrollers, microprocessors, Reduced Instruction Set Computers (RISC), Application Specific Integrated Circuits (ASICs), application specific instruction set processors (ASIPs), Central Processing Units (CPUs), Graphics Processing Units (GPUs), Physical Processing Units (PPUs), microcontroller units, Digital Signal Processors (DSPs), Field Programmable Gate Arrays (FPGAs), high-order RISC machines (ARMs), Programmable Logic Devices (PLDs), any circuit or processor capable of executing one or more functions, or the like, or any combination thereof.

For illustration only, only one processor is depicted in computing device 200. It should be noted, however, that the computing device 200 may also include multiple processors, and that operations and/or method steps performed thereby, such as one processor described herein, may also be performed by multiple processors, either in conjunction or separately. For example, if in the present application, the processors of computing device 200 perform steps a and B, it should be noted that steps a and B may also be performed jointly or independently by two or more different processors of computing device 200 (e.g., a first processor performs step a, a second processor performs step B, or a first and second processor performs steps a and B jointly).

The memory 220 may store data/messages retrieved from the user terminal 130, the memory 160, and/or any other component of the SAC system 100. In some embodiments, memory 220 may include mass storage, removable storage, volatile read and write memory, Read Only Memory (ROM), and the like, or any combination thereof. For example, mass storage may include magnetic disks, optical disks, solid state drives, and so forth. Removable memory may include flash drives, floppy disks, optical disks, memory cards, compact disks, magnetic tape, and the like. Volatile read and write memory can include Random Access Memory (RAM). RAM may include Dynamic RAM (DRAM), double-data-rate synchronous dynamic RAM (DDR SDRAM), Static RAM (SRAM), thyristor RAM (T-RAM), zero-capacitance (Z-RAM), and the like. An exemplary read-only memory may include a mask type read-only memory

(MROM), programmable read-only memory (PROM), erasable programmable read-only memory

(EPROM), electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM), digital versatile disc read-only memory, and the like. In some embodiments, storage 220 may store data and/or instructions that are used by a server to perform or to use to perform the exemplary methods described in this application. For example, memory 220 may store programs for processing engine 112 for training and using SAC models.

I/O230 may input and/or output signals, data, messages, etc. In some embodiments, I/O230 may enable a user to interact with processing engine 112. In some embodiments, I/O230 may include input devices and output devices. Examples of input devices may include a keyboard, mouse, touch screen, microphone, etc., or a combination thereof. Examples of output devices may include a display, speakers, printer, projector, etc., or a combination thereof. Exemplary display devices may include Liquid Crystal Displays (LCDs), Light Emitting Diode (LED) based displays, flat panel displays, curved displays, television devices, Cathode Ray Tubes (CRTs), and the like, or any combination thereof.

The communication port 240 may be connected to a network (e.g., network 120) to facilitate data communication. The communication port 240 may establish a connection between the processing engine 112, the user terminal 130, or the memory 160. The connection may be a wired connection, a wireless connection, any other communication connection that may enable data transmission and/or reception, and/or any combination of these connections. The wired connection may include, for example, an electrical cable, an optical cable, a telephone line, etc., or any combination thereof. The wireless connection may include, for example, a bluetooth link, a Wi-Fi link, a WiMax link, a WLAN link, a ZigBee link, a mobile network link (e.g., 3G, 4G, 5G, etc.), and the like, or combinations thereof. In some embodiments, the communication port 240 may be and/or include a standardized communication port, such as RS232, RS485, and the like.

Fig. 3 is a schematic diagram of exemplary hardware and/or software components of a mobile device on which user terminal 130 may be implemented according to some embodiments of the present application. As shown in FIG. 3, mobile device 300 may include a communication platform 310, a display 320, a Graphics Processing Unit (GPU)330, a Central Processing Unit (CPU)340, I/O350, memory 360, and storage 390. In some embodiments, any other suitable component, including but not limited to a system bus or a controller (not shown), may also be included in mobile device 300. In some embodiments, the operating system 370 (e.g., Ios)^TM、Androi^TM、Windows Phon^TMEtc.) and one or more applications 380 may be downloaded from storage 390 to memory 360 and executed by CPU 340. Application 380 may include a browser or any other suitable mobile application for receiving response messages from server 110. User interaction with the message stream may be enabled through I/O350 and provided to processing engine 112 and/or other components of SAC system 100 via network 120.

To implement the various modules, units, and their functions described herein, a computer hardware platform may be used as the hardware platform for one or more of the components described herein. A computer with user interface components may be used to implement a Personal Computer (PC) or any other type of workstation or terminal device. If programmed properly, the computer may also act as a server.

FIG. 4 is a block diagram of an exemplary processing engine of a server shown in accordance with some embodiments of the present application. In some embodiments, processing engine 112 may include a data collection module 410, a chat robot module 420, an emotion annotator module 430, an emotion prediction module 440, a SAC module 450, and a response delivery module 460. These modules may also be implemented as an application or set of instructions that are read and executed by the processing engine 112. Further, a module may be any combination of hardware circuitry and applications/instructions. For example, a module may be part of processing engine 112 when the processing engine executes an application/set of instructions.

The data collection module 410 may retrieve data from one or more components in the system 100, such as the user terminal 130 or the memory 160. For example, the data acquisition module 410 can retrieve a corpus from the memory 160. As another example, the data collection module 410 may obtain an input message sent from the user terminal 130 or other input device.

Chat robot module 420 may apply one or more machine learning processes to the corpus to train chat robot models to obtain machine-learned chat robot models. The chat bot model can be stored in memory 160 and can be invoked by chat bot module 420 when needed. Once the input message is fed in, the chat robot model can generate a response message based on the input message. Details regarding training the chat robot model can be found elsewhere in this application (e.g., operation 604 in FIG. 6).

Emotion annotator module 430 can generate an emotional state result based on the input message. In some embodiments, the emotional state result of the input message may be generated based on an emotional annotator model. Once the input message is fed in, the AS model may generate an emotional state result of the input message. Details regarding the retrieval of the tagged corpus by emotion annotator module 430 can be found elsewhere in the application (e.g., operation 1002 in FIG. 10).

Emotion prediction module 440 may apply one or more machine learning processes to the corpus to train emotion predictor (SP) models to obtain machine-learned SP models. In some embodiments, the corpus used to train the SP models may be a pre-labeled corpus that is different from the corpus used to train the chat robot models. The SP models may be stored in memory 160 and may be invoked by emotion prediction module 440 when needed. For example, once entered into a message exchange pair comprising an input message and a response message, the emotional prediction model may generate an emotional state result for the next input message. The SP model may include a parameter defined as phi. The emotion prediction module 440 may adjust parameters of the SP model to obtain a machine-learned SP model. Details regarding the training of the SP model by the emotion prediction module may be found elsewhere in this application (e.g., operation 606 in FIG. 6 and operation 1004 in FIG. 10).

SAC module 450 applies one or more machine learning processes to the corpus to train SAC models to obtain machine-learned SAC models. The SAC model is constructed based on a machine-learned chat robot model and a machine-learned SP model. Once input to the input message, the machine-learned SAC model may generate a response message that takes into account the user's mood. Details regarding training of the SAC model by the SAC module may be found elsewhere in this application (e.g., operation 608 in fig. 6). In some embodiments, SAC module 450 may be configured to apply a machine-learned SAC model to an input message from an input device to generate an emotional-soothing response message. Details regarding the generation of the emotional-soothing response message may be found elsewhere in this application (e.g., operation 1204 in fig. 12).

The response delivery module 460 may be configured to transmit the emotion-soothing response message generated by the SAC module 450 to the user terminal 130 or other output device. For example, system 100 may be a local system and processing engine 112 may receive an input message from an input device of user terminal 130. After generating the response message, the processing engine 112 may send it to an output device of the user terminal.

It should be noted that the above description of processing engine 112 is provided for illustrative purposes only and is not intended to limit the scope of the present application. Various modifications and changes may occur to those skilled in the art in light of the description herein. For example, processing engine 112 may also include a storage module that facilitates data storage. However, such changes and modifications do not depart from the scope of the present application.

Fig. 5A-5D are schematic diagrams of exemplary models used in the present application, shown according to some embodiments of the present application. Four types of models, including: the chat robot model, emotion annotator model, emotion prediction model, and emotion-soothing chat robot (SAC) model are four basic models in the proposed technology. Each of the four types of basic models may be constructed based on various architectures, such as a Recurrent Neural Network (RNN) model, a Convolutional Neural Network (CNN) model, or the like, or a combination thereof. The following description of the four basic models may present one or more possible architectures, such as RNNs, for illustration purposes. It should be noted that any other architecture that can achieve the same functionality of the four basic models can also be included in the present application.

Fig. 5A illustrates a chat robot model. The chat bot model can be stored in memory 160 and can be invoked by processing engine 112 as needed. For example, when an input device sends a message to processing engine 112, processing engine 112 can invoke the chat bot model to generate a response message based on the input message and further transmit it to an output device. For the chat robot model, once an input message is fed in, the chat robot model can generate a response message from the input message. If the chat robot is trained, the response message may be strongly correlated with the input message. For example, if the incoming message is a question, the response message may be the answer to the question. As another example, if the incoming message is a request message, the response message may be an acknowledgement message regarding acceptance of the request.

Fig. 5B shows an emotion annotator model. The mood annotator model can be stored in the memory 160 and can be invoked by the processing engine 112 when needed. For example, during the training process of any one of the four basic models, a corpus comprising at least two dialogues may be required, e.g., training of an emotion prediction model. Messages contained in at least two conversations may need to be tagged with certain emotional state results. As used herein, an emotional state result may refer to an estimate of the emotion of an input message, which may reflect a prediction of an emotional condition or message. The emotional state results may be numbers, symbols, descriptions, or any other form that may be used to distinguish one from another. For example, the emotional state result may be a value located in [0,1], with 1 referring to the most positive emotions and 0 referring to the most negative emotions. As another example, the emotional state result may be a textual description related to the emotion. A positive emotion may correspond to "good" text, while a negative emotion may correspond to "bad" text. The emotional annotator model is operable to generate emotional state results for the messages in the at least two conversations. For example, once an input message is fed in, the emotion annotator model may generate an emotional state result for the input message. In some embodiments, the emotion annotator model may be a fusion model constructed based on at least two emotion estimation models (e.g., bayesian models, dictionary-based models, etc.). Details regarding the mood annotator model can be found elsewhere in the application (e.g., fig. 10 and its description).

Fig. 5C shows an emotion prediction (SP) model. The SP model may be stored in memory 160 and may be invoked by processing engine 112 as needed. As used herein, an emotion prediction model is used to predict the emotional state outcome of future input messages that a user may send. For example, once a message exchange pair comprising an input message and a response message is fed, the emotion prediction model may generate a predicted emotional state result for the next input message. In some embodiments, a response message corresponding to the received customer message may be sent by the service provider. The predicted emotional state result may reflect to some extent whether the customer is satisfied with the service provider's response message. In other words, the predicted emotional state results may be used to test the quality of the response message for the service provider, for which it may be important to improve customer satisfaction.

Fig. 5D illustrates an emotion soothing chat robot (SAC) model. The SAC model may be stored in memory 160 and may be called by processing engine 112 as needed. The SAC model may be constructed based on a chat robot model and an emotion prediction model. Once fed into the input message, the SAC model may generate a response message. However, the response message generated by the SAC model may consider the emotion of the user compared to the chat robot model because the emotion prediction model is included in the SAC model. In some embodiments, the input message may include an emotive element indicating a negative level of emotion of the user. The SAC model may generate a response message based on the input message and the emotive element. In this case, the response message may soothe the user's negative emotions. In some embodiments, the emotive element may include some characteristic words capable of describing the user's emotion (e.g., dead, great, dissatisfied, etc.). The appeasing element may include some characteristic words or phrases that may be used to calm down the mood (e.g., apology, thank you, etc.).

In some embodiments, the response message generated by the SAC model may be converted into a different language relative to the input message. For example, when a foreign client unfamiliar with Chinese sends a complaint message using Chinese, the SAC model may generate and translate a response message into the client's native language. Details about SAC models can be found elsewhere in this application (e.g., fig. 6 and 9, and descriptions thereof).

FIG. 6 is a flow diagram of an exemplary process for training a SAC model, shown in accordance with some embodiments of the present application. Process 600 may be performed by SAC system 100. For example, process 600 may be implemented as a set of instructions (e.g., an application program) stored in a storage device (e.g., storage device 160). Processor 210 and/or the modules in fig. 4 may execute a set of instructions and, when executing the instructions, processor 210 and/or the modules may be configured to perform process 600. The operations of the illustrated process presented below are intended to be illustrative. In some embodiments, process 600 may, when implemented, add one or more additional operations not described, and/or subtract one or more operations described herein. Additionally, the order in which the process operations are illustrated in FIG. 6 and described below is not intended to be limiting.

As shown in fig. 6, at 602, processing engine 112 may obtain a corpus. The processing engine 112 may access the memory 160 via the network 120 to retrieve a corpus. The corpus can be used to train SAC models. In some embodiments, the corpus may include at least two dialogs. For example, the corpus may include M dialogs D between the customer and the service provider^[1],…,D^[M]. Each dialog D^[m]At least two message exchange pairs may be included. A message exchange pair refers to a conversation between a client and a service provider. For example, dialog D^[m]May include N_mA group message exchange pair, which may be denoted as

As herein describedAs used, x refers to an incoming message and y refers to a response message relative to the incoming message. In some embodiments, the input message or the response message includes various data formats (e.g., text, images, video, sound, symbols, etc.). In some embodiments, the input message may include at least two sentences. For example, a customer may complain to a service provider by sending several messages to the service provider's processor. These messages may be contained in a single incoming message. In the context of the present application, it is,

can be used to express the length as T_xIn which x is₁A flag representing the incoming message at time t. In a similar manner to that described above,

representing a response message from the service provider. The response message from the service provider may be generated by the chat robot or a person. Based on the above description, the total number of message exchange pairs in the corpus may be

In some embodiments, a corpus may be obtained from a database. Redundant data that may not be used in the corpus may be included. Thus, the processing engine 112 may perform a filtering process on the database to obtain a corpus. For example, an exemplary filtering process may include the following five steps.

At step one, the processing engine 112 may filter out all imperceptible or sensitive messages, e.g., cell numbers, birthdays, etc., and remove all corrupted or non-textual conversations.

In step two, the processing engine 112 may combine and discard duplicate and redundant messages from the customer and customer service.

In step three, processing engine 112 may process each dialog to<x_n,y_n>The format of the pairs, and merging consecutive posts (also known as input messages) or response messages (published by the same person) into a singleAn incoming message or a response message. Processing engine 112 may remove the data having a total number less than 2<x_n,y_n>Dialog to (i.e., group).

At step four, the processing engine 112 may segment all messages into words. An exemplary segmentation algorithm for Chinese characters may include "jieba," which may include three types of segmentation modes, including an exact mode, a full mode, and a search engine mode. The exact mode tries to cut the sentence into the most accurate segmentations, which is suitable for text analysis. The full mode takes all possible words from the sentence. An accurate mode based search engine mode attempts to cut a long word into several short words, which may improve recall. "Jieba" can implement efficient word graph scanning based on a prefix dictionary structure. "Jieba" can build a Directed Acyclic Graph (DAG) for all possible word combinations. "Jieba" may use dynamic programming to find the most likely combination based on word frequency. For unknown words, HMM-based models are used with the viterbi algorithm.

In step 5, the processing engine 112 may count the frequency of each word and divide the rare words (frequency ≦ 3) into characters. Processing engine 112 may further recalculate the frequency of all tokens (i.e., words and characters) and replace rare tokens with rare labels (frequency ≦ 3).

Based on the filtering process, the processing engine 112 may retrieve a corpus. Example statistics for a corpus of Chinese characters are shown in Table 1. The corpus may be shuffled and divided into several groups for training, validation and testing, respectively.

Table 1: statistics of corpus

In 604, processing engine 112 may apply one or more machine learning processes to the corpus to train the chat robot model to obtain a machine-learned chat robot model. In some embodiments, at least a portion of the corpus obtained in operation 602 may be used to train a chat robot model. In some embodiments, the chat robot model may be built on top of a sequence-to-sequence (seq2seq) model and an attention model. The exemplary seq2seq model and attention model that may be used to construct the chat robot model disclosed in the present application are for illustration purposes and are not intended to limit the scope of the present application.

In some embodiments, the chat robot model may be built on top of a seq2seq model containing an encoder-decoder architecture. The basic sequence-to-sequence model consists of two Recurrent Neural Networks (RNNs): an encoder that processes an input and a decoder that generates an output. The basic architecture is shown in fig. 7. Each circle in fig. 7 represents a cell of the RNN. The encoder and decoder may share weights or use different sets of parameters.

In the seq2seq model, the length of an input message x can be represented as T_xAnd the length of the response message may be represented as T_y. The encoder may map the input message

To the context vector c. Response message

May be generated by the decoder from the context vector c. The seq2seq model may model conditional probability p (y-x), which may be decomposed as in equation (1) below:

in some embodiments, the encoder of the seq2seq model may be a Recurrent Neural Network (RNN). RNN may be a neural network consisting of a hidden state h and an optional output y, which is set at a variable length sequence x ═ x (x)₁,…x_T) And (4) running. At each time step t, hidden state h of RNN_(t)From h_(t)＝f(h_(t-1),x_t) Updating, wherein f is a non-linear activation function. The RNN can learn the probability distribution over the sequence by training to predict the next symbol in the sequence. In this case, the output t at each time step is the conditional distribution p (x)_t|x_t-1,…,x₁)。

In this application, the hidden state of the RNN is represented by h as the encoder sequentially passes through each symbol of the input message x_t＝f(h_t-1,x_t) And (6) updating. In some embodiments, the decoder may also be an RNN model. The decoder can be trained to pass through equation h_t＝f(h_t-1,y_t-1C) predicting a given hidden state h step by step_tEach y of_tA response message y is generated. Decode each y_tCan be parameterized based on equation (2) as follows:

p(y_t|y₁,…,y_t-1,s)＝g(h_t,y_t-1,c),(2)

where g is a non-linear activation function, such as a softmax function.

To train the seq2seq model, the encoder and decoder can be jointly trained to maximize the conditional log likelihood based on equation (3), as follows

Where N represents the total number of message exchange pairs in the corpus.

In some embodiments, the stacked RNN architecture may also be used to construct a chat robot model. The stacked RNN may include a Long Short Term Memory (LSTM) unit (or block), which is a building unit for layers of the Recurrent Neural Network (RNN). The stacked RNN may include two different LSTM: one for the input sequence and one for the output sequence, which may increase the number of model parameters with negligible computational cost, and may naturally train LSTM on multiple language pairs at the same time.

Gated Repeat Units (GRUs) can be used to construct encoders and decoders. GRUs are related to LSTM (long-term short-term memory), but both use different gating mechanisms to prevent long-range dependency problems. The GRU can disclose the complete hidden content without any control. The GRU has two gates, a reset gate r and an update gate z. Intuitively, the reset gate determines how to combine the new input with the previous memory and the update gate defines how much of the previous memory is to be preserved.

In some embodiments, an attention model may be integrated into the seq2seq model to address alignment issues in the encoder-decoder architecture. Instead of decoding the encoded input message x with a fixed context vector c, the attention model may link the current decoding time step to find the most relevant part of the input message x to the current decoding state. For example,

can represent a hidden state from the encoder, and

indicating the decoding hidden state. The attention model may map the current decoding state h to_tWith each input state

And a weight vector a_tt′And linking. Weight vector a_tt′May be derived based on various scoring functions (e.g., global attention model, local attention model, etc.).

In some embodiments, the dot product between two vectors, i.e.,

can be used as a scoring function, and the weight vector a_tt′Can be determined as

Given weight vector a_tt′Attention vector c for decoding at step t_tIs determined as a weighted average along all input hidden states

Focusing on hidden states

Can be prepared from

Is generated, [ h ]_t,c_t]Is a join operation on the current decoded hidden state and the vector of interest. Then will be

Is fed into the softmax function to obtain a predicted distribution of

Attention mechanisms can be used to improve the performance of the seq2seq model. By applying the attention mechanism in the decoder. The decoder decides the portion of the source sentence that is to be focused on. By having the decoder with an attention mechanism, the encoder can relieve the burden of having to encode all the messages in the source sentence into fixed-length vectors, which can improve the performance of the seq2seq model.

Referring back to 604, fig. 8 illustrates an exemplary chat bot model, shown in accordance with some embodiments of the present application. In the chat robot model, the encoder may convert an input message x into a continuous representation vector c. The decoder may decode c to generate a response message y. In some embodiments, the encoder and decoder may be a 2-layer GRU architecture. A layer 2 GRU can be developed by adding a second GRU layer that captures higher level feature interactions between different time steps. The attention model may be a dot product of a scoring attention model.

Both messages x and y may pass through the same embedding layer before they are fed to the GRU. The chat robot model can estimate conditionsProbability p_θ(y | x) in order to maximize the conditional log-likelihood defined in equation (1). For example only, the chat robot model may generate a predictive response message once fed into the input message. The predicted response may be expressed as

Wherein

Representing a prediction word. Predictive word

Can be expressed as

And the real character y_tCan be represented as o_t. Message exchange Pair < x, y>May be

And o_tThe cross quotient between, which can be parameterized as equation (4) as follows:

where θ represents a parameter of the chat robot model.

Thus, the total loss of the corpus may be the sum of the losses of all message exchange pairs included in the corpus, as shown in equation (5) below:

wherein the content of the first and second substances,

refers to the loss function l (θ) of the message exchange pair at the nth group and mth dialog box. For the decoder, a teacher forcing algorithm may be used during training. The teacher forcing algorithm is a method for quickly and effectively training the recurrent neural network model. The teacher enforces the current time step y by using the current time step from the training data set_(t)As the next time step x_(t+1)Rather than the network generated output. It is a network training method and is crucial to develop deep learning language models for machine translation, text summarization and image captions, as well as many other applications.

Processing engine 112 may adjust a parameter θ of the chat robot model to minimize the overall loss of the corpus to obtain a machine-learned chat robot model. The machine-learned chat robot model can be used as a baseline model for the SAC model.

After the training process of the chat robot model, the machine-learned chat robot model may be evaluated using a beam search algorithm with a size k-5. The beam search algorithm is a heuristic search algorithm that explores the graph by expanding the most promising nodes in a finite set. The beam search is an optimization of the best first search, which can reduce its memory requirements. The best first search is a graph search that ranks all partial solutions (states) according to some heuristic that attempts to predict how close the partial solutions are to the complete solution (target state). But only a predetermined number of best partial solutions are kept as candidates in the beam search. It is a greedy algorithm.

In 606, the processing engine 112 may apply one or more machine learning processes to the corpus to train emotion predictor (SP) models to obtain machine-learned SP models. In some embodiments, at least a portion of the corpus acquired in operation 602 may be used to train the SP model. The corpus used to train the SP models can be different from a corpus (e.g., a pre-labeled corpus) used to train the chat robot models. The SP model may include a parameter defined as phi. The processing engine 112 may adjust the parameters of the SP model phi to obtain a machine-learned SP model. After training the SP model, once fed to the message exchange pair < x, y >, the SP model may generate a predicted emotional state result for the next input message. The predicted emotional state result may indicate whether the response message in the message exchange pair may soothe the mood of the customer. By utilizing this function of the SP model, the SAC model including the SP model can generate a response message, not only accurately, but also taking into account the emotion of the client. Details regarding SP model training can be found elsewhere in the present application (e.g., fig. 10 and 11, and descriptions thereof).

In 608, the processing engine 112 may apply one or more machine learning processes to the corpus to train SAC models to obtain machine-learned SAC models. In some embodiments, at least a portion of the corpus acquired in operation 602 may be used to train a SAC model. The corpus used to train the SAC model may be different from the corpus used to train the chat robot model and the SP model. The SAC model is constructed based on a machine-learned chat robot model and a machine-learned SP model. The response message generated by the SAC model may be generated based at least in part on the emotional state result generated by the SP model. By including a chat robot model and an SP model, emotional state results of response messages generated by a SAC model can be facilitated once an input message is given.

Fig. 9 illustrates an exemplary architecture for a combined machine-learned chat robot model and machine-learned SP model. In message exchange pair

As an example, a message is input

Chat robot model that can be fed into a machine study to obtain provisional response messages

The processing engine 112 may input messages

And interim response message

Feed to machine learned SP model to obtain next input message

Temporal prediction of emotional state outcomes

To placate the customer's emotion, processing engine 112 may adjust parameters of the machine-learned chat robot model to facilitate temporally predicting emotional state outcomes

In some embodiments, the predicted emotional state outcome is at [0,1]]And 1 refers to the most positive emotion and 0 refers to the most negative emotion. Thus, to facilitate the temporary predicted emotional state outcome, processing engine 112 may adjust parameters of the chat robot model to minimize a Mean Square Error (MSE) between the temporary predicted emotional state outcome and 1 (the most positive score).

Considering that the response message should also have the correct syntax and be related to the input message, the processing engine 112 may combine the MSE loss with the chat robot objective function of equation (4) to obtain the total loss function of the SAC model, as shown in equation (6):

wherein the content of the first and second substances,

defined in equation (4), λ represents a hyper-parameter, representing how much the SP model has an effect on the combination of total loss functions. As shown in equation (6), the processing engine 112 may adjust the parameters of the machine learning to minimize the total loss function to obtain the SAC model of the machine learning.

It should be noted that the foregoing is provided for illustrative purposes only and is not intended to limit the scope of the present application. It will be apparent to those skilled in the art that various changes and modifications can be made in the light of the description of the application. However, such changes and modifications do not depart from the scope of the present application. For example, operation 606 of training the SP model may be performed before operation 604 of training the chat robot model.

FIG. 10 is a flow diagram of an exemplary process for training an SP model, shown in accordance with some embodiments of the present application. Process 1000 may be performed by SAC system 100. For example, process 1000 may be implemented as a set of instructions (e.g., an application program) stored in a storage device (e.g., storage device 160). Processor 210 and/or the modules in fig. 4 may execute a set of instructions, and when executing the instructions, processor 210 and/or the modules may be configured to perform process 1000. The operations of the illustrated process presented below are intended to be illustrative. In some embodiments, process 1000, when implemented, may add one or more additional operations not described herein and/or subtract one or more operations described herein.

To train the SP model, the input messages in the corpus may need to be a labeled corpus, at least a portion of the input messages in the corpus corresponding to the emotional state results. However, the corpus may only include text messages and not be tagged with emotional state results.

At 1002, the processing engine 112 may generate an emotional state result for each input message in the corpus to obtain a labeled corpus. In some embodiments, the corpus of tokens may be obtained manually. This may increase labor costs. In some embodiments, a corpus of tokens may be obtained based on an emotion annotator (SA) model. Once the input message is fed in, the SA model may generate emotional state results for the input message. In some embodiments, the corpus may include multiple types of languages, and the SA model may identify the language type of the input message and further generate a corresponding emotional state result. As described above, the emotional state results may lie in the range of [0,1], with 1 referring to the most positive emotion and 0 referring to the most negative emotion. The processing engine 112 may operate the SA model to generate an emotional state result for each input message in the corpus.

In some embodiments, the SA model may be constructed based on at least two emotion estimation models. Exemplary mood estimation models can include bayesian models, dictionary-based models, the like, or combinations thereof. For example, the SA model may be a fusion model by combining a bayesian model and a dictionary model as shown in equation (7) below:

s＝μ·s_bayes+(1-μ)·s_dict, (7)

wherein s is_bayesRepresenting the output emotional state, s, determined by a Bayesian model_dictRepresents the output emotional state result of the dictionary-based model, and μ represents a combination coefficient between 0 and 1.

In some embodiments, the bayesian model may be a pre-trained bayesian emotion classifier from a SnowNLP packet. SnowNLP is a python writing class library for processing chinese text content. It can support a variety of functions including chinese word segmentation (character-based generative models), language tagging, sentiment analysis, text classification (naive bayes), conversion to pinyin, traditional simplified form, keyword extraction, abstract extraction, Tf, idf, tokenization, text similarity, and Python3 support. The output of SnowNLP is between 0 and 1, from the most negative value to the most positive value.

The dictionary-based model may include an emotion polarity dictionary (e.g., positive dictionary, negative dictionary), an emotion degree dictionary, a stop word dictionary, and the like, or combinations thereof. The dictionary-based model can be applied to various languages because it can contain various types of dictionaries. In some embodiments, the dictionary-based model may be configured to classify at least two target words associated with an emotion into at least two categories representing different types of emotions. Once fed in, the dictionary-based model may filter the input message to obtain one or more words contained in the target word. The dictionary-based model may then generate an emotional state result for the input message based on one or more types of emotions corresponding to the one or more words.

In some embodiments, the dictionary-based model may include at least two electronic dictionaries. Exemplary electronic dictionaries may include HowNet, NTUSD, BosonNLP, etc., or combinations thereof. The web is an online common sense knowledge base that exposes relationships between concepts and relationships between concepts' attributes, which connect Chinese and their English equivalents. One significant feature of the netbook is that the user himself can generate synonyms, antonyms and inverse relations from synonym relation rules, antonyms relation lists and inverse relation lists, rather than publicly encoding them on every concept as WordNet does. NTUSD is a dictionary of emotions. It provides 11,088 emotive words including a positive word and a negative word. NTUSD provides useful polarity information that can be used as a seed to learn the mood of other words, sentences, and even documents. Boson nlp is an aggregation method for word segmentation and POS tagging, comprising three steps: preprocessing, statistical modeling and post-processing. In the pre-processing, the training data is presented in a format using 5-tag labels. The 5-tag labels { B, C, M, E, S } for word segmentation represent the beginning, second character, interior, end, isolation of the word, respectively. In statistical modeling, second-order linear Chain Random Fields (CRFs) are used as the backbone algorithm because exact inference can be done in polynomial time. Character-level features and dictionary features are extracted to produce accurate predictions. In post-processing, three rules are applied to improve the accuracy of the prediction. The rules are as follows: the prediction of the last character of 1.y must end with S, E. 2. If the next character is also a number, the end of the number is not marked. 3. If the next character is a letter, the end of the English word is not marked.

For example only, if a word included in the target word is included in the mood polarity dictionary, it may be considered positive or negative and may specify base mood state results 1 and-1, respectively. If the word is not included in the target word, the word may be considered normal and the base score is 0. In some embodiments, the target words may be classified into 7 groups with different degrees of rank and weight, as shown in table 2. In some embodiments, once the input message x is given, the emotional state result may be determined in the following steps. First, processing engine 112 may filter stop words from x after this step to obtain Q_xAnd words. Second, forQ_xWord x_t(t＝1,…,Q_x) Processing engine 112 may determine its individual emotional state result by multiplying its base emotional state result by the degree weights of all its related emotional degree words

Processing engine 112 may then sum all individual emotional state results and obtain

As a result of the non-standardized emotional state of the input message x. Finally, to remedy the impact of the length of the input message, the processing engine 112 may divide the non-normalized score by Q_xAnd applies a sigmoid function σ (·) thereon. Thus, the emotional state results generated by the dictionary-based model may be defined as

Table 2: emotional degree, weight, and example words of chinese.

Referring back to fig. 10, at 1004, the processing engine 112 may train an SP model based on the labeled corpus to obtain a machine-learned SP model. In some embodiments, for each message exchange pair, processing engine 112 may obtain a predicted emotional state result with respect to a next input message of the message exchange pair by feeding the message exchange pair to the SP model, and obtain a true emotional state result for the next input message based on the corpus of tokens. Processing engine 112 may then obtain a machine-learned SP model by adjusting parameters of the SP model to minimize a difference between the at least two predicted emotional state outcomes and the at least two true emotional state outcomes.

For example only, the processing engine 112 may train the SP model based on the following steps. First, for each of the at least two conversations, and for each message exchange pair except the last one in the conversation content, processing engine 112 may obtain a predicted emotional state result relative to the next input message of the message exchange pair by feeding the message exchange pair into the SP model. Processing engine 112 may then retrieve the true emotional state result for the next input message based on the labeled corpus. To train the SP model, processing engine 112 may further adjust parameters of the SP model to minimize a difference between the at least two predicted emotional state outcomes and the at least two true emotional state outcomes.

By way of example only, an exemplary architecture of an SP model is provided for illustration, as shown in FIG. 11. The SP model may be a dual RNN model, incorporating an embedding tier at the bottom and a focus tier and a dense tier at the top. Once fed into the message exchange pair<x_n,y_n>The SP model may generate a predicted emotional state result s for the next input message_n+1。

The SP model can be trained based on a corpus of tokens annotated by the SA model disclosed above. In each dialog box D^[M]For all but the last message exchange pairs (i.e., N-1, …, N)_m-1) The processing engine 112 may be implemented by

Applying SA model to obtain next input message

Is determined. Message processing engine 112 may also obtain predicted emotional state results for the next incoming message

Based on equation (8) below, the processing engine 112 passes the minimization

And

mean Square Error (MSE) between them training SP models

Wherein L is_SP(φ)Representing the total loss function of the SP model. The machine-learned SP model can be used to construct the SAC model.

It should be noted that the foregoing is provided for illustrative purposes only and is not intended to limit the scope of the present application. Many variations and modifications may be made to the teachings of the present application by those of ordinary skill in the art in light of the present disclosure. However, variations and modifications may be made without departing from the scope of the present application.

FIG. 12 is a flow diagram of an exemplary process for operating a SAC model, shown in accordance with some embodiments of the present application. Process 1200 may be performed by SAC system 100. For example, process 1200 may be implemented as a set of instructions (e.g., an application program) stored in a storage device (e.g., storage device 160). Processor 210 and/or the modules in fig. 4 may execute a set of instructions and, when executing the instructions, processor 210 and/or the modules may be configured to perform process 1200. The operations of the illustrated process presented below are intended to be illustrative. In some embodiments, process 1200, when implemented, may add one or more additional operations not described herein and/or delete one or more operations described herein.

In 1202, the processing engine 112 may receive an input message from an input device. The input message may include an emotive element indicating a negative level of emotion of the user using the input device. As used herein, the level of negative emotions may provide a judgment reference to determine whether the user is in a state of good humor. For example, if the user is not satisfied with the service provider, the user may compose a complaint message on his or her mobile phone (user terminal 130) and further send the complaint message to the processing engine 112 of the server 110. The complaint information may include these emotional elements. The mood element may include some characteristic words that may describe the user's mood (e.g., dead, great, dissatisfied, etc.).

In 1204, the processing engine 112 may apply an emotion soothing chat robot (SAC) model to the input message to generate an emotion soothing response message based on the emotional elements. The emotional soothing response message may include a soothing element that reacts to the emotional element of the input message. As used herein, a SAC model may be machine learned based on the proposed methods and systems disclosed in the present application (e.g., fig. 6 and its description). In some embodiments, upon receiving an input message from an input device, processing engine 112 may invoke a machine-learned SAC model stored in memory 160 to generate an emotional calming response message based on the emotional element. For example, the emotion elements contained in the input message are about complaints about the service. The emotional placation response message generated by the machine-learned SAC model may include a sorry utterance that says sorry to the user.

In 1206, the processing engine 112 may send an emotion placation response message to the output device. In some embodiments, processing engine 112 may send a responsive mood placation response message to the output device.

It should be noted that the foregoing is provided for illustrative purposes only and is not intended to limit the scope of the present application. Various changes and modifications will occur to those skilled in the art based on the description herein. However, variations and modifications may be made without departing from the scope of the present application.

FIG. 13 is a flow diagram of an exemplary process for operating a SAC model, shown in accordance with some embodiments of the present application. Process 1300 may be performed by SAC system 100. For example, process 1300 may be implemented as a set of instructions (e.g., an application program) stored in a storage device (e.g., storage device 160). Processor 210 and/or the modules in fig. 4 may execute a set of instructions, and when executing the instructions, processor 210 and/or the modules may be configured to perform process 1300. The operations of the illustrated process presented below are intended to be illustrative. In some embodiments, process 1300, when implemented, may add one or more additional operations not described herein and/or subtract one or more operations described herein. For example, process 1300 may be implemented on user terminal 130.

In 1302, an input device of the user terminal 130 may send an input message to the processor. The input message may include an emotive element indicating a negative level of emotion of the user. For example, if the user is not satisfied with the service provider, the user may compose a complaint message on his or her mobile phone (user terminal 130) and further send the complaint message to the processing engine 112 of the server 110. The complaint information may include these emotional factors. The mood element may include some characteristic words that may describe the user's mood (e.g., dead, great, dissatisfied, etc.).

In 1304, an output device of user terminal 130 may receive an emotional calming response message from the processor. The emotional soothing response message may be generated by applying a feeling soothing chat robot (SAC) model to the input message based on the emotional elements. The emotional soothing response message may include a soothing element that reacts to the emotional element of the input message. In some embodiments, the processor receives an input message sent from the input device. The processing engine 112 may invoke the trained SAC model stored in the memory 160 to generate an emotion-soothing response message. The trained SAC model may be obtained based on the methods and systems proposed in the present application (e.g., fig. 6 and its description). Processing engine 112 may then send the mood calming response message to an output device.

Examples of the present invention

After training the SAC model based on the proposed system and method, tests are performed to evaluate the performance of the trained SP model and SAC model. In this example, the chat robot model is built based on a seq2seq framework, with both the encoder and decoder being 2-layer stacked GRUs, with dense and attention layers combined on top of the seq2seq framework. The SP model has a dual RNN structure, and two RNNs in the SP model are also 2-layer stacked GRUs.

For the basic model disclosed above, a pre-trained word2vec model may be used to prepare the embedding layer. The word2vec model may be configured to take a corpus of text as input and produce a word vector as output. It constructs words from training text data and then learns vector representations of the words. The generated word vector file can be used as a feature in many natural language processing and machine learning applications.

To achieve better embedding quality, the Chinese Wiki dataset and the customer service dataset are mixed into a corpus. The word2vec model may be constructed based on Gensim. Gensim is a powerful open source vector space modeling and topic modeling toolkit implemented with Python. It uses NumPy, SciPy and Cython to improve performance. Gensim is specifically designed to handle large text collections, using data flow and efficient incremental algorithms, which distinguishes it from most other scientific software packages that are only directed to batch and memory processing.

A ski-gram model with negative sampling may be used to train the word2vec model, and the output dimension of the embedded vector is 128. The skip-gram model may include a corpus of words w and their contexts c. The conditional probability of the skip-gram model is denoted as p (c | w). Given corpus text, the goal is to set the parameter θ of p (c | w; θ) to maximize corpus probability. The negative sampling method is a more efficient method of derived word embedding. Although the negative sampling is based on the skip-gram model, it is actually optimizing different objectives.

The Adam algorithm with initial learning 10-3 is used as an optimizer to train the base model. The Adam algorithm is an efficient stochastic optimization method, requiring only several levels of gradients and little memory requirements. The method calculates respective adaptive learning rates for different parameters based on estimates of first and second moments of the gradient.

Example 1

In a first example, the SP model is evaluated by comparing its performance to other baseline models. The SP model and the baseline model are trained based on a training data set. The parameters of the model are adjusted to achieve optimal performance based on the validation dataset. Exemplary baseline models for comparison to the SP models include the dual RNN-Attn-Char model, the dual RNN model, the MLP model, the RR model, and the LR model.

The structure of the Dual-RNN-Attn-Char model may be the same as the SP model. The input to the Dual-RNN-Attn-Char model is a character rather than a word used in the SP model.

The dual RNN model is constructed based on the SP model by removing the layers of interest. Thus, the contribution of the interest layer to improving SP model performance can be demonstrated.

Instead of using dual RNN structure, MLP models are constructed by applying the de layer for emotion state determination prediction.

And the RR model constructed based on the ridge regression algorithm forms a scinit-spare library. In the RR model, a max-pooling layer is applied on top of the embedded representation of the message exchange pair. The emotional state outcome may be predicted based on a ridge regression algorithm. The ridge regression algorithm is a technique for analyzing multiple regression data that suffer from multiple collinearity. When multiple collinearities occur, the least squares estimates are unbiased, but their variance is large, so they may be far from the true value. By adding a degree of bias to the regression estimate, ridge regression can reduce the standard error. Scikit-spare (formerly sciitts. spare) is a free software machine learning library for Python programming language. It has various classification, regression and clustering algorithms, including support vector machines, random forests, gradient enhancement, k-means and DBSCAN, intended to interoperate with Python numbers and the scientific libraries NumPy and SciPy.

The LR model is similar to the RR model. Linear regression was used instead of ridge regression in the LR model.

Table 3 represents the results of the model in Mean Square Error (MSE). As shown in table 3, the SP model outperformed the other baseline models, indicating its ability to make reasonable predictions of emotional state outcomes.

Table 3: emotional state outcome prediction

Example 2

In a second example, a SAC model will be evaluated. Containing 1,000 client input messages x_nThe corpus with the most negative emotion is included in the evaluation. Corresponding response message y can also be obtained_n. Based on the selected corpus, evaluation of the SAC model proceeds as follows.

Step one, obtaining (e.g., based on the SA model) the input message s for the selected group_nAll corresponding annotated emotional state results x_n。

Step two, obtaining (e.g., based on the SA model) all corresponding annotated emotional state results s for the selected group_n+1。

Step three, based on the message exchange pairs in the selected corpus<x_n,y_n>Obtaining predicted emotional State determination s_n+1。

Step four, acquiring each message exchange pair based on SP model<x_n,y_n>Temporary emotional state determination s_n+1. They represent response messages generated by the SAC model, which are trained by different lambda values. In this example, the value of λ is set to λ [ -1,0,1,3,5,8,10]. If the value of λ is 0, the SAC model is equal to the chat robot model.

In this example, the step five, standard installation utterances (SAU), which is often used in everyday customer service is proposed. Five SAUs are shown in table 4. Each SAU is paired with a selected input message x _ n and processed by the SP model to generate its predicted emotional state outcome s_n+1。

Extracting standard soothing words (SAU) from a customer service knowledge base

Table 5: predicted emotional state outcome

Table 5 shows the results of the mood scores. As shown in the table, 1,000 samples of the input message were annotated by the SA model (sentiment score)Is x_n). After 5 SAUs are sent to the client, the next input message from the client (emotion score) is also annotated<x_n,y_n>. Samples of the input messages in the 5 SAUs and their response messages are also fed to the SP model to obtain the predicted emotional state outcome (predicted score)<x_n,y_n>. As shown in the table, the response message generated by the SAC model performed better than the SAU in satisfying the client emotion when the value was 5.

The accuracy and confusion of response messages generated by different SAC models trained on different lambda values were also evaluated, with the results shown in table 6.

Table 6: accuracy, confusion and emotional state outcome prediction for SAC models trained with different lambda

As shown in fig. 6, when the value of λ is 5, the response message s generated by the SAC model may have better performance in mood soothing. When the value of λ is 0, which means that the SAC model is equal to the chat robot model, the accuracy and confusion of the response message may be better than other λ values.

In this example, the SAC model is trained with λ -1 and with λ -5, and the chat robot model generates response messages corresponding to negative input messages of three customers. The results are shown in Table 7.

Table 7: three examples of client input messages and corresponding response messages generated by different SAC models. s_nIs a result of the emotional state of each incoming message,

is a result of the predicted emotional state of the generated response message.

As shown in table 7, when the value of λ is-1, the response message generated by the SAC model may have a negative influence on emotional soothing. When the value of λ is 5, the response message generated by the SAC model may have a positive influence on emotional soothing.

Having thus described the basic concepts, it will be apparent to those of ordinary skill in the art having read this application that the foregoing disclosure is to be construed as illustrative only and is not limiting of the application. Various modifications, improvements and adaptations of the present application may occur to those skilled in the art, although they are not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present application and thus fall within the spirit and scope of the exemplary embodiments of the present application.

Also, this application uses specific language to describe embodiments of the application. For example, "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the application. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, certain features, structures, or characteristics of one or more embodiments of the application may be combined as appropriate.

Moreover, those of ordinary skill in the art will understand that aspects of the present application may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, articles, or materials, or any new and useful modification thereof. Accordingly, various aspects of the present application may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.), or in a combination of hardware and software. The above hardware or software may be referred to as "block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present application may take the form of a computer program product embodied in one or more computer-readable media, with computer-readable program code embodied therein.

A computer readable signal medium may comprise a propagated data signal with computer program code embodied therewith, for example, on baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including electro-magnetic, optical, and the like, or any suitable combination. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code on a computer readable signal medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, etc., or any combination of the preceding.

Computer program code required for operation of aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, or similar conventional programming languages, such as the "C" programming language, Visual Basic, Fortran 1703, Perl, COBOL1702, PHP, ABAP, dynamic programming languages such as Python, Ruby, and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service using, for example, software as a service (SaaS).

Additionally, the order of the process elements and sequences described herein, the use of numerical letters, or other designations are not intended to limit the order of the processes and methods unless otherwise indicated in the claims. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.

Similarly, it should be noted that in the preceding description of embodiments of the present application, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the embodiments. This method of application, however, is not to be interpreted as reflecting an intention that the claimed subject matter to be scanned requires more features than are expressly recited in each claim. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

Claims

1. A system for training an emotion-soothing chat robot model, comprising:

a computer-readable storage medium storing executable instructions for training the emotion-soothing chat robot model; and

at least one processor in communication with the computer-readable storage medium, the at least one processor, when executing the executable instructions, instructing the system to:

obtaining a corpus, wherein the corpus comprises at least two message exchange pairs, wherein at least one of the at least two message exchange pairs comprises an input message and a response message;

applying one or more machine learning processes to the corpus to train a chat robot model to obtain a machine learning chat robot model, wherein the chat robot model generates a first response message upon input of an input message;

applying one or more machine learning processes to the corpus to train an emotion prediction model to obtain a machine-learned emotion prediction model, wherein the emotion prediction model generates an emotional state result upon being input into a message exchange pair; and

applying one or more machine learning processes to the corpus to train the emotion placating chat robot model to obtain a machine-learned emotion placating chat robot model, wherein the emotion placating chat robot model is constructed based on the machine-learned emotion prediction model and the machine-learned chat robot model, wherein upon input of an input message, the emotion placating chat robot model generates a second response message that is determined based at least in part on an emotional state result generated by the machine-learned emotion prediction model.

2. The system of claim 1, wherein:

the chat robot model is constructed based on a sequence-to-sequence model and an attention model; and is

The emotion prediction model is constructed based on a dual RNN model.

3. The system of claim 1, wherein the format of the input message comprises at least one of text, images, sound, and video.

4. The system of claim 1, wherein one or more machine learning processes are applied to the corpus to train emotion prediction models to obtain machine-learned emotion prediction models, the at least one processor further configured to cause the system to:

for each input message of the at least two message exchange pairs, generating an emotional state result indicative of an emotional estimate of the input message for the input message to obtain a labeled corpus; and

applying the one or more machine learning processes to the corpus of tokens to train the emotion prediction model to obtain the machine-learned emotion prediction model.

5. The system of claim 4, wherein the at least one processor is further configured to cause the system to generate an emotional state result for the input message:

generating the emotional state determination using an emotion annotator model, wherein:

the emotion annotator model is a fusion model constructed based on at least two emotion estimation models; and

upon entering the input message, the emotion annotator model generates an emotional state result for the input message.

6. The system of claim 5, wherein the at least two emotion estimation models comprise at least one of a Bayesian model and a dictionary-based model, wherein the dictionary-based model is configured to:

classifying at least two target words related to emotion into at least two categories representing different types of emotions;

filtering the input message to obtain one or more words contained in the target word; and

determining the emotional state result for the input message based on one or more types of emotions corresponding to the one or more words.

7. The system of claim 4, wherein the one or more machine learning processes are applied to the corpus of tokens to train the emotion prediction model to obtain the machine-learned emotion prediction model, the at least one processor further configured to cause the system to:

for each of the at least two message exchange pairs:

obtaining a predicted emotional state result for a next input message of the message exchange pair by inputting the message exchange pair to the emotion prediction model; and

obtaining a true emotional state result of the next input message based on the labeled corpus; and

obtaining the machine-learned emotional prediction model by adjusting parameters of the emotional prediction model to minimize a difference between the at least two predicted emotional state outcomes and the at least two true emotional states.

8. The system of claim 1, wherein the second response message comprises a soothing element that reacts to an emotional element of the input message.

9. The system of claim 1, wherein one or more machine learning processes are applied to the corpus to train the emotion-soothing chat robot model to obtain a machine-learned emotion-soothing chat robot model, the at least one processor further configured to cause the system to:

for each of the at least two message exchange pairs:

generating a provisional response message by inputting an input message of the message exchange pair into the machine-learned chat robot model;

generating a temporally predicted emotional state result relative to a next input message of the message exchange pair by inputting the input message and the provisional response message of the message exchange pair to the machine-learned emotional prediction model;

determining a first difference between the interim response message and a true response message included in the message exchange pair;

determining a second difference between the temporary predicted emotional state outcome and a target emotional state outcome;

determining a combined difference from the first difference and the second difference; and

obtaining the machine-learned emotion-pacifying chat robot model by adjusting parameters of the machine-learned chat robot model to minimize a sum of the at least two combined differences.

10. The system of claim 9, wherein based on the first difference and the second difference determining a combined difference, the at least one processor is further directed to the system to: combining the first difference and the second difference according to a predetermined ratio to obtain the combined difference.

11. A method for training an emotion-soothing chat robot model, the method implemented on a computing device having at least one processor and at least one storage medium, the method comprising:

applying one or more machine learning processes to the corpus to train an emotion prediction model to obtain a machine-learned emotion prediction model, wherein the machine-learned emotion prediction model generates an emotional state result upon input of a message exchange pair; and

12. The method of claim 11, wherein:

The emotion prediction model is constructed based on a dual RNN model.

13. The method of claim 11, wherein the format of the input message comprises at least one of text, image, sound, and video.

14. The method of claim 11, wherein applying one or more machine learning processes to the corpus to train emotion prediction models to obtain machine-learned emotion prediction models comprises:

for each input message of the at least two message exchange pairs, generating an emotional state result of the input message indicative of an emotional estimate of the input message to obtain a labeled corpus; and

15. The method of claim 14, wherein generating the emotional state result for the input message comprises:

generating the emotional state result for the input message using an emotion annotator model, wherein:

upon input of the input message, the emotion annotator model generates the emotional state result for the input message.

16. The method of claim 15, wherein the at least two emotion estimation models comprise at least one of a bayesian model and a dictionary-based model, wherein the dictionary-based model is configured to:

classifying at least two target words related to emotions into at least two categories representing different types of emotions;

17. The method of claim 14, wherein the applying the one or more machine learning processes to the corpus of tokens to train the emotion prediction model to obtain the machine-learned emotion prediction model comprises:

for each of the at least two message exchange pairs:

18. The method of claim 11, wherein the second response message includes a soothing element that reacts to an emotional element of the input message.

19. The method of claim 11, wherein applying one or more machine learning processes to the corpus to train the emotion-soothing chat robot model to obtain a machine-learned emotion-soothing chat robot model comprises:

for each of the at least two message exchange pairs:

determining a second difference between the temporary predicted emotional state outcome and a target emotional state outcome; and

20. The method of claim 19, wherein the determining a combined difference based on the first difference and the second difference comprises:

combining the first difference and the second difference according to a predetermined ratio to obtain the combined difference.

21. A non-transitory computer-readable medium comprising at least one module for training an emotion-soothing chat robot, wherein when executed by at least one processor of an electronic terminal, the at least one set of instructions instructs the at least one processor to perform the acts of:

applying one or more machine learning processes to the corpus to train the emotion placating chat robot model to obtain a machine-learned perceptual placating chat robot model, wherein the emotion placating chat robot model is constructed based on the machine-learned emotion prediction model and the machine-learned chat robot model, wherein upon input of an input message, the emotion placating chat robot model generates a second response message that is determined based at least in part on an emotional state result generated by the machine-learned emotion prediction model.

22. A chat robot system, comprising:

a computer-readable storage medium storing executable instructions; and

at least one processor in communication with the computer-readable storage medium, the at least one processor, when executing the executable instructions, being directed to cause the system to:

receiving an input message from an input device, wherein the input message includes an emotive element indicating a negative level of emotion of a user using the input device;

applying an emotional placation chat robot model to the input message based on the emotional elements to generate a response message, wherein the response message includes a placation element that reacts to the emotional elements of the input message; and

sending the response message to an output device.

23. The system of claim 22, wherein the emotion-soothing chat robot model is trained by applying one or more machine learning processes to a corpus, wherein the training the emotion-soothing chat robot model comprises:

obtaining the corpus, wherein the corpus comprises at least two message exchange pairs, wherein at least one of the at least two message exchange pairs comprises an input message and a response message;

applying one or more machine learning processes to the corpus to train a chat robot model to obtain a machine learning chat robot model, wherein upon input of an input message, the chat robot model outputs a first response message;

applying one or more machine learning processes to the corpus to train an emotion prediction model to obtain a machine-learned emotion prediction model, wherein the machine-learned emotion prediction model outputs an emotional state result upon input of a message exchange pair; and

applying one or more machine learning processes to the corpus to train the emotion placating chat robot model to obtain a machine-learned emotion placating chat robot model, wherein the emotion placating chat robot model is constructed based on the machine-learned emotion prediction model and the machine-learned chat robot model, wherein upon input of an input message, the emotion placating chat robot model outputs a second response message determined based at least in part on an emotional state result output by the machine-learned emotion prediction model.

24. A method, comprising:

applying an emotion soothing robot model to the input message based on the emotional elements to generate a response message, wherein the response message includes soothing elements that react to the emotional elements of the input message; and

sending the response message to an output device.

25. The method of claim 24, wherein the emotion-soothing chat robot model is trained by applying one or more machine learning processes to a corpus, wherein the training the emotion-soothing chat robot model comprises:

26. A chat robot system, comprising:

a computer-readable storage medium storing executable instructions; and

sending an input message to a processor, wherein the input message includes an emotive element indicating a negative emotive level of a user; and

receiving a response message from the processor, wherein the response message is generated by applying an emotion soothing chat robot model to the input message based on the emotional element, and wherein the response message includes a soothing element that reacts to the emotional element of the input message.

27. The system of claim 26, wherein the emotion-soothing chat robot model is trained by applying one or more machine learning processes to a corpus, wherein the training the emotion-soothing chat robot model comprises:

28. A method, comprising:

29. The method of claim 28, wherein the emotion-soothing chat robot model is trained by applying one or more machine learning processes to a corpus, wherein the training the emotion-soothing chat robot model comprises: