WO2021051404A1

WO2021051404A1 - Systems and methods for auxiliary reply

Info

Publication number: WO2021051404A1
Application number: PCT/CN2019/107053
Authority: WO
Inventors: Che Liu
Original assignee: Beijing Didi Infinity Technology And Development Co., Ltd.
Priority date: 2019-09-20
Filing date: 2019-09-20
Publication date: 2021-03-25

Abstract

A system for auxiliary reply may receive a message from a terminal device. The system may also recommend, for a target respondent, at least one candidate reply to the message by using a trained reply generation model, wherein the trained reply generation model may be associated with a plurality of embedding vectors indicating a plurality of language habits corresponding to a plurality of respondents. The system may further transmit a target reply from the at least one candidate reply to the terminal device.

Description

SYSTEMS AND METHODS FOR AUXILIARY REPLY

TECHNICAL FIELD

The present disclosure generally relates to question answering (QA) , and in particular, to systems and methods for auxiliary reply.

BACKGROUND

With the development of Internet technology and artificial intelligence, auxiliary reply in various scenarios (e.g., auxiliary reply for a customer service in online shopping) becomes more and more popular. After receiving a message from a user, a system for providing auxiliary reply may perform a textual analysis on the message, identify one or more replies that match the message, and recommend the one or more replies to a respondent (e.g., a customer service) for selection. However, different respondents may have different language habits and a specific respondent wishes to receive recommended replies in accordance with his/her language habit. Therefore, it is desirable to provide systems and methods for recommending replies for respondents accurately and efficiently.

SUMMARY

An aspect of the present disclosure relates to a system for auxiliary reply. The system may include a storage medium to store a set of instructions and a processor communicatively coupled to the storage medium. The system may receive a message from a terminal device; recommend, for a target respondent, at least one candidate reply to the message by using a trained reply generation model, wherein the trained reply generation model may be associated with a plurality of embedding vectors indicating a plurality of language habits corresponding to a plurality of respondents; and transmit a target reply from the at least one candidate reply to the terminal device.

In some embodiments, the trained reply generation model may be determined with a training process. The training process may include obtaining a plurality of samples, wherein each of the plurality of samples corresponds to a respondent of the plurality of respondents and includes a sample message, a sample reply, and a preliminary embedding vector corresponding to the respondent; obtaining a preliminary reply generation model; determining a plurality of sample scores corresponding to the plurality of samples based on the preliminary reply generation model, wherein each of the plurality of sample scores indicates a matching degree between the sample reply and the sample message; determining whether the plurality of sample scores satisfy a preset condition; and designating the preliminary reply generation model as the trained reply generation model in response to determining that the plurality of sample scores satisfy the preset condition.

In some embodiments, the training process may further include updating the preliminary reply generation model in response to determining that the plurality of sample scores do not satisfy the preset condition and repeating the step of determining whether the plurality of sample scores satisfy the preset condition until the plurality of sample scores satisfy the preset condition.

In some embodiments, the training process may further include determining a preliminary matrix including a plurality of preliminary embedding vectors corresponding to the plurality of respondents.

In some embodiments, the training process may further include updating the preliminary matrix in response to determining that the plurality of sample scores do not satisfy the preset condition.

In some embodiments, the embedding vector may be associated with at least one feature of a gender, an age, a working age, a work type, and/or a language habit.

In some embodiments, the message may include a text message, a voice message, and/or an image message.

In some embodiments, the message may include a question to the target respondent regarding an online transportation service.

In some embodiments, the reply generation model may include a Deep Structured Semantic Model (DSSM) and/or a Match-Pyramid model.

Another aspect of the present disclosure relates to a method implemented on a computing device having at least one processor, at least one storage medium, and a communication platform connected to a network. The method may include receiving a message from a terminal device; recommending, for a target respondent, at least one candidate reply to the message by using a trained reply generation model, wherein the trained reply generation model may be associated with a plurality of embedding vectors indicating a plurality of language habits corresponding to a plurality of respondents; and transmitting a target reply from the at least one candidate reply to the terminal device.

A further aspect of the present disclosure relates to a system for auxiliary reply. The system may include a receiving module, a recommendation module, and a transmission module. The receiving module may be configured to receive a message from a terminal device. The recommendation module may be configured to recommend, for a target respondent, at least one candidate reply to the message by using a trained reply generation model, wherein the trained reply generation model may be associated with a plurality of embedding vectors indicating a plurality of language habits corresponding to a plurality of respondents. The transmission module may be configured to transmit a target reply from the at least one candidate reply to the terminal device.

In some embodiments, the system further include a training module. The training module may be configured to obtain a plurality of samples, wherein each of the plurality of samples corresponds to a respondent of the plurality of respondents and includes a sample message, a sample reply, and a preliminary embedding vector corresponding to the respondent; obtain a preliminary reply generation model; determine a plurality of sample scores corresponding to the plurality of samples based on the preliminary reply generation model, wherein each of the plurality of sample scores indicates a matching degree between the sample reply and the sample message; determine whether the plurality of sample scores satisfy a preset condition; and designate the preliminary reply generation model as the trained reply generation model in response to determining that the plurality of sample scores satisfy the preset condition.

In some embodiments, the training module may be further configured to update the preliminary reply generation model in response to determining that the plurality of sample scores do not satisfy the preset condition and repeat the step of determining whether the plurality of sample scores satisfy the preset condition until the plurality of sample scores satisfy the preset condition.

In some embodiments, the training module may be further configured to determine a preliminary matrix including a plurality of preliminary embedding vectors corresponding to the plurality of respondents.

In some embodiments, the training module may be further configured to update the preliminary matrix in response to determining that the plurality of sample scores do not satisfy the preset condition.

A still further aspect of the present disclosure relates to a non-transitory computer readable medium including executable instructions. When the executable instructions are executed by at least one processor, the executable instructions may direct the at least one processor to perform a method. The method may include receiving a message from a terminal device; recommending, for a target respondent, at least one candidate reply to the message by using a trained reply generation model, wherein the trained reply generation model may be associated with a plurality of embedding vectors indicating a plurality of language habits corresponding to a plurality of respondents; and transmitting a target reply from the at least one candidate reply to the terminal device.

Additional features will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following and the accompanying drawings or may be learned by production or operation of the examples. The features of the present disclosure may be realized and attained by practice or use of various aspects of the methodologies, instrumentalities and combinations set forth in the detailed examples discussed below.

BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure is further described in terms of exemplary embodiments. These exemplary embodiments are described in detail with reference to the drawings. These embodiments are non-limiting exemplary embodiments, in which like reference numerals represent similar structures throughout the several views of the drawings, and wherein:

FIG. 1 is a schematic diagram illustrating an exemplary auxiliary reply system according to some embodiments of the present disclosure;

FIG. 2 is a schematic diagram illustrating exemplary hardware and/or software components of an exemplary computing device according to some embodiments of the present disclosure;

FIG. 3 is a schematic diagram illustrating exemplary hardware and/or software components of an exemplary mobile device according to some embodiments of the present disclosure;

FIG. 4 is a block diagram illustrating an exemplary processing engine according to an embodiment of the present disclosure;

FIG. 5 is a flowchart illustrating an exemplary process for auxiliary reply according to some embodiments of the present disclosure;

FIG. 6 is a flowchart illustrating an exemplary process for determining a trained reply generation model according to some embodiment of the present disclosure;

FIG. 7 is a schematic diagram illustrating an exemplary training sample according to some embodiments of the present disclosure; and

FIG. 8 is a schematic diagram illustrating an exemplary matrix associated with a plurality of embedding vectors corresponding to a plurality of respondents according to some embodiments of the present disclosure.

DETAILED DESCRIPTION

The following description is presented to enable any person skilled in the art to make and use the present disclosure, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present disclosure. Thus, the present disclosure is not limited to the embodiments shown, but is to be accorded the widest scope consistent with the claims.

The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. As used herein, the singular forms “a, ” “an, ” and “the” may be intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprise, ” “comprises, ” and/or “comprising, ” “include, ” “includes, ” and/or “including, ” when used in this disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

These and other features, and characteristics of the present disclosure, as well as the methods of operations and functions of the related elements of structure and the combination of parts and economies of manufacture, may become more apparent upon consideration of the following description with reference to the accompanying drawings, all of which form part of this disclosure. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended to limit the scope of the present disclosure. It is understood that the drawings are not to scale.

The flowcharts used in the present disclosure illustrate operations that systems implement according to some embodiments of the present disclosure. It is to be expressly understood, the operations of the flowcharts may be implemented not in order. Conversely, the operations may be implemented in inverted order, or simultaneously. Moreover, one or more other operations may be added to the flowcharts. One or more operations may be removed from the flowcharts.

An aspect of the present disclosure relates to systems and methods for auxiliary reply. The systems may receive a message (e.g., a question) from a terminal device and recommend at least one candidate reply (e.g., an answer to the question) for a target respondent (e.g., a customer service) to the message by using a trained reply generation model. The trained reply generation model may be associated with a plurality of embedding vectors indicating a plurality of language habits corresponding to a plurality of respondents (including the target respondent) , thus, the at least one candidate reply determined by using the trained reply generation model meets the language habit of the target respondent. Further, the systems may transmit a target reply from the at least one candidate reply to the terminal device. According to the systems and methods of the present disclosure, candidate replies are recommended to respondents with language habits of the respondents taken into consideration, thereby improving the efficiency and accuracy of the auxiliary reply.

FIG. 1 is a schematic diagram illustrating an exemplary auxiliary reply system according to some embodiments of the present disclosure. In some embodiments, the auxiliary reply system 100 may be applied in various application scenarios, such as online transportation service, online shopping, online consultation, etc. In some embodiments, the auxiliary reply system 100 may include a server 110, a network 120, a terminal device 130, and a storage 140.

In some embodiments, the server 110 may be a single server or a server group. The server group may be centralized or distributed (e.g., the server 110 may be a distributed system) . In some embodiments, the server 110 may be local or remote. For example, the server 110 may access information and/or data stored in the terminal device 130 and/or the storage 140 via the network 120. As another example, the server 110 may be directly connected to the terminal device 130 and/or the storage 140 to access stored information and/or data. In some embodiments, the server 110 may be implemented on a cloud platform. Merely by way of example, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an inter-cloud, a multi-cloud, or the like, or any combination thereof. In some embodiments, the server 110 may be implemented on a computing device 200 including one or more components illustrated in FIG. 2 in the present disclosure.

In some embodiments, the server 110 may include a processing engine 112. The processing engine 112 may process information and/or data associated with auxiliary reply to perform one or more functions described in the present disclosure. For example, the processing engine 112 may recommend, for a target respondent (e.g., a customer service) , at least one candidate reply to a message received from the terminal device 130 by using a trained reply generation model. In some embodiments, the processing engine 112 may include one or more processing engines (e.g., single-core processing engine (s) or multi-core processor (s) ) . Merely by way of example, the processing engine 112 may include a central processing unit (CPU) , an application-specific integrated circuit (ASIC) , an application-specific instruction-set processor (ASIP) , a graphics processing unit (GPU) , a physics processing unit (PPU) , a digital signal processor (DSP) , a field programmable gate array (FPGA) , a programmable logic device (PLD) , a controller, a microcontroller unit, a reduced instruction-set computer (RISC) , a microprocessor, or the like, or any combination thereof.

The network 120 may facilitate exchange of information and/or data. In some embodiments, one or more components (e.g., the server 110, the terminal device 130, or the storage 140) of the auxiliary reply system 100 may transmit information and/or data to other component (s) of the auxiliary reply system 100 via the network 120. For example, the server 110 may transmit a target reply to the terminal device 130 via the network 120. In some embodiments, the network 120 may be any type of wired or wireless network, or combination thereof. Merely by way of example, the network 120 may include a cable network, a wireline network, an optical fiber network, a tele communications network, an intranet, an Internet, a local area network (LAN) , a wide area network (WAN) , a wireless local area network (WLAN) , a metropolitan area network (MAN) , a public telephone switched network (PSTN) , a Bluetooth network, a ZigBee network, a near field communication (NFC) network, or the like, or any combination thereof. In some embodiments, the network 120 may include one or more network access points. For example, the network 120 may include wired or wireless network access points such as base stations and/or internet exchange points 120-1, 120-2, …, through which one or more components of the auxiliary reply system 100 may be connected to the network 120 to exchange data and/or information.

In some embodiments, the terminal device 130 may include a mobile device 130-1, a tablet computer 130-2, a laptop computer 130-3, or the like, or any combination thereof. In some embodiments, the mobile device 130-1 may include a smart home device, a wearable device, a smart mobile device, a virtual reality device, an augmented reality device, or the like, or any combination thereof. In some embodiments, the smart home device may include a smart lighting device, a control device of an intelligent electrical apparatus, a smart monitoring device, a smart television, a smart video camera, an interphone, or the like, or any combination thereof. In some embodiments, the wearable device may include a smart bracelet, a smart footgear, a smart glass, a smart helmet, a smart watch, a smart clothing, a smart backpack, a smart accessory, or the like, or any combination thereof. In some embodiments, the smart mobile device may include a smart phone, a personal digital assistance (PDA) , a gaming device, a navigation device, a point of sale (POS) device, or the like, or any combination thereof. In some embodiments, the virtual reality device and/or the augmented reality device may include a virtual reality helmet, a virtual reality glass, a virtual reality patch, an augmented reality helmet, augmented reality glasses, an augmented reality patch, or the like, or any combination thereof. For example, the virtual reality device and/or the augmented reality device may include a Google Glass ^TM, an Oculus Rift ^TM, a Hololens ^TM, a Gear VR ^TM, etc.

The storage 140 may store data and/or instructions. In some embodiments, the storage 140 may store message obtained from the terminal device 130. In some embodiments, the storage 140 may store data and/or instructions that the server 110 may execute or use to perform exemplary methods described in the present disclosure. In some embodiments, storage 140 may include a mass storage, a removable storage, a volatile read-and-write memory, a read-only memory (ROM) , or the like, or any combination thereof. Exemplary mass storage may include a magnetic disk, an optical disk, a solid-state drive, etc. Exemplary removable storage may include a flash drive, a floppy disk, an optical disk, a memory card, a zip disk, a magnetic tape, etc. Exemplary volatile read-and-write memory may include a random access memory (RAM) . Exemplary RAM may include a dynamic RAM (DRAM) , a double date rate synchronous dynamic RAM (DDR SDRAM) , a static RAM (SRAM) , a thyrisor RAM (T-RAM) , and a zero-capacitor RAM (Z-RAM) , etc. Exemplary ROM may include a mask ROM (MROM) , a programmable ROM (PROM) , an erasable programmable ROM (EPROM) , an electrically-erasable programmable ROM (EEPROM) , a compact disk ROM (CD-ROM) , and a digital versatile disk ROM, etc. In some embodiments, the storage 140 may be implemented on a cloud platform. Merely by way of example, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an inter-cloud, a multi-cloud, or the like, or any combination thereof.

In some embodiments, the storage 140 may be connected to the network 120 to communicate with one or more components (e.g., the server 110, and the terminal device 130) of the auxiliary reply system 100. One or more components of the auxiliary reply system 100 may access the data and/or instructions stored in the storage 140 via the network 120. In some embodiments, the storage 140 may be directly connected to or communicate with one or more components (e.g., the server 110, the terminal device 130) of the auxiliary reply system 100. In some embodiments, the storage 140 may be part of the server 110.

One of ordinary skill in the art would understand that when an element (or component) of the auxiliary reply system 100 performs, the element may perform through electrical signals and/or electromagnetic signals. For example, when the terminal device 130 processes a task, such as sending a message, receiving a target reply, the terminal device 130 may operate logic circuits in its processor to process such task. When the terminal device 130 sends out a message to the server 110, a processor of the service terminal device 130 may generate electrical signals encoding the message. The processor of the terminal device 130 may then send the electrical signals to an output port. If the terminal device 130 communicates with the server 110 via a wired network, the output port may be physically connected to a cable, which may further transmit the electrical signals to an input port of the server 110. If the terminal device 130 communicates with the server 110 via a wireless network, the output port of the terminal device 130 may be one or more antennas, which may convert the electrical signals to electromagnetic signals. Within an electronic device, such as the terminal device 130 and/or the server 110, when a processor thereof processes an instruction, sends out an instruction, and/or performs an action, the instruction and/or action is conducted via electrical signals. For example, when the processor retrieves or saves data from a storage medium (e.g., the storage 140) , it may send out electrical signals to a read/write device of the storage medium, which may read or write structured data in the storage medium. The structured data may be transmitted to the processor in the form of electrical signals via a bus of the electronic device. Here, an electrical signal refers to one electrical signal, a series of electrical signals, and/or a plurality of distinguish electrical signals.

It should be noted that application scenario illustrated in FIG. 1 is merely provided for the purposes of illustration, and is not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations or modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure.

FIG. 2 is a schematic diagram illustrating exemplary hardware and/or software components of an exemplary computing device according to some embodiments of the present disclosure. In some embodiments, the server 110 may be implemented on the computing device 200. For example, the processing engine 112 may be implemented on the computing device 200 and configured to perform functions of the processing engine 112 disclosed in this disclosure.

The computing device 200 may be used to implement any component of the auxiliary reply system 100 of the present disclosure. For example, the processing engine 112 may be implemented on the computing device 200, via its hardware, software program, firmware, or a combination thereof. Although only one such computer is shown for convenience, the computer functions related to the auxiliary reply system 100 as described herein may be implemented in a distributed manner on a number of similar platforms to distribute the processing load.

The computing device 200, for example, may include communication (COM) ports 250 connected to and from a network (e.g., the network 120) connected thereto to facilitate data communications. The computing device 200 may also include a processor (e.g., a processor 220) , in the form of one or more processors (e.g., logic circuits) , for executing program instructions. For example, the processor may include interface circuits and processing circuits therein. The interface circuits may be configured to receive electronic signals from a bus 210, wherein the electronic signals encode structured data and/or instructions for the processing circuits to process. The processing circuits may conduct logic calculations, and then determine a conclusion, a result, and/or an instruction encoded as electronic signals. Then the interface circuits may send out the electronic signals from the processing circuits via the bus 210.

The computing device 200 may further include program storage and data storage of different forms, for example, a disk 270, and a read only memory (ROM) 230, or a random access memory (RAM) 240, for storing various data files to be processed and/or transmitted by the computing device 200. The computing device 200 may also include program instructions stored in the ROM 230, the RAM 240, and/or other type of non-transitory storage medium to be executed by the processor 220. The methods and/or processes of the present disclosure may be implemented as the program instructions. The computing device 200 also includes an I/O component 260, supporting input/output between the computing device 200 and other components therein. The computing device 200 may also receive programming and data via network communications.

Merely for illustration, only one processor is described in FIG. 2. Multiple processors are also contemplated, thus operations and/or method steps performed by one processor as described in the present disclosure may also be jointly or separately performed by the multiple processors. For example, if in the present disclosure the processor of the computing device 200 executes both step A and step B, it should be understood that step A and step B may also be performed by two different CPUs and/or processors jointly or separately in the computing device 200 (e.g., the first processor executes step A and the second processor executes step B, or the first and second processors jointly execute steps A and B) .

FIG. 3 is a schematic diagram illustrating exemplary hardware and/or software components of an exemplary mobile device according to some embodiments of the present disclosure. In some embodiments, the terminal device 130 may be implemented on the mobile device 300.

As illustrated in FIG. 3, the mobile device 300 may include a communication platform 310, a display 320, a graphic processing unit (GPU) 330, a central processing unit (CPU) 340, an I/O 350, a memory 360, and a storage 390. In some embodiments, any other suitable component, including but not limited to a system bus or a controller (not shown) , may also be included in the mobile device 300. In some embodiments, a mobile operating system 370 (e.g., iOS ^TM, Android ^TM, Windows Phone ^TM) and one or more applications 380 may be loaded into the memory 360 from the storage 390 in order to be executed by the CPU 340. The applications 380 may include a browser or any other suitable mobile apps for receiving and rendering information relating to the auxiliary reply system 100. User interactions with the information stream may be achieved via the I/O 350 and provided to one or more components of the auxiliary reply system 100 via the network 120.

To implement various modules, units, and their functionalities described in the present disclosure, computer hardware platforms may be used as the hardware platform (s) for one or more of the elements described herein. A computer with user interface elements may be used to implement a personal computer (PC) or any other type of work station or terminal device. A computer may also act as a server if appropriately programmed.

FIG. 4 is a block diagram illustrating an exemplary processing engine according to an embodiment of the present disclosure. The processing engine 112 may include a receiving module 402, a recommendation module 404, a transmitting module 406, and a training module 408.

The receiving module 402 may be configured to receive a message from the terminal device 130 via the network 120. Take an online taxi hailing scenario as an example, the message may include a conversation associated with a service type, a driving route, a service fee, a payment, an invoice, etc. In some embodiments, the message may include a text message, a voice message, an image message, or the like, or a combination thereof. In some embodiments, the receiving module 402 may receive the message from the terminal device 130 by using any suitable communication protocol, for example, Hypertext Transfer Protocol (HTTP) , Address Resolution Protocol (ARP) , Dynamic Host Configuration Protocol (DHCP) , File Transfer Protocol (FTP) , etc.

The recommendation module 404 may be configured to recommend at least one candidate reply for a target respondent to the message by using a trained reply generation model. In some embodiments, the trained reply generation model may be associated with a plurality of embedding vectors (also referred to as “trained embedding vectors” ) indicating a plurality of language habits corresponding to a plurality of respondents (which include the target respondent) . In some embodiments, the plurality of embedding vectors may be included in a matrix and each of the plurality of embedding vectors corresponds to a row vector or a column vector of the matrix. For the target respondent, the recommendation module 404 may identify a target row vector or a target column vector from the matrix and determine the at least one candidate reply based on the target row vector or the target column vector using the trained reply generation model. More descriptions of the trained reply generation model may be found elsewhere in the present disclosure (e.g., FIG. 6 and the description thereof) .

The transmission module 406 may be configured to transmit a target reply from the at least one candidate reply to the terminal device 130. In some embodiments, the target reply may be manually selected from the at least one candidate reply by the target respondent. In some embodiments, the transmitting module 406 may select the target reply from the at least one candidate reply based on a predetermined rule.

The training module 408 may be configured to determine the trained reply generation module based on a plurality of samples according to a training process. In some embodiments, the training module 408 may obtain the plurality of samples based on a plurality of historical sessions associated with the plurality of respondents. In some embodiments, each of the plurality of samples may correspond to a respondent of the plurality of respondents and include a sample message, a sample reply, and an embedding vector corresponding to the respondent. In some embodiments, the plurality of samples may include a plurality of positive samples and a plurality of negative samples. More descriptions of the training process may be found elsewhere in the present disclosure (e.g., FIG. 6 and the description thereof) .

The modules in the processing engine 112 may be connected to or communicate with each other via a wired connection or a wireless connection. The wired connection may include a metal cable, an optical cable, a hybrid cable, or the like, or any combination thereof. The wireless connection may include a Local Area Network (LAN) , a Wide Area Network (WAN) , a Bluetooth, a ZigBee, a Near Field Communication (NFC) , or the like, or any combination thereof. Two or more of the modules may be combined as a single module, and any one of the modules may be divided into two or more units. For example, the receiving module 402 and the transmission module 406 may be combined as a single module which may both receive the message from the terminal device 130 and transmit the target reply to the terminal device 130. As another example, the processing engine 112 may include a storage module (not shown) which may be configured to store the message, the at least one candidate reply, the trained reply generation model, the target reply, etc. As a further example, the training module 408 may be unnecessary and the trained reply generation module may be obtained from a storage device (e.g., the storage 140) disclosed elsewhere in the present disclosure or may be determined by an independent training device in the auxiliary reply system 100.

FIG. 5 is a flowchart illustrating an exemplary process for auxiliary reply according to some embodiments of the present disclosure. The process 500 may be executed by the auxiliary reply system 100. For example, the process 500 may be implemented as a set of instructions (e.g., an application) stored in the storage ROM 230 or RAM 240. The processor 220 and/or the modules in FIG. 4 may execute the set of instructions, and when executing the instructions, the processor 220 and/or the modules may be configured to perform the process 500. The operations of the illustrated process presented below are intended to be illustrative. In some embodiments, the process 500 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed. Additionally, the order in which the operations of the process 500 illustrated in FIG. 5 and described below is not intended to be limiting.

In 510, the processing engine 112 (e.g., the receiving module 402) (e.g., the interface circuits of the processor 220) may receive a message from the terminal device 130. The processing engine 112 may receive the message from the terminal device 130 via the network 120. Take an online taxi hailing scenario as an example, the message may be a conversation associated with a service type, a driving route, a service fee, a payment, an invoice, etc.

In some embodiments, the message may include a text message, a voice message, an image message, or the like, or a combination thereof. In some embodiments, the processing engine 112 may receive the message from the terminal device 130 by using any suitable communication protocol, for example, Hypertext Transfer Protocol (HTTP) , Address Resolution Protocol (ARP) , Dynamic Host Configuration Protocol (DHCP) , File Transfer Protocol (FTP) , etc.

In some embodiments, after receiving the message, the processing engine 112 may analyze and/or process the message. For example, the processing engine 112 may perform a textual analysis on the message to extract semantic information (e.g., a word, a phrase, a sentence, context information) associated with the message. As another example, the processing engine 112 may perform a voice recognition on the voice message. As a further example, the processing engine 112 may perform an image recognition (e.g., an optical character recognition) on the image message.

In 520, the processing engine 112 (e.g., the recommendation module 404) (e.g., the processing circuits of the processor 220) may recommend at least one candidate reply for a target respondent to the message by using a trained reply generation model.

As used herein, the target respondent may be an online customer service, an online consultant, an online assistance, etc. Take the online customer service as an example, it is known that a specific customer service may have his/her personalized language habit, therefore, when receiving a message from a user, the specific customer service may wish to receive recommended replies that meet his/her personalized language habit. Accordingly, for the target respondent, the processing engine 112 may recommend at least one candidate reply by using the trained reply generation model which may be associated with a plurality of embedding vectors (also referred to as “trained embedding vectors” ) indicating a plurality of language habits corresponding to a plurality of respondents (which include the target respondent) .

In some embodiments, the plurality of embedding vectors may be included in a matrix and each of the plurality of embedding vectors corresponds to a row vector or a column vector of the matrix. For the target respondent, the processing engine 112 may identify a target row vector or a target column vector from the matrix and determine the at least one candidate reply based on the target row vector or the target column vector using the trained reply generation model.

In some embodiments, the trained reply generation model may be configured to provide a matching score between the message and each of the at least one candidate reply. In some embodiments, the matching score may indicate a matching degree between a message and a reply. The larger the matching score is, the higher the matching degree between the message and the reply may be.

In some embodiments, the processing engine 112 may obtain the trained reply generation model from the training module 408 or a storage device (e.g., the storage 140) disclosed elsewhere in the present disclosure. In some embodiments, the trained reply generation model may be determined based on a plurality of samples, wherein each of the plurality of samples may correspond to a respondent of the plurality of respondents. In some embodiments, each of the plurality of samples may include a sample message, a sample reply, and a sample embedding vector corresponding to the respondent. In some embodiments, the trained reply generation model may include a Deep Structured Semantic Model (DSSM) , a Match-Pyramid model, etc. More descriptions of the trained reply generation model may be found elsewhere in the present disclosure (e.g., FIG. 6 and the description thereof) .

In 530, the processing engine 112 (e.g., the transmission module 406) (e.g., the interface circuits of the processor 220) may transmit a target reply from the at least one candidate reply to the terminal device 130. The processing engine 112 may transmit the target reply to the terminal device 130 via the network 120.

In some embodiments, the target reply may be manually selected from the at least one candidate reply by the target respondent. In some embodiments, the processing engine 112 may select the target reply from the at least one candidate reply based on a predetermined rule. For example, the processing engine 112 may rank the at least one candidate reply from high to low based on at least one matching score corresponding to the at least one candidate reply. Further, the processing engine 112 may select a candidate reply with a highest matching score as the target reply.

It should be noted that the above description is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. For example, one or more other optional operations (e.g., a storing operation) may be added elsewhere in the process 500. In the storing operation, the processing engine 112 may store information and/or data (e.g., the at least one candidate reply, the trained reply generation model, the target reply) associated with the message in a storage device (e.g., the storage 140) disclosed elsewhere in the present disclosure. As another example, before the target reply is transmitted to the terminal device 130, the target respondent can modify and/or amend the target reply so that it can better meet his/her language habit.

FIG. 6 is a flowchart illustrating an exemplary process for determining a trained reply generation model according to some embodiment of the present disclosure. The process 600 may be executed by the auxiliary reply system 100. For example, the process 600 may be implemented as a set of instructions (e.g., an application) stored in the storage ROM 230 or RAM 240. The processor 220 and/or the training module 408 may execute the set of instructions, and when executing the instructions, the processor 220 and/or the training module 408 may be configured to perform the process 600. The operations of the illustrated process/method presented below are intended to be illustrative. In some embodiments, the process 600 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed. Additionally, the order in which the operations of the process 600 illustrated in FIG. 6 and described below is not intended to be limiting.

In 610, the processing engine 112 (e.g., the training module 408) (e.g., the interface circuits or the processing circuits of the processor 220) may obtain a plurality of samples. The processing engine 112 may obtain the plurality of samples from a storage device (e.g., the storage 140) disclosed elsewhere in the present disclosure.

In some embodiments, the processing engine 112 may obtain the plurality of samples based on a plurality of historical sessions associated with the plurality of respondents. In some embodiments, each of the plurality of samples may correspond to a respondent of the plurality of respondents and include a sample message, a sample reply, and a preliminary embedding vector corresponding to the respondent. As used herein, the “embedding vector” refers to a vector associated with at least one feature of the respondent. The at least one feature of the respondent may include a gender, an age, a working age, a work type, a language habit (which may be expressed as default value (e.g., 0) in the preliminary embedding vector) , or the like, or a combination thereof. In some embodiments, the processing engine 112 may determine the preliminary embedding vector corresponding to the respondent based on one hot coding. More descriptions of the embedding vector may be found elsewhere in the present disclosure (e.g., FIG. 7, FIG. 8, and the descriptions thereof) .

In some embodiments, for each of the plurality of samples, the processing engine 112 may determine a sample message vector corresponding to the sample message and a sample reply vector corresponding to the sample reply, both are determined in a manner similar to the determination of the embedding vector. Further, the processing engine 112 may determine a combined vector (also referred to as a “modified message vector” ) by combining the sample message vector with the embedding vector corresponding to the respondent.

In some embodiments, the plurality of samples may include a plurality of positive samples (which may be labelled as “1” and indicates that a matching degree between the sample reply and the sample message is 100%) and a plurality of negative samples (which may be labelled as “0” and indicates that the matching degree between the sample reply and the sample message is 0%) . For the positive sample, the sample reply refers to a historical reply immediately following the sample message (which is a historical message in the historical session) in a historical session; for the negative sample, the sample reply refers to a historical reply which doesn’t correspond to the sample message (i.e., a historical reply other than the historical reply immediately following the sample message in the historical session or a historical reply in other historical sessions) .

For example, take a historical session illustrated below as an example,

User: Hello!

Customer service: Hello, what can I help you?

User: How to charge for the carpooling service?

Customer service: The charging rule for the carpooling service is …

As illustrated, a sample including the user’s question ( “How to charge for the ride-sharing service? ” ) and the customer service’s reply ( “The charging rule for the carpooling service is... ” ) immediately following the user’s question in the historical session is a positive sample; a sample including the user’s question ( “How to charge for the ride-sharing service? ” ) and the customer service’s reply ( “Hello, what can I help you? ” ) which doesn’t correspond to the user’s question in the historical session is a negative sample.

In some embodiments, the processing engine 112 may divide the plurality of samples into a training set and a test set. The training set may be used to train the model and the test set may be used to determine whether the training process has been completed.

In 620, the processing engine 112 (e.g., the training module 408) (e.g., the processing circuits of the processor 220) may obtain a preliminary reply generation model. The preliminary reply generation model may include one or more preliminary parameters which may be default settings of the auxiliary reply system 100 or may be adjustable under different situations.

In some embodiments, the preliminary reply generation model may include a preliminary matrix associated with a plurality of preliminary embedding vectors corresponding to the plurality of respondents. As described above, for a specific respondent, a corresponding embedding vector refers to a vector associated with at least one feature (e.g., a gender, an age, a working age, a work type, a language habit) of the specific respondent. Further, the preliminary matrix may be a matrix including the plurality preliminary embedding vectors each of which corresponds to a row vector or a column vector of the matrix.

In 630, the processing engine 112 (e.g., the training module 408) (e.g., the processing circuits of the processor 220) may determine a plurality of sample scores corresponding to the plurality of samples based on the preliminary reply generation model. As used herein, a sample score may indicate a matching degree between the sample reply and the sample message. In some embodiments, the processing engine 112 may match the message vector (or the modified message vector) with the reply vector and determine the sample score based on the matching result.

In 640, the processing engine 112 (e.g., the training module 408) (e.g., the processing circuits of the processor 220) may determine whether the plurality of sample scores satisfy a preset condition.

For example, the processing engine 112 may determine a first accuracy rate of the preliminary reply generation model corresponding to the training set and a second accuracy rate of the preliminary reply generation model corresponding to the test set. Further, the processing engine 112 may determine whether the first accuracy rate has been stable ( “stable” refers to that a first accuracy rate in a current iteration is substantially same as (i.e., less than a threshold) a first accuracy rate in a previous adjacent iteration or multiple first accuracies in multiple previous iterations) and whether the second accuracy rate has reached a maximum value. As used herein, the first accuracy rate and/or the second accuracy rate may be determined based on one or more parameters (e.g., a proximity degree) associated with the plurality of sample scores and the plurality of labels (i.e., “1” or “0” ) corresponding to the plurality of samples. In response to determining that the first accuracy rate has been stable and the second accuracy rate has reaches the maximum value, the processing engine 112 may determine that the plurality of sample scores satisfy the preset condition. In response to determining that the first accuracy rate is unstable (e.g., rising) and the second accuracy rate has not reaches the maximum value, the processing engine 112 may determine that the plurality of sample scores do not satisfy the preset condition.

As another example, the processing engine 112 may determine a loss function (e.g., a L2 loss function) of the preliminary reply generation model based on the plurality of sample scores and the plurality of labels and determine a value of the loss function based on the plurality of sample scores. Further, the processing engine 112 may determine whether the value of the loss function is less than a loss threshold. The loss threshold may be a default setting of the auxiliary reply 100 or may be adjustable under different situations. In response to determining that the value of the loss function is less than the loss threshold, the processing engine 112 may determine that the plurality of sample scores satisfy the preset condition. In response to determining that the value of the loss function is higher than or equal to the loss threshold, the processing engine 112 may determine that the plurality of sample scores do not satisfy the preset condition.

As a further example, the processing engine 112 may determine whether a number count of iterations is larger than a count threshold. In response to determining that the number count of iterations is larger than the count threshold, the processing engine 112 may determine that the plurality of sample scores satisfy the preset condition. In response to determining that the number count of iterations is less than or equal to the count threshold, the processing engine 112 may determine that the plurality of sample scores do not satisfy the preset condition.

In 650, in response to determining that the plurality of sample scores satisfy the preset condition, the processing engine 112 (e.g., the training module 408) (e.g., the processing circuits of the processor 220) may designate the preliminary reply generation model as the trained reply generation model, which means that the training process has been completed.

On the other hand, in response to determining that the plurality of sample scores do not satisfy the preset condition, the processing engine 112 (e.g., the training module 408) (e.g., the processing circuits of the processor 220) may execute the process 600 to return to operation 620 to update the preliminary reply generation model. For example, the processing engine 112 may update the one or more preliminary parameters to produce an updated reply generation model. Further, the processing engine 112 (e.g., the training module 408) (e.g., the processing circuits of the processor 220) may also repeat the step of determining whether the plurality of updated sample scores satisfy the preset condition until the plurality of updated sample scores satisfy the preset condition. In response to determining that the plurality of updated sample scores under the updated reply generation model satisfy the preset condition, the processing engine 112 may designate the updated reply generation model as the trained reply generation model.

In some embodiments, in response to determining that the plurality of sample scores (or the updated sample scores) do not satisfy the preset condition, the processing engine 112 (e.g., the training module 408) (e.g., the processing circuits of the processor 220) may update the plurality of preliminary embedding vectors (or a plurality of updated embedding vectors) corresponding to the plurality of respondents (i.e., update the preliminary matrix (or an updated matrix) ) .

As described elsewhere in the present disclosure, a specific embedding vector corresponds to a specific respondent and indicates one or more features (e.g., an age, a gender, a working age, a work type, a language habit) of the specific respondent. In the preliminary embedding vector, the feature “language habit” may be expressed as default value (e.g., 0) . During the training process, the processing engine 112 may iteratively update the embedding vectors corresponding to the plurality of respondents; after the training process is completed, embedding vectors corresponding to respondents with similar language habits are similar with each other.

In some embodiments, as described in connection with FIG. 5, if the target respondent has not replied to a specific message in the historical sessions, the processing engine 112 may identify a respondent with a similar language habit (i.e., a similar embedding vector) and recommend the at least one candidate reply based on an embedding vector corresponding to the respondent for the target respondent.

It should be noted that the above description is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. For example, the processing engine 112 may update the trained reply generation model at a certain time interval (e.g., per month, per two months) based on a plurality of newly obtained samples. As another example, the positive samples and the negative samples may be determined manually by an operator or by the auxiliary system 100 according to a predetermined rule.

FIG. 7 is a schematic diagram illustrating an exemplary sample according to some embodiments of the present disclosure. As described in connection with operation 610, the sample 700 may include a sample message, a sample reply, and an embedding vector corresponding to a respondent.

As illustrated in FIG. 7, the sample message is a question (e.g., “How to charge for the carpooling service? ” ) from a user in a historical session and the sample reply is a historical reply (e.g., “The charging rule for the carpooling service is …” ) from a customer service in the historical session. Further, the embedding vector corresponding to the respondent (i.e., the customer service) refers to a vector associated with at least one feature (e.g., a gender, an age, a working age, a work type, a language habit) of the respondent. In some embodiments, the at least one feature of the respondent may be encoded as a 128-dimensional vector or a 256-dimensional vector based on one hot coding. For example, it is assumed that the gender of the respondent is “male, ” the age of the respondent is “21-25, ” the working age is “1 year, ” the work type is “express, ” the embedding vector corresponding to the respondent may be expressed as “10100010000100. ” In some embodiments, the sample message and the sample reply may be also coded as the sample message vector and the sample reply vector respectively based on one hot coding.

FIG. 8 is a schematic diagram illustrating an exemplary matrix associated with a plurality of embedding vectors corresponding to a plurality of respondents according to some embodiments of the present disclosure. As described in connection with FIG. 5 and FIG. 6, the matrix may be constituted by the plurality of embedding vectors corresponding to the plurality of respondents, wherein each of the plurality of embedding vectors corresponds to a row or a column of the matrix.

As described elsewhere in the present disclosure, the embedding vector is associated with at least one feature (e.g., a gender, an age, a working age, a work type, a language habit) of the respondent. Take one hot coding as an example, for “gender, ” “10” and “01” refer to “male” and “female” respectively; for “age, ” “1000, ” “0100, ” “0010, ” and “0001” refer to “21-25, ” “26-30, ” “31-35, ” and “36-40” respectively; for “working age, ” “10000, ” “01000, ” “00100, ” “00010, ” and “00001” refer to “1 year, ” “2-3 years, ” “4-5 years, ” “5-10 years, ” and “more than 10 years” respectively; for “work type, ” “100, ” “010, ” and “001” refer to “express, ” “taxi, ” and “carpool” respectively. Further, for “language habit, ” vector elements may be expressed as default value in the preliminary embedding vector and may be iteratively updated during the training process.

Having thus described the basic concepts, it may be rather apparent to those skilled in the art after reading this detailed disclosure that the foregoing detailed disclosure is intended to be presented by way of example only and is not limiting. Various alterations, improvements, and modifications may occur and are intended to those skilled in the art, though not expressly stated herein. These alterations, improvements, and modifications are intended to be suggested by this disclosure, and are within the spirit and scope of the exemplary embodiments of this disclosure.

Moreover, certain terminology has been used to describe embodiments of the present disclosure. For example, the terms “one embodiment, ” “some embodiments, ” and/or “some embodiments” mean that a particular feature, structure or characteristic described in connection with some embodiments is included in at least one embodiment of the present disclosure. Therefore, it is emphasized and should be appreciated that two or more references to “some embodiments, ” “one embodiment, ” or “an alternative embodiment” in various portions of this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined as suitable in one or more embodiments of the present disclosure.

Further, it will be appreciated by one skilled in the art, aspects of the present disclosure may be illustrated and described herein in any of a number of patentable classes or context including any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof. Accordingly, aspects of the present disclosure may be implemented entirely hardware, entirely software (including firmware, resident software, micro-code, etc. ) or combining software and hardware implementation that may all generally be referred to herein as a "block, " “module, ” “engine, ” “unit, ” “component, ” or “system. ” Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer readable media having computer readable program code embodied thereon.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including electro-magnetic, optical, or the like, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that may communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable signal medium may be transmitted using any appropriate medium, including wireless, wireline, optical fiber cable, RF, or the like, or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C++, C#, VB. NET, Python or the like, conventional procedural programming languages, such as the “C” programming language, Visual Basic, Fortran 1703, Perl, COBOL 1702, PHP, ABAP, dynamic programming languages such as Python, Ruby and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN) , or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider) or in a cloud computing environment or offered as a service such as a software as a service (SaaS) .

Furthermore, the recited order of processing elements or sequences, or the use of numbers, letters, or other designations, therefore, is not intended to limit the claimed processes and methods to any order except as may be specified in the claims. Although the above disclosure discusses through various examples what is currently considered to be a variety of useful embodiments of the disclosure, it is to be understood that such detail is solely for that purpose, and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover modifications and equivalent arrangements that are within the spirit and scope of the disclosed embodiments. For example, although the implementation of various components described above may be embodied in a hardware device, it may also be implemented as a software-only solution-e.g., an installation on an existing server or mobile device.

Similarly, it should be appreciated that in the foregoing description of embodiments of the present disclosure, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the various embodiments. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed subject matter requires more features than are expressly recited in each claim. Rather, claimed subject matter may lie in less than all features of a single foregoing disclosed embodiment.

Claims

A system for auxiliary reply, comprising:

a storage medium to store a set of instructions; and

a processor, communicatively coupled to the storage medium, to execute the set of instructions to:

receive a message from a terminal device;

recommend, for a target respondent, at least one candidate reply to the message by using a trained reply generation model, wherein the trained reply generation model is associated with a plurality of embedding vectors indicating a plurality of language habits corresponding to a plurality of respondents; and

transmit a target reply from the at least one candidate reply to the terminal device.
The system of claim 1, wherein the trained reply generation model is determined with a training process, the training process comprising:

obtaining a plurality of samples, wherein each of the plurality of samples corresponds to a respondent of the plurality of respondents and comprises a sample message, a sample reply, and a preliminary embedding vector corresponding to the respondent;

obtaining a preliminary reply generation model;

determining a plurality of sample scores corresponding to the plurality of samples based on the preliminary reply generation model, wherein each of the plurality of sample scores indicates a matching degree between the sample reply and the sample message;

determining whether the plurality of sample scores satisfy a preset condition; and

designating the preliminary reply generation model as the trained reply generation model in response to determining that the plurality of sample scores satisfy the preset condition.
The system of claim 2, the training process further comprising:

updating the preliminary reply generation model in response to determining that the plurality of sample scores do not satisfy the preset condition; and

repeating the step of determining whether the plurality of sample scores satisfy the preset condition until the plurality of sample scores satisfy the preset condition.
The system of claim 2 or claim 3, further comprising:

determining a preliminary matrix comprising a plurality of preliminary embedding vectors corresponding to the plurality of respondents.
The system of claim 4, the training process further comprising:

updating the preliminary matrix in response to determining that the plurality of sample scores do not satisfy the preset condition.
The system of any of claims 1-5, wherein the embedding vector is associated with at least one feature of a gender, an age, a working age, a work type, or a language habit.
The system of any of claims 1-6, wherein the message comprises at least one of a text message, a voice message, or an image message.
The system of any of claims 1-7, wherein the message comprises a question to the target respondent regarding an online transportation service.
The system of any of claims 1-8, wherein the reply generation model comprises a Deep Structured Semantic Model (DSSM) or a Match-Pyramid model.
A method implemented on a computing device having at least one processor, at least one storage medium, and a communication platform connected to a network, the method comprising:

receiving a message from a terminal device;

recommending, for a target respondent, at least one candidate reply to the message by using a trained reply generation model, wherein the trained reply generation model is associated with a plurality of embedding vectors indicating a plurality of language habits corresponding to a plurality of respondents; and

transmitting a target reply from the at least one candidate reply to the terminal device.
The method of claim 10, wherein the trained reply generation model is determined with a training process, the training process comprising:

obtaining a plurality of samples, wherein each of the plurality of samples corresponds to a respondent of the plurality of respondents and comprises a sample message, a sample reply, and a preliminary embedding vector corresponding to the respondent;

obtaining a preliminary reply generation model;

determining a plurality of sample scores corresponding to the plurality of samples based on the preliminary reply generation model, wherein each of the plurality of sample scores indicates a matching degree between the sample reply and the sample message;

determining whether the plurality of sample scores satisfy a preset condition; and

designating the preliminary reply generation model as the trained reply generation model in response to determining that the plurality of sample scores satisfy the preset condition.
The method of claim 11, the training process further comprising:

updating the preliminary reply generation model in response to determining that the plurality of sample scores do not satisfy the preset condition; and

repeating the step of determining whether the plurality of sample scores satisfy the preset condition until the plurality of sample scores satisfy the preset condition.
The method of claim 11 or claim 12, further comprising:

determining a preliminary matrix comprising a plurality of preliminary embedding vectors corresponding to the plurality of respondents.
The method of claim 13, the training process further comprising:

updating the preliminary matrix in response to determining that the plurality of sample scores do not satisfy the preset condition.
The method of claim 10-14, wherein the embedding vector is associated with at least one feature of a gender, an age, a working age, a work type, or a language habit.
The method of claim 10-15, wherein the message comprises at least one of a text message, a voice message, or an image message.
The method of claim 10-16, wherein the message comprises a question to the target respondent regarding an online transportation service.
The method of claim 10-17, wherein the reply generation model comprises a Deep Structured Semantic Model (DSSM) or a Match-Pyramid model.
A system, comprising:

a receiving module configured to receive a message from a terminal device;

a recommendation module configured to recommend, for a target respondent, at least one candidate reply to the message by using a trained reply generation model, wherein the trained reply generation model is associated with a plurality of embedding vectors indicating a plurality of language habits corresponding to a plurality of respondents; and

a transmission module configured to transmit a target reply from the at least one candidate reply to the terminal device.
The system of claim 19, wherein the system further includes a training module configured to:

obtain a plurality of samples, wherein each of the plurality of samples corresponds to a respondent of the plurality of respondents and comprises a sample message, a sample reply, and a preliminary embedding vector corresponding to the respondent;

obtain a preliminary reply generation model;

determine a plurality of sample scores corresponding to the plurality of samples based on the preliminary reply generation model, wherein each of the plurality of sample scores indicates a matching degree between the sample reply and the sample message;

determine whether the plurality of sample scores satisfy a preset condition; and

designate the preliminary reply generation model as the trained reply generation model in response to determining that the plurality of sample scores satisfy the preset condition.
The system of claim 20, wherein the training module is further configured to:

update the preliminary reply generation model in response to determining that the plurality of sample scores do not satisfy the preset condition; and

repeat the step of determining whether the plurality of sample scores satisfy the preset condition until the plurality of sample scores satisfy the preset condition.
The system of claim 20 or claim 21, wherein the training module is further configured to:

determine a preliminary matrix comprising a plurality of preliminary embedding vectors corresponding to the plurality of respondents.
The system of claim 22, wherein the training module is further configured to:

update the preliminary matrix in response to determining that the plurality of sample scores do not satisfy the preset condition.
The system of any of claims 19-23, wherein the embedding vector is associated with at least one feature of a gender, an age, a working age, a work type, or a language habit.
The system of any of claims 19-24, wherein the message comprises at least one of a text message, a voice message, or an image message.
The system of any of claims 19-25, wherein the message comprises a question to the target respondent regarding an online transportation service.
The system of any of claims 19-26, wherein the reply generation model comprises a Deep Structured Semantic Model (DSSM) or a Match-Pyramid model.
A non-transitory computer readable medium, comprising executable instructions that, when executed by at least one processor, direct the at least one processor to perform a method, the method comprising:

receiving a message from a terminal device;

recommending, for a target respondent, at least one candidate reply to the message by using a trained reply generation model, wherein the trained reply generation model is associated with a plurality of embedding vectors indicating a plurality of language habits corresponding to a plurality of respondents; and

transmitting a target reply from the at least one candidate reply to the terminal device.