WO2020042164A1

WO2020042164A1 - Artificial intelligence systems and methods based on hierarchical clustering

Info

Publication number: WO2020042164A1
Application number: PCT/CN2018/103626
Authority: WO
Inventors: Junhong LIU; Peng Wang; Kangkang WU; Jie Wang
Original assignee: Beijing Didi Infinity Technology And Development Co., Ltd.
Priority date: 2018-08-31
Filing date: 2018-08-31
Publication date: 2020-03-05
Also published as: CN111373395A

Abstract

Artificial intelligence systems and methods for classifying customer inquiries are provided. The system (100) includes a communication interface (102) configured to receive a plurality of historical customer inquiries. The system (100) further includes a processor (104). The processor (104) is configured to segment the historical customer inquiries into a plurality of terms and determine a group of frequently used terms among the terms. The processor (104) is further configured to filter the historical customer inquiries using the group of frequently used terms. The processor (104) is also configured to determine a plurality of representative topics by classifying the filtered historical customer inquiries, wherein the classification applies a hierarchical clustering method. The system (100) also includes a storage (108) configured to store the frequently used terms and the representative topics.

Description

ARTIFICIAL INTELLIGENCE SYSTEMS AND METHODS BASED ON HIERARCHICAL CLUSTERING

TECHNICAL FIELD

The present disclosure relates to artificial intelligence (AI) systems and methods for managing customer inquiries, and more particularly to, AI systems and methods for intelligently classifying customer inquiries based on hierarchical clustering.

BACKGROUND

High-quality customer service is important to virtually all types of businesses, including businesses that sell products and provide services. Customer service is typically labor intensive, and thus requires a large team of representatives to meet the bandwidth requirement. Automated or partially automated customer service systems have been implemented to reduce human capital cost while increasing service bandwidth and speed. For example, an automated customer service system can process multiple customer inquiries simultaneously so that customers do not need to wait in line.

Certain automated customer service systems can communicate with customers intelligently in question and answer (QA) sessions, such as to understand customer inquiries and provide responses to address the inquiries. For an intelligent customer service system to effectively handle QA sessions, the system has to, first, accurately determine what topic the question is related to. For example, a question “when should I expect to receive my order? ” is related to an existing topic of “shipping status. ”

In existing systems, the topics are typically synthesized manually based on a large amount of QA data. The process is labor intensive, inefficient, and prone to errors. For example, different people have different subjective understanding of a customer inquiry and thus may make different determinations as to the related topics. As another example, a manually synthesized topic may be ambiguous, e.g., the topic may map to two different scenarios where different answers should be provided. For instance, “how to change password” in a banking context may include two possible scenarios that require different answers: (1) how to change login password and (2) how to change cash advance password.

Classification methods, such as k-means clustering, have been applied to cluster customer inquires in order to synthesize topics. However, customer inquires in the same context (e.g., ride-hail service) are usually similar semantically, and not easily distinguishable from each other. Therefore, applying a simple k-mean clustering may not be able to “separate” such inquires in the clustering space.

Embodiments of the disclosure address the above problems by providing artificial intelligence systems and methods for intelligently learning customer inquiries based on hierarchical clustering.

SUMMARY

Embodiments of the disclosure provide an artificial intelligence system for classifying customer inquiries. The system includes a communication interface configured to receive a plurality of historical customer inquiries. The system further includes a processor. The processor is configured to segment the historical customer inquiries into a plurality of terms and determine a group of frequently used terms among the terms. The processor is further configured to filter the historical customer inquiries using the group of frequently used terms. The processor is also configured to determine a plurality of representative topics by classifying the filtered historical customer inquiries, wherein the classification applies a hierarchical clustering method. The system also includes a storage configured to store the frequently used terms and the representative topics.

Embodiments of the disclosure also provide an artificial intelligence method for classifying customer inquiries. The method includes receiving a plurality of historical customer inquiries and segmenting, by a processor, the historical customer inquiries into a plurality of terms. The method further includes determining, by the processor, a group of frequently used terms among the terms and filtering, by the processor, the historical customer inquiries using the group of frequently used terms. The method also includes determining, by the processor, a plurality of representative topics by classifying the filtered historical customer inquiries, wherein the classification applies a hierarchical clustering method. The method further includes storing the frequently used terms and the representative topics in a storage.

Embodiments of the disclosure further provide a non-transitory computer-readable medium having instructions stored thereon that, when executed by a processor, causes the processor to perform an artificial intelligence method for classifying customer inquiries. The method includes receiving a plurality of historical customer inquiries and segmenting the historical customer inquiries into a plurality of terms. The method further includes determining a group of frequently used terms among the terms and filtering the historical customer inquiries using the group of frequently used terms. The method also includes determining a plurality of representative topics by classifying the filtered historical customer inquiries, wherein the classification applies a hierarchical clustering method.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a schematic diagram of an exemplary AI system for classifying customer inquiries, according to embodiments of the disclosure.

FIG. 2 illustrates a flow diagram of an exemplary AI method for classifying customer inquiries, according to embodiments of the disclosure.

FIG. 3 illustrates a flowchart of an exemplary AI method for classifying customer inquiries, according to embodiments of the disclosure.

FIG. 4 illustrates a flowchart of an exemplary method for interpreting a new customer inquiry, according to embodiments of the disclosure.

DETAILED DESCRIPTION

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.

For explanation purpose, embodiments may be described in the context of an online hailing platform (e.g., DiDi ^TM online) . An online hailing platform can receive a rideshare service request from a passenger and then route the service request to at least one transportation service provider (e.g., a taxi driver, a private car owner, or the like) . The driver and the passenger may each communicate via an application installed on a terminal device such as a mobile phone. The application may display various information within a display region on the terminal device. For example, on the passenger terminal, the application may display driver and/or vehicle information, trip information, trip cost, and a navigation map, etc. On the driver terminal, the application may display passenger information, trip information, trip cost, and a navigation map, etc.

The passenger and the driver can access customer service through the application installed on their respective terminal devices, to make various inquiries. Users (passenger and/or driver) may also access customer service on the platform provider’s website. For example, a passenger/driver may forget their ride hailing account logins and would like to reset their passwords. Additionally, a driver may have questions regarding payments for providing the transportation service. A passenger may inquire regarding an item lost on a service vehicle.

Although customer service related to an online hailing platform is described in this disclosure, it is contemplated that the disclosed systems and methods can be adapted by a person of ordinary skill in the art to customer services in other contexts, such as banking, e-commerce, social media, insurance, etc.

FIG. 1 illustrates a block diagram of an exemplary AI system 100 for classifying customer inquiries, according to embodiments of the disclosure. Consistent with the present disclosure, AI system 100 may receive Q&Adata 103 from one or more terminal devices 110. Terminal device 110 may be a mobile phone, a wearable device, a desktop computer, a laptop, a PDA, etc. In some embodiments, AI system 100 may be implemented as a part of an online hailing service application. In such a context, terminal device 110 may be a device used by a driver ( “a driver terminal” ) or a passenger ( “a passenger terminal” ) .

AI system 100 may filter Q&A data 103 to obtain customer inquiries relevant to a particular context and then synthesize the customer inquiries to obtain a plurality of topics. As used here, a “topic” is a description of a category of customer inquiries. For example, a topic may be “change password, ” “lost item, ” “apply coupon, ” “missing rewards, ” etc. A topic may be contained in various customer inquiries that use different words and phrases, different sentence structures, and different grammar. To synthesize a topic from customer inquiries, AI system 100 may cluster customer inquiries semantically, and then determine a common topic for each cluster of inquiries. AI system 100 may determine automated answers related to the classified topics. Therefore, when a new customer inquiry is received, AI system 100 may determine the topic that the inquiry is most relevant to, and provide the automated answer in response to the inquiry.

In some embodiments, as shown in FIG. 1, AI system 100 may include a communication interface 102, a processor 104, a memory 106, and a storage 108. In some embodiments, AI system 100 may have different modules in a single device, such as an integrated circuit (IC) chip (e.g., implemented as an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA) ) , or separate devices with dedicated functions. In some embodiments, one or more components of AI system 100 may be located in a cloud, or may be alternatively in a single location (such as inside a vehicle or a mobile device) or distributed locations. Components of AI system 100 may be in an integrated device, or distributed at different locations but communicate with each other through a network (not shown) .

Communication interface 102 may send data to and receive data from components such as terminal devices 110 via communication cables, a Wireless Local Area Network (WLAN) , a Wide Area Network (WAN) , wireless networks such as radio waves, a cellular network, and/or a local or short-range wireless network (e.g., Bluetooth ^TM) , or other communication methods. In some embodiments, communication interface 102 may include an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection. As another example, communication interface 102 may include a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links can also be implemented by communication interface 102. In such an implementation, communication interface 102 can send and receive electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.

Consistent with some embodiments, communication interface 102 may receive data such as historical Q&A data 103 from terminal devices 110. Alternatively, customer inquiries may be transmitted from terminal devices 110 to a central repository first, and communication interface 102 may receive Q&A data 103 from the central repository. Communication interface 102 may further provide the received data to memory 106 and/or storage 108 for storage or to processor 104 for processing.

Processor 104 may include any appropriate type of general-purpose or special-purpose microprocessor, digital signal processor, or microcontroller. Processor 104 may be configured as a separate processor module dedicated to customer service or more particularly customer inquiry processing. Alternatively, processor 104 may be configured as a shared processor module for performing other functions unrelated to customer inquiry processing.

As shown in FIG. 1, processor 104 may include multiple modules, such as a data cleaning unit 120, a pre-processing unit 122, a frequent terms mining unit 124, an embedding training unit 126, a hierarchical clustering unit 128, and the like. These modules (and any corresponding sub-modules or sub-units) can be hardware units (e.g., portions of an integrated circuit) of processor 104 designed for use with other components or software units implemented by processor 104 through executing at least part of a program. The program may be stored on a computer-readable medium, and when executed by processor 104, it may perform one or more functions. Although FIG. 1 shows units 120-128 all within one processor 104, it is contemplated that these units may be distributed among multiple processors located closely or remotely with each other.

Among other things, processor 104 may be configured to synthesize the customer inquiries to obtain a plurality of topics. FIG. 2 illustrates a flow diagram of an exemplary AI method 200 for classifying customer inquiries, according to embodiments of the disclosure. Modules 120-128 and method 200 will be described together.

Q&Adata 103 may be customer service conversation data that include customer inquiries and service representative answers. For example, the conversations may be relevant to a particular context, such as transportation service orders. The customer inquires in Q&Adata 103 may include, e.g., inquiries for number of orders 202, inquiries for missing order information 204, inquiries for order prices 206, inquiries for order compensation 208, and inquiries for order status 210. For example, inquiries for number of orders 202 may include passenger inquiries about the number of orders he has placed, or driver inquiries about the number of orders he completed, during a time period, such as a month. Inquiries for missing order information 204 may include passenger inquiries regarding orders he placed but could not find in the application. Inquiries for order prices 206 may include passenger inquiries about how much the order will cost him or driver inquiries about how much he will earn from the order. Inquiries for order compensation 208 may include driver inquiries about how much compensation he can get from the rideshare service platform due to, e.g., promotions. Inquiries for order status 210 may include passenger inquiries or driver inquiries about the status of a placed order.

In some embodiments, data cleaning unit 120 may be configured to perform customer inquiry recall process 212 of method 200 on Q&Adata 103. Customer inquiry recall process 212 is also known as a “data cleaning” process. Data cleaning unit 120 may sample Q&A data 103 received from terminal devices 110 to obtain sample conversations (e.g., questions and answers) . In some embodiments, the sampling may be performed randomly or according to certain criteria such as related to certain passengers/drivers, certain origins/destinations, certain time periods, certain geographic regions, etc.

Data cleaning unit 120 may further define a plurality of keywords based on the sample conversations, such as “order, ” “transportation service, ” “status, ” etc. The keywords may be commonly used terms within a particular context. Using these defined keywords, data cleaning unit 120 may filter Q&A data 103 to remove customer representative answers, customer inquires unrelated to transportation service orders (e.g., greetings, complaints, personal identifications, etc. ) . In some embodiments, in addition to the keywords, data cleaning unit 120 may apply additional filters such as sentence length of the customer inquiry. As a result, after the keywords filtering on Q&A data 103, the remaining data include only customer inquiries related to transportation orders. This process is known as customer inquiry recall.

The recalled inquiries may be provided to pre-processing unit 122 where a pre-processing process 214 is performed. Pre-processing process 214 may include several sub-processes, such as segmenting a customer inquiry into multiple terms, removing non-informational terms, and replacing synonymous terms in the sentence with a predetermined term. In some embodiments, pre-processing unit 122 may segment each customer inquiry into several terms. For example, “I really have no way to update my login password” can be segmented into terms “I, ” “really, ” “have no way, ” “to update, ” and “my login password. ”

Because customer service Q&A conversations are usually informal, the customer inquiries may contain typos, grammatical mistakes, inaccurate expressions, or non-informational words. After the segmentation, pre-processing unit 122 may identify non-informational term (s) among the segmented terms, and remove them from the customer inquiry. Consistent with the disclosure, a non-informational term is a term that does not carry substantive meaning. For example, in the exemplary customer inquiry above, the term “really” may be removed as a non-informational term.

In some embodiments, pre-processing unit 122 may also identify synonymous terms among the segmented terms and replace them with a predetermined term. In some embodiments, the synonymous terms may be identified as terms synonymous to, e.g., having same or similar semantic meaning, the a predetermined term. In some other embodiments, the synonymous terms may be identified as synonymous to each other. The predetermined term may be identical to one of the identified synonymous terms, or a different term. For example, one customer inquiry may be “I have no way to update my login password, ” and another customer inquiry may be “I can’ t change my login password. ” Within the context, the terms “have no way to” and “can’ t” are synonymous terms. The term “have no way to” may be replaced with “can’ t, ” or both terms may be replaced with a predetermined term “fail to.” Similarly, terms “update” and “change” may also be considered as synonymous terms. “Update” may be replaced with “change. ”

Frequent terms mining unit 124 may be configured to perform frequent term process 216. In some embodiments, frequent terms mining unit 124 may use a frequent-pattern tree (FP-tree) to determine terms frequently used in the customer inquiries. An FP-tree is a compact structure that stores quantitative information about frequent patterns in a database. Frequent terms mining unit 124 may use the pre-processed customer inquiries as input (e.g., a transaction database) to construct an FP-tree. Using data mining algorithms such as FP-growth algorithm, frequent terms mining unit 124 may obtain a set of frequently used terms.

Frequent terms mining unit 124 may further filter the customer inquiries using the obtained frequently used terms, similar to customer inquiry recall process 212. The filtering (or recall) process further refines the data so that the remaining customer inquires are all relevant to one or more particular contexts.

Embedding training unit 126 may be configured to train word embeddings. In Natural Language Processing (NLP) , words are often mapped into vectors that contains numeric values so that machine can understand it. Word embedding is a type of mapping that allows words with similar meaning to have similar representations. Word embedding may be determined using various machine learning methods, such as Word2Vec and FastText. For example, embedding training unit 126 may use training samples, e.g., historical customer inquiries, obtained from database 210 to train a FastText network 230. The trained embeddings may be stored in database 210 or memory 106/storage 108, e.g., as lookup tables. As a result, the embedding of a word can be looked up.

Hierarchical clustering unit 128 may be configured to apply hierarchical clustering to the customer inquiries obtained by frequent terms mining unit 124. After pre-processing process 214 and frequent term mining process 216, the customer inquiries are essentially collections of terms, where each term has its corresponding embedding. Hierarchical clustering unit 128 may look up the embedding trained by embedding training unit 126 of each term. For each customer inquiry (or sentence) , hierarchical clustering unit 128 may determine an overall embedding representation based on the term embeddings. For example, the overall embedding representation may be determined as an average embedding among the term embeddings of the customer inquiry. Accordingly, each customer inquiry may correspond to an embedding representation.

Hierarchical clustering unit 128 may further perform a clustering process 218. In some embodiments, hierarchical clustering unit 128 may input the overall embedding representations of the customer inquiries into a hierarchical cluster for clustering. In some embodiments, the hierarchical cluster may be an Agglomerative Nesting (AGNES) cluster. It is contemplated that other types of clusters may be used. The AGNES-algorithm constructs a hierarchy of clusters. At first, each embedding is treated as a small cluster by itself. Clusters may be merged until the distances among clusters meet a predetermined requirement. At each stage the two nearest clusters may be combined to form one larger cluster. Hierarchical clustering unit 128 may determine a topic for each cluster 240 that remains at the end of the iterations.

Memory 106 and storage 108 may include any appropriate type of mass storage provided to store any type of information that processor 104 may need to operate. Memory 106 and storage 108 may be a volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other type of storage device or tangible (i.e., non-transitory) computer-readable medium including, but not limited to, a ROM, a flash memory, a dynamic RAM, and a static RAM. Memory 106 and/or storage 108 may be configured to store one or more computer programs that may be executed by processor 104 to perform vehicle data estimation functions disclosed herein. For example, memory 106 and/or storage 108 may be configured to store program (s) that may be executed by processor 104 to estimate vehicle data in real-time using a model that is adaptively updated using computed vehicle data.

Memory 106 and/or storage 108 may be further configured to store information and data used by processor 104. For instance, memory 106 and/or storage 108 may be configured to store the various types of data (e.g., Q&A data 103, etc. ) . Memory 106 and/or storage 108 may also store intermediate data such as the customer inquiries recalled by data cleaning unit 120, pre-processed data generated by pre-processing unit 122, filtered data by frequent terms mining unit 124, embeddings trained by embedding training unit 126, and clustering results including the topics obtained by hierarchical clustering unit 128, etc. In some embodiments, non-informational terms and synonymous terms may be pre-recorded in tables and saved in memory 106 or storage 108. For example, the table may be a public table that applies to multiple contexts or a private table that only applies to a specific context. Memory 106 and/or storage 108 may additionally store various learning models including their model parameters. The various types of data may be stored permanently, removed periodically, or disregarded immediately after each frame of data is processed.

FIG. 3 illustrates a flowchart of an exemplary AI method 300 for classifying customer inquiries, according to embodiments of the disclosure. In some embodiments, method 300 may be implemented by AI system 100 that includes, among other things, processor 104. However, method 300 is not limited to that exemplary embodiment. Method 300 may include steps S302-S322 as described below. It is to be appreciated that some of the steps may be optional to perform the disclosure provided herein. Further, some of the steps may be performed simultaneously, or in a different order than shown in FIG. 3.

In step S302, AI system 100 may receive Q&A data 103 including historical customer inquiries, e.g., via communication interface 102. For description purpose only, historical customer inquiries in the context of a user (e.g., a passenger or a driver) lost his password will be used to describe method 300. Customer inquiries related to this exemplary context can include several topics, e.g., the user lost his “withdraw/payment password” or the user lost his “login password. ” A “withdraw/payment password” is used when the user is making a payment or withdrawing cash from a financial account, e.g., DiDi ^TM wallet. On the other hand, a “login password” is used to access an account, e.g., DiDi ^TM application. Therefore, although they fall under the same general topic of lost password, they should be classified as different topics and responded with different solutions.

Steps S304-S308 may be part of customer inquiry recall process 212 performed by data cleaning unit 120 of processor 104. In step S304, data cleaning unit 120 may select sample customer inquiries from the received Q&A data. In some embodiments, samples inquiries may be selected randomly. In step S306, data cleaning unit 120 may determine one or more keywords from the sample customer inquiries. For example, such keywords may be “password, ” “PIN, ” “login, ” “withdraw, ” “payment, ” “account, ” “forget, ” “change, ” and “update, ” etc.

In step S308, data cleaning unit 120 may recall a subset of historical customer inquiries from the received historical customer inquiries using the keywords. In some embodiments, data cleaning unit 120 may use the keywords as filter to obtain the subset of inquiries. For example, the following customer inquiries may be among the subset recalled in step S308:

I forgot my PIN for cash withdraw.

It always tells me that my login password is incorrect.

I don’ t see the “I forget my password” link.

But the problem is there is no guide for finding lost password.

I can’ t set my payment PIN.

How do I change my previous password if I forget it?

I followed the “I forget my password” link but it says the verification code is incorrect.

My cash withdraw password is lost.

Steps S310-S314 may be part of pre-processing process 214 performed by pre-processing unit 122. In step S310, pre-processing unit 122 may segment each historical customer inquiry in the subset into multiple terms. Various existing word segmentation methods may be used. Inquiries in word-based languages, such as English, Spanish, French, German, etc., may be segmented using different methods from inquiries in character-based languages, such as Chinese, Japanese, Korean, etc. For example, “I forgot my PIN for cash withdraw” can be segmented as [I, forgot, my PIN, for, cash withdraw] , and “My cash withdraw password is lost” may be segmented as [my, cash withdraw, password, is, lost] .

In step S312, pre-processing unit 122 may identify and remove non-informational terms. In some embodiments, non-information terms may be defined by pubic and/or private non-informational term tables stored in memory 106/storage 108. Pre-processing unit 122 may look for any non-informational term in the tables in a customer inquiry and remove it if detected. For example, words/terms such as “for, ” “is, ” “but the problem is, ” “there is, ” “it says, ” etc. may be removed as non-informational terms.

In step S314, pre-processing unit 122 may identify synonymous terms among the customer inquiries and replace them with a predetermined term. The predetermined term may be among the synonymous terms or a separate term. In some embodiments, synonymous terms may be defined by pubic and/or private synonymous term tables stored in memory 106/storage 108. Pre-processing unit 122 may look up the synonymous terms from the tables. For example, in the customer inquired recalled above, “password” and “PIN” may be synonymous terms, and “lost” and “forgot” may be synonymous terms.

Steps S316-S318 may be part of frequent term mining process 216 performed by frequent terms mining unit 124. In step S316, frequent terms mining unit may determine one or more terms frequently used among the customer inquiries pre-processed by pre-processing unit 122. In some embodiments, the frequently-used terms may be obtained using an FP-tree. In one example, the frequently-used terms may include {password, can’ t, change, forget/forgot, login} . In step S318, frequent terms mining unit 124 may filter the originally received historical customer inquiries using the frequently-used terms determined in step S316. In some embodiments, frequent terms mining unit 124 may further combine or otherwise consolidate the customer inquiries that include the frequently-used terms.

Steps S320-S322 may be part of clustering process 218 performed by hierarchical clustering unit 124. In step S320, hierarchical clustering unit 128 may compute embedding representations of the filtered historical customer inquiries. In some embodiments, hierarchical clustering unit 128 may retrieve word embeddings trained by embedding training unit 126. For example, the word embeddings may be trained using a neural network, e.g., a FastText network 230, and saved in a look-up table stored in memory 106/storage 108. In some embodiments, hierarchical clustering unit 128 may retrieve the word embeddings by looking up the look-up table by terms. In some embodiments, hierarchical clustering unit 128 may determine an overall embedding representation for each customer inquiry based on the word embeddings for the terms in the customer inquiry. For example, the overall embedding maybe an average of the word embeddings.

In step S320, hierarchical clustering unit 124 may apply a hierarchical cluster to the embedding representations of respective customer inquiries determined in step S318. In some embodiments, an AGENS hierarchical cluster may be applied. The AGNES clustering method uses a hierarchy of clusters. For example, in the beginning, hierarchical clustering unit 124 may treat each embedding as a small cluster by itself, and then iteratively merge the smaller clusters into larger clusters until the distances among clusters meet a predetermined requirement. Hierarchical clustering unit 128 may therefore determine clusters 240 each corresponding to a topic.

In some embodiments, hierarchical clustering unit 124 may determine a representative inquiry for each topic from the customer inquiries belonging to the corresponding cluster. The remaining customer inquiries in that cluster become synonymous inquires. In one example, Table 1 shows the results of clustering, the representative inquiries, and synonymous inquiries.

Table 1

FIG. 4 illustrates a flowchart of an exemplary method 400 for interpreting a new customer inquiry, according to embodiments of the disclosure. Method 400 classifies the new customer inquiry into a topic and provides an automated answer to the customer based on the topic. Method 400 may be implemented by processor 104 or a separate processor not shown in FIG. 1. Method 400 may include steps S402-S408 as described below. It is to be appreciated that some of the steps may be optional to perform the disclosure provided herein. Further, some of the steps may be performed simultaneously, or in a different order than shown in FIG. 4.

In step S402, AI system 100 may receive a new customer inquiry. For example, the new customer inquiry may be made on terminal device 110 and received by communication interface 102 of AI system 100. For instance, the new customer inquiry may be “I want to update my login password but I don’ t see a link for doing that. ”

In step S404, AI system 100 may segment the new customer inquiry into multiple terms. In some embodiments, same or similar segmentation techniques may be used as that of step S310. For example, the exemplary customer inquiry above can be segmented as [I, want to, update, my login password, but, I, don’ t, see, a link, for doing that] . In some embodiments, AI system 100 may additionally perform the pre-processing steps, such as to remove non-informational terms and replace synonymous terms, to the segmented inquiry, such as in steps S312-314. For example, the segmented inquiry may become [change, login password, no link] after those additional pre-processing steps.

In step S406, AI system 100 may determine a topic for the new customer inquiry among the representative topics, based on the segmented terms. In some embodiments, the classification may use a neural network, such as one based on calculation of embeddings. In step S408, AI system 100 may provide information automatically to the user in response to the new customer inquiry based on the topic. In some embodiments, various solutions, instructions, or guidance for the representative topics may be pre-determined and stored in memory 106/storage 108. Accordingly, AI system 100 may retrieve those solutions, instructions, or guidance based on the topic the new inquiry belonging to, and provide as answers to the user. In some embodiments, AI system 100 may form an answer based on the topic on the fly, and provide it to the user. The information may be provided to the user on terminal device 110.

Another aspect of the disclosure is directed to a non-transitory computer-readable medium storing instructions which, when executed, cause one or more processors to perform the methods, as discussed above. The computer-readable medium may include volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other types of computer-readable medium or computer-readable storage devices. For example, the computer-readable medium may be the storage device or the memory module having the computer instructions stored thereon, as disclosed. In some embodiments, the computer-readable medium may be a disc or a flash drive having the computer instructions stored thereon.

It will be apparent to those skilled in the art that various modifications and variations can be made to the disclosed system and related methods. Other embodiments will be apparent to those skilled in the art from consideration of the specification and practice of the disclosed system and related methods.

It is intended that the specification and examples be considered as exemplary only, with a true scope being indicated by the following claims and their equivalents.

Claims

An artificial intelligence system for classifying customer inquiries, comprising:

a communication interface configured to receive data including a plurality of historical customer inquiries;

a processor, configured to:

segment the historical customer inquiries into a plurality of terms;

determine a group of frequently used terms among the terms;

filter the historical customer inquiries using the group of frequently used terms; and

determine a plurality of representative topics by classifying the filtered historical customer inquiries, wherein the classification applies a hierarchical clustering method; and

a storage configured to store the frequently used terms and the representative topics.
The artificial intelligence system of claim 1, wherein the processor is further configured to:

determine at least one keyword based on the historical customer inquiries; and

identify a subset of the historical customer inquiries using the at least one keyword,

wherein the historical customer inquiries being segmented are within the identified subset.
The artificial intelligence system of claim 1, wherein the processor is further configured to:

remove non-informational terms from the plurality of terms before determining the group of frequently used terms.
The artificial intelligence system of claim 1, wherein the processor is further configured to:

replace synonymous terms among the plurality of terms with a predetermined term before determining the group of frequently used terms.
The artificial intelligence system of claim 1, wherein the group of frequently used terms are determined using an FP-tree model.
The artificial intelligence system of claim 1, wherein the processor is further configured to:

determine an embedding vector for each term in each filtered historical customer inquiry; and

determine an embedding representation for the filtered historical customer inquiry based on the embedding vectors.
The artificial intelligence system of claim 6, wherein the processor is further configured to apply the hierarchical clustering method to the embedding presentations corresponding to the filtered historical customer inquiries.
The artificial intelligence system of claim 1, wherein the hierarchical clustering method is an AGNES method.
The artificial intelligence system of claim 1, wherein the communication interface configured to receive a new customer inquiry, wherein the processor is further configured to:

determine a topic for the new customer inquiry among the representative topics; and

provide information in response to the new customer inquiry based on the topic.
An artificial intelligence method for classifying customer inquiries, comprising:

receiving data including a plurality of historical customer inquiries;

segmenting, by a processor, the historical customer inquiries into a plurality of terms;

determining, by the processor, a group of frequently used terms among the terms;

filtering, by the processor, the historical customer inquiries using the group of frequently used terms;

determining, by the processor, a plurality of representative topics by classifying the filtered historical customer inquiries, wherein the classification applies a hierarchical clustering method; and

storing the frequently used terms and the representative topics in a storage.
The artificial intelligence method of claim 10, further comprising:

determining at least one keyword based on the historical customer inquiries; and

identifying a subset of the historical customer inquiries using the at least one keyword,

wherein the historical customer inquiries being segmented are within the identified subset.
The artificial intelligence method of claim 10, further comprising:

removing non-informational terms from the plurality of terms before determining the group of frequently used terms.
The artificial intelligence method of claim 10, further comprising:

replacing synonymous terms among the plurality of terms with a predetermined term before determining the group of frequently used terms.
The artificial intelligence method of claim 10, wherein determining the group of frequently used terms uses an FP-tree model.
The artificial intelligence method of claim 10, further comprising:

determining an embedding vector for each term in each filtered historical customer inquiry; and

determining an embedding representation for the filtered historical customer inquiry based on the embedding vectors.
The artificial intelligence method of claim 15, further comprising applying the hierarchical clustering method to the embedding presentations corresponding to the filtered historical customer inquiries.
The artificial intelligence method of claim 10, wherein the hierarchical clustering method is an AGNES method.
The artificial intelligence method of claim 10, further comprising:

receiving a new customer inquiry;

determining a topic for the new customer inquiry among the representative topics; and

providing information in response to the new customer inquiry based on the topic.
A non-transitory computer-readable medium having a computer program stored thereon, wherein the computer program, when executed by a processor, performs an artificial intelligence method for classifying customer inquiries, comprising:

receiving data including a plurality of historical customer inquiries;

segmenting the historical customer inquiries into a plurality of terms;

determining a group of frequently used terms among the terms;

filtering the historical customer inquiries using the group of frequently used terms; and

determining a plurality of representative topics by classifying the filtered historical customer inquiries, wherein the classification applies a hierarchical clustering method.
The non-transitory computer-readable medium of claim 19, wherein the artificial intelligence method further comprises, before determining the group of frequently used terms,

removing non-informational terms from the plurality of terms; and

replacing synonymous terms among the plurality of terms with a predetermined term before determining the group of frequently used terms.