WO2020010930A1

WO2020010930A1 - Method for detecting ambiguity of customer service robot knowledge base, storage medium, and computer device

Info

Publication number: WO2020010930A1
Application number: PCT/CN2019/088473
Authority: WO
Inventors: 欧泽彬; 徐易楠; 潘晟锋; 刘云峰; 吴悦; 胡晓; 汶林丁
Original assignee: 深圳追一科技有限公司
Priority date: 2018-07-09
Filing date: 2019-05-27
Publication date: 2020-01-16

Abstract

The present application relates to a method for detecting ambiguity of a customer service robot knowledge base, wherein the method comprises: constructing a knowledge base, wherein the knowledge base is divided according to FAQ, and each FAQ is provided with at least one similar question, and each FAQ is a category; dividing the knowledge base into a test set and a training set for a deep learning model; training the deep learning model on the training set, and using the learned deep learning model to carry out ambiguity detection; updating the knowledge base according to an ambiguity detection result; and repeating the above-mentioned steps until the learning effect does not improve any more. According to the present application, updating the knowledge base according to the ambiguity detection result, and repeating the training steps until the learning effect reaches an expected standard can assist manually finding and correcting ambiguity in the knowledge base, and obtaining a knowledge base with the ambiguity eliminated, and data is extracted from the knowledge base with ambiguity eliminated as the training set and the test set of the deep learning model, so as to further improve the learning effect of the deep learning model.

Description

Customer service robot knowledge base ambiguity detection method, storage medium and computer equipment

Cross-reference to related applications

This application claims the priority of a Chinese patent application filed on July 9, 2018 with the Chinese Patent Office under the application number 201810749561.X, and the application name is "A Method for Automatically Building Customer Service Knowledge Base Based on Manual Customer Service Logs", and The Chinese Patent Application was filed on July 19, 2018, with application number 201810801678.8, and applied for the priority of a Chinese patent application entitled "Customer Service Robot Knowledge Base Ambiguity Detection Method", the entire contents of which are incorporated herein by reference.

Technical field

The present application relates to the field of artificial intelligence technology, and in particular, to a method, storage medium, and computer device for ambiguous detection of a knowledge base of a customer service robot.

Background technique

With the increase of Internet users, the service pressure of the company's customer service department is increasing. Since most users encounter recurring questions, these repeated questions can often be answered with a fixed template. In order to reduce the labor cost of the customer service center, a robot customer service can be introduced to determine the type of user's problem using a program. If the problem is a Frequently Asked Questions (FAQ), a standard answer will be given directly, otherwise the manual will be transferred Services for special interventions.

In related technologies, customer service robots use machine learning techniques to identify user intents and translate intent recognition into question classification questions. Each FAQ corresponds to a category, and each category has more than one similar question. All FAQs and corresponding similar questions form the robot's knowledge base.

The effect of a machine learning model often depends on the training data selected from the knowledge base, especially the labeling accuracy of the training data will have a greater impact on the model effect. However, due to the limitation of time and artificial energy, there are often a lot of ambiguities in the knowledge base, such as the question corresponding to the wrong category, the category and category semantic overlap, etc. These ambiguities will cause the model to learn the wrong knowledge and thus the accuracy of the model. It has a negative impact, and because machine learning training requires a large amount of labeled data, it is impossible to rely solely on humans to find and handle these ambiguities. Therefore, how to perform ambiguity detection on the knowledge base and assist in manually eliminating the ambiguity of the knowledge base has become an urgent problem for related technical staff The problem.

Summary of the invention

According to various embodiments provided in the present application, a method for detecting ambiguity in a knowledge base of a customer service robot, a storage medium, and a computer device are provided.

An ambiguous detection method for customer service robot knowledge base, including:

Construct a knowledge base, which is divided into FAQs, each FAQ is provided with at least one similar question, and each FAQ is a category;

Dividing the knowledge base into a test set and a training set for a deep learning model;

Training a deep learning model on the training set, and performing ambiguous detection using the learned deep learning model;

Updating the knowledge base based on the ambiguity detection results; and

Repeat the above steps until the learning effect is no longer improved, and a disambiguation knowledge base is obtained.

A method for automatically constructing a customer service knowledge base based on a manual customer service log includes:

Preprocessing the manual customer service log data;

Establish an expression model based on the processed manual customer service log data;

Obtaining question expression information of a user question to be sorted through the expression model;

Aggregate the question expression information to obtain a user question cluster; and

The user question sentence clusters are sorted to obtain a knowledge base.

One or more non-transitory computer-readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:

Updating the knowledge base based on the ambiguity detection results; and

One or more non-transitory computer-readable storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the following steps:

Preprocessing the manual customer service log data;

Performing clustering processing on the question expression information;

The similar user questions are aggregated into the same category, and classified into a knowledge base.

A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the one or more processors, the one or more processors are caused. Each processor performs the following steps:

Updating the knowledge base based on the ambiguity detection results; and

Preprocessing the manual customer service log data;

Performing clustering processing on the question expression information;

Aggregate similar user questions into the same category and sort them to get a knowledge base

Details of one or more embodiments of the present application are set forth in the accompanying drawings and description below. Other features, objects, and advantages of the application will become apparent from the description, the drawings, and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to explain the technical solutions in the embodiments of the present application or the prior art more clearly, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are merely These are some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without paying creative work.

FIG. 1 is an application environment diagram of an ambiguous detection method of a customer service robot knowledge base in an embodiment; FIG.

FIG. 2 is a schematic diagram of an internal structure of a computer device in an embodiment; FIG.

FIG. 3 is a schematic flowchart of an ambiguous detection method of a customer service robot knowledge base in an embodiment; FIG.

4 is a schematic flowchart of a method for detecting a ambiguity in a knowledge base of a customer service robot in another embodiment;

FIG. 5 is a schematic flowchart of constructing a customer service knowledge base in an embodiment; FIG.

FIG. 6 is a schematic diagram of the working principle of constructing a customer service knowledge base in an embodiment.

detailed description

In order to make the purpose, technical solution, and advantages of the present application clearer, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the application, and are not used to limit the application.

FIG. 1 is an application environment diagram of an ambiguous detection method for a knowledge base of a customer service robot in an embodiment. Referring to FIG. 1, the application environment includes a terminal 110 and a server 120, and the terminal 110 can communicate with the server 120 through a network. The server 120 constructs a knowledge base, which is divided into FAQs. Each FAQ is provided with at least one similar question, and each FAQ is a category. The server 120 divides the knowledge base into a test set of a deep learning model. And training set; the server 120 trains a deep learning model on the training set, and uses the learned deep learning model to perform ambiguity detection; updates the knowledge base according to the ambiguity detection result; and the server 120 repeats the above steps until the learning effect is no longer Improve and get disambiguated knowledge base. The server 120 may obtain, from the terminal 110, data for constructing a knowledge base input by a user. The terminal 110 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server 120 may be implemented by an independent server or a server cluster composed of multiple servers.

FIG. 2 is a schematic diagram of an internal structure of a computer device in an embodiment. The computer device may specifically be a server 120 or a user terminal 110 as shown in FIG. 1. Referring to FIG. 2, the computer device includes a processor, a memory, and a network interface connected through a system bus. The memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and computer-readable instructions. When the computer-readable instructions are executed, the processor can cause the processor to execute a method for detecting ambiguity in a knowledge base of a customer service robot. The internal memory of the computer device may store computer-readable instructions, and when the computer-readable instructions are executed by the processor, the processor may cause the processor to execute a method for detecting ambiguity in a customer service robot knowledge base. The network interface of the computer device is used to communicate through the network, for example, to obtain data to build data of a knowledge base.

FIG. 3 is a schematic flowchart of a method for detecting a ambiguity in a knowledge base of a customer service robot according to an embodiment of the present application, and this embodiment mainly uses the method applied to the computer device 120 in FIG.

As shown in FIG. 3, the method in this embodiment includes:

S11: Construct a knowledge base. The knowledge base is divided into FAQs. Each FAQ has a similar number of similar questions. Each FAQ is a category.

The knowledge base is an industry-oriented application developed on the basis of large-scale knowledge processing. It is applicable to large-scale knowledge processing, natural language understanding, knowledge management, automatic question answering systems, reasoning and other technical industries. Intelligent customer service is not only for enterprises. Provides fine-grained knowledge management technology, and also establishes a fast and effective technical means based on natural language for communication between enterprises and mass users. Take the customer service robot knowledge base of an e-commerce enterprise as an example, the knowledge base contains multiple FAQs, such as "return process" and "refund process". Taking the "return process" as an example, the FAQ may contain the following similar questions: "How did I return the goods I bought yesterday?", "I want to return the goods, what should I do?"

S12: Divide the knowledge base into a test set and a training set of a deep learning model.

N FAQs from which the ambiguity needs to be detected are selected from the knowledge base as N categories. For each FAQ, a preset number of similar questions are randomly selected as test data of the category, and the remaining similar questions are used as training data of the category. The test data of all categories constitute the test set, and the training data of all categories constitute the training set.

For example, the knowledge base contains 10 FAQs, each of which contains 20 similar questions, and a preset amount is randomly selected from each category of the knowledge base, for example, 3 similar questions are used as a test set for a deep learning model , The test set contains 30 similar questions, and the remaining 170 similar questions are included in the training set of the deep learning model.

It should be noted that the number of categories included in the knowledge base involved in this application and the number of similar questions in each category are not limited to the examples involved in the embodiments, and are not repeated here.

S13: Training a deep learning model on the training set, and performing ambiguous detection using the learned deep learning model.

The deep learning model includes a feature extractor and a shallow classifier.

The ambiguity detection includes: category ambiguity detection, labeling error detection, and labeling ambiguity detection;

The ambiguities include:

Category ambiguity: the meanings of the two categories are very similar, for example, category 1 is "order problem", category 2 is "product change cancellation problem", the semantics of category 1 and semantics of category 2 overlap, because category 1 basically covers Category 2

Mark ambiguity: the question can be marked as multiple categories at the same time, for example: category 1 is "return problem of product", category 2 is "price problem of product", if the question is "this thing is too expensive, I want to return ", The sentence is ambiguous because the question contains both meanings of the two categories;

Incorrect labeling: Questions correspond to the wrong category, for example, Category 1 is "Product return problem" and Category 2 is "Product price problem". If the question is "I do n’t want anymore", it is labeled as Category 2. An annotation error will occur.

The ambiguity detection is for a test set or training set.

The performing ambiguity detection by using the learned deep learning model includes:

Use feature extractors in deep learning models to detect ambiguity;

Use shallow classifiers in deep learning models to detect ambiguity;

The detecting ambiguity by using a feature extractor in a deep learning model includes:

Using a feature extractor in the deep learning model to convert similar questions in a data set into feature vectors, where the data set includes a training set or / and a test set;

Combining the feature vector corresponding to the question into a question feature vector pair (x, y), where the question corresponding to the feature vector x and the question corresponding to the feature vector y are from different categories;

Calculate the vector similarity cos (x, y) of the feature vector pairs of each group, where

Sort all question feature vector pairs according to the similarity of the vector from high to low, select the question feature vector pair ranked highest in the vector similarity, and rank the question feature vector ranked highest according to the vector similarity Ambiguous judgment.

The judging whether there is an ambiguity in the top question vector based on the vector similarity includes:

Determine whether there are labeling ambiguities or labeling errors: extract a first preset number of, for example, 30 question feature vector pairs with the highest similarity ranking, and manually check whether the corresponding query pair has labeling ambiguity and labeling errors;

Determine whether there is category ambiguity: For the first preset number of question feature vector pairs, count the number of repeated occurrences of the corresponding category pair, sort them from high to low, and take the second preset number, for example, 20 categories Yes, manually check for category ambiguity.

The method for detecting ambiguity by using a shallow classifier of a deep learning model includes:

The classification results of the deep learning model are counted and a confusion matrix is formed. Each row i of the confusion matrix corresponds to the labeled category, each column j corresponds to the category predicted by the deep learning model, and the element x _ij is labeled as category i, and the model Predict the number of questions in category j, the element x _ji is labeled as category j, and the model predicts the number of questions in category i;

Calculate the number of samples labeled as category i in the data set, where the number of samples in category i is

Where k is any category;

Calculate the number of samples labeled as category j in the data set, where the number of samples in category j is

Where k is any category;

Calculate the proportion P _ij of the samples labeled as category i predicted by the deep learning model as category j in the data set, and the ratio P _ji of the samples predicted as category j predicted to category i by the deep learning model, the formulas for calculating P _ij and P _ji They are:

The category i and the category j belong to different categories, and the data set includes a training set or / and a test set;

Calculate the degree of confusion of category pairs (category i, category j), which is the harmonic mean S _ij of P _ij and P _ji , where

It is determined whether there is an ambiguity between the category i and the category j according to the degree of confusion.

The judging whether there is any ambiguity between category i and category j according to the degree of confusion includes:

Sort the calculated confusion;

A third preset number is extracted, for example, 5 top-ranked category pairs with confusion degree, and artificial detection of category ambiguity is performed.

The method for detecting ambiguity by using a deep classifier of a deep learning model further includes: finding data in which the actual categories marked in the data set and the categories predicted by the deep learning model are inconsistent, and manually checking whether there are labeling errors. The data set includes a training set or / And test set.

S14: Updating the knowledge base according to the ambiguity detection result, including:

Manually rewrite, re-annotate the detected ambiguous questions, and delete the original annotations;

Reassemble and assign similar questions to the detected ambiguity categories, and delete the original ambiguity categories.

S15: Repeat the above steps until the learning effect is no longer improved, and a disambiguation knowledge base is obtained.

The learning effect is the agreement rate between the model prediction result and the actual category of the question mark in the test set. The agreement rate is, for example, the prediction accuracy rate, that is, the number of questions with consistent prediction results divided by the total number of questions. The learning effect is no longer improved, for example, the prediction accuracy is improved by less than 0.5%.

When the model learning effect no longer improves, it means that the degradation of the model performance caused by the ambiguity of the knowledge base has been eliminated, and the model can be used to train the model and deploy it to a production environment for use.

In this embodiment, the knowledge base is updated according to the ambiguity detection result, and the training steps are repeated until the learning effect reaches the expected standard. It can assist in manually discovering and correcting the ambiguity of the knowledge base, obtaining a disambiguating knowledge base, and extracting from the disambiguating knowledge base. The data is used as the training set and test set of the deep learning model to further improve the learning effect of the deep learning model.

FIG. 4 is a schematic flowchart of an ambiguous detection method for a knowledge base of a customer service robot in another embodiment of the present application.

As shown in FIG. 4, the training a deep learning model on a training set includes:

The concept of deep learning originates from the research of artificial neural networks, multi-layer perceptrons with multiple hidden layers. Deep learning combines low-level features to form more abstract high-level representation attribute categories or features to discover distributed feature representations of data. The deep learning model includes a feature extractor and a shallow classifier.

S21: input the question in the training set as an input part into the deep learning model;

S22: using a feature extractor in the deep learning model to convert a question in an input part into a feature vector;

The feature extractor is, for example, a recurrent neural network. This model sequentially reads each word in the question and outputs a feature vector of a fixed dimension. It should be noted that the feature extractor is not limited to the exemplified recurrent neural network, and any method that can transform a question into a feature vector of a fixed dimension can be used as a feature extractor.

S23: Use a shallow classifier in the deep learning model to calculate a prediction result according to the feature vector, where the prediction result is a category corresponding to a question in the input part;

The shallow classifier is, for example, a linear classifier. The classifier reads a feature vector of a fixed dimension and calculates the linear combination of the vector elements to obtain the score of each category, and takes the category with the highest score as the prediction result. It should be noted that the shallow classifier is not limited to the linear classifier exemplified. Any method that can convert feature vectors of a fixed dimension into a score for each category can be used as the shallow classifier.

S24: Optimize the training model by using an optimizer, and minimize the average difference between the actual categories marked in the training set and the prediction results of the deep learning model;

The average difference is, for example, a loss function. The loss function is, for example, cross-entropy.

The optimizer is, for example, a gradient descent method. The gradient descent is a kind of iterative method. When solving model parameters of a machine learning algorithm, that is, an unconstrained optimization problem, gradient descent is one of the most commonly used methods. When solving the minimum value of the loss function, it can be solved step by step through the gradient descent method to obtain the minimized loss function and the corresponding model parameter values.

S25: Use the test set to evaluate the trained model, calculate the consensus rate between the model prediction result and the actual category marked in the test set, and use it as an evaluation of the model learning effect. The consensus rate is, for example, the prediction accuracy rate, that is, the prediction result. The number of consistent questions is divided by the total number of questions.

In this embodiment, a deep learning model is used to train the FAQ in the training set, an optimizer is used to continuously optimize the model during the training process, iteratively improves the learning effect of the deep learning model, and continuously improves the accuracy of the ambiguity detection.

In one embodiment, a method for ambiguous detection of a customer service robot knowledge base may include the following steps:

S102. The server obtains the manual customer service log data from the terminal, and preprocesses the manual customer service log data.

S104. Establish an expression model according to the processed manual customer service log data.

S106. Obtain question expression information of the user question to be sorted through the expression model.

S108: Aggregate the question expression information to obtain a user question cluster.

S110. Sort the user question clusters into a FAQ to obtain a knowledge base.

S112. The obtained knowledge base is divided into a test set and a training set of the deep learning model.

S114: Train a deep learning model on the training set, and use the learned deep learning model to perform ambiguity detection.

S116. Update the knowledge base according to the ambiguity detection result.

S118. Repeat the above steps S112 to S116 until the learning effect is no longer improved, and a disambiguation knowledge base is obtained.

In this embodiment, the expression model is trained in advance based on the artificial customer log data, and the expression model is used to obtain the question expression information of the user question to be arranged, and the question expression information is aggregated to process the user question sentence. The clusters are organized to construct FAQs to obtain a knowledge base and reduce the investment of human resources. At the same time, by extracting a large amount of manual logs, it reduces the requirements for the service level of customer service personnel in the process of building a knowledge base and reduces the construction. Difficulty, then the deep learning model trained through the training set in the knowledge base, and then the ambiguity detection is performed on the knowledge base, the knowledge base is updated, and finally the ambiguity-free knowledge base is obtained.

FIG. 5 is a schematic flowchart of a method for automatically constructing a customer service knowledge base based on a manual customer service log provided in an embodiment of the present application.

As shown in FIG. 5, the method in this embodiment may include:

S1: Preprocessing the manual customer service log data;

Further, the manual customer service log data includes:

A question from the user and the corresponding customer service response; and,

All questions and corresponding customer service responses during the user's entire conversation.

The preprocessing of the manual customer service log data includes:

Use machine learning algorithms or natural language processing algorithms to process artificial customer service log data to remove user questions and responses that are not related to business content.

S2: Establish an expression model based on the processed customer service log data;

Further, the expression model is obtained by using a training algorithm to train the processed artificial customer service log data.

The training algorithm includes: a machine learning algorithm (such as a machine translation algorithm) or a search algorithm.

S3: obtaining question expression information of a user question to be collated through the expression model;

Specifically, the question expression information includes: a vector representation of a sentence and / or a text feature representation.

S4: Perform aggregation processing on the question expression information to obtain user question clusters;

Further, performing aggregation processing on the question expression information includes:

The clustering algorithm or synonym integration is used to process the question expression information.

Specifically, the clustering algorithm is a K-Means clustering algorithm and related improved algorithms.

S5: Sort the user question clusters to obtain a knowledge base.

In the above process, the expression model is mainly to obtain a mapping relationship from user questions to customer service. Through multiple groups of user questions in the artificial customer service log data and the corresponding expression model described in the customer service training department, the training algorithm Machine learning algorithms, search solutions, or other algorithms can be used.

After the training of the expression model is completed, we can input the user questions that need to organize the knowledge base into the expression model to obtain the expression information of the batch of user questions. The form of the question expression information can be vectors, text features, but can be It is understood that the form of question expression information is not limited to vector or text features. Then, the question expression information is clustered, and similar user questions are aggregated into one category, and then the clusters are sorted to construct a FAQ to obtain a knowledge base.

The word vector mentioned above refers to the phrase in the user's question, and the text feature refers to the part of speech of the phrase in the user's question, the subject-verb form of the verb, object, and so on. For example, when the user asks "How to apply for a refund?", The word vectors can be divided into: how, apply, and refund. Their text features are: how (pronoun), application (verb), refund (verb), and "How" and "application" are adverbial structures, and "application" and "refund" are verb-object structures. For the above word vectors and text features, synonym integration or clustering algorithms can be used to obtain similar user questions of the same type. That is, the user question clusters are obtained, and finally, the user question clusters are recommended to the artificial customer service to sort and organize the FAQs to obtain the knowledge base. FIG. 6 is a working principle diagram of a method for constructing a customer service knowledge base in an embodiment of the present application.

It can be seen from FIG. 6 that the expression model is trained through manual customer service log data (including user questions and customer service answers), and the user question to be collated is input into the trained expression model to obtain the user question expression information. In the clustering processing of the question expression information, similar user questions are aggregated into the same category to obtain user question clusters; finally, the user question clusters are recommended to the artificial customer service for classification and construction FAQ to get the knowledge base.

To facilitate understanding, this embodiment uses a machine translation algorithm as a training algorithm for an expression model and a K-Means clustering algorithm as a clustering algorithm for description, but the implementation of this solution is not limited to this form. In the expression model, the input of the model is manual customer service log data (for example, a question from the user and the corresponding customer service response or all the questions and corresponding customer service responses during the user's entire conversation), a user's question and The corresponding customer service response is explained, and the question is analyzed to obtain the part-of-speech information and named entity information of the user question. In this solution, the word vector is used as the expression form of the user's question. It can also be expressed using text features. The processing process can be:

First, clean the manual customer service log data, remove user questions and answers that are not close to the business relationship (such as hello and thank you, you need to filter based on the business situation), the specific method can use machine learning algorithms (such as language models Scoring) or natural language processing algorithms (such as template matching, syntax analysis, etc.);

Then, use the cleaned artificial customer service log data to train the expression model, the main purpose is to learn a mapping relationship from user questions to customer service answers;

Then, obtaining the question expression information of the user question to be sorted through the expression model, such as the word vector of the question;

Then perform clustering algorithm processing or synonym replacement integration on the question expression information to obtain user question clusters;

Finally, the user question clusters are recommended to customer service staff for FAQ sorting to obtain the corresponding FAQ knowledge base.

It can be understood that the training method of the expression model described in this embodiment is not limited to its form, and deep learning algorithms, machine learning models, or search technology solutions can be used, and its input form is not limited to user questions and manual customer service responses. Etc. The input form of the customer service robot can be determined according to the actual situation. For example, the business may focus on emotions and construct some inputs based on whether there are emotional words in the user's question and which emotional words.

The method described in this embodiment is not limited to the granularity of the knowledge base collation. The granularity of the knowledge base can be determined according to the actual situation of the business, that is, a coarse-grained or fine-grained FAQ division method is designed according to the specific needs of the business. For example, if there is no customer service knowledge base for a certain service, the division method is mainly reflected in the number of categories of clustering. For example, a certain service is a bank's card-making service. The questions asked by users mainly include debit cards, credit cards, etc. Coarse-grained builds can be built into two categories. For credit cards, the credit card classification can also contain a lot of content, such as card opening, annual fees, etc. If you want to divide more, you can increase the number of clustering categories.

Usually customer service is a major way for companies to get user feedback and resolve user product questions. The traditional customer service business is mainly handled by professional manual customer service personnel, so that the company's investment in customer service will increase rapidly with the increase in customer service business volume, which can not be ignored.

To solve this problem, the more advanced solution is to introduce intelligent customer service robots, which can significantly reduce the amount of manual customer service and save a lot of customer service costs. The application of customer service robots in customer service does have obvious advantages: First, it improves user perception and provides enterprises with Online customer service, new media customer service, etc. provide unified and intelligent self-service support, which reduces the difficulty and complexity of user issues. Second, it improves service efficiency, shortens the time limit for consulting and processing, and offloads the pressure of traditional manual customer service, saving service costs. Quickly collect user demand and behavior data to support iterative product optimization.

Although customer service robots have all the above advantages, we need to consider a problem, how to extract and analyze hot topics with high frequency and clear intentions from the manual customer service logs, and analyze them into several types of standard questions (FAQs) , Frequently Asked Questions), for each FAQ, a professional business person configures the standard answer, and then, for the questions of future users, use technical means to analyze whether the problem is classified into any existing FAQ. If it is successful, it will be pre-configured The answer is returned to the user to achieve the effect of efficiently solving the user's question.

Switching from traditional manual customer service directly to intelligent customer service robots, there are currently more processing methods on the market for senior customer service personnel to summarize and summarize the questions frequently asked by users to form a knowledge base. This method relies more on the ability of senior customer service staff to understand and summarize the overall business situation. For a business, there is usually a large number of user logs accumulated, and the user log contains most of the knowledge base information.

At present, most knowledge base construction algorithms usually use machine learning algorithms (such as LSA, LDA and other topic model algorithms, and deep learning algorithms such as Seq2Seq) or natural language related algorithms (such as rule matching or template matching) to aggregate user questions. Then each category cluster is manually screened and summarized into FAQ standard questions, so as to achieve the purpose of building an intelligent customer service knowledge base. However, the existing methods for constructing the intelligent customer service knowledge base require more manual intervention and require a large amount of manual investment, and the quality of the constructed knowledge base is greatly affected by the level of the manual customer service business.

This application adopts the above technical solution, trains an expression model through manual customer service log data, and uses the expression model to obtain the question expression information of the user question to be collated, and aggregates the question expression information to obtain the user question class Clusters, and finally sort the user question class clusters to obtain a knowledge base. This method makes full use of the information contained in the existing manual customer service log data, and can quickly and iteratively optimize the robot customer service knowledge base through massive manual customer service log data, reducing the dependence of the knowledge base construction on the level of manual business and reducing the difficulty of construction.

It should be understood that although the steps in the embodiments of the present application are not necessarily performed sequentially in the order indicated by the step numbers. Unless explicitly stated in this document, the execution of these steps is not strictly limited, and these steps can be performed in other orders. Moreover, at least a part of the steps in each embodiment may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily performed at the same time, but may be performed at different times. The execution of these sub-steps or stages The order is not necessarily performed sequentially, but may be performed in turn or alternately with other steps or at least a part of the sub-steps or stages of other steps.

One or more non-transitory computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors implement one of the embodiments of the present application. The steps of the ambiguous detection method provided by the customer service robot knowledge base.

One or more non-transitory computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors implement one of the embodiments of the present application. Provides steps of a method for automatically building a customer service knowledge base based on a manual customer service log.

A computer device includes a memory and one or more processors. Computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by one or more processors, the one or more processors implement any one of the present application. The steps of the ambiguous detection method of the knowledge base of the customer service robot provided in the embodiment.

A computer device includes a memory and one or more processors. Computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by one or more processors, the one or more processors implement any one of the present application. The steps of the method for automatically constructing a customer service knowledge base based on a manual customer service log provided in the embodiment.

It can be understood that the same or similar parts in the above embodiments can be referred to each other. For the content that is not described in detail in some embodiments, refer to the same or similar content in other embodiments.

Any process or method description in a flowchart or otherwise described herein can be understood as representing a module, fragment, or portion of code that includes one or more executable instructions for implementing a particular logical function or step of a process And, the scope of the preferred embodiments of the present application includes additional implementations, in which the functions may be performed out of the order shown or discussed, including performing functions in a substantially simultaneous manner or in the reverse order according to the functions involved, which should It is understood by those skilled in the art to which the embodiments of the present application pertain.

It should be understood that each part of the application may be implemented by hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods may be implemented by software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it may be implemented using any one or a combination of the following techniques known in the art: Discrete logic circuits, application specific integrated circuits with suitable combinational logic gate circuits, programmable gate arrays (PGA), field programmable gate arrays (FPGA), etc.

A person of ordinary skill in the art can understand that all or part of the steps carried by the methods in the foregoing embodiments may be implemented by a program instructing related hardware. The program may be stored in a computer-readable storage medium. The program is When executed, one or a combination of the steps of the method embodiment is included.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing module, or each unit may exist separately physically, or two or more units may be integrated into one module. The above integrated modules can be implemented in the form of hardware or software functional modules. If the integrated module is implemented in the form of a software functional module and sold or used as an independent product, it may also be stored in a computer-readable storage medium.

The aforementioned storage medium may be a read-only memory, a magnetic disk, or an optical disk.

In the description of this specification, the description with reference to the terms “one embodiment”, “some embodiments”, “examples”, “specific examples”, or “some examples” and the like means specific features described in conjunction with the embodiments or examples , Structure, materials, or features are included in at least one embodiment or example of the present application. In this specification, the schematic expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although the embodiments of the present application have been shown and described above, it can be understood that the above embodiments are exemplary and should not be construed as limitations on the present application. Those skilled in the art can interpret the above within the scope of the present application. Embodiments are subject to change, modification, substitution, and modification.

Claims

A method for ambiguous detection of a customer service robot knowledge base, which is characterized by:

Construct a knowledge base, which is divided into FAQs, each FAQ is provided with at least one similar question, and each FAQ is a category;

Dividing the knowledge base into a test set and a training set for a deep learning model;

Training a deep learning model on the training set, and performing ambiguous detection using the learned deep learning model;

Updating the knowledge base based on the ambiguity detection results; and

Repeat the above steps until the learning effect is no longer improved, and a disambiguation knowledge base is obtained.
The method according to claim 1, wherein the dividing the knowledge base into a test set and a training set for a deep learning model comprises: randomly extracting a preset number of similar questions corresponding to each FAQ for FAQ correspondence. Test data of categories, and other similar questions as training data of the corresponding category of the FAQ; test data of all categories constitute a test set, and training data of all categories constitute a training set.
The method according to claim 1, wherein the deep learning model comprises: a feature extractor, a shallow classifier, and training a deep learning model on a training set, comprising:

Inputting the question in the training set as an input part into the deep learning model;

Use a feature extractor in the deep learning model to convert the question in the input part into a feature vector;

Using a shallow classifier in the deep learning model to calculate a prediction result according to the feature vector, where the prediction result is a category corresponding to a question in the input part;

Using an optimizer to optimize the training model, minimizing the average difference between the actual categories marked in the training set and the prediction results of the deep learning model; and

The test set is used to evaluate the trained model, and the consistency rate of the model prediction results and the actual category labels of the question sets in the test set is calculated as the evaluation of the model learning effect.
The method according to claim 1, wherein the ambiguity detection comprises: category ambiguity detection, labeling error detection, and labeling ambiguity detection, and using the learned deep learning model to perform ambiguity detection, comprising:

Use feature extractors in deep learning models to detect ambiguity; and

Use shallow classifiers in deep learning models to detect ambiguity.
The method according to claim 4, wherein detecting the ambiguity by using a feature extractor in a deep learning model comprises:

Using a feature extractor in the deep learning model to convert similar questions in a data set into feature vectors, where the data set includes a training set or / and a test set;

Combining the feature vectors corresponding to the question into a question feature vector pair (x, y), where the question corresponding to the feature vector x and the question corresponding to the feature vector y are from different categories;

Calculate the vector similarity cos (x, y) of the feature vector pairs of each group, where
and

Sort all question feature vector pairs according to the similarity of the vector from high to low, select the question feature vector pair ranked highest in the vector similarity, and rank the question feature vector ranked highest according to the vector similarity Ambiguous judgment.
The method according to claim 5, characterized in that determining whether there is ambiguity according to the top question vector pairs of vector similarity ranking comprises:

Judging whether there are labeling ambiguities or labeling errors: extracting a first preset number of the similarity-ranked question feature vector pairs, and manually checking whether the corresponding query pair has labeling ambiguity and labeling errors; and

Determine whether there is category ambiguity: For the first preset number of question feature vector pairs, count the number of repeated occurrences of the corresponding category pair, sort them from high to low, and take the second preset number of category pairs. Check for category ambiguity.
The method according to claim 4, wherein detecting the ambiguity by using a shallow classifier of a deep learning model comprises:

The classification results of the deep learning model are counted and a confusion matrix is formed. Each row i of the confusion matrix corresponds to the labeled category, each column j corresponds to the category predicted by the deep learning model, and the element x ij is labeled as category i, and the model Predict the number of questions in category j, the element x ji is labeled as category j, and the model predicts the number of questions in category i;

Calculate the number of samples labeled as category i in the data set, where the number of samples in category i is
Where k is any category;

Calculate the number of samples labeled as category j in the data set, where the number of samples in category j is
Where k is any category;

Calculate the proportion P ij of the samples labeled as category i predicted by the deep learning model as category j in the data set, and the ratio P ji of the samples predicted as category j predicted to category i by the deep learning model, the formulas for calculating P ij and P ji They are:
The category i and the category j belong to different categories, and the data set includes a training set or / and a test set;

Calculate the degree of confusion of category pairs (category i, category j), which is the harmonic mean S ij of P ij and P ji , where
and

It is determined whether there is an ambiguity between the category i and the category j according to the degree of confusion.
The method according to claim 7, wherein determining whether there is an ambiguity between category i and category j according to the degree of confusion comprises:

Sort the calculated confusion; and

A third preset number of categories with a higher degree of confusion are extracted, and the existence of category ambiguity is manually detected.
The method according to claim 7, further comprising: using a shallow classifier of the deep learning model to detect ambiguity, further comprising: finding data in which the actual categories marked in the data set and the categories predicted by the deep learning model are inconsistent, and manually checking for the existence of the data Annotation error, the data set includes a training set and / or a test set.
The method according to claim 1, wherein updating the knowledge base according to the ambiguity detection result comprises:

Manually rewrite, re-annotate the detected ambiguous questions, and delete the original annotations; and

Reassemble and assign similar questions to the detected ambiguity categories, and delete the original ambiguity categories.
The method according to claim 1, wherein constructing a knowledge base comprises:

Preprocessing the manual customer service log data;

Establish an expression model based on the processed manual customer service log data;

Obtaining question expression information of a user question to be sorted through the expression model;

Aggregate the question expression information to obtain a user question cluster; and

The user question sentence clusters are sorted to obtain a knowledge base.
A method for automatically constructing a customer service knowledge base based on a manual customer service log, which is characterized by:

Preprocessing the manual customer service log data;

Establish an expression model based on the processed manual customer service log data;

Obtaining question expression information of a user question to be sorted through the expression model;

Aggregate the question expression information to obtain a user question cluster; and

The user question sentence clusters are sorted to obtain a knowledge base.
The method according to claim 1, wherein the manual customer service log data comprises:

A question from the user and the corresponding customer service response; and,

All questions and corresponding customer service responses during the user's entire conversation.
The method according to claim 13, wherein preprocessing the manual customer service log data comprises:

Use machine learning algorithms or natural language processing algorithms to process artificial customer service log data to remove user questions and responses that are not related to business content.
The method according to claim 13, wherein the expression model is obtained by training the processed artificial customer service log data by using a training algorithm.
The method according to claim 15, wherein the training algorithm comprises:

Machine learning algorithms or search algorithms.
The method according to any one of claims 13 to 16, wherein performing aggregation processing on the question expression information comprises:

The clustering algorithm or synonym integration is used to process the question expression information.
The method according to claim 17, wherein the clustering algorithm is a K-Means clustering algorithm and related improved algorithms.
The method according to any one of claims 12 to 16, wherein the question expression information comprises: a vector representation of a sentence and / or a text feature representation.
One or more non-transitory computer-readable storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the following steps:

Construct a knowledge base, which is divided into FAQs, each FAQ is provided with at least one similar question, and each FAQ is a category;

Dividing the knowledge base into a test set and a training set for a deep learning model;

Training a deep learning model on the training set, and performing ambiguous detection using the learned deep learning model;

Updating the knowledge base based on the ambiguity detection results; and

Repeat the above steps until the learning effect is no longer improved, and a disambiguation knowledge base is obtained.
One or more non-transitory computer-readable storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the following steps:

Preprocessing the manual customer service log data;

Establish an expression model based on the processed manual customer service log data;

Obtaining question expression information of a user question to be sorted through the expression model;

Performing clustering processing on the question expression information;

The similar user questions are aggregated into the same category, and classified into a knowledge base.
A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the one or more processors, the one or more processors are caused. Each processor performs the following steps:

Construct a knowledge base, which is divided into FAQs, each FAQ is provided with at least one similar question, and each FAQ is a category;

Dividing the knowledge base into a test set and a training set for a deep learning model;

Training a deep learning model on the training set, and performing ambiguous detection using the learned deep learning model;

Update the knowledge base according to the ambiguity detection result;

Repeat the above steps until the learning effect is no longer improved, and a disambiguation knowledge base is obtained.
A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the one or more processors, the one or more processors are caused. Each processor performs the following steps:

Preprocessing the manual customer service log data;

Establish an expression model based on the processed manual customer service log data;

Obtaining question expression information of a user question to be sorted through the expression model;

Performing clustering processing on the question expression information;

The similar user questions are aggregated into the same category, and classified into a knowledge base.