CN113779236B

CN113779236B - Method and device for problem classification based on artificial intelligence

Info

Publication number: CN113779236B
Application number: CN202110919635.1A
Authority: CN
Inventors: 齐维维
Original assignee: Zhejiang Yiwugou E Commerce Co ltd
Current assignee: Zhejiang Yiwugou E Commerce Co ltd
Priority date: 2021-08-11
Filing date: 2021-08-11
Publication date: 2022-12-27
Anticipated expiration: 2041-08-11
Also published as: CN113779236A

Abstract

The application discloses a problem classification method based on artificial intelligence, which comprises the following steps: the cloud server acquires a problem sample reported by the intelligent customer service; extracting the problem sample characteristics, inputting the problem sample characteristics into a multilayer convolutional neural network to obtain a global characteristic vector sample, and defining the global characteristic vector sample as a training set; and testing the training set by using the Gaussian kernel support vector model to obtain the problem classification after testing.

Description

Method and device for problem classification based on artificial intelligence

Technical Field

The application relates to the technical field of electronic commerce, in particular to a problem classification method and device based on artificial intelligence.

Background

At present, the electronic commerce field is often applied to intelligent customer service to answer various questions provided by users, and the mainstream question automatic answering technology is to make simple and accurate answers under the condition of given questions. Currently, how to obtain the correct classification of a question is the key to whether the question can be answered accurately.

In the prior art, a classifier is constructed by adopting a supervised learning method, and is used for problem classification. However, when various complicated and irregular question questioning methods are dealt with, the supervised learning method is not accurate enough in classification, the supervised learning can only predict and match questions from a relatively coarse classification, the question classification is not accurate enough, and the user experience is poor.

Disclosure of Invention

The embodiment of the application provides a problem classification method and device based on artificial intelligence, which are used for solving the problem of poor problem classification accuracy in the prior art.

The embodiment of the invention provides a problem classification method based on artificial intelligence, which comprises the following steps:

the cloud server obtains problem samples reported by the intelligent customer service;

extracting the problem sample characteristics, inputting the problem sample characteristics into a multilayer convolutional neural network to obtain a global characteristic vector sample, and defining the global characteristic vector sample as a training set;

and testing the training set by using a Gaussian kernel support vector model to obtain the tested problem classification.

Optionally, before the testing the training set by using the gaussian kernel support vector model, the method further includes:

determining a hyper-parameter value range of the Gaussian kernel support vector model, wherein the hyper-parameter comprises a kernel function parameter and a penalty factor;

and obtaining the classification accuracy under different super-parameter values by adopting a multivariate binary classifier.

Optionally, the method further comprises:

optimizing the value range of the hyper-parameter by using a good lattice point method, comprising the following steps:

setting the test times n and the factor number as s, and calculating a positive integer set H _n,s

H _n,s ＝{h:h＜n,gcd(n,h)＝1}；

At H _n,s Wherein H = { H = { H) ₁ ,H ₂ ,...,H _n H wherein H _i ＝{h _i ,h _i ² ...h _i ^s N, generating n s of a U-shaped design U ^K ＝{u ^k _i,j H, where u ^k _i,j ＝i*k ^j-1 (mod n),j＝1...,s；

In all U ^k Selecting the design with the minimum deviation value as the uniform design U ^k _x 。

Optionally, obtaining classification accuracy under different values of the hyper-parameter by using a multivariate binary classifier includes:

setting a support vector machine model of a two-classifier between every two samples in the training set;

and classifying the sample data according to a preset hyperplane by a support vector machine model of the two classifiers, and calculating the classification accuracy.

Optionally, the method further comprises:

and setting the experiment times, and performing a plurality of sequential experiments to obtain the classification accuracy under the current different values of the hyper-parameters in the training set.

Optionally, inputting the problem sample feature into a multi-layer convolutional neural network to obtain a global feature vector sample, including:

generating a first feature vector sample based on the problem sample feature;

carrying out high-dimensional mapping on the first feature vector sample to obtain a second feature vector sample;

inputting the second feature vector sample into a plurality of convolutional layers and pooling layers of a multilayer convolutional neural network to obtain a first fusion feature sample;

respectively performing feature fusion on the feature vector samples of the intermediate hidden layer output by the convolution operation and the pooling operation through weighting calculation to obtain second fusion feature vector samples;

and inputting the first fusion characteristic sample and the second fusion characteristic vector sample to a full-connection layer in the multilayer convolutional neural network to obtain a global characteristic vector sample.

Optionally, before the cloud server obtains the problem sample reported by the intelligent customer service, the method further includes:

and cleaning data of the problems reported by the intelligent customer service, carrying out K-means clustering on the cleaned data to obtain clustering problems, and selecting a plurality of problem samples from the clustering problems.

Optionally, the method further comprises:

performing semantic analysis on the reported problems, and extracting the emotional characteristics of the user;

and identifying the emotion of the user based on the emotional characteristics of the user, and responding to the emotion of the user.

Optionally, the performing of the user emotional response includes:

and replacing the intelligent customer service with a manual customer service.

The embodiment of the invention also provides a device which comprises a memory and a processor, wherein the memory is stored with computer executable instructions, and the processor realizes the method when running the computer executable instructions on the memory.

According to the method provided by the embodiment of the invention, the problems are classified through the Gaussian kernel support vector machine, the accurate classification result is obtained, the optimized hyper-parameters are obtained through hyper-parameter optimization and a test design method, and the problem classification accuracy is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below.

FIG. 1 is a schematic flow diagram of artificial intelligence based question classification in one embodiment;

FIG. 2 is a logic diagram of a convolutional neural network in one embodiment;

FIG. 3 is a schematic illustration of artificial intelligence based question classification in another embodiment;

FIG. 4 is a diagram of the hardware components of an apparatus according to one embodiment.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, of the embodiments of the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

FIG. 1 is a flow chart of artificial intelligence based problem classification according to an embodiment of the present invention, as shown in FIG. 1, the method includes:

s101, a cloud server obtains problem samples reported by intelligent customer service;

in the embodiment of the invention, the intelligent customer service is an intelligent program, and the intelligent customer service automatically answers the questions of the user, so that the load of manual customer service is greatly reduced. The intelligent customer service is started when the user logs in the electronic commerce network, and can analyze the matters such as commodities, payment and complaints aiming at the user, make an accurate response, and rerank the problems and gather the questions into a database at the cloud side.

Before the problem samples reported by the intelligent customer service are obtained, the cloud server collects all problem information in advance, and cleans and primarily classifies the problem information. Since the problems of the users are various and the expression modes of the same type of problems are also various, the cloud server can clean the problem information in advance, reserve one of the similar problems and delete the invalid problem (irrelevant to the e-commerce). Optionally, the cloud server performs primary classification on the problem by using a k-means method, and it should be noted that the primary classification is only a relatively rough and simple classification, and then a more accurate result can be obtained by performing refined classification on the basis.

In the embodiment of the invention, the method for preliminarily classifying the problems by using the k-means method comprises the following steps:

firstly, problem information is subjected to normalization operation to form a standardized problem data cluster;

secondly, random data is set in a standardized problem data cluster as a cluster center, and a plurality of clusters are formed by taking the cluster center as the center;

and finally, selecting a central point in each cluster as a new cluster center, and dividing the clusters again, and repeating the steps until the last cluster is not changed.

Each cluster group is a roughly divided problem category, and the cloud server can select a plurality of problem samples from each roughly divided problem category to continue to perform type precise division.

S102, extracting the problem sample characteristics, inputting the problem sample characteristics into a multilayer convolutional neural network to obtain a global characteristic vector sample, and defining the global characteristic vector sample as a training set;

in the embodiment of the invention, the characteristic of the problem sample is acquired by adopting a deep learning method. In the actual operating environment, hundreds of thousands of even millions of problem samples can be collected, the problem samples are cleaned, and the problem samples with relatively strong correlation can be obtained after irrelevant samples are removed. The cloud server may obtain different question samples from the e-commerce website, for example, by an intelligent customer service question and answer system, a web crawler, or the like.

The cloud server extracts the features of the problem sample based on the first feature vector sample, and generates a first feature vector sample; in particular, a first feature vector sample extraction may be performed by a word vector word2vec tool. The word2vec tool is a commonly used tool and embodiments of the present invention will not be described again.

After the cloud server extracts the first feature vector sample, high-dimensional mapping is carried out on the first feature vector sample to obtain a second feature vector sample; for example, the ConvLSTM algorithm or the lightNet algorithm is used for performing the spatio-temporal correlation analysis, and the first eigenvector sample is subjected to high-dimensional mapping, so that the first eigenvector sample can obtain more and more sufficient information.

After the second feature vector sample is obtained, the cloud server inputs the second feature vector sample into a plurality of convolutional layers and pooling layers of the multilayer convolutional neural network to obtain a first fusion feature sample; it should be noted that, since the problem of user input is often brief and irregular, a convolution operation using a Garbor kernel function is required to extract local features. Illustratively, embodiments of the present invention may employ a convolutional neural network of Garbor. Wherein the Garbor kernel function can be expressed as:

wherein guv (x, y) is a Garbor kernel function, x, y are respectively the center points of kernels, i is the imaginary part of a complex number, k is the proportion of the amplitude of a Gaussian kernel,

where v is the wavelength of the gabor filter, u is the direction of the gabor kernel, K is the total number of directions,

representing the height of the filter.

After the first fusion feature vector is obtained, the cloud server performs feature fusion on feature vector samples of the middle hidden layer output by convolution operation and pooling operation through weighted calculation to obtain a second fusion feature vector sample; in other words, although the convolutional neural network can implement feature fusion by the superposition and convolution operations and the pooling operation, part of information included in the intermediate hidden layer is discarded by pooling, and in order to fully utilize the feature information, it is considered to perform post-fusion on the lost information.

And finally, the cloud server inputs the first fusion characteristic sample and the second fusion characteristic vector sample to a full connection layer in the multilayer convolutional neural network to obtain a global characteristic vector sample.

Fig. 2 is a schematic diagram of the above multilayer convolutional neural network, and as shown in fig. 2, after a problem sample is input into a plurality of convolutional layer kernel pooling layers, feature fusion is performed through weighting calculation to obtain a second fused feature vector sample, and finally, the second fused feature vector sample is input into a full connection layer to obtain a global feature vector sample.

S103, testing the training set by using the Gaussian kernel support vector model to obtain the tested problem classification.

In the embodiment of the invention, the principle of the support vector machine is to find an optimal classification hyperplane meeting the classification requirement, so that the hyperplane can maximize blank areas on two sides of the hyperplane while ensuring the classification precision, namely

s.t.y _i (w ^T x _i +b)≥1,i＝1,2,...,m。

This is the basic model of the support vector machine. Since this is a convex quadratic programming problem, the dual problem can be obtained using the Lagrangian multiplier method

After alpha (more than or equal to 0) is solved, w and b are solved, the model corresponding to the optimal classification hyperplane is

In a real-world task, there may not be a hyperplane in the original sample space that correctly partitions the two types of samples, and therefore, the samples are mapped from the original space to a higher-dimensional feature space, so that the samples are linearly separable within this feature space. Let phi (x) represent the feature vector after x is mapped, then the model corresponding to the optimal classification hyperplane is represented as

f(x)＝w ^T Φ(x)+b。

Is like having

s.t.y _i (w ^T Φ(x _i )+b)≥1,i＝1,2,...,m。

The dual problem is that

To simplify the calculation, a kernel function is introduced

Is rewritten as

The division hyperplane with the largest spacing is represented as

Commonly used kernel functions include linear kernel functions, polynomial kernel functions, radial basis kernel functions, and Sigmoid kernel functions. Wherein, the radial basic kernel function is Gaussian

The application is the most extensive.

However, it is often difficult to determine a suitable kernel function in real-world tasks such that the training samples are linearly separable in the feature space. To go back, even if a kernel function is found that makes the training set linearly separable in the feature space, it is difficult to determine that this seemingly linearly separable result is not due to overfitting. One way to alleviate this problem is to allow the support vector machine to be in error on some samples.

Thus, an optimization objective can be written as

Wherein C>0 is a constant, l _0/1 Is "01 loss function ".

The penalty factor C is a compromise between minimizing the training error and maximizing the classification interval, and limits the Lagrangian multiplier alpha _i In the [0,C ]]The range, therefore, is also referred to as an upper bound parameter. A low upper bound reduces the punishment of the algorithm to the misclassification condition of the learning machine, and a high upper bound enhances the punishment of the algorithm to the misclassification condition of the learning machine. Using too small an upper bound will result in the algorithm achieving good generalization ability and poor classification accuracy; using too large an upper bound will result in an algorithm that can obtain accurate classification results on the training samples, but it is highly likely that the algorithm will over-learn, and will also significantly increase the computation time for the process of solving the support vector machine model.

In addition, in S103, before testing the training set by using the gaussian kernel support vector model, the embodiment of the present invention further includes the following contents:

compared with a linear kernel function, the Gaussian kernel function can be mapped to infinite dimensions, decision boundaries are more diversified, the method is suitable for various sample conditions, and the noise in data has good anti-interference capability, so that the method is excellent in performance and wide in application.

Wherein, the value range of the hyper-parameter can be optimized by using a good lattice point method, which specifically comprises the following steps:

setting the number of trials n (exemplary, n = 100) and the factor s, calculating the positive integer set H _n,s Wherein

H _n,s ＝{h:h＜n,gcd(n,h)＝1}；

At H _n,s Wherein H = { H = { (H) } ₁ ,H ₂ ,...,H _n In which H is _i ＝{h _i ,h _i ² ...h _i ^s N, generating n s of a U-shaped design U ^K ＝{u ^k _i,j In which u ^k _i,j ＝i*k ^j-1 (mod n),j＝1...,s；

In the embodiment of the present invention, the multiple binary classifiers are used to obtain the classification accuracy under different values of the hyper-parameters, which may specifically be:

setting a support vector machine model of a two-classifier between every two samples in the training set; for example, a one-against-one classification method may be used to train multiple two classifiers to classify. Illustratively, the training set has x classes, and a two-classifier SVM model is constructed between each two classes.

And classifying the sample data according to a preset hyperplane by a support vector machine model of the two classifiers, and calculating the classification accuracy. In particular, a voting strategy may be employed to classify, i.e., classify, the data according to its hyperplane. Taking two types of classification problems of a and b as examples (a represents payment correlation, b represents payment irrelevance), if the sample is a type, the ticket of the a type is added with 1, and if the sample is b type, the ticket of the b type is added with 1.

It should be noted that, in the embodiment of the present invention, the number of experiments may be set to perform a plurality of sequential tests, and a sequential optimization method is used to obtain the classification accuracy under different current values of the hyper-parameters in the training set, so that the speed of optimization can be greatly increased.

In addition, in the embodiment of the invention, semantic analysis can be carried out on the reported problems, and the emotional characteristics of the user can be extracted; for example, when the user sends voice, mel-frequency cepstrum coefficients of the voice data can be extracted, and the coefficients are used to train the continuous hidden Markov recognition model, and output emotional states or characteristics, wherein typical emotional states are happy, sad, angry and the like;

and identifying the emotion of the user based on the emotional characteristics of the user, and responding to the emotion of the user. For example, if the user's mood is angry, which may be due to the intelligent customer service being out of service in time, the intelligent customer service may be replaced with a manual customer service. Alternatively, another intelligent customer service is assigned to be serviced again.

Fig. 3 is a flowchart of problem classification based on artificial intelligence in an embodiment of the present invention, and in conjunction with the technical description of the above embodiment, fig. 3 may include the following steps:

s301, the cloud server acquires the problem reported by the intelligent customer service;

s303, cleaning and primarily classifying the problems; for example, k-means may be used for primary clustering;

s303, constructing a problem sample;

s304, inputting the global feature training set into a multilayer convolutional neural network to obtain a global feature training set; the specific steps are described in the previous embodiment, and are not described again;

s305, constructing a Gaussian kernel support vector model and optimizing hyper-parameters of the model; optimizing the hyper-parameters comprises the steps of generating a uniform design table with minimum deviation through a good lattice point method, and training a two-classifier support vector machine model by using a one-against-one pairwise classification method;

s306, inputting the training set into the constructed Gaussian kernel support vector model for problem classification.

According to the method provided by the embodiment of the invention, the problems are classified through the Gaussian kernel support vector machine, the accurate classification result is obtained, the optimized hyper-parameters are obtained through the hyper-parameter optimization and the test design method, and the problem classification accuracy is improved.

The embodiment of the present invention further includes an apparatus, which is characterized by comprising a memory and a processor, wherein the memory stores computer executable instructions, and the processor implements the method when executing the computer executable instructions on the memory.

Embodiments of the present invention also provide a computer-readable storage medium, on which computer-executable instructions are stored, where the computer-executable instructions are used to execute the method in the foregoing embodiments.

FIG. 4 is a diagram of the hardware components of an apparatus according to one embodiment. It will be appreciated that fig. 4 only shows a simplified design of the device. In practical applications, the apparatuses may also respectively include other necessary elements, including but not limited to any number of input/output systems, processors, controllers, memories, etc., and all apparatuses that can implement the big data management method of the embodiments of the present application are within the protection scope of the present application.

The memory includes, but is not limited to, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), or a portable read-only memory (CD-ROM), which is used for storing instructions and data.

The input system is for inputting data and/or signals and the output system is for outputting data and/or signals. The output system and the input system may be separate devices or may be an integral device.

The processor may include one or more processors, for example, one or more Central Processing Units (CPUs), and in the case of one CPU, the CPU may be a single-core CPU or a multi-core CPU. The processor may also include one or more special purpose processors, which may include GPUs, FPGAs, etc., for accelerated processing.

The memory is used to store program codes and data for the network device.

The processor is used for calling the program codes and data in the memory and executing the steps in the method embodiment. Specifically, reference may be made to the description of the method embodiment, which is not repeated herein.

In the several embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other ways. For example, the division of the unit is only one logical function division, and other division may be implemented in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. The shown or discussed mutual coupling, direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, systems or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable system. The computer instructions may be stored on or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)), or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more available media. The usable medium may be a read-only memory (ROM), or a Random Access Memory (RAM), or a magnetic medium, such as a floppy disk, a hard disk, a magnetic tape, a magnetic disk, or an optical medium, such as a Digital Versatile Disk (DVD), or a semiconductor medium, such as a Solid State Disk (SSD).

The above is only a specific embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think of various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for artificial intelligence based problem classification, comprising:

the cloud server obtains problem samples reported by the intelligent customer service; the method comprises the following steps: the cloud server collects all the problem information; the cloud server cleans the problem information, one of a plurality of similar problems is reserved, and the invalid problem is deleted; the cloud server performs normalization operation on the problem information to form a standardized problem data cluster; setting random data as a cluster center in a standardized problem data cluster, and forming a plurality of clusters by taking the cluster center as the center; selecting a central point in each cluster as a new cluster center, and dividing the clusters again, and repeating the steps until the last cluster is not changed; each cluster is a roughly divided problem category, and the cloud server selects a plurality of problem samples from each roughly divided problem category and continues to perform type precise division;

mapping the samples from the original space to a higher dimensional feature space such that the samples are linearly separable within the feature space; let Φ (x) represent the feature vector after x is mapped, the model corresponding to the optimal classification hyperplane is represented as: f (x) = w ^T Φ(x)+b；

Obtaining a basic model of a support vector machine:

s.t.y _i (w ^T Φ(x _i )+b)≥1,i＝1,2,...,m；

the dual problem is obtained by the Lagrange multiplier method:

introducing kernel functions

The rewrite is:

the partition hyperplane with the largest spacing is then expressed as:

adopting a multivariate binary classifier to obtain classification accuracy under different super-parameter values, comprising the following steps: setting a support vector machine model of a two-classifier between every two samples in the training set; the support vector machine model of the two classifiers can classify sample data by adopting a voting strategy according to a preset hyperplane and calculate the classification accuracy;

testing the training set by using a Gaussian kernel support vector model to obtain the problem classification after testing;

the method for optimizing the value range of the hyperparameter of the Gaussian kernel support vector model by using the good lattice point method comprises the following steps:

H _n,s ＝{h:h<n,gcd(n,h)＝1}；

At H _n,s Wherein H = { H = { H) ₁ ,H ₂ ,...,H _n H wherein H _i ＝{h _i ,h _i ² ...h _i ^s H, n, a U-shaped design U generating n × s ^K ＝{u ^k _i,j In which u ^k _i,j ＝i*k ^j-1 (mod n),j＝1...,s；

2. The method of claim 1, wherein prior to said testing the training set using the gaussian kernel support vector model, the method further comprises:

determining a value range of a hyper-parameter of the Gaussian kernel support vector model, wherein the hyper-parameter comprises a kernel function parameter and a penalty factor;

3. The method of claim 1, further comprising:

4. The method of claim 1, wherein inputting the problem sample features into a multi-layer convolutional neural network to obtain global feature vector samples comprises:

generating a first feature vector sample based on the problem sample features;

respectively performing feature fusion on the feature vector samples of the middle hidden layer output by the convolution operation and the pooling operation through weighting calculation to obtain second fusion feature vector samples;

and inputting the first fusion feature sample and the second fusion feature vector sample to a full connection layer in the multilayer convolutional neural network to obtain a global feature vector sample.

5. The method of claim 1, wherein before the cloud server obtains a sample of problems reported by smart customer services, the method further comprises:

6. The method of claim 5, further comprising:

7. The method of claim 6, wherein said engaging a user emotional response comprises:

and replacing the intelligent customer service with a manual customer service.