CN110827797B

CN110827797B - Voice response event classification processing method and device

Info

Publication number: CN110827797B
Application number: CN201911074562.XA
Authority: CN
Inventors: 姜盛乾; 付先凯; 孙宇晨; 许涵博; 谭楚婧; 张佳人; 段冉; 张瑜筱丹; 姜红娟; 董俊男; 管金凤
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2019-11-06
Filing date: 2019-11-06
Publication date: 2022-04-12
Anticipated expiration: 2039-11-06
Also published as: CN110827797A

Abstract

The invention discloses a method and a device for classifying and processing voice response events, and relates to the field of voice recognition. The method comprises the following steps: determining user service scene characteristic data corresponding to a user voice response event; identifying the current emotion characteristic data of the user according to the voice response content of the user; determining user portrait characteristic data according to access data of a user voice response event; and inputting the user service scene characteristic data, the user current emotion characteristic data and the user image characteristic data into a follow-up category classification model, and determining a follow-up category corresponding to the voice response event so as to follow up the voice response event according to the follow-up category. The method and the device can effectively classify the voice response events, so that the event processing efficiency is improved.

Description

Voice response event classification processing method and device

Technical Field

The present disclosure relates to the field of speech recognition, and in particular, to a method and an apparatus for classifying and processing a speech response event.

Background

After the user accesses the intelligent voice response robot, the intelligent voice response robot creates an event in the system and answers the question of the user, and the event is closed after the call is ended. However, in practice, the user may not be satisfied with the response from the intelligent voice response robot, and thus, the user may need to manually follow up the event generated by the intelligent voice response robot.

In the related art, the intelligent voice response robot does not effectively classify created events, so that a user needs to access the intelligent voice response robot for multiple times to perform intelligent response, the event processing efficiency is poor, and the user experience is poor.

Disclosure of Invention

The technical problem to be solved by the present disclosure is to provide a method and an apparatus for processing voice response events in a classified manner, which can improve the efficiency of event processing.

According to an aspect of the present disclosure, a method for classifying and processing a voice response event is provided, including: determining user service scene characteristic data corresponding to a user voice response event; identifying the current emotion characteristic data of the user according to the voice response content of the user; determining user portrait characteristic data according to access data of a user voice response event; and inputting the user service scene characteristic data, the user current emotion characteristic data and the user image characteristic data into a follow-up category classification model, and determining a follow-up category corresponding to the voice response event so as to follow up the voice response event according to the follow-up category.

In some embodiments, user satisfaction characteristic data of a user for a voice response event is obtained; and inputting the user service scene characteristic data, the user current emotion characteristic data, the user portrait characteristic data and the user satisfaction degree characteristic data into a follow-up category classification model, and determining a follow-up category corresponding to the voice response event.

In some embodiments, determining the user service scenario characteristic data corresponding to the user voice response event comprises: according to the voice response content of the user, identifying the standard value of each level of service scene corresponding to the voice response event of the user, or responding to each level of service scene selected by the user in the voice response event of the user and determining the standard value corresponding to each level of service scene; determining a service scene score corresponding to a user voice response event according to the standard value of each level of service scene; and determining the characteristic data of the user service scene according to the service scene score.

In some embodiments, identifying the current emotional characteristic data of the user based on the user voice response content comprises: converting the voice response content of the user into text sentences; performing word segmentation processing on the text sentence to obtain a word segmentation sequence; vectorizing the word sequence to obtain a plurality of word vectors; obtaining a sentence vector according to the word vector; and inputting the sentence vector into the emotion classification model to obtain the current emotion characteristic data of the user.

In some embodiments, a sample sentence vector corresponding to the sample user voice response content is obtained; labeling the emotion characteristic data corresponding to the sample sentence vectors to generate an emotion labeling file; and training the emotion classification model based on the sample sentence vectors and the emotion marking files.

In some embodiments, determining user representation feature data based on the access data of the user voice response event comprises: determining static information data and dynamic information data of a user according to access data of a user voice response event; determining a basic label of a user according to the static information data of the user; determining an attribute tag of a user according to dynamic information data of the user; and determining the user portrait feature data according to the basic label and the attribute label of the user.

In some embodiments, sample service scene feature data, sample emotion feature data, sample portrait feature data and sample satisfaction feature data corresponding to a sample user voice response event are obtained; marking corresponding sample follow-up categories of sample service scene characteristic data, sample emotion characteristic data, sample portrait characteristic data and sample satisfaction degree characteristic data to generate follow-up category marking files; and training a follow-up category classification model based on the sample service scene characteristic data, the sample emotion characteristic data, the sample portrait characteristic data, the sample satisfaction degree characteristic data and the follow-up category labeling file.

According to another aspect of the present disclosure, a device for classifying and processing a voice response event is further provided, including: the scene characteristic data determining unit is configured to determine user service scene characteristic data corresponding to the user voice response event; the emotion characteristic data determination unit is configured to identify current emotion characteristic data of the user according to the voice response content of the user; the portrait characteristic data determining unit is configured to determine user portrait characteristic data according to access data of a user voice response event; and the event follow-up category determining unit is configured to input the user service scene feature data, the user current emotion feature data and the user image feature data into a follow-up category classification model, determine a follow-up category corresponding to the voice response event and follow-up the voice response event according to the follow-up category.

In some embodiments, the satisfaction characteristic data determination unit is configured to obtain user satisfaction characteristic data of a user to a voice response event; the event follow-up category determining unit is further configured to input the user service scene feature data, the user current emotion feature data, the user portrait feature data and the user satisfaction feature data into a follow-up category classification model, and determine a follow-up category corresponding to the voice response event.

According to another aspect of the present disclosure, a device for classifying and processing a voice response event is further provided, including: a memory; and a processor coupled to the memory, the processor configured to perform the voice response event classification processing method as described above based on instructions stored in the memory.

According to another aspect of the present disclosure, a computer-readable storage medium is also provided, on which computer program instructions are stored, which when executed by a processor implement the above-mentioned voice response event classification processing method.

Compared with the prior art, in the embodiment of the disclosure, the follow-up category corresponding to the voice response event is determined according to the user service scene characteristic data, the user current emotion characteristic data and the user portrait characteristic data corresponding to the voice response event, so that the voice response event can be followed up according to the follow-up category, the voice response event can be effectively classified, and the event processing efficiency is improved.

Other features of the present disclosure and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

The present disclosure may be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:

fig. 1 is a flow diagram of some embodiments of a voice response event classification processing method of the present disclosure.

Fig. 2 is a flowchart illustrating a voice response event classification processing method according to another embodiment of the disclosure.

Fig. 3 is a flowchart illustrating a voice response event classification processing method according to another embodiment of the disclosure.

Fig. 4 is a flow diagram of some embodiments of the present disclosure of identifying user current emotional characteristic data.

Fig. 5 is a schematic structural diagram of some embodiments of a device for classifying and processing a voice response event according to the present disclosure.

Fig. 6 is a schematic structural diagram of another embodiment of a device for classifying and processing a voice response event according to the present disclosure.

Fig. 7 is a schematic structural diagram of another embodiment of a device for classifying and processing a voice response event according to the present disclosure.

Fig. 8 is a schematic structural diagram of another embodiment of a device for classifying and processing a voice response event according to the present disclosure.

Detailed Description

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.

In step 110, the user service scenario characteristic data corresponding to the user voice response event is determined.

In some embodiments, after the voice response robot completes intelligent wiring, an event is created for the user question, and the service category and content corresponding to the event are recorded. And performing complexity evaluation on the service category corresponding to the event to obtain the user service scene characteristic data.

In some embodiments, the standard value of each level of service scene corresponding to the user voice response event is identified according to the user voice response content, the service scene score corresponding to the user voice response event is determined according to the standard value of each level of service scene, and the user service scene feature data, such as the service category complexity value, is determined according to the service scene score.

For example, the traffic is classified according to descriptors spoken by the user. For example, the events have a total of three classes, one class being A_iWherein i represents the ith category of the primary classification; second class classification as B_jWherein j represents the jth category of the secondary classification; three-stage classification as C_kWhere k represents the kth class of the three-level classification. Scene partitioning as A_i-B_j-C_kSuch as pre-sale goods consultation-inventory consultation-time to back-order. Class one classification A_iSecond class B_jAnd three-stage classification C_kEach has its own standard value, and the standard value interval is (0, 1), for example, 0.1, 0.3, 0.5,0.7, 0.9. If the identified service scene is closer to the actual scene classification, the standard value is lower, and the standard value is closer to the actual scene classification, the standard value is higherThe higher the accuracy value. Then, the standard values of the respective service scenes are multiplied to obtain a service scene score, for example, the service scene score is a_i×B_j×A_kAnd if the score is high, the service class complexity value is larger.

Those skilled in the art will appreciate that the closer the identified traffic scene is to the actual scene classification, the lower the standard value, for example only. It can also be set that the closer the identified service scene is to the actual scene classification, the higher the standard value is; the closer to the actual scene classification, the lower the standard value. The lower the obtained service scene score is, the larger the service category complexity value is.

In other embodiments, in response to each level of service scenario selected by a user in a user voice response event, determining a standard value corresponding to each level of service scenario, determining a service scenario score corresponding to the user voice response event according to the standard value of each level of service scenario, and determining user service scenario feature data according to the service scenario score.

For example, after the user accesses the voice service, a first-layer service scenario is prompted to the user, after the user selects the first-layer service scenario, a second-layer service scenario is prompted to the user, and so on until the user selects the last-layer service scenario.

In step 120, the current emotional characteristic data of the user is identified according to the voice response content of the user.

In some embodiments, the user voice response content is converted to text statements; performing word segmentation processing on the text sentence to obtain a word segmentation sequence; vectorizing the word sequence to obtain a plurality of word vectors; obtaining a sentence vector according to the word vector; and inputting the sentence vector into the emotion classification model to obtain the current emotion characteristic data of the user.

At step 130, user profile feature data is determined based on the access data of the user voice response event.

In some embodiments, the static information data and the dynamic information data of the user are determined according to the access data of the voice response event of the user; determining a basic label of a user according to the static information data of the user; determining an attribute tag of a user according to dynamic information data of the user; and determining the user portrait feature data according to the basic label and the attribute label of the user.

In some embodiments, if the voice response robot is applied to a shopping scenario, the database is built according to various types of data in the shopping platform. For example, the shopping platform builds a database with all static and dynamic information in the comprehensive platform, and the static information in the database comprises information such as consumption grade, average consumption period, user gender, age, region and the like; the dynamic information comprises behavior information of browsing, consuming, inquiring, commenting, praising, adding shopping carts and the like in the user platform.

And according to the access telephone corresponding to the user voice response event or the telephone input by the user, finding out the registration information of the user on the shopping platform, matching the user registration information with the static information data in the database, and automatically generating a basic label. The base labels are for example: post 90, VIP users, etc.

And constructing a user attribute tag according to data of browsing, consuming, inquiring and the like of the user in the shopping platform. For example, the attribute tag of the user includes the commodity class and the operation weight of the user on a certain product, such as the purchase weight W_aThe purchase weight refers to the preference of the user for a certain type of merchandise, and the purchase weight range is, for example, [0,100 ]]. And operations such as browsing, consuming, inquiring and commenting of the user in the platform have corresponding purchasing weight scores. For example, lipstick, the preference degree is added by 1 every time a user browses one category; every time a lipstick category item is purchased, the preference level is increased by 5.

The attribute tag may decay with the passage of the event, and in some embodiments, a decay function may be set up for the attribute tag, operating once within the platform, attribute tag W_Rights＝W_a×e^-z(t-ts). z is the decay rate and t-ts is the difference between the current time and the operating time.

In some embodiments, the user profile feature data is matched in the database using the base tags and attribute tags of the user. The matching model may be, for example, a cluster analysis, a bayesian probability algorithm, or the like. In some embodiments, the user profile feature data is divided into, for example, reputation scores, user impairment metrics, platform membership, and the like.

In step 140, the user service scene feature data, the user current emotion feature data and the user image feature data are input to the follow-up category classification model, and a follow-up category corresponding to the voice response event is determined, so that the voice response event is followed according to the follow-up category.

In some embodiments, sample service scene feature data, sample emotion feature data and sample portrait feature data corresponding to a sample user voice response event may be obtained; labeling sample follow-up categories corresponding to the sample service scene characteristic data, the sample emotion characteristic data and the sample image characteristic data to generate follow-up category labeling files; and training a follow-up category classification model based on the sample service scene characteristic data, the sample emotion characteristic data, the sample portrait characteristic data, the sample satisfaction degree characteristic data and the follow-up category labeling file.

The follow-up categories include, for example, a first follow-up category, a second follow-up category, a third follow-up category, a fourth follow-up category, and the like. The first follow-up category is, for example, a closing event category, the second follow-up category is, for example, a smart customer service follow-up category, the third follow-up category is, for example, a human customer follow-up category, and the fourth follow-up category is, for example, a user care intervention category. It will be understood by those skilled in the art that the four categories are used herein for example purposes only, and the follow-up categories may also be classified into multiple categories according to the actual application.

In the embodiment, the follow-up type corresponding to the voice response event is determined according to the user service scene characteristic data, the user current emotion characteristic data and the user portrait characteristic data corresponding to the voice response event of the user, so that the voice response event can be followed up according to the follow-up type, the voice response event can be effectively classified, and the event processing efficiency is improved.

In step 210, the user service scenario characteristic data corresponding to the user voice response event is determined.

For example, the user service scenario feature data comprises a service class complexity value x₁，x₁∈[0,1]Where 0 represents simple service scenario, 1 represents complex service scenario, x₁Reflecting the complexity of the scene.

In step 220, the current emotional characteristic data of the user is identified according to the voice response content of the user.

The user emotion characteristic data comprises a user emotion index x₂，x₂∈[0,1]Where 0 represents pleasure, 1 represents anger, and x₂Reflecting the degree of anger of the user.

At step 230, user profile feature data is determined based on the access data of the user voice response event.

The user representation feature data comprises a user reputation score x₃User effortless index x₄Whether it is a platform member x₅. Wherein x is₃Reflecting the reputation score, x, of the user₃∈[1,8]And generating according to the reputation of the user on the platform, wherein 1 represents that the user has the lowest reputation, and 8 represents that the user has the highest reputation. x is the number of₄Reflecting the consuming ability of the user, generated from the amount of consumption of the user over a recent period of time (e.g., one year), x₄∈[0,100000]Wherein the value represents the amount of consumption by the user. x is the number of₅Representing whether the user is a platform member, x₅Is 1 variable from 0 to 1, with 1 representing that the user is a platform member.

At step 240, user satisfaction characteristic data of the user for the voice response event is obtained.

In some embodiments, user satisfaction may be classified, for example, as very satisfactory, general, unsatisfactory, and very unsatisfactory, with different types of satisfaction being set with different values. The user satisfaction comprises a user satisfaction score x₆，x₆∈[1,5]And representing the evaluation of the service by the user.

In step 250, the user service scene feature data, the user current emotion feature data, the user portrait feature data and the user satisfaction feature data are input into a follow-up category classification model, and a follow-up category corresponding to the voice response event is determined.

In this embodiment, the event intelligent classification processing task is described as a multi-classification problem, and incoming call incoming lines are distinguished according to the acquired features, such as a 6-dimensional feature vector<x₁,x₂,x₃,x₄,x₅,x₆>Inputting the features into 1 or more classification methods (classifiers) to obtain a four-classification result, for example, the result is represented as y, y belongs to {1,2,3,4}, and represents event closing, intelligent customer service follow-up, manual customer service follow-up and user care intervention respectively, and then follow-up is performed on the voice response event according to the result, so that user experience is improved.

In this embodiment, an effective supervised learning Classification method may be utilized, for example, including K-Nearest Neighbor (KNN), Classification Regression Tree (CART), Bayesian method (Bayesian), Support Vector Machine (SVM) based on Kernel method, gaussian Kernel (RBF Kernel), Neural Network (Neural Network), And the like.

The specific training process is shown in fig. 3, for example.

At step 310, the training data set is calibrated. Training data such as <1|0.1,0.4,3.2,20245,1,4 >.

In step 320, the training data is normalized to obtain <1|0.1,0.4,0.314,0.205,1,0.8 >.

And step 330, constructing a model by using a support vector classification method and a gaussian kernel, and performing model training to obtain an SVM classification model with parameters of C1.27 and g 3.55.

In step 340, user service scene feature data, user current emotion feature data, user portrait feature data and user satisfaction feature data are obtained. For example, <0.5,0.7,3.1,9001,0,1 >.

In step 350, the feature data is predicted using the classification model to obtain a follow-up category.

At step 360, the event is correspondingly followed according to the follow-up category.

In the embodiment, sufficient training data can be obtained by calibrating event data accessed by enough hot wires, and then an effective event processing classifier is obtained by selecting a classification model for training, so that an event intelligent classification processing task is solved, abnormal users are found out to follow continuously, and user experience is improved.

In step 410, the user voice response content is converted into text sentences.

When a user calls in, firstly, the user voice in the hot line conversation process is converted into original text sentences in real time through the current mature voice recognition technology. For example, the collected customer recording data is converted into text by ASR (Automatic Speech Recognition) technology.

In step 420, the word segmentation is performed on the text sentence to obtain a word segmentation sequence. For example, word segmentation is performed on a text sentence based on a dictionary matching or statistical method to obtain a word segmentation sequence, and meanwhile, new words and ambiguous words in the word segmentation sequence are removed.

In step 430, vectorization processing is performed on the word sequence to obtain a plurality of word vectors. For example, a word-to-word vector method is used to vectorize the word sequence and map each word of the mahjong into a vector with a fixed length.

The word-to-word vector algorithm includes CBOW (Continuous Bag-of-Words Model) and Skip-gram (Skip word) models. Because the corpus of the customer service system is very large in scale, in some embodiments, the corpus is trained in an unsupervised manner by adopting a Google Source tool Word2Vec based on a Skip-gram model, so as to obtain the representation of text Word vectors. The method can be used for compressing the data scale while capturing the context information, and the problem of multidirectional divergence of words is solved by the high dimensionality of the word vector, so that the stability of the model is ensured.

In some embodiments, each word is represented in the input layer by one-hot, i.e., all words are represented as an N-dimensional vector, where N is the total number of words in the vocabulary. In the vector, each word sets the corresponding dimension to 1, and the values of the other dimensions are 0. The value of the output layer vector can be calculated by the hidden layer vector (K dimension) and the K × N dimension weight matrix connecting the hidden layer and the output layer. The output layer is also an N-dimensional vector, each dimension corresponding to a word in the vocabulary. Finally, applying the Softmax activation function to the output layer vector, the generation probability of each word can be calculated.

At step 440, a sentence vector is derived from the word vector. For example, a matrix corresponding to the sentence is obtained based on the result of word vectorization.

In some embodiments, a weighted average is performed on word vectors of all words in the sentence, and the weight of each word vector can be expressed as

Where a is the parameter and p (w) is the frequency of the word w. The vector values are then modified using PCA (Principal Components Analysis)/SVD (Singular Value Decomposition), and a vector representation of the sentence, i.e., a sentence vector, is output.

In step 450, the sentence vector is input into the emotion classification model to obtain the current emotion feature data of the user. The emotion classification model is, for example, an LSTM (Long Short-Term Memory) neural network model.

In some embodiments, a sample sentence vector corresponding to the sample user voice response content is obtained; marking the emotion state index corresponding to the sample sentence vector to generate an emotion marking file; and training the emotion classification model based on the sample sentence vectors and the emotion marking files.

For example, the emotion classification model may be trained in advance. For example, word and sentence are preprocessed, for example, irrelevant information such as punctuation, slang and the like included in a text is removed, then labeling is performed, in the labeling process, three persons label at different time periods, the positive emotion is labeled as 1, the negative emotion is labeled as 0, and the labeling result is the maximum value of the three persons, that is, if the labeling result of a certain piece of information is different, the number of the persons with the largest number of the labeling results is taken as the final result, so that the accuracy of the labeling result is ensured. And performing Chinese word segmentation processing on experimental data by adopting a 'Chinese character' segmentation engine. And finally, randomly segmenting the data into a test set and a training set by adopting a random segmentation method, wherein the ratio of the test set to the training set is 2:8, the training set is used for training the LSTM model, and the test set is used for testing the classification effect of the classification model. In order to prevent the model from generating overfitting, Dropout random optimization factors are added into the LSTM network, so that the model can be well fitted in complex comment sentences, and the applicability of the model is improved. The final prediction results output a value between 0, 1.

In the embodiment, the current emotion characteristic data of the user can be obtained by processing the voice response content of the user, so that the prediction result is more accurate when the follow-up category corresponding to the voice response event is predicted subsequently.

Fig. 5 is a schematic structural diagram of some embodiments of a device for classifying and processing a voice response event according to the present disclosure. The apparatus, for example, provided in an intelligent answering robot, includes a scene characteristic data determining unit 510, an emotion characteristic data determining unit 520, a portrait characteristic data determining unit 530, and an event follow-up category determining unit 540.

The scene characteristic data determination unit 510 is configured to determine user service scene characteristic data corresponding to the user voice response event.

The emotional characteristic data determination unit 520 is configured to identify current emotional characteristic data of the user according to the user voice response content.

The representation feature data determination unit 530 is configured to determine user representation feature data based on access data of a user voice response event.

The event follow-up category determining unit 540 is configured to input the user service scene feature data, the user current emotion feature data and the user portrait feature data into the follow-up category classification model, and determine a follow-up category corresponding to the voice response event, so as to follow up the voice response event according to the follow-up category.

Fig. 6 is a schematic structural diagram of another embodiment of a device for classifying and processing a voice response event according to the present disclosure. The apparatus includes a satisfaction characteristic data determination unit 550 in addition to the scene characteristic data determination unit 510, emotion characteristic data determination unit 520, portrait characteristic data determination unit 530, and event follow-up category determination unit 540 described above.

The satisfaction profile determination unit 550 is configured to obtain user satisfaction profile of the user with respect to the voice response event.

The event follow-up category determining unit 540 is further configured to input the user service scene feature data, the user current emotion feature data, the user portrait feature data and the user satisfaction feature data into the follow-up category classification model, and determine a follow-up category corresponding to the voice response event.

In the embodiment, the user service scene characteristic data, the user current emotion characteristic data, the user portrait characteristic data and the user satisfaction degree characteristic data are input into the follow-up category classification model, and the follow-up category corresponding to the voice response event is determined, so that the follow-up category is divided more accurately, the event follow-up efficiency is improved, and the user experience is improved.

Fig. 7 is a schematic structural diagram of another embodiment of a device for classifying and processing a voice response event according to the present disclosure. The apparatus comprises a memory 710 and a processor 720, wherein: the memory 710 may be a magnetic disk, flash memory, or any other non-volatile storage medium. The memory is used to store instructions in the embodiments corresponding to fig. 1-4. Processor 720, coupled to memory 710, may be implemented as one or more integrated circuits, such as a microprocessor or microcontroller. The processor 720 is configured to execute instructions stored in the memory.

In some embodiments, as also shown in fig. 8, the apparatus 800 includes a memory 810 and a processor 820. The processor 820 is coupled to the memory 810 by a BUS 830. The device 800 may also be coupled to an external storage device 850 via a storage interface 840 for facilitating retrieval of external data, and may also be coupled to a network or another computer system (not shown) via a network interface 880, which will not be described in detail herein.

In this embodiment, the event processing efficiency can be improved by storing the data instructions in the memory and processing the data instructions by the processor.

In other embodiments, a computer-readable storage medium has stored thereon computer program instructions which, when executed by a processor, implement the steps of the method in the embodiments corresponding to fig. 1-4. As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, apparatus, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Thus far, the present disclosure has been described in detail. Some details that are well known in the art have not been described in order to avoid obscuring the concepts of the present disclosure. It will be fully apparent to those skilled in the art from the foregoing description how to practice the presently disclosed embodiments.

Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the foregoing examples are for purposes of illustration only and are not intended to limit the scope of the present disclosure. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims

1. A voice response event classification processing method comprises the following steps:

determining user service scene characteristic data corresponding to a user voice response event, wherein the user service scene characteristic data comprises a service class complexity value;

identifying the current emotion characteristic data of the user according to the voice response content of the user;

determining user portrait characteristic data according to access data of a user voice response event;

and inputting the user service scene characteristic data, the user current emotion characteristic data and the user image characteristic data into a follow-up category classification model, and determining a follow-up category corresponding to the voice response event so as to follow up the voice response event according to the follow-up category.

2. The voice response event classification processing method according to claim 1, further comprising:

acquiring user satisfaction characteristic data of the user on the voice response event;

and inputting the user service scene characteristic data, the user current emotion characteristic data, the user portrait characteristic data and the user satisfaction degree characteristic data into a follow-up category classification model, and determining a follow-up category corresponding to the voice response event.

3. The method for classifying and processing the voice response event according to claim 1 or 2, wherein the step of determining the user service scenario characteristic data corresponding to the user voice response event comprises the steps of:

according to the voice response content of the user, identifying the standard value of each level of service scene corresponding to the voice response event of the user, or responding to each level of service scene selected by the user in the voice response event of the user, and determining the standard value corresponding to each level of service scene;

determining a service scene score corresponding to the user voice response event according to the standard value of each level of service scene;

and determining the user service scene characteristic data according to the service scene score.

4. The voice response event classification processing method according to claim 1 or 2, wherein identifying the current emotional characteristic data of the user according to the voice response content of the user comprises:

converting the user voice response content into text sentences;

performing word segmentation processing on the text sentence to obtain a word segmentation sequence;

vectorizing the word segmentation sequence to obtain a plurality of word vectors;

obtaining a sentence vector according to the word vector;

and inputting the sentence vector into a sentiment classification model to obtain the current sentiment feature data of the user.

5. The voice response event classification processing method according to claim 4, further comprising:

obtaining a sample sentence vector corresponding to the voice response content of the sample user;

labeling the emotion characteristic data corresponding to the sample sentence vector to generate an emotion labeling file;

and training the emotion classification model based on the sample sentence vectors and the emotion marking files.

6. The method for classifying and processing the voice response event according to claim 1 or 2, wherein the step of determining the user portrait characteristic data according to the access data of the user voice response event comprises the steps of:

determining static information data and dynamic information data of a user according to access data of a user voice response event;

determining a basic label of a user according to static information data of the user;

determining an attribute tag of a user according to dynamic information data of the user;

and determining the user portrait feature data according to the basic label and the attribute label of the user.

7. The voice response event classification processing method according to claim 2, further comprising:

acquiring sample service scene characteristic data, sample emotion characteristic data, sample portrait characteristic data and sample satisfaction characteristic data corresponding to a sample user voice response event;

labeling corresponding sample follow-up categories of the sample service scene characteristic data, the sample emotion characteristic data, the sample image characteristic data and the sample satisfaction degree characteristic data to generate follow-up category labeling files;

and training the follow-up category classification model based on the sample service scene characteristic data, the sample emotion characteristic data, the sample image characteristic data, the sample satisfaction degree characteristic data and the follow-up category labeling file.

8. A speech response event classification processing apparatus comprising:

the scene feature data determining unit is configured to determine user service scene feature data corresponding to a user voice response event, wherein the user service scene feature data comprise a service class complexity value;

the emotion characteristic data determination unit is configured to identify current emotion characteristic data of the user according to the voice response content of the user;

the portrait characteristic data determining unit is configured to determine user portrait characteristic data according to access data of a user voice response event;

and the event follow-up category determining unit is configured to input the user service scene feature data, the user current emotion feature data and the user image feature data into a follow-up category classification model, determine a follow-up category corresponding to the voice response event, and follow-up the voice response event according to the follow-up category.

9. The speech response event classification processing apparatus according to claim 8, further comprising:

a satisfaction characteristic data determination unit configured to acquire user satisfaction characteristic data of a user on the voice response event;

the event follow-up category determining unit is further configured to input the user service scene feature data, the user current emotion feature data, the user portrait feature data and the user satisfaction feature data into a follow-up category classification model, and determine a follow-up category corresponding to the voice response event.

10. A speech response event classification processing apparatus comprising:

a memory; and

a processor coupled to the memory, the processor configured to perform the method of voice response event classification processing of any of claims 1 to 7 based on instructions stored in the memory.

11. A computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the voice response event classification processing method of any one of claims 1 to 7.