CN105810205A

CN105810205A - Speech processing method and device

Info

Publication number: CN105810205A
Application number: CN201410838240.9A
Authority: CN
Inventors: 王朝民; 冯俊兰
Original assignee: China Mobile Communications Group Co Ltd
Current assignee: China Mobile Communications Group Co Ltd
Priority date: 2014-12-29
Filing date: 2014-12-29
Publication date: 2016-07-27

Abstract

The invention provides a speech processing method and device. The speech processing method comprises the steps of processing hot speech data to acquire text information of the hot speech data and user speech information involved in the hot speech data; and acquiring an evaluation result of the hot speech data according to the text information and the user speech information. According to the scheme, the user satisfaction of customer service is evaluated in combination with the text information and the speech information in the speech data, so that the user satisfaction of customer service records can be evaluated more comprehensively and more meticulously, the object scale of the satisfaction evaluation is enlarged, meanwhile, the manpower cost is reduced, and a technical support is better provided for customer service hotline operation.

Description

A kind of method of speech processing and device

Technical field

The present invention relates to speech recognition and voice processing technology field, particularly to a kind of method of speech processing and device.

Background technology

Along with network, communication, computer technology development, enterprise presents the feature of electronization, long-range, virtualization, networking, and more Xian Shang enterprise emerges in multitude.And communication between client and enterprise and dialogue, the exchange and conmmunication of the remotely means such as Network Based, phone are also developed into by aspectant consulting, negotiation.In this context, become, based on the client service center (call center) of phone, the important channel that enterprise is mutual with user.Client service center every day, all facing to substantial amounts of telephone voice service, processes the diversified demand for services of client, including pre-sales consulting, purchase, after sale, complaint etc..In the process of telephone service, customer service needs the service object of the different emotion of reply, and makes suitable reaction.It can be said that the image that client service center is enterprise is represented, the service quality of client service center directly affects user's loyalty to enterprise, therefore by improving the service quality of customer service and improving customer satisfaction, loyalty has become the important public relations direction of enterprise.Additionally, by accuracy customer demand solution, promote customer service efficiency；And by reasonably managing customer service team, the service quality etc. of accurate evaluation employee, all become the direction that client service center needs constantly to inquire into, grope and study.

In prior art, adopting by speech recognition engine is content of text by hot line recording data " translation ", then pass through natural language understanding technology and obtain the satisfaction correlation analysis of text, thus judging the user satisfaction situation of this recording of taking on the telephone, but, existing customer service hot line recording customer satisfaction evaluation method is owing to using speech recognition engine to be identified rear text, process this text message thus obtaining user satisfaction information, actually lost the voice satisfaction information in recording originally, the unsatisfied situation of a lot of users does not have complete reaction out, have impact on satisfaction evaluation accuracy.

Known technology:

1) speech recognition technology: speech recognition technology, it is also referred to as automatic speech recognition (AutomaticSpeechRecognition, ASR), its target is to be computer-readable input by the vocabulary Content Transformation in the voice of the mankind, such as button, binary coding or character string, different from Speaker Identification and speaker verification, the latter attempts identifying or confirm to send the speaker of voice but not the vocabulary content that wherein comprises.

2) speech parameter fractional analysis: speech parameter is analyzed after including direct Waveform Analysis Method and speech parameterization extraction by the method for speech analysis, and current main flow is Parametric Analysis.Popular speech parameter has the parametrization that speech channel is composed, such as linear predictor coefficient (LPC (LinearPredictiveCoding, linear predictive coding) coefficient), MFCC cepstrum (MelFrequencyCepstrumCoefficient and MFCC) and line spectrum pair (LSP) etc.；Voice sound source parameter, as fundamental frequency, aperiodic composition etc.；Also has the linguistics content that higher level analysis obtains, such as the analysis of sound mother stock, tone, intonation；Also have deeper emotion information and speaker's individual information etc..By the analysis of these speech parameters, we can comprehensively understand the information that voice carries more, and more directly effectively processes voice.

3) speech emotion recognition: contain abundant information in the natural-sounding of people, including semantic information, speaker information and emotion information etc..Speech emotion recognition is through extracting the emotion correlated characteristic in voice signal, identifies the emotion information of speaker by setting up category of model, and the model being generally adopted and method have gauss hybrid models, principal component analysis, support vector machine etc..

Summary of the invention

The technical problem to be solved in the present invention is to provide a kind of method of speech processing and device, only user satisfaction information is obtained according to the text that hot line recording data is corresponding in order to solve existing customer service hot line recording customer satisfaction evaluation method, actually lost the voice satisfaction information in recording originally, the unsatisfied situation of a lot of users does not have complete reaction out, the problem that have impact on satisfaction evaluation accuracy.

In order to solve above-mentioned technical problem, the embodiment of the present invention provides a kind of method of speech processing, including:

Hot line speech data is processed, obtains the user speech information related in the text message of described hot line speech data and hot line speech data；

According to described text message and user speech information, obtain the evaluation result of described hot line speech data.

Further, described hot line speech data being processed, the step obtaining the user speech information related in the text message of described hot line speech data and hot line speech data includes:

Described hot line speech data is carried out voice recognition processing, it is thus achieved that the text message of hot line speech data；

Described hot line speech data is carried out cutting process, it is thus achieved that the user speech information related in hot line speech data.

Further, described described hot line speech data is carried out voice recognition processing, it is thus achieved that the step of the text message of hot line speech data includes:

By speech recognition engine, described hot line speech data is carried out voice recognition processing, it is thus achieved that the text message of hot line speech data.

Further, described described hot line speech data is carried out cutting process, it is thus achieved that the step of the user speech information related in hot line speech data includes:

By voice partitioning algorithm, described hot line speech data is carried out phonetic segmentation process, it is thus achieved that the user speech information related in hot line speech data.

Further, described according to described text message and user speech information, the step of the evaluation result obtaining described hot line speech data includes:

The text feature relevant to user satisfaction and emotion is extracted from described text message；

Extracting from described user speech information and speech emotional, abstract characteristics that customer satisfaction evaluation is relevant, as voice assemblage characteristic, described abstract characteristics includes: temporal signatures, frequency domain character, short-time characteristic, long time feature and session features；

It is combined described text feature and voice assemblage characteristic processing, obtains the evaluation result of described hot line speech data.

Further, described described text feature and voice assemblage characteristic being combined processes, and the step obtaining described hot line speech data evaluation result includes:

Determine the Feature Words weight in text feature and voice assemblage characteristic respectively according to presetting, be combined processing, using the combined treatment result evaluation result as described hot line speech data.

Further, if described hot line speech data includes mark degrees of data, then hot line speech data is processed, after obtaining the user speech information related in the text message of described hot line speech data and hot line speech data, according to described mark degrees of data and described text message and user speech information, obtain the evaluation result of described hot line speech data.

The embodiment of the present invention provides a kind of voice processing apparatus, including:

First acquisition module, for hot line speech data is processed, obtains the user speech information related in the text message of described hot line speech data and hot line speech data；

Second acquisition module, for according to described text message and user speech information, obtaining the evaluation result of described hot line speech data.

Further, described first acquisition module, including:

Voice recognition unit, for carrying out voice recognition processing to described hot line speech data, it is thus achieved that the text message of hot line speech data；

Phonetic segmentation unit, for carrying out cutting process to described hot line speech data, it is thus achieved that the user speech information related in hot line speech data.

Further, described second acquisition module, including:

Text character extraction unit, for extracting the text feature relevant to user satisfaction and emotion from described text message；

Speech feature extraction unit, for extracting from described user speech information and speech emotional, abstract characteristics that customer satisfaction evaluation is relevant, as voice assemblage characteristic, described abstract characteristics includes: temporal signatures, frequency domain character, short-time characteristic, long time feature and session features；

Processing unit, for being combined described text feature and voice assemblage characteristic processing, obtains the evaluation result of described hot line speech data.

The invention has the beneficial effects as follows:

Such scheme, in conjunction with the text message in speech data and voice messaging, the user satisfaction of customer service is evaluated, the user satisfaction situation of evaluation customer service recording that can be more comprehensively more careful, expand the object scale of satisfaction evaluation, save human cost simultaneously, better provide technical support for customer service hot line operation.

Accompanying drawing explanation

Fig. 1 represents the overview flow chart of the described method of speech processing of the embodiment of the present invention；

Fig. 2 represents the detail flowchart of step 20 in Fig. 1；

Fig. 3 represents the module diagram of the described voice processing apparatus of the embodiment of the present invention；

Fig. 4 represents the composition structural representation of described second acquisition module of the embodiment of the present invention；

Fig. 5 degree of being satisfied with evaluation realize schematic flow sheet.

Detailed description of the invention

For making the object, technical solutions and advantages of the present invention clearly, describe the present invention below in conjunction with the accompanying drawings and the specific embodiments.

The present invention is directed to existing customer service hot line recording customer satisfaction evaluation method owing to using speech recognition engine to be identified rear text, process this text message thus obtaining user satisfaction information, actually lost the voice satisfaction information in recording originally, the unsatisfied situation of a lot of users does not have complete reaction out, the problem that have impact on satisfaction evaluation accuracy, it is provided that a kind of method of speech processing and device.

As it is shown in figure 1, the embodiment of the present invention provides a kind of method of speech processing, including:

Step 10, processes hot line speech data, obtains the user speech information related in the text message of described hot line speech data and hot line speech data；

Step 20, according to described text message and user speech information, obtains the evaluation result of described hot line speech data.

Such scheme, by the text message in extraction hot line voice and voice messaging, it is combined the two processing, jointly obtain the evaluation result of user satisfaction, this kind of method, it is possible to the more comprehensively more careful user satisfaction situation evaluating customer service recording, expands the object scale of satisfaction evaluation, save human cost simultaneously, better provide technical support for customer service hot line operation.

In another embodiment of the present invention, described step 10 includes:

Step 11, carries out voice recognition processing to described hot line speech data, it is thus achieved that the text message of hot line speech data；

Step 12, carries out cutting process to described hot line speech data, it is thus achieved that the user speech information related in hot line speech data.

It should be noted that, the corresponding text identification in hot line speech data is gone out by described step 11 mainly through speech recognition engine, forming the text after identifying, this text only comprises voice content information in hot line speech data；Described step 12 mainly through voice cutting techniques by the dialogue both sides' phonetic segmentation in hot line voice, obtain the voice segments collection of user, customer service represent voice segments collection and relevant dialog information, after completing step 10, just obtain the voice segments collection of user and voice content corresponding to described user, voice segments collection that customer service represents and described customer service represent corresponding voice content.

It should be noted that, the information obtained in described step 10, it it is the follow-up basis that described hot line speech data is evaluated, it follows that be the step being evaluated according to information acquired in described step 10, as shown in Figure 2, in further embodiment of this invention, described step 20 particularly as follows:

Step 21, extracts the text feature relevant to user satisfaction and emotion from described text message；

Step 22, extracts and speech emotional, abstract characteristics that customer satisfaction evaluation is relevant from described user speech information, and as voice assemblage characteristic, described abstract characteristics includes: temporal signatures, frequency domain character, short-time characteristic, long time feature and session features；

Step 23, is combined described text feature and voice assemblage characteristic processing, obtains the evaluation result of described hot line speech data.

Alternatively, described step 21 concrete implementation mode is: extract the text feature relevant to user satisfaction and emotion by the correlation technique of natural language processing, including emphasis vocabulary and assemblage characteristic etc., described emphasis vocabulary and assemblage characteristic are formulated according to practical situation；Described step 22 concrete implementation mode is: extract the abstract characteristics in voice signal by technology such as relevant treatment, Fourier transformation, homomorphic filterings, including temporal signatures and frequency domain character, include short-time characteristic (with speech frame for analytic unit) from feature span, long time feature (with phrase or sentence for analytic unit) and session features (with the context of dialogue for analytic unit), the present invention selects combine with speech emotional, feature that customer satisfaction evaluation is relevant to be analyzed.

After obtaining text feature and voice assemblage characteristic, it is analyzed described text feature and voice assemblage characteristic processing, just the evaluation result that hot line speech data is corresponding can be obtained, therefore, in further embodiment of this invention, described step 23 concrete implementation mode is: determine the Feature Words weight in text feature and voice assemblage characteristic respectively according to presetting, and is combined processing, using the combined treatment result evaluation result as described hot line speech data.

The realization of described step 23 is described as follows that (setting rule can be: unsatisfied emotion of particular words is more strong, and the weight of the text message of its correspondence is more big；Passing judgment on of the weight of voice messaging is regular identical with text message), in each hot line voice, only one of which Feature Words illustrates:

Example 1: when the default Feature Words that need to analyze is for " being unsatisfied with ", when including the word of " being unsatisfied with " in described hot line speech data being detected, because this word emotional color is stronger, the weight of the text message that this hot line speech data is corresponding is 10, and the information in the voice dialogue corresponding to word obtaining described " being unsatisfied with " of analyzing is " being not be unsatisfied with ", now, the weight of described default Feature Words correspondence voice messaging is 1, the numerical value that the weight of described text message is corresponding with the weight of voice messaging is overlapped or subtracts each other, result computing obtained is as the evaluation result of described hot line speech data, to be superposed to example in the present invention, therefore features described above word is the evaluation result of " being unsatisfied with " is 11.

Example 2: when the default Feature Words that need to analyze is for " call ", when including the word of " call " in described hot line speech data being detected, because this word belongs to neutral word, do not comprise emotional color, the weight of the text message that this hot line speech data is corresponding is 0, and the information in the voice dialogue corresponding to word obtaining described " call " of analyzing is " call response is very slow ", emotional color with user, now, the weight of described default Feature Words correspondence voice messaging is 3, the numerical value that the weight of described text message is corresponding with the weight of voice messaging is overlapped, the value obtained is 3, therefore the evaluation result that " call " Feature Words described in this hot line speech data is corresponding is 3.

In actual applications; evaluation to each hot line voice; multiple Feature Words would generally be selected to carry out overall merit; when obtaining evaluation result; can first obtain the evaluation result of each Feature Words respectively; then the evaluation result of each Feature Words is carried out corresponding computing again and (can first acquire the shared proportion of the emotion of each Feature Words; then undertaken being multiplied the computing of then superposition respectively by the proportion of the weight of these Feature Words and correspondence), finally obtain the comprehensive evaluation result of this hot line speech data.

It is above to hot line speech data does not mark degrees of data that (described mark degrees of data is after call completes, the actual evaluating data provided of user) the explanation that carries out of situation, in another embodiment of the present invention, if described hot line speech data includes mark degrees of data, then hot line speech data is processed, after obtaining the user speech information related in the text message of described hot line speech data and hot line speech data, according to described mark degrees of data and described text message and user speech information, obtain the evaluation result of described hot line speech data.

Should be noted that, when containing mark degrees of data, direct basis mark degrees of data can obtain evaluation result, first can also analyze described text message according to said method and user speech information obtains rough evaluation result, then according to the proportion that mark degrees of data accounts for is 80%, the data of rough evaluation result account for the proportion of 20%, carry out comprehensive computing, finally obtain the evaluation result of described hot line speech data.

Should be noted that, said method is used to obtain the customer service satisfaction evaluation result of hot line speech data, the user satisfaction situation of evaluation customer service recording that can be more comprehensively more careful, expand the object scale of satisfaction evaluation, save human cost simultaneously, better provide technical support for customer service hot line operation.

As it is shown on figure 3, the embodiment of the present invention also provides for a kind of voice processing apparatus, including:

First acquisition module 100, for hot line speech data is processed, obtains the user speech information related in the text message of described hot line speech data and hot line speech data；

Second acquisition module 200, for according to described text message and user speech information, obtaining the evaluation result of described hot line speech data.

Alternatively, described first acquisition module 100, including:

Alternatively, as shown in Figure 4, described second acquisition module 200, including:

Text character extraction unit 201, for extracting the text feature relevant to user satisfaction and emotion from described text message；

Speech feature extraction unit 202, for extracting from described user speech information and speech emotional, abstract characteristics that customer satisfaction evaluation is relevant, as voice assemblage characteristic, described abstract characteristics includes: temporal signatures, frequency domain character, short-time characteristic, long time feature and session features；

Processing unit 203, for being combined described text feature and voice assemblage characteristic processing, obtains the evaluation result of described hot line speech data.

It should be noted that this device embodiment is the device corresponding with said method, all implementations of said method, all suitable in this device embodiment, also can reach technique effect same as mentioned above.

The actual process that realizes of the present invention is: is first the training stage of evaluation model, by repeatedly simulation training, acquires the good evaluation model of evaluation effect, is then according to the evaluation of existing hot line recording data to obtaining of this evaluation model.As shown in Figure 5, in the training stage: first the hot line recording data containing satisfaction labeled data is carried out speaker's cutting and acquire speech data, and speech recognition obtains text data, then speech data is carried out speech feature extraction, text data is carried out Text character extraction, last according to said method foundation phonetic feature and text feature, omnibearing parsing user satisfaction feature is evaluated the training of model；Evaluation phase: after the evaluation model obtaining optimum, determines, by this evaluation model identification or classification, the user satisfaction confidence level that wall scroll hot line is recorded, thus automatically providing user satisfaction marking, just can obtain satisfaction evaluation result with this.

In actual applications, training end adopts news to fly the identification engine of offer to identify that wall scroll 10086 customer service recording is identified rear text, then ties up (namely 13 kinds) satisfaction related text feature by text analyzing and Text character extraction to 13；Totally 45 dimension (namely 45 kinds) the satisfaction related voice feature of 10 classes in hot line recording is extracted by various voice tools of increasing income；Combine 13 Balakrishnan eigens and 45 dimension phonetic features and existing satisfaction labeled data trains a SVM (SupportVectorMachine, support vector machine) grader；

Identify that a new hot line customer service recording is extracted 13 dimension demand characteristics by speech recognition engine and Text Feature Extraction instrument by end, extract 45 dimension phonetic features of demand again through voice tool；Then features described above is obtained classification results by SVM classifier, namely satisfaction evaluation result, complete automatic satisfaction evaluation.

Above-described is the preferred embodiment of the present invention; should be understood that the ordinary person for the art; can also making some improvements and modifications under without departing from principle premise of the present invention, these improvements and modifications are also in protection scope of the present invention.

Claims

1. a method of speech processing, it is characterised in that including:

2. method of speech processing according to claim 1, it is characterised in that described hot line speech data is processed, the step obtaining the user speech information related in the text message of described hot line speech data and hot line speech data includes:

3. method of speech processing according to claim 2, it is characterised in that described described hot line speech data is carried out voice recognition processing, it is thus achieved that the step of the text message of hot line speech data includes:

4. method of speech processing according to claim 2, it is characterised in that described described hot line speech data is carried out cutting process, it is thus achieved that the step of the user speech information related in hot line speech data includes:

5. method of speech processing according to claim 1, it is characterised in that described according to described text message and user speech information, the step of the evaluation result obtaining described hot line speech data includes:

6. method of speech processing according to claim 5, it is characterised in that described described text feature and voice assemblage characteristic being combined processes, and the step obtaining described hot line speech data evaluation result includes:

7. method of speech processing according to claim 1, it is characterized in that, if described hot line speech data includes mark degrees of data, then hot line speech data is processed, after obtaining the user speech information related in the text message of described hot line speech data and hot line speech data, according to described mark degrees of data and described text message and user speech information, obtain the evaluation result of described hot line speech data.

8. a voice processing apparatus, it is characterised in that including:

9. voice processing apparatus according to claim 8, it is characterised in that described first acquisition module, including:

10. voice processing apparatus according to claim 8, it is characterised in that described second acquisition module, including: