CN111489743A

CN111489743A - Operation management analysis system based on intelligent voice technology

Info

Publication number: CN111489743A
Application number: CN201910082514.9A
Authority: CN
Inventors: 张劭韡; 吴佐平; 王颖; 邓艳丽; 陈敏耀; 邓志东; 张晓慧; 杜小瑾; 姜冬; 徐景龙; 乔晅; 徐强
Original assignee: State Grid Co ltd Customer Service Center; Beijing China Power Information Technology Co Ltd
Current assignee: State Grid Co ltd Customer Service Center; Beijing China Power Information Technology Co Ltd
Priority date: 2019-01-28
Filing date: 2019-01-28
Publication date: 2020-08-04

Abstract

The invention provides an operation management analysis system based on an intelligent voice technology, which comprises: the recording acquisition unit is used for downloading a recording data file from the telephone recording platform, splicing and converting the file and generating a complete voice file; the scene segmentation unit is used for carrying out scene segmentation or speaker segmentation on the voice file; the voice transcription unit is used for recognizing the voice file based on an intelligent voice recognition engine and transcribing the voice file into text content; the data analysis unit is used for analyzing the text content and the voice file based on a neural network model and outputting an analysis report; the database unit is used for storing the voice file, the text content and the analysis report; and the content indexing unit is used for retrieving the data stored in the database according to the indexing command. The invention discovers the problems and the defects in the service process by intelligently analyzing a large number of recording data files of the call center and timely masters the appeal of the user, thereby improving the satisfaction degree of the user.

Description

Operation management analysis system based on intelligent voice technology

Technical Field

The invention relates to the technical field of information analysis, in particular to an operation management analysis system based on an intelligent voice technology.

Background

With the development of mobile communication technology, a customer service call center plays a crucial role as a bridge between an operation platform and a user. In recent years, the intelligent voice technology industry scale is rapidly and continuously increased under the drive of multiple factors such as global user demand pulling, national strategy guidance and enterprise competition, and is continuously and deeply applied to the fields of mobile internet, intelligent home, automotive electronics, financial payment, online education, medical treatment and the like. Under the promotion of mass data and deep learning, intelligent voice technologies such as voice recognition, voice synthesis and voiceprint recognition become mature day by day and start to enter a practical stage.

The national power grid 95598 call center is used as an important bridge between a national power grid company and a user, the Chinese speech recognition technology trained by adopting the current international mainstream DNN (deep neural network) and HMM (hidden Markov model) method can be suitable for application environments of different ages, different regions, different crowds, different channels, different terminals and different noise environments, meanwhile, the customized training of the model is carried out by utilizing the massive speech corpora and text corpora accumulated by the national power grid 95598 call center, a speech transcription and analysis platform with high availability and high recognition rate is established, the defects of unclear and inaccurate speech recognition and transcription in the prior art are greatly improved, and the speech recognition error rate is reduced.

Disclosure of Invention

In view of the above, the main objective of the present invention is to provide an operation management analysis system based on an intelligent voice technology, which establishes an intelligent voice recognition model with high availability and high recognition rate by using a large amount of voice corpora and text corpora accumulated in a national power grid 95598 call center, and continuously trains and optimizes the model by using a self-deep learning technology, so as to improve the recognition accuracy, continuously improve the recognition accuracy and the applicability to the power service industry, and timely transcribe and analyze a large amount of recording data generated by the call center based on the intelligent voice recognition model, thereby timely discovering defects in the service process, grasping user requirements, and improving the service quality.

The invention adopts the technical scheme that an operation management analysis system based on an intelligent voice technology comprises the following steps:

the recording acquisition unit is used for downloading a recording data file from the telephone recording platform, splicing and converting the file and generating a complete voice file;

the scene segmentation unit is used for carrying out scene segmentation or speaker segmentation on the voice file;

the voice transcription unit is used for recognizing the voice file based on an intelligent voice recognition engine and transcribing the voice file into text content;

the data analysis unit is used for analyzing the text content and the voice file based on a neural network model and outputting an analysis report;

the database unit is used for storing the voice file, the text content and the analysis report;

and the content indexing unit is used for retrieving the data stored in the database according to the indexing command.

Wherein, the recording acquisition unit includes:

the recording downloading module is connected with the telephone recording platform and used for receiving the recording data file segments transmitted by the telephone recording platform at regular time;

the splicing transcoding module is used for splicing the recording data file segments, decompressing the spliced recording data file and converting the recording data file into a complete voice file which can be identified;

and the transcription scheduling module is used for calling the corresponding voice file according to the transcription command and sending the voice file to the voice transcription unit.

Wherein the data analysis unit includes:

the audio analysis module comprises a silence interval detection module, a speech speed detection module and an emotion detection module and is used for respectively carrying out silence interval analysis, speech speed analysis and emotion analysis on the voice file;

and the text analysis module is used for analyzing the text content.

The voice file processing system is further improved by further comprising a recording distribution module used for respectively transmitting the voice file to the voice transcription unit, the data analysis unit and the database unit.

Wherein the intelligent speech recognition engine comprises an acoustic model and a language model corresponding to the calculation of syllable-to-audio feature probabilities and syllable-to-text probabilities extracted from the speech file, respectively;

the language model is modeled by adopting an N-Gram model;

the acoustic model is modeled by adopting a deep neural network and a hidden Markov model.

Wherein the content indexing unit includes:

the data storage module is used for storing the text content and the associated data generated in the transcription process;

the data query module is used for querying and aggregating the voice file and the associated data according to preset query conditions and displaying the result;

the word bank is internally provided with a self-defined word segmentation word bank, and can accurately segment the electric power related sentences;

the text word segmentation processing module is used for carrying out word segmentation marking on the text content based on the word segmentation result and extracting each word segmentation to generate structured text data;

and the text clustering module is used for performing text clustering processing and deep cross analysis on the structured text data generated in a time period to obtain text related clustering information in the time period.

Further improved, the system also comprises a standardized sentence vector model library which comprises a plurality of standardized sentence vector models;

the standardized sentence vector model is obtained by carrying out sentence vector similarity calculation on sentence samples in a corpus based on a neural network model and carrying out standardized training on sentence vectors meeting a similarity threshold.

In a further improvement, the speech transcription unit further includes a normalization processing module, configured to perform sentence segmentation on the text content, calculate a sentence vector of each sentence, and select a normalization sentence vector model corresponding to each sentence from the normalization sentence vector model library to perform normalization training on each sentence, so as to output a corresponding normalization sentence.

Drawings

Fig. 1 is a block diagram of an operation management analysis system based on intelligent voice technology.

Detailed Description

The invention mainly aims to provide an operation management analysis system based on an intelligent voice technology, which establishes an intelligent voice recognition model with high availability and high recognition rate by utilizing massive voice corpora and text corpora accumulated by a national power grid 95598 call center, continuously trains and optimizes the model by utilizing a self-deep learning technology, improves the recognition accuracy, continuously improves the recognition accuracy and the applicability to the power service industry, and timely transcribes and analyzes a large amount of recording data generated by the call center based on the intelligent voice recognition model, thereby timely discovering the defects in the service process, mastering the user requirements and improving the service quality.

The national power grid 95598 call center can be suitable for application environments of different ages, different regions, different crowds, different channels, different terminals and different noise environments by adopting a Chinese speech recognition technology trained by a method of DNN (deep neural network) + HMM (hidden Markov model) which is currently mainstream internationally, and simultaneously carries out customized training on models by utilizing mass speech corpora and text corpora accumulated by the national power grid 95598 call center, so that a speech transcription platform with high availability and high recognition rate is achieved;

the core technology of the voice transcription platform is an intelligent voice recognition technology, the intelligent voice recognition technology adopts a latest generation recognition algorithm, a decoder core and an advanced acoustic model and language model training method, and the intelligent voice recognition technology mainly comprises three important components: training a voice recognition model, processing front-end voice and processing rear-end recognition;

1. speech recognition model training

The speech recognition model is usually composed of two parts, an acoustic model and a language model, corresponding to the computation of syllable-to-syllable probabilities and syllable-to-word probabilities, respectively, of features extracted from the speech signal.

At present, a DNN (deep neural network) + HMM (hidden Markov model) method is generally adopted as a modeling method of an acoustic model, and compared with a GMM (Gaussian mixture model) + HMM method used in the previous generation, the error rate of speech recognition is reduced by 30%, which is the fastest progress in the speech recognition technology in the last 20 years. In the aspect of language models, a modeling method of a statistical language model is usually adopted at present, the statistical language model adopts an N-Gram model, the N-Gram model is also called a first-order markov chain, and the basic idea is to perform a sliding window operation with the size of N on the content in a text according to bytes to form a byte fragment sequence with the length of N, each byte fragment is called a Gram, statistics is performed on the occurrence frequency of all the grams, filtering is performed according to a preset threshold value to form a key Gram list, namely a vector feature space of the text, and each Gram in the list is a feature vector dimension;

the algorithm has the advantages of strong fault tolerance and language independence, is universal for Chinese, English and Chinese, does not need to be processed in linguistics, is a common language model in large-vocabulary continuous speech recognition, is simple and effective, and is widely used.

In order to adapt to application environments of different ages, different regions, different crowds, different channels, different terminals and different noise environments, a large amount of voice corpora and text corpora are required to be trained, and the recognition rate can be effectively improved. With the rapid development of the internet and the popularization and application of mobile terminals such as mobile phones and the like, a large amount of texts or linguistic data in the aspect of voice can be obtained from a plurality of channels at present, which provides rich resources for the training of language models and acoustic models in voice recognition models, and makes the construction of general large-scale language models and acoustic models possible.

2. Front-end speech processing

Front-end speech processing refers to preprocessing such as detecting and denoising the speaker's speech by using a signal processing method so as to obtain the speech most suitable for the recognition engine to process. The main functions include:

(1) endpoint detection

The endpoint detection is to analyze the input audio stream, distinguish the speech and non-speech signal periods in the speech signal, and accurately determine the starting point of the speech signal. After the endpoint detection, the subsequent processing can be carried out on the voice signal only, which plays an important role in improving the accuracy of the model and the recognition accuracy.

(2) Noise cancellation

In practical applications, background noise is a real challenge for speech recognition applications, and even if a speaker is in a quiet office environment, it is difficult to avoid certain noise during a telephone voice call. A good speech recognition engine needs to have efficient noise cancellation capabilities to accommodate the user's requirements for use in a wide variety of environments.

(3) Feature extraction

The features commonly used at present include MFCC (Mel Frequency Cepstrum Coefficient) and P L P (Perceptual L initial Prediction), etc.

3. Backend recognition processing

The back-end recognition processing is a process of recognizing (also referred to as "decoding") the extracted feature vectors by using the trained "acoustic model" and "language model" to obtain text information. The main purpose of the acoustic model is to correspond to the computation of the probabilities of speech features to syllables (or phonemes) and the main purpose of the language model is to correspond to the computation of the probabilities of syllables to words. The most important decoder part is that the original speech characteristics are subjected to acoustic model scoring and language model scoring, and an optimal word pattern sequence path is obtained on the basis, and the text corresponding to the path is the final recognition result.

The early decoder based on the syntax tree structure is designed more complicated, and under the current technical condition, the speed increase of the decoder is already met with a bottleneck, but most of the current mainstream speech recognition decoders adopt a decoding network based on a finite state machine (WFST), and the decoding network can integrate a language model, a dictionary and an acoustic shared tone word set into a large decoding network, so that the decoding speed is greatly improved, and the decoding process and a knowledge source can be separated.

Based on the above-mentioned massive speech corpus and text corpus accumulated based on the national power grid 95598 call center and the intelligent speech recognition model formed by adopting the DNN (deep neural network) + HMM (hidden Markov model) customized training, a preferred embodiment provided by the invention relates to an operation management analysis system, the integration aspect of the system needs to be directly connected with a 95598 telephone recording platform, a 95598 business management system and a quality inspection management module, and is indirectly connected with a 95598 business support system through a quality inspection management module.

The system strives for clear hierarchical division on the whole architecture, and the technology adopted by the core component achieves harmonious unification of advancement and maturity stability. The high availability requirements are fully considered by various interfaces, services and engines of the whole system, the interfaces adopt a master-standby mode in principle, the services and the engines adopt a load balancing mechanism, single-point faults do not exist, and service interruption or flow blockage caused by failure of a few nodes is avoided.

The system needs to deploy corresponding parts in the south (north) branch center, the south branch center needs to be in butt joint with a local telephone recording platform to obtain recording data nearby, and when recording is called, binary voice streams need to be transmitted between south and north networks.

The branch center in the north is also butted with a local recording platform to realize the acquisition and the transcription of the recording. In addition, because the user of the voice analysis is mainly close to the north branch center, and because of the requirement of data summarization, the quality control text content processing and the voice content analysis of the whole customer service center are centralized in the north branch center for processing, and the content retrieval service and the database need to be deployed in the north branch center.

The management of the data is realized by intensively storing the text data according to the actual condition, and various kinds of storage are planned according to the current reasonable flow, so that the processing requirement of the maximum voice flow during the peak-to-summer period is met.

The present system is described in detail below with reference to fig. 1, and the present operation management analysis system includes:

the recording acquisition unit 100 specifically comprises a recording downloading module, which is connected to the telephone recording platform and receives the recording data file segments transmitted by the telephone recording platform at regular time; the splicing transcoding module is used for splicing the recording data file segments, decompressing the spliced recording data file and converting the recording data file into a complete voice file which can be identified; the transcription scheduling module is used for calling the corresponding voice file to send according to the transcription command;

the recording distribution module 200 is configured to transmit the voice files respectively;

the voice transcription unit 300 is used for recognizing the voice file based on an intelligent voice recognition engine and transcribing the voice file into text content;

the intelligent speech recognition engine comprises an acoustic model and a language model, and the acoustic model and the language model respectively correspond to the calculation of the probability from the audio features extracted from the speech file to the syllables and the calculation of the probability from the syllables to the characters;

the language model is modeled by adopting an N-Gram model;

the acoustic model is modeled by adopting a deep neural network and a hidden Markov model

A scene division unit 400, configured to perform scene division or speaker division on the voice file;

in order to save cost, the current call center usually uses single-channel recording, that is, records users and customer services simultaneously and stores the records in the same channel. However, the customer service recording and the user recording are generally required to be analyzed respectively, the customer service recording is mainly used for evaluating the service capacity of the customer service, the user recording contains potential demand information of the user or competitor information and the like, and the business value is obvious. This function is commonly referred to as "speaker separation," also known as "scene segmentation";

the data analysis unit 500 analyzes the text content and the voice file based on the neural network model, and outputs an analysis report, which includes an audio analysis module and a text analysis module;

the text analysis module is used for analyzing the text content;

a database unit 600 for storing the voice file, text content and analysis report;

the content indexing unit 700 retrieves data stored in the database according to the index command, and specifically includes:

In addition, the system also comprises a standardized sentence vector model base which contains a plurality of standardized sentence vector models;

Based on the standardized sentence vector model library, the voice transcription unit further comprises a standardized processing module for sentence segmentation of the text content, calculating a sentence vector of each sentence, and selecting a standardized sentence vector model corresponding to each sentence from the standardized sentence vector model library to perform standardized training on each sentence so as to output a corresponding standardized sentence;

the standardized sentences are recombined to generate standardized text content, so that more accurate text analysis can be performed subsequently.

Based on the operation management analysis system provided by the invention, refined operation application on services can be realized, such as:

customer service voice quality inspection: by applying the voice analysis technology, various retrieval functions are flexibly combined and applied, different application parameter thresholds are set, and the problem of customer service call quality can be effectively and comprehensively analyzed and evaluated. The voice analysis can also locate the specific position where the problem occurs, thereby facilitating the further tracing and determining of the problem by the manager.

Operation management analysis: and the service short board is mined by combining the recording and the recording identification result to carry out service efficiency analysis, incoming call reason mining, user requirement analysis, call duration analysis, hotspot and change trend monitoring, so that auxiliary support is provided for standardizing the service process and optimizing the service flow.

The voice analysis technology supports that a voice analysis system detects the variation amplitude of fundamental frequency, pitch and the like in audio frequency in a telephone recording, provides prediction of emotion fluctuation possibly occurring in the recording, positions the position information of the audio frequency with emotion fluctuation in the whole voice, detects and analyzes the average speed of speech in the whole telephone recording and the variation of the speed of speech in a certain section of recording, detects the mute time and the like of no speaking of a user and a hotline service person in the recording file, generates an index file in a standard XM L format, and carries out keyword retrieval, abnormal condition detection, abnormal voice detection and abnormal dialogue detection.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. An operation management analysis system based on intelligent voice technology, comprising:

2. The system of claim 1, wherein the recording acquisition unit comprises:

3. The system according to claim 1 or 2, wherein the data analysis unit comprises:

and the text analysis module is used for analyzing the text content.

4. The system of claim 1, further comprising a recording distribution module for transmitting the voice file to the voice transcription unit, the data analysis unit, and the database unit, respectively.

5. The system of claim 1, wherein the intelligent speech recognition engine includes an acoustic model and a language model corresponding to the calculation of syllable to syllable probabilities and the calculation of syllable to text probabilities, respectively, of audio features extracted from a speech file;

the language model is modeled by adopting an N-Gram model;

6. The system of claim 1, wherein the content indexing unit comprises:

7. The system of claim 1, further comprising a library of standardized sentence vector models comprising a plurality of standardized sentence vector models;

8. The system of claim 7, wherein the speech transcription unit further comprises a normalization processing module for sentence-slicing the text content and calculating a sentence vector for each sentence, and selecting a normalization sentence vector model corresponding to each sentence from the normalization sentence vector model library to perform normalization training on each sentence to output a corresponding normalization sentence.