CN111489743A - Operation management analysis system based on intelligent voice technology - Google Patents
Operation management analysis system based on intelligent voice technology Download PDFInfo
- Publication number
- CN111489743A CN111489743A CN201910082514.9A CN201910082514A CN111489743A CN 111489743 A CN111489743 A CN 111489743A CN 201910082514 A CN201910082514 A CN 201910082514A CN 111489743 A CN111489743 A CN 111489743A
- Authority
- CN
- China
- Prior art keywords
- voice
- file
- analysis
- recording
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 53
- 238000005516 engineering process Methods 0.000 title claims abstract description 24
- 230000011218 segmentation Effects 0.000 claims abstract description 27
- 238000013518 transcription Methods 0.000 claims abstract description 25
- 230000035897 transcription Effects 0.000 claims abstract description 25
- 238000000034 method Methods 0.000 claims abstract description 17
- 238000007726 management method Methods 0.000 claims abstract description 16
- 230000008569 process Effects 0.000 claims abstract description 10
- 238000007405 data analysis Methods 0.000 claims abstract description 8
- 238000003062 neural network model Methods 0.000 claims abstract description 7
- 239000013598 vector Substances 0.000 claims description 28
- 238000012545 processing Methods 0.000 claims description 20
- 238000001514 detection method Methods 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 13
- 238000010606 normalization Methods 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 8
- 230000008451 emotion Effects 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 230000004931 aggregating effect Effects 0.000 claims description 3
- 238000013500 data storage Methods 0.000 claims description 3
- 230000007547 defect Effects 0.000 abstract description 4
- 230000002159 abnormal effect Effects 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000007689 inspection Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000000306 component Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 239000008358 core component Substances 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 210000001503 joint Anatomy 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/50—Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
- H04M3/51—Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing
- H04M3/5166—Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing in combination with interactive voice response systems or voice portals, e.g. as front-ends
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/50—Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
- H04M3/53—Centralised arrangements for recording incoming messages, i.e. mailbox systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Probability & Statistics with Applications (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Business, Economics & Management (AREA)
- Marketing (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention provides an operation management analysis system based on an intelligent voice technology, which comprises: the recording acquisition unit is used for downloading a recording data file from the telephone recording platform, splicing and converting the file and generating a complete voice file; the scene segmentation unit is used for carrying out scene segmentation or speaker segmentation on the voice file; the voice transcription unit is used for recognizing the voice file based on an intelligent voice recognition engine and transcribing the voice file into text content; the data analysis unit is used for analyzing the text content and the voice file based on a neural network model and outputting an analysis report; the database unit is used for storing the voice file, the text content and the analysis report; and the content indexing unit is used for retrieving the data stored in the database according to the indexing command. The invention discovers the problems and the defects in the service process by intelligently analyzing a large number of recording data files of the call center and timely masters the appeal of the user, thereby improving the satisfaction degree of the user.
Description
Technical Field
The invention relates to the technical field of information analysis, in particular to an operation management analysis system based on an intelligent voice technology.
Background
With the development of mobile communication technology, a customer service call center plays a crucial role as a bridge between an operation platform and a user. In recent years, the intelligent voice technology industry scale is rapidly and continuously increased under the drive of multiple factors such as global user demand pulling, national strategy guidance and enterprise competition, and is continuously and deeply applied to the fields of mobile internet, intelligent home, automotive electronics, financial payment, online education, medical treatment and the like. Under the promotion of mass data and deep learning, intelligent voice technologies such as voice recognition, voice synthesis and voiceprint recognition become mature day by day and start to enter a practical stage.
The national power grid 95598 call center is used as an important bridge between a national power grid company and a user, the Chinese speech recognition technology trained by adopting the current international mainstream DNN (deep neural network) and HMM (hidden Markov model) method can be suitable for application environments of different ages, different regions, different crowds, different channels, different terminals and different noise environments, meanwhile, the customized training of the model is carried out by utilizing the massive speech corpora and text corpora accumulated by the national power grid 95598 call center, a speech transcription and analysis platform with high availability and high recognition rate is established, the defects of unclear and inaccurate speech recognition and transcription in the prior art are greatly improved, and the speech recognition error rate is reduced.
Disclosure of Invention
In view of the above, the main objective of the present invention is to provide an operation management analysis system based on an intelligent voice technology, which establishes an intelligent voice recognition model with high availability and high recognition rate by using a large amount of voice corpora and text corpora accumulated in a national power grid 95598 call center, and continuously trains and optimizes the model by using a self-deep learning technology, so as to improve the recognition accuracy, continuously improve the recognition accuracy and the applicability to the power service industry, and timely transcribe and analyze a large amount of recording data generated by the call center based on the intelligent voice recognition model, thereby timely discovering defects in the service process, grasping user requirements, and improving the service quality.
The invention adopts the technical scheme that an operation management analysis system based on an intelligent voice technology comprises the following steps:
the recording acquisition unit is used for downloading a recording data file from the telephone recording platform, splicing and converting the file and generating a complete voice file;
the scene segmentation unit is used for carrying out scene segmentation or speaker segmentation on the voice file;
the voice transcription unit is used for recognizing the voice file based on an intelligent voice recognition engine and transcribing the voice file into text content;
the data analysis unit is used for analyzing the text content and the voice file based on a neural network model and outputting an analysis report;
the database unit is used for storing the voice file, the text content and the analysis report;
and the content indexing unit is used for retrieving the data stored in the database according to the indexing command.
Wherein, the recording acquisition unit includes:
the recording downloading module is connected with the telephone recording platform and used for receiving the recording data file segments transmitted by the telephone recording platform at regular time;
the splicing transcoding module is used for splicing the recording data file segments, decompressing the spliced recording data file and converting the recording data file into a complete voice file which can be identified;
and the transcription scheduling module is used for calling the corresponding voice file according to the transcription command and sending the voice file to the voice transcription unit.
Wherein the data analysis unit includes:
the audio analysis module comprises a silence interval detection module, a speech speed detection module and an emotion detection module and is used for respectively carrying out silence interval analysis, speech speed analysis and emotion analysis on the voice file;
and the text analysis module is used for analyzing the text content.
The voice file processing system is further improved by further comprising a recording distribution module used for respectively transmitting the voice file to the voice transcription unit, the data analysis unit and the database unit.
Wherein the intelligent speech recognition engine comprises an acoustic model and a language model corresponding to the calculation of syllable-to-audio feature probabilities and syllable-to-text probabilities extracted from the speech file, respectively;
the language model is modeled by adopting an N-Gram model;
the acoustic model is modeled by adopting a deep neural network and a hidden Markov model.
Wherein the content indexing unit includes:
the data storage module is used for storing the text content and the associated data generated in the transcription process;
the data query module is used for querying and aggregating the voice file and the associated data according to preset query conditions and displaying the result;
the word bank is internally provided with a self-defined word segmentation word bank, and can accurately segment the electric power related sentences;
the text word segmentation processing module is used for carrying out word segmentation marking on the text content based on the word segmentation result and extracting each word segmentation to generate structured text data;
and the text clustering module is used for performing text clustering processing and deep cross analysis on the structured text data generated in a time period to obtain text related clustering information in the time period.
Further improved, the system also comprises a standardized sentence vector model library which comprises a plurality of standardized sentence vector models;
the standardized sentence vector model is obtained by carrying out sentence vector similarity calculation on sentence samples in a corpus based on a neural network model and carrying out standardized training on sentence vectors meeting a similarity threshold.
In a further improvement, the speech transcription unit further includes a normalization processing module, configured to perform sentence segmentation on the text content, calculate a sentence vector of each sentence, and select a normalization sentence vector model corresponding to each sentence from the normalization sentence vector model library to perform normalization training on each sentence, so as to output a corresponding normalization sentence.
Drawings
Fig. 1 is a block diagram of an operation management analysis system based on intelligent voice technology.
Detailed Description
The invention mainly aims to provide an operation management analysis system based on an intelligent voice technology, which establishes an intelligent voice recognition model with high availability and high recognition rate by utilizing massive voice corpora and text corpora accumulated by a national power grid 95598 call center, continuously trains and optimizes the model by utilizing a self-deep learning technology, improves the recognition accuracy, continuously improves the recognition accuracy and the applicability to the power service industry, and timely transcribes and analyzes a large amount of recording data generated by the call center based on the intelligent voice recognition model, thereby timely discovering the defects in the service process, mastering the user requirements and improving the service quality.
The national power grid 95598 call center can be suitable for application environments of different ages, different regions, different crowds, different channels, different terminals and different noise environments by adopting a Chinese speech recognition technology trained by a method of DNN (deep neural network) + HMM (hidden Markov model) which is currently mainstream internationally, and simultaneously carries out customized training on models by utilizing mass speech corpora and text corpora accumulated by the national power grid 95598 call center, so that a speech transcription platform with high availability and high recognition rate is achieved;
the core technology of the voice transcription platform is an intelligent voice recognition technology, the intelligent voice recognition technology adopts a latest generation recognition algorithm, a decoder core and an advanced acoustic model and language model training method, and the intelligent voice recognition technology mainly comprises three important components: training a voice recognition model, processing front-end voice and processing rear-end recognition;
1. speech recognition model training
The speech recognition model is usually composed of two parts, an acoustic model and a language model, corresponding to the computation of syllable-to-syllable probabilities and syllable-to-word probabilities, respectively, of features extracted from the speech signal.
At present, a DNN (deep neural network) + HMM (hidden Markov model) method is generally adopted as a modeling method of an acoustic model, and compared with a GMM (Gaussian mixture model) + HMM method used in the previous generation, the error rate of speech recognition is reduced by 30%, which is the fastest progress in the speech recognition technology in the last 20 years. In the aspect of language models, a modeling method of a statistical language model is usually adopted at present, the statistical language model adopts an N-Gram model, the N-Gram model is also called a first-order markov chain, and the basic idea is to perform a sliding window operation with the size of N on the content in a text according to bytes to form a byte fragment sequence with the length of N, each byte fragment is called a Gram, statistics is performed on the occurrence frequency of all the grams, filtering is performed according to a preset threshold value to form a key Gram list, namely a vector feature space of the text, and each Gram in the list is a feature vector dimension;
the algorithm has the advantages of strong fault tolerance and language independence, is universal for Chinese, English and Chinese, does not need to be processed in linguistics, is a common language model in large-vocabulary continuous speech recognition, is simple and effective, and is widely used.
In order to adapt to application environments of different ages, different regions, different crowds, different channels, different terminals and different noise environments, a large amount of voice corpora and text corpora are required to be trained, and the recognition rate can be effectively improved. With the rapid development of the internet and the popularization and application of mobile terminals such as mobile phones and the like, a large amount of texts or linguistic data in the aspect of voice can be obtained from a plurality of channels at present, which provides rich resources for the training of language models and acoustic models in voice recognition models, and makes the construction of general large-scale language models and acoustic models possible.
2. Front-end speech processing
Front-end speech processing refers to preprocessing such as detecting and denoising the speaker's speech by using a signal processing method so as to obtain the speech most suitable for the recognition engine to process. The main functions include:
(1) endpoint detection
The endpoint detection is to analyze the input audio stream, distinguish the speech and non-speech signal periods in the speech signal, and accurately determine the starting point of the speech signal. After the endpoint detection, the subsequent processing can be carried out on the voice signal only, which plays an important role in improving the accuracy of the model and the recognition accuracy.
(2) Noise cancellation
In practical applications, background noise is a real challenge for speech recognition applications, and even if a speaker is in a quiet office environment, it is difficult to avoid certain noise during a telephone voice call. A good speech recognition engine needs to have efficient noise cancellation capabilities to accommodate the user's requirements for use in a wide variety of environments.
(3) Feature extraction
The features commonly used at present include MFCC (Mel Frequency Cepstrum Coefficient) and P L P (Perceptual L initial Prediction), etc.
3. Backend recognition processing
The back-end recognition processing is a process of recognizing (also referred to as "decoding") the extracted feature vectors by using the trained "acoustic model" and "language model" to obtain text information. The main purpose of the acoustic model is to correspond to the computation of the probabilities of speech features to syllables (or phonemes) and the main purpose of the language model is to correspond to the computation of the probabilities of syllables to words. The most important decoder part is that the original speech characteristics are subjected to acoustic model scoring and language model scoring, and an optimal word pattern sequence path is obtained on the basis, and the text corresponding to the path is the final recognition result.
The early decoder based on the syntax tree structure is designed more complicated, and under the current technical condition, the speed increase of the decoder is already met with a bottleneck, but most of the current mainstream speech recognition decoders adopt a decoding network based on a finite state machine (WFST), and the decoding network can integrate a language model, a dictionary and an acoustic shared tone word set into a large decoding network, so that the decoding speed is greatly improved, and the decoding process and a knowledge source can be separated.
Based on the above-mentioned massive speech corpus and text corpus accumulated based on the national power grid 95598 call center and the intelligent speech recognition model formed by adopting the DNN (deep neural network) + HMM (hidden Markov model) customized training, a preferred embodiment provided by the invention relates to an operation management analysis system, the integration aspect of the system needs to be directly connected with a 95598 telephone recording platform, a 95598 business management system and a quality inspection management module, and is indirectly connected with a 95598 business support system through a quality inspection management module.
The system strives for clear hierarchical division on the whole architecture, and the technology adopted by the core component achieves harmonious unification of advancement and maturity stability. The high availability requirements are fully considered by various interfaces, services and engines of the whole system, the interfaces adopt a master-standby mode in principle, the services and the engines adopt a load balancing mechanism, single-point faults do not exist, and service interruption or flow blockage caused by failure of a few nodes is avoided.
The system needs to deploy corresponding parts in the south (north) branch center, the south branch center needs to be in butt joint with a local telephone recording platform to obtain recording data nearby, and when recording is called, binary voice streams need to be transmitted between south and north networks.
The branch center in the north is also butted with a local recording platform to realize the acquisition and the transcription of the recording. In addition, because the user of the voice analysis is mainly close to the north branch center, and because of the requirement of data summarization, the quality control text content processing and the voice content analysis of the whole customer service center are centralized in the north branch center for processing, and the content retrieval service and the database need to be deployed in the north branch center.
The management of the data is realized by intensively storing the text data according to the actual condition, and various kinds of storage are planned according to the current reasonable flow, so that the processing requirement of the maximum voice flow during the peak-to-summer period is met.
The present system is described in detail below with reference to fig. 1, and the present operation management analysis system includes:
the recording acquisition unit 100 specifically comprises a recording downloading module, which is connected to the telephone recording platform and receives the recording data file segments transmitted by the telephone recording platform at regular time; the splicing transcoding module is used for splicing the recording data file segments, decompressing the spliced recording data file and converting the recording data file into a complete voice file which can be identified; the transcription scheduling module is used for calling the corresponding voice file to send according to the transcription command;
the recording distribution module 200 is configured to transmit the voice files respectively;
the voice transcription unit 300 is used for recognizing the voice file based on an intelligent voice recognition engine and transcribing the voice file into text content;
the intelligent speech recognition engine comprises an acoustic model and a language model, and the acoustic model and the language model respectively correspond to the calculation of the probability from the audio features extracted from the speech file to the syllables and the calculation of the probability from the syllables to the characters;
the language model is modeled by adopting an N-Gram model;
the acoustic model is modeled by adopting a deep neural network and a hidden Markov model
A scene division unit 400, configured to perform scene division or speaker division on the voice file;
in order to save cost, the current call center usually uses single-channel recording, that is, records users and customer services simultaneously and stores the records in the same channel. However, the customer service recording and the user recording are generally required to be analyzed respectively, the customer service recording is mainly used for evaluating the service capacity of the customer service, the user recording contains potential demand information of the user or competitor information and the like, and the business value is obvious. This function is commonly referred to as "speaker separation," also known as "scene segmentation";
the data analysis unit 500 analyzes the text content and the voice file based on the neural network model, and outputs an analysis report, which includes an audio analysis module and a text analysis module;
the audio analysis module comprises a silence interval detection module, a speech speed detection module and an emotion detection module and is used for respectively carrying out silence interval analysis, speech speed analysis and emotion analysis on the voice file;
the text analysis module is used for analyzing the text content;
a database unit 600 for storing the voice file, text content and analysis report;
the content indexing unit 700 retrieves data stored in the database according to the index command, and specifically includes:
the data storage module is used for storing the text content and the associated data generated in the transcription process;
the data query module is used for querying and aggregating the voice file and the associated data according to preset query conditions and displaying the result;
the word bank is internally provided with a self-defined word segmentation word bank, and can accurately segment the electric power related sentences;
the text word segmentation processing module is used for carrying out word segmentation marking on the text content based on the word segmentation result and extracting each word segmentation to generate structured text data;
and the text clustering module is used for performing text clustering processing and deep cross analysis on the structured text data generated in a time period to obtain text related clustering information in the time period.
In addition, the system also comprises a standardized sentence vector model base which contains a plurality of standardized sentence vector models;
the standardized sentence vector model is obtained by carrying out sentence vector similarity calculation on sentence samples in a corpus based on a neural network model and carrying out standardized training on sentence vectors meeting a similarity threshold.
Based on the standardized sentence vector model library, the voice transcription unit further comprises a standardized processing module for sentence segmentation of the text content, calculating a sentence vector of each sentence, and selecting a standardized sentence vector model corresponding to each sentence from the standardized sentence vector model library to perform standardized training on each sentence so as to output a corresponding standardized sentence;
the standardized sentences are recombined to generate standardized text content, so that more accurate text analysis can be performed subsequently.
Based on the operation management analysis system provided by the invention, refined operation application on services can be realized, such as:
customer service voice quality inspection: by applying the voice analysis technology, various retrieval functions are flexibly combined and applied, different application parameter thresholds are set, and the problem of customer service call quality can be effectively and comprehensively analyzed and evaluated. The voice analysis can also locate the specific position where the problem occurs, thereby facilitating the further tracing and determining of the problem by the manager.
Operation management analysis: and the service short board is mined by combining the recording and the recording identification result to carry out service efficiency analysis, incoming call reason mining, user requirement analysis, call duration analysis, hotspot and change trend monitoring, so that auxiliary support is provided for standardizing the service process and optimizing the service flow.
The voice analysis technology supports that a voice analysis system detects the variation amplitude of fundamental frequency, pitch and the like in audio frequency in a telephone recording, provides prediction of emotion fluctuation possibly occurring in the recording, positions the position information of the audio frequency with emotion fluctuation in the whole voice, detects and analyzes the average speed of speech in the whole telephone recording and the variation of the speed of speech in a certain section of recording, detects the mute time and the like of no speaking of a user and a hotline service person in the recording file, generates an index file in a standard XM L format, and carries out keyword retrieval, abnormal condition detection, abnormal voice detection and abnormal dialogue detection.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (8)
1. An operation management analysis system based on intelligent voice technology, comprising:
the recording acquisition unit is used for downloading a recording data file from the telephone recording platform, splicing and converting the file and generating a complete voice file;
the scene segmentation unit is used for carrying out scene segmentation or speaker segmentation on the voice file;
the voice transcription unit is used for recognizing the voice file based on an intelligent voice recognition engine and transcribing the voice file into text content;
the data analysis unit is used for analyzing the text content and the voice file based on a neural network model and outputting an analysis report;
the database unit is used for storing the voice file, the text content and the analysis report;
and the content indexing unit is used for retrieving the data stored in the database according to the indexing command.
2. The system of claim 1, wherein the recording acquisition unit comprises:
the recording downloading module is connected with the telephone recording platform and used for receiving the recording data file segments transmitted by the telephone recording platform at regular time;
the splicing transcoding module is used for splicing the recording data file segments, decompressing the spliced recording data file and converting the recording data file into a complete voice file which can be identified;
and the transcription scheduling module is used for calling the corresponding voice file according to the transcription command and sending the voice file to the voice transcription unit.
3. The system according to claim 1 or 2, wherein the data analysis unit comprises:
the audio analysis module comprises a silence interval detection module, a speech speed detection module and an emotion detection module and is used for respectively carrying out silence interval analysis, speech speed analysis and emotion analysis on the voice file;
and the text analysis module is used for analyzing the text content.
4. The system of claim 1, further comprising a recording distribution module for transmitting the voice file to the voice transcription unit, the data analysis unit, and the database unit, respectively.
5. The system of claim 1, wherein the intelligent speech recognition engine includes an acoustic model and a language model corresponding to the calculation of syllable to syllable probabilities and the calculation of syllable to text probabilities, respectively, of audio features extracted from a speech file;
the language model is modeled by adopting an N-Gram model;
the acoustic model is modeled by adopting a deep neural network and a hidden Markov model.
6. The system of claim 1, wherein the content indexing unit comprises:
the data storage module is used for storing the text content and the associated data generated in the transcription process;
the data query module is used for querying and aggregating the voice file and the associated data according to preset query conditions and displaying the result;
the word bank is internally provided with a self-defined word segmentation word bank, and can accurately segment the electric power related sentences;
the text word segmentation processing module is used for carrying out word segmentation marking on the text content based on the word segmentation result and extracting each word segmentation to generate structured text data;
and the text clustering module is used for performing text clustering processing and deep cross analysis on the structured text data generated in a time period to obtain text related clustering information in the time period.
7. The system of claim 1, further comprising a library of standardized sentence vector models comprising a plurality of standardized sentence vector models;
the standardized sentence vector model is obtained by carrying out sentence vector similarity calculation on sentence samples in a corpus based on a neural network model and carrying out standardized training on sentence vectors meeting a similarity threshold.
8. The system of claim 7, wherein the speech transcription unit further comprises a normalization processing module for sentence-slicing the text content and calculating a sentence vector for each sentence, and selecting a normalization sentence vector model corresponding to each sentence from the normalization sentence vector model library to perform normalization training on each sentence to output a corresponding normalization sentence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910082514.9A CN111489743A (en) | 2019-01-28 | 2019-01-28 | Operation management analysis system based on intelligent voice technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910082514.9A CN111489743A (en) | 2019-01-28 | 2019-01-28 | Operation management analysis system based on intelligent voice technology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111489743A true CN111489743A (en) | 2020-08-04 |
Family
ID=71810764
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910082514.9A Pending CN111489743A (en) | 2019-01-28 | 2019-01-28 | Operation management analysis system based on intelligent voice technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111489743A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113744712A (en) * | 2021-07-29 | 2021-12-03 | 中国工商银行股份有限公司 | Intelligent outbound voice splicing method, device, equipment, medium and program product |
CN113743983A (en) * | 2021-08-09 | 2021-12-03 | 太逗科技集团有限公司 | Android application-based electric pin management method, device, equipment and medium |
CN114666449A (en) * | 2022-03-29 | 2022-06-24 | 深圳市银服通企业管理咨询有限公司 | Voice data processing method of calling system and calling system |
CN116978384A (en) * | 2023-09-25 | 2023-10-31 | 成都市青羊大数据有限责任公司 | Public security integrated big data management system |
CN117672266A (en) * | 2023-12-05 | 2024-03-08 | 绍兴大明电力建设有限公司 | Voiceprint recognition method based on DCN |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009175943A (en) * | 2008-01-23 | 2009-08-06 | Seiko Epson Corp | Database system for call center, information management method for database and information management program for database |
CN103118361A (en) * | 2013-01-21 | 2013-05-22 | 吴建进 | Recording method and device based on signaling detection system |
CN103793515A (en) * | 2014-02-11 | 2014-05-14 | 安徽科大讯飞信息科技股份有限公司 | Service voice intelligent search and analysis system and method |
US20180308487A1 (en) * | 2017-04-21 | 2018-10-25 | Go-Vivace Inc. | Dialogue System Incorporating Unique Speech to Text Conversion Method for Meaningful Dialogue Response |
-
2019
- 2019-01-28 CN CN201910082514.9A patent/CN111489743A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009175943A (en) * | 2008-01-23 | 2009-08-06 | Seiko Epson Corp | Database system for call center, information management method for database and information management program for database |
CN103118361A (en) * | 2013-01-21 | 2013-05-22 | 吴建进 | Recording method and device based on signaling detection system |
CN103793515A (en) * | 2014-02-11 | 2014-05-14 | 安徽科大讯飞信息科技股份有限公司 | Service voice intelligent search and analysis system and method |
US20180308487A1 (en) * | 2017-04-21 | 2018-10-25 | Go-Vivace Inc. | Dialogue System Incorporating Unique Speech to Text Conversion Method for Meaningful Dialogue Response |
Non-Patent Citations (2)
Title |
---|
常培;刘海舟;: "电信运营商智能语音客服平台研究与分析", 邮电设计技术, no. 09, pages 63 - 67 * |
黄翊: "基于智能语音分析的客服智慧运营管理系统解决方案", 《科技传播》, pages 121 - 123 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113744712A (en) * | 2021-07-29 | 2021-12-03 | 中国工商银行股份有限公司 | Intelligent outbound voice splicing method, device, equipment, medium and program product |
CN113743983A (en) * | 2021-08-09 | 2021-12-03 | 太逗科技集团有限公司 | Android application-based electric pin management method, device, equipment and medium |
CN114666449A (en) * | 2022-03-29 | 2022-06-24 | 深圳市银服通企业管理咨询有限公司 | Voice data processing method of calling system and calling system |
CN116978384A (en) * | 2023-09-25 | 2023-10-31 | 成都市青羊大数据有限责任公司 | Public security integrated big data management system |
CN116978384B (en) * | 2023-09-25 | 2024-01-02 | 成都市青羊大数据有限责任公司 | Public security integrated big data management system |
CN117672266A (en) * | 2023-12-05 | 2024-03-08 | 绍兴大明电力建设有限公司 | Voiceprint recognition method based on DCN |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111933129B (en) | Audio processing method, language model training method and device and computer equipment | |
CN110853649A (en) | Label extraction method, system, device and medium based on intelligent voice technology | |
CN108305634B (en) | Decoding method, decoder and storage medium | |
US11189272B2 (en) | Dialect phoneme adaptive training system and method | |
JP6772198B2 (en) | Language model speech end pointing | |
EP1564722B1 (en) | Automatic identification of telephone callers based on voice characteristics | |
Juang et al. | Automatic recognition and understanding of spoken language-a first step toward natural human-machine communication | |
US8831947B2 (en) | Method and apparatus for large vocabulary continuous speech recognition using a hybrid phoneme-word lattice | |
CN111489743A (en) | Operation management analysis system based on intelligent voice technology | |
Mao et al. | Speech recognition and multi-speaker diarization of long conversations | |
CN111489765A (en) | Telephone traffic service quality inspection method based on intelligent voice technology | |
Rabiner et al. | An overview of automatic speech recognition | |
US11056100B2 (en) | Acoustic information based language modeling system and method | |
CN111489754A (en) | Telephone traffic data analysis method based on intelligent voice technology | |
CN111105785B (en) | Text prosody boundary recognition method and device | |
CN100354929C (en) | Voice processing device and method, recording medium, and program | |
CN114818649A (en) | Service consultation processing method and device based on intelligent voice interaction technology | |
CN111081219A (en) | End-to-end voice intention recognition method | |
CN112397054A (en) | Power dispatching voice recognition method | |
CN114120985A (en) | Pacifying interaction method, system and equipment of intelligent voice terminal and storage medium | |
CN111414748A (en) | Traffic data processing method and device | |
CN111402887A (en) | Method and device for escaping characters by voice | |
EP0177854B1 (en) | Keyword recognition system using template-concatenation model | |
Thakur et al. | NLP & AI speech recognition: an analytical review | |
Žgank et al. | Slovenian spontaneous speech recognition and acoustic modeling of filled pauses and onomatopoeas |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |