WO2020228173A1 - Illegal speech detection method, apparatus and device and computer-readable storage medium - Google Patents

Illegal speech detection method, apparatus and device and computer-readable storage medium Download PDF

Info

Publication number
WO2020228173A1
WO2020228173A1 PCT/CN2019/102261 CN2019102261W WO2020228173A1 WO 2020228173 A1 WO2020228173 A1 WO 2020228173A1 CN 2019102261 W CN2019102261 W CN 2019102261W WO 2020228173 A1 WO2020228173 A1 WO 2020228173A1
Authority
WO
WIPO (PCT)
Prior art keywords
illegal speech
illegal
speech
call
output
Prior art date
Application number
PCT/CN2019/102261
Other languages
French (fr)
Chinese (zh)
Inventor
岳鹏昱
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020228173A1 publication Critical patent/WO2020228173A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/50Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
    • H04M3/51Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing
    • H04M3/5175Call or contact centers supervision arrangements

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a method, device, equipment and computer-readable storage medium for detecting illegal speech.
  • customer service calls are generally recorded to facilitate subsequent spot checks on the service quality of the telephone customer service. For example, whether there is any illegal speech in the process of communicating with customers, so as to facilitate the evaluation of customer service work.
  • the spot check process of the call content is usually done manually. The inventor realized that if the call content is large, the manual quality inspection method must be time-consuming and labor-intensive, and the execution efficiency is not high.
  • the main purpose of this application is to provide a method, device, equipment, and computer-readable storage medium for detecting illegal speech, aiming to solve the technical problem of low execution efficiency of manual quality inspection in the existing telephone customer service management.
  • the present application provides a method for detecting illegal speech.
  • the method for detecting illegal speech includes the following steps:
  • the output order input the words or words corresponding to each phoneme information into the preset language model for processing, and output the probability that a single word or word is related to each other;
  • the illegal speech is associated with the corresponding call audio and saved in the call record library.
  • the method for detecting illegal speech techniques further includes:
  • output related information of the call audio associated with the illegal speech includes: customer service information, customer information, and the subject of the call.
  • the use of the call content containing the illegal speech as a training sample and the machine learning method for training to obtain the illegal speech recognition model includes:
  • segment the training sample Using the content of the call that contains illegal speech as a training sample, segment the training sample to obtain corresponding segmentation data;
  • the method further includes:
  • search the call record library to detect whether there is a call audio associated with the illegal speech keywords in the call record library
  • the searching the call log library using the illegal speech keyword as a query condition to detect whether there is a call audio associated with the illegal speech keyword in the call log library includes:
  • the present application also provides a device for detecting illegal speech
  • the device for detecting illegal speech includes:
  • the model training module is used to take the call content containing the illegal speech as the training sample, and use the machine learning method to train to obtain the illegal speech recognition model;
  • the acquisition module is used to acquire the call audio between the customer service and the customer in the call log library
  • the voice recognition module is used to frame the call audio to obtain multiple voice frames with time sequence; extract the voice features of the voice frames in sequence according to the time sequence and generate a multi-dimensional voice feature vector containing voice information;
  • the voice feature vector is input to the preset acoustic model for processing, and the phoneme information corresponding to the speech frame is output; based on the phoneme information, the preset dictionary is searched, and the word or word corresponding to each phoneme information is output; and the words corresponding to each phoneme information are output in the order
  • Or words are input into a preset language model for processing, and the probability of a single word or word is outputted; the outputted word or word with the highest probability is spliced into text format call content and output;
  • the speech recognition module is used to input the content of the conversation between the customer service and the customer output by the voice recognition into the illegal speech recognition model to identify the illegal speech, and output the recognition result; based on the recognition result, determine the content of the conversation Whether there is illegal speech;
  • the association saving module is used for associating the illegal speech technique with the corresponding call audio if there is an illegal speech technique and saving it in the call record library.
  • the present application also provides an illegal speech detection device, which includes a memory, a processor, and an illegal speech that is stored in the memory and can run on the processor.
  • a speech art detection program which implements the steps of the illegal speech detection method as described in any one of the above when the illegal speech detection program is executed by the processor.
  • the present application also provides a computer-readable storage medium, the computer-readable storage medium stores an illegal speech detection program, and the illegal speech detection program is executed by a processor to achieve The steps of any one of the above-mentioned methods for detecting illegal speech.
  • This application uses voice recognition technology to recognize the call audio as text-format call content, and then conducts the illegal speech detection on the text-format call content. If the illegal speech is detected, the illegal speech is automatically associated with the corresponding call audio Save, so as to realize the automatic quality inspection of illegal speech, avoid the tedious and time-consuming manual quality inspection, improve the efficiency of quality inspection and save costs.
  • FIG. 1 is a schematic structural diagram of the device hardware operating environment involved in the embodiment of the illegal speech detection device solution of the application;
  • FIG. 2 is a schematic flowchart of the first embodiment of the method for detecting illegal speech in this application
  • FIG. 3 is a schematic flowchart of a second embodiment of the method for detecting illegal speech in this application
  • FIG. 4 is a schematic diagram of functional modules of the first embodiment of the device for detecting illegal speech in this application;
  • Fig. 5 is a schematic diagram of functional modules of a second embodiment of a device for detecting illegal speech in this application.
  • This application provides a device for detecting illegal speech.
  • Fig. 1 is a schematic structural diagram of a device hardware operating environment involved in an embodiment of a device for detecting illegal speech in this application.
  • the illegal speech detection device may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005.
  • the communication bus 1002 is used to implement connection and communication between these components.
  • the user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 1005 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as a magnetic disk memory.
  • the memory 1005 may also be a storage device independent of the foregoing processor 1001.
  • the hardware structure of the illegal speech detection device shown in FIG. 1 does not constitute a limitation on the illegal speech detection device, and may include more or less components than shown in the figure, or a combination of some Components, or different component arrangements.
  • the memory 1005 which is a computer-readable storage medium, may include an operating system, a network communication module, a user interface module, and an illegal speech detection program.
  • the operating system is a program that manages and controls illegal speech detection equipment and software resources, and supports the operation of network communication modules, user interface modules, illegal speech detection programs, and other programs or software;
  • network communication modules are used to manage and control the network Interface 1004:
  • the user interface module is used to manage and control the user interface 1003.
  • the network interface 1004 is mainly used to connect to the system backend, and to communicate data with the system backend;
  • the user interface 1003 is mainly used to connect to the client (user side) and communicate with the client Data communication;
  • the illegal speech detection device calls the illegal speech detection program stored in the memory 1005 through the processor 1001, and executes the operations of the following various embodiments of the illegal speech detection method.
  • FIG. 2 is a schematic flowchart of the first embodiment of the method for detecting illegal speech in this application.
  • the method for detecting illegal speech includes the following steps:
  • Step S10 taking the call content containing the illegal speech as a training sample, and adopting a machine learning method for training to obtain the illegal speech recognition model;
  • the illegal speech recognition model is specifically trained in the following manner:
  • Word embedding refers to a vector representation that maps words or phrases from the vocabulary to real numbers. Usually words are in the form of natural language, but in machine learning technology, natural language cannot be processed directly. Instead, natural language words need to be converted into processable mathematical structures, that is, space vector form. Any word can be used. Represented as different vectors in space. For example, to sort all words into a long string, each word will correspond to a position after sorting, and then use an array of the same length as the number of words to represent a word, and the position array value of the word is 1. All position values of other words are 0, so that words can be mapped into word vectors.
  • Feedforward neural network is a type of artificial neural network. In this kind of neural network, each neuron starts from the input layer, receives the input of the previous level, and outputs to the next level until the output layer.
  • the feedforward neural network uses a unidirectional multilayer structure. Each layer contains several neurons, and the neurons in the same layer are not connected to each other, and the transmission of information between the layers is only carried out in one direction. The first layer is called the input layer, the last layer is the output layer, and the middle is the hidden layer.
  • the hidden layer can be one layer or multiple layers.
  • the single hidden layer feedforward neural network includes: the first layer: input layer, the second layer: hidden layer, and the third layer: output layer, Among them, the input layer uses the word vector corresponding to the call content as the training sample, and the output layer uses the illegal language in the call content as the training sample.
  • the mathematical model based on the single hidden layer feedforward neural network, by constantly adjusting the weights of the input layer neurons to the hidden layer neurons, the threshold value of the hidden layer neurons, and the hidden layer neurons to the output layer The weight of the neuron is used to ensure that the error between the model output and the expected output is within an acceptable range, and then the model training is completed to obtain an illegal speech recognition model that can identify the content of the call.
  • Step S20 Obtain the call audio between the customer service and the customer in the call log library
  • the call audio used in the illegal speech detection can be either quasi real-time or historical, which can be set according to actual needs.
  • the call audio detected each time there is no limit to the selection method of the call audio detected each time, for example, the call audio of a fixed time or within a time limit is acquired each time. For example, the call audio within 10 hours is acquired every time, or the call audio within 1 day is acquired every time.
  • call audio when storing the call audio, further store information related to the call audio, such as customer service name, customer service ID, call start time, end time, customer name, customer phone number, call subject (such as personal deposit business, personal Transfer business, etc.).
  • information related to the call audio such as customer service name, customer service ID, call start time, end time, customer name, customer phone number, call subject (such as personal deposit business, personal Transfer business, etc.).
  • Step S30 Framing the call audio to obtain multiple time-series voice frames
  • the framing processing in this embodiment is to divide the sound into a small segment, and each small segment is called a frame of speech. Use the moving window function to realize the framing processing, and get multiple speech frames with time sequence.
  • Step S40 sequentially extracting the sound features of the speech frame according to the time sequence and generating a multi-dimensional sound feature vector containing sound information
  • Feature extraction is to convert the sound signal from the time domain to the frequency domain to provide a suitable input feature vector for the acoustic model.
  • This embodiment mainly uses linear prediction cepstral coefficient (LPCC) and Mel cepstral coefficient (MFCC) algorithms to extract voice features, and then converts each waveform speech frame into a multi-dimensional vector containing voice information.
  • LPCC linear prediction cepstral coefficient
  • MFCC Mel cepstral coefficient
  • Step S50 Input the multi-dimensional sound feature vector into a preset acoustic model for processing, and output phoneme information corresponding to the speech frame;
  • Acoustic model is a knowledge representation of differences in acoustics, phonetics, environmental variables, speaker gender, accent, etc.
  • the acoustic model is obtained by training the voice data.
  • the acoustic model can calculate the probability score of each feature vector on the acoustic feature according to the acoustic characteristics, that is, establish the mapping relationship between the voice feature and the phoneme.
  • Step S60 based on the phoneme information, look up a preset dictionary, and output the word or word corresponding to each phoneme information;
  • a dictionary is a collection of phoneme indexes corresponding to words, and is a mapping between words and phonemes. By looking up the dictionary, the words or words corresponding to each phoneme information are determined.
  • Step S70 according to the output order, input the words or words corresponding to each phoneme information into a preset language model for processing, and output the probability that a single word or word is related to each other;
  • the language model represents the probability of a certain character sequence, which can be obtained by training text language data.
  • the language model can calculate the probability of the sound signal corresponding to the phrase sequence based on the linguistic characteristics, that is, establish the phoneme corresponding to the text to the phrase sequence composed of the text The mapping relationship.
  • Step S80 splicing the output word or word with the highest probability into the call content in text format and outputting it;
  • the word or word with the highest probability is spliced into the call content in text format and output as the result of voice recognition.
  • the voice recognition technology is used to perform voice recognition on the call audio, and output the call content in text format.
  • the call content in text format is further organized into customer service call content and customer call content.
  • Step S90 input the content of the conversation between the customer service and the customer output by the voice recognition into the illegal speech recognition model to identify the illegal speech, and output the recognition result;
  • Step S100 based on the recognition result, determine whether there is illegal speech in the conversation content
  • Speech art refers to the prescribed terms of communication, and different industries or businesses use different speech art. Such as polite language, business language and so on. This embodiment does not limit the setting of illegal speech.
  • the language skills can be used by the customer service alone or in pairs based on the conversation with the customer.
  • Illegal language refers to language that does not meet the requirements, such as impolite words or terms that do not meet business requirements.
  • This embodiment does not limit the manner of detecting illegal speech. For example, detection is performed through character matching or classification and recognition based on mathematical models.
  • the call content containing the illegal speech technique is used as the training sample in advance, and the machine learning method is used for training to obtain the illegal speech technique recognition model; and then the speech recognition output of the conversation content between the customer service and the customer is input into the The illegal speech recognition model recognizes the illegal speech and outputs the recognition result; finally, based on the recognition result, it is determined whether there is any illegal speech in the conversation content.
  • a classification model matching method is used to detect illegal words.
  • the conversation content containing the illegal speech technique is used as a training sample in advance, and a machine learning method is used to train to obtain a classification model that can identify the conversation content containing the illegal speech technique (that is, the illegal speech recognition model).
  • a supervised learning method is used for model training, where the training sample set requirements for supervised learning include input and output, that is, the content of the call is the input and the illegal speech technique is the output.
  • Common supervised learning algorithms include regression analysis and statistical classification.
  • a classification model trained in a machine learning method is used to detect the content of the call. If the output result is not empty, it is determined that there is illegal speech in the currently detected call content; and if the output result is empty, the current detection is determined There are no illegal words in the content of the call.
  • step S110 if there is an illegal speech technique, associate the illegal speech technique with the corresponding call audio and save it in the call record library.
  • call record 1 records: call audio A and the illegal speech a in call audio A
  • call record 2 records: call audio B and call audio B with illegal speech b1, b2, b3 .
  • output related information of the call audio associated with the illegal speech includes: customer service information, customer Information and subject of the call.
  • the relevant information of the call audio with the illegal speech is further output, such as customer service information, such as customer service name, job ID, call start time and end Time; such as customer information, such as customer name, mobile phone number, etc.; such as the subject of the call, such as personal deposit business or personal transfer business, etc.
  • customer service information such as customer service name, job ID, call start time and end Time
  • customer information such as customer name, mobile phone number, etc.
  • the subject of the call such as personal deposit business or personal transfer business, etc.
  • This embodiment uses voice recognition technology to recognize the call audio as text format call content, and then perform illegal speech detection on the text format call content. If the illegal speech is detected, the illegal speech and the corresponding call audio are automatically reduced. Associated storage, so as to realize the automatic quality inspection of illegal speech, avoid the tedious and time-consuming manual quality inspection, improve the efficiency of quality inspection and save costs.
  • FIG. 3 is a schematic flowchart of a second embodiment of a method for detecting illegal speech in this application. Based on the first embodiment of the foregoing method, in this embodiment, after the foregoing step S110, the method further includes:
  • Step S120 Obtain the illegal speech keywords entered in the preset query page
  • Step S130 searching the call log library with the illegal speech keyword as a query condition to detect whether there is a call audio associated with the illegal speech keyword in the call log library;
  • Step S140 if it exists, play the call audio associated with the illegal speech keyword.
  • a query page that can be retrieved by the user is further provided.
  • the query function provided in this embodiment specifically uses query keywords as query conditions to search for call content through keyword matching. If there is content matching the query keyword in the call content, the call content matching the query keyword is automatically played The corresponding call audio.
  • a query page is further provided to facilitate the user to perform random inspections flexibly.
  • the user can search the illegal call record library with any illegal speech technique as a query keyword, so as to obtain the call audio matching the target illegal speech technique and play it automatically, which improves the retrieval flexibility.
  • step S130 in order to improve retrieval efficiency, further includes:
  • the illegal speech associated with each call audio is extracted from the call record library in advance, and then the extracted illegal speech is spliced to form a string of illegal speech, and the illegal speech string is combined Transfer to the memory; then use the entered illegal speech keywords as query conditions to retrieve the illegal speech strings. Only one retrieval operation is required to complete the retrieval of multiple illegal speeches in the call log library, thereby improving Improved retrieval efficiency.
  • the application also provides a device for detecting illegal speech.
  • Fig. 4 is a schematic diagram of the functional modules of the first embodiment of the device for detecting illegal speech in this application.
  • the device for detecting illegal speech includes:
  • the model training module 10 is used to use the call content containing the illegal speech as a training sample, and use machine learning to train to obtain the illegal speech recognition model;
  • model training module 10 is specifically configured to:
  • the obtaining module 20 is used to obtain the call audio between the customer service and the customer in the call record library;
  • the speech recognition module 30 is configured to frame the call audio to obtain multiple speech frames with time sequence; sequentially extract the sound features of the speech frames according to the time sequence and generate a multi-dimensional sound feature vector containing sound information;
  • the multi-dimensional sound feature vector is input into a preset acoustic model for processing, and the phoneme information corresponding to the speech frame is output; based on the phoneme information, the preset dictionary is searched, and the word or word corresponding to each phoneme information is output; and each phoneme information corresponds to the phoneme information according to the output order
  • Words or words are input into a preset language model for processing, and the probability of a single word or word is outputted; the outputted word or word with the highest probability is spliced into text format call content and output;
  • the speech recognition module 40 is configured to input the content of the conversation between the customer service and the customer output by the voice recognition into the illegal speech recognition model to identify the illegal speech, and output the recognition result; determine the content of the conversation based on the recognition result Whether there is any irregularity in the language;
  • the association saving module 50 is configured to, if there is an illegal speech technique, associate the illegal speech technique with the corresponding call audio, and save it in the call record library.
  • This embodiment uses voice recognition technology to recognize the call audio as text format call content, and then perform illegal speech detection on the text format call content. If the illegal speech is detected, the illegal speech and the corresponding call audio are automatically reduced. Associated storage, so as to realize the automatic quality inspection of illegal speech, avoid the tedious and time-consuming manual quality inspection, improve the efficiency of quality inspection and save costs.
  • Fig. 5 is a schematic diagram of the functional modules of the second embodiment of the device for detecting illegal speech in this application.
  • the device for detecting illegal speech includes:
  • the keyword acquisition module 60 is used to acquire the illegal speech keywords entered in the preset query page;
  • the retrieval module 70 is configured to search the call log library using the illegal speech keyword as a query condition to detect whether there is a call audio associated with the illegal speech keyword in the call log library;
  • the playing module 80 is configured to play the call audio associated with the illegal speech keyword if the call record inventory contains the call audio associated with the illegal speech keyword.
  • a query page is further provided to facilitate the user to perform random inspections flexibly.
  • the user can search the illegal call record library with any illegal speech technique as a query keyword, so as to obtain the call audio matching the target illegal speech technique and play it automatically, which improves the retrieval flexibility.
  • the application also provides a non-volatile computer-readable storage medium.
  • the computer-readable storage medium stores an illegal speech detection program, and when the illegal speech detection program is executed by a processor, the illegal speech detection method as described in any of the above embodiments is implemented.
  • the output order input the words or words corresponding to each phoneme information into the preset language model for processing, and output the probability that a single word or word is related to each other;
  • the illegal speech is associated with the corresponding call audio and saved in the call record library.
  • the following steps of the illegal speech detection method are further implemented:
  • output related information of the call audio associated with the illegal speech includes: customer service information, customer information, and the subject of the call.
  • the following steps of the illegal speech detection method are further implemented:
  • segment the training sample Using the content of the call that contains illegal speech as a training sample, segment the training sample to obtain corresponding segmentation data;
  • the following steps of the illegal speech detection method are further implemented:
  • search the call record library to detect whether there is a call audio associated with the illegal speech keywords in the call record library
  • the following steps of the illegal speech detection method are further implemented:
  • the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. ⁇
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium (such as ROM/RAM), including Several instructions are used to make a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) execute the method described in each embodiment of the present application.

Abstract

The present application relates to the technical field of artificial intelligence, and disclosed is an illegal speech detection method, comprising: taking call content comprising illegal speech as training samples, and employing a machine learning method for training to obtain an illegal speech recognition model; acquiring in a call record library call audio between customer service and a customer; performing voice recognition on the call audio and outputting the call content in a text format; inputting the call content outputted by voice recognition into the illegal speech recognition model for recognition, and outputting a recognition result; on the basis of the recognition result, determining whether there is illegal speech in the call content; if there is illegal speech, then associating the illegal speech with the corresponding call audio and saving same in the call record library. Further disclosed by the present application are an illegal speech detection apparatus and device, and a computer-readable storage medium. The present application achieves the automated quality inspection of illegal speech, avoids tedious and time-consuming manual quality inspection, and improves the efficiency of quality inspection.

Description

违规话术检测方法、装置、设备及计算机可读存储介质Method, device, equipment and computer readable storage medium for detecting illegal speech
本申请要求于2019年5月16日提交中国专利局、申请号为201910411437.7、发明名称为“违规话术检测方法、装置、设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 201910411437.7, and the invention title is "Illegal Speech Detection Method, Device, Equipment, and Computer-readable Storage Medium" on May 16, 2019, all of which The content is incorporated in the application by reference.
技术领域Technical field
本申请涉及人工智能技术领域,尤其涉及一种违规话术检测方法、装置、设备及计算机可读存储介质。This application relates to the field of artificial intelligence technology, and in particular to a method, device, equipment and computer-readable storage medium for detecting illegal speech.
背景技术Background technique
现有电话客服管理中,为提升服务质量,一般都会对客服电话进行录音,以便于后续对电话客服的服务质量进行抽查。比如与客户交流过程中是否存在违规话术,从而便于对客服工作进行评价。而通话内容的抽查过程通常由人工完成,发明人意识到如果通话内容较多,则采用人工质检方式务必会存在耗时费力的问题,因而执行效率不高。In the existing telephone customer service management, in order to improve service quality, customer service calls are generally recorded to facilitate subsequent spot checks on the service quality of the telephone customer service. For example, whether there is any illegal speech in the process of communicating with customers, so as to facilitate the evaluation of customer service work. The spot check process of the call content is usually done manually. The inventor realized that if the call content is large, the manual quality inspection method must be time-consuming and labor-intensive, and the execution efficiency is not high.
发明内容Summary of the invention
本申请的主要目的在于提供一种违规话术检测方法、装置、设备及计算机可读存储介质,旨在解决现有电话客服管理中采用人工质检方式的执行效率不高的技术问题。The main purpose of this application is to provide a method, device, equipment, and computer-readable storage medium for detecting illegal speech, aiming to solve the technical problem of low execution efficiency of manual quality inspection in the existing telephone customer service management.
为实现上述目的,本申请提供一种违规话术检测方法,所述违规话术检测方法包括以下步骤:In order to achieve the above objective, the present application provides a method for detecting illegal speech. The method for detecting illegal speech includes the following steps:
以包含违规话术的通话内容为训练样本,采用机器学习方式进行训练,得到违规话术识别模型;Take the call content containing the illegal speech as the training sample, and use the machine learning method to train to obtain the illegal speech recognition model;
获取通话记录库中客服与客户之间的通话音频;Obtain the call audio between the customer service and the customer in the call log library;
对所述通话音频进行分帧,得到多个带时序的语音帧;Framing the call audio to obtain multiple speech frames with time sequence;
按照时序依次提取所述语音帧的声音特征并生成包含声音信息的多维声音特向量;Sequentially extracting the sound features of the speech frames according to time sequence and generating a multi-dimensional sound feature vector containing sound information;
将所述多维声音特征向量输入预置声学模型进行处理,输出语音帧对应的音素信息;Inputting the multi-dimensional sound feature vector into a preset acoustic model for processing, and outputting phoneme information corresponding to the speech frame;
基于所述音素信息,查找预置字典,输出各音素信息对应的字或词;Based on the phoneme information, look up a preset dictionary, and output the word or word corresponding to each phoneme information;
按照输出顺序将各音素信息对应的字或词输入预置语言模型进行处理,输出单个字或词相互关联的概率;According to the output order, input the words or words corresponding to each phoneme information into the preset language model for processing, and output the probability that a single word or word is related to each other;
将输出的最大概率的字或词拼接为文字格式的通话内容并输出;Combine the words or words with the highest probability of output into text format call content and output;
将语音识别输出的客服与客户之间的通话内容输入所述违规话术识别模型进行违规话术识别,输出识别结果;Input the content of the conversation between the customer service and the customer output by the voice recognition into the illegal speech recognition model to identify the illegal speech, and output the recognition result;
基于所述识别结果,确定所述通话内容中是否存在违规话术;Based on the recognition result, determine whether there is illegal speech in the conversation content;
若存在违规话术,则将所述违规话术与对应通话音频进行关联,并保存到所述通话记录库中。If there is an illegal speech, the illegal speech is associated with the corresponding call audio and saved in the call record library.
可选地,所述违规话术检测方法还包括:Optionally, the method for detecting illegal speech techniques further includes:
若存在违规话术,则输出与该违规话术关联的通话音频的相关信息,所述相关信息包括:客服信息、客户信息以及通话主题。If there is an illegal speech, output related information of the call audio associated with the illegal speech, and the relevant information includes: customer service information, customer information, and the subject of the call.
可选地,所述以包含违规话术的通话内容为训练样本,采用机器学习方式进行训练,得到违规话术识别模型包括:Optionally, the use of the call content containing the illegal speech as a training sample and the machine learning method for training to obtain the illegal speech recognition model includes:
以包含违规话术的通话内容为训练样本,对所述训练样本进行分词,得到对应的分词数据;Using the content of the call that contains illegal speech as a training sample, segment the training sample to obtain corresponding segmentation data;
将所有分词数据映射成待训练的词向量;Map all word segmentation data into word vectors to be trained;
构建基于单隐层前馈神经网络的数学模型,并以所述词向量为所述数学模型的输入,以所述训练样本中的违规话术为所述数学模型的输出,对所述数学模型进行迭代训练,得到违规话术识别模型。Construct a mathematical model based on a single hidden layer feedforward neural network, and use the word vector as the input of the mathematical model, and the illegal speech in the training sample as the output of the mathematical model. Carry out iterative training to get a recognition model of illegal speech.
可选地,在所述若存在违规话术,则将所述违规话术与对应通话音频进行关联,并保存到所述通话记录库中的步骤之后,还包括:Optionally, after the step of associating the illegal speech with the corresponding call audio if there is an illegal speech and saving it in the call record library, the method further includes:
获取预置查询页面中输入的违规话术关键字;Obtain the illegal verbal keywords entered in the preset query page;
以所述违规话术关键字为查询条件,检索所述通话记录库,以检测所述通话记录库是否存在与所述违规话术关键字关联的通话音频;Using the illegal speech keywords as query conditions, search the call record library to detect whether there is a call audio associated with the illegal speech keywords in the call record library;
若存在,则播放与所述违规话术关键字关联的通话音频。If it exists, the call audio associated with the illegal speech keyword is played.
可选地,所述以所述违规话术关键字为查询条件,检索所述通话记录库,以检测所述通话记录库是否存在与所述违规话术关键字关联的通话音频包括:Optionally, the searching the call log library using the illegal speech keyword as a query condition to detect whether there is a call audio associated with the illegal speech keyword in the call log library includes:
对所述通话记录库中各通话音频关联的违规话术进行字符拼接,以形成违规话术字符串,并将所述违规话术字符串传入内存中;Perform character splicing on the illegal speech associated with each call audio in the call record library to form a string of illegal speech, and transfer the string of illegal speech into the memory;
以所述违规话术关键字为查询条件,检索所述违规话术字符串,以检测所述通话记录库是否存在与所述违规话术关键字关联的通话音频。Using the illegal speech keyword as a query condition, retrieve the illegal speech string to detect whether there is a call audio associated with the illegal speech keyword in the call record library.
进一步地,为实现上述目的,本申请还提供一种违规话术检测装置,所述违规话术检测装置包括:Further, in order to achieve the above object, the present application also provides a device for detecting illegal speech, the device for detecting illegal speech includes:
模型训练模块,用于以包含违规话术的通话内容为训练样本,采用机器学习方式进行训练,得到违规话术识别模型;The model training module is used to take the call content containing the illegal speech as the training sample, and use the machine learning method to train to obtain the illegal speech recognition model;
获取模块,用于获取通话记录库中客服与客户之间的通话音频;The acquisition module is used to acquire the call audio between the customer service and the customer in the call log library;
语音识别模块,用于对所述通话音频进行分帧,得到多个带时序的语音帧;按照时序依次提取所述语音帧的声音特征并生成包含声音信息的多维声音特向量;将所述多维声音特征向量输入预置声学模型进行处理,输出语音帧对应的音素信息;基于所述音素信息,查找预置字典,输出各音素信息对应的字或词;按照输出顺序将各音素信息对应的字或词输入预置语言模型进行处理,输出单个字或词相互关联的概率;将输出的最大概率的字或词拼接为文字格式的通话内容并输出;The voice recognition module is used to frame the call audio to obtain multiple voice frames with time sequence; extract the voice features of the voice frames in sequence according to the time sequence and generate a multi-dimensional voice feature vector containing voice information; The voice feature vector is input to the preset acoustic model for processing, and the phoneme information corresponding to the speech frame is output; based on the phoneme information, the preset dictionary is searched, and the word or word corresponding to each phoneme information is output; and the words corresponding to each phoneme information are output in the order Or words are input into a preset language model for processing, and the probability of a single word or word is outputted; the outputted word or word with the highest probability is spliced into text format call content and output;
话术识别模块,用于将语音识别输出的客服与客户之间的通话内容输入所述违规话术识别模型进行违规话术识别,输出识别结果;基于所述识别结果,确定所述通话内容中是否存在违规话术;The speech recognition module is used to input the content of the conversation between the customer service and the customer output by the voice recognition into the illegal speech recognition model to identify the illegal speech, and output the recognition result; based on the recognition result, determine the content of the conversation Whether there is illegal speech;
关联保存模块,用于若存在违规话术,则将所述违规话术与对应通话音频进行关联,并保存到所述通话记录库中。The association saving module is used for associating the illegal speech technique with the corresponding call audio if there is an illegal speech technique and saving it in the call record library.
进一步地,为实现上述目的,本申请还提供一种违规话术检测设备,所述违规话术检测设备包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的违规话术检测程序,所述违规话术检测程序被所述处理器执行时实现如上述任一项所述的违规话术检测方法的步骤。Further, in order to achieve the above object, the present application also provides an illegal speech detection device, which includes a memory, a processor, and an illegal speech that is stored in the memory and can run on the processor. A speech art detection program, which implements the steps of the illegal speech detection method as described in any one of the above when the illegal speech detection program is executed by the processor.
进一步地,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有违规话术检测程序,所述违规话术检测程序被处理器执行时实现如上述任一项所述的违规话术检测方法的步骤。Further, in order to achieve the above object, the present application also provides a computer-readable storage medium, the computer-readable storage medium stores an illegal speech detection program, and the illegal speech detection program is executed by a processor to achieve The steps of any one of the above-mentioned methods for detecting illegal speech.
本申请采用语音识别技术,将通话音频识别为文字格式的通话内容,然后对文字格式的通话内容进行违规话术检测,若检测到违规话术,则自动将违规话术与对应通话音频进行关联保存,从而实现违规话术的自动化质检,避免了人工质检的繁琐耗时,提升了质检工作效率以及节省了成本。This application uses voice recognition technology to recognize the call audio as text-format call content, and then conducts the illegal speech detection on the text-format call content. If the illegal speech is detected, the illegal speech is automatically associated with the corresponding call audio Save, so as to realize the automatic quality inspection of illegal speech, avoid the tedious and time-consuming manual quality inspection, improve the efficiency of quality inspection and save costs.
附图说明Description of the drawings
图1为本申请违规话术检测设备实施例方案涉及的设备硬件运行环境的结构示意图;FIG. 1 is a schematic structural diagram of the device hardware operating environment involved in the embodiment of the illegal speech detection device solution of the application;
图2为本申请违规话术检测方法第一实施例的流程示意图;FIG. 2 is a schematic flowchart of the first embodiment of the method for detecting illegal speech in this application;
图3为本申请违规话术检测方法第二实施例的流程示意图;FIG. 3 is a schematic flowchart of a second embodiment of the method for detecting illegal speech in this application;
图4为本申请违规话术检测装置第一实施例的功能模块示意图;FIG. 4 is a schematic diagram of functional modules of the first embodiment of the device for detecting illegal speech in this application;
图5为本申请违规话术检测装置第二实施例的功能模块示意图。Fig. 5 is a schematic diagram of functional modules of a second embodiment of a device for detecting illegal speech in this application.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
具体实施方式Detailed ways
应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described herein are only used to explain the application, and not used to limit the application.
本申请提供一种违规话术检测设备。This application provides a device for detecting illegal speech.
参照图1,图1为本申请违规话术检测设备实施例方案涉及的设备硬件运行环境的结构示意图。Referring to Fig. 1, Fig. 1 is a schematic structural diagram of a device hardware operating environment involved in an embodiment of a device for detecting illegal speech in this application.
如图1所示,该违规话术检测设备可以包括:处理器1001,例如CPU,通信总线1002、用户接口1003,网络接口1004,存储器1005。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储设备。As shown in FIG. 1, the illegal speech detection device may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Among them, the communication bus 1002 is used to implement connection and communication between these components. The user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. The network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface). The memory 1005 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as a magnetic disk memory. Optionally, the memory 1005 may also be a storage device independent of the foregoing processor 1001.
本领域技术人员可以理解,图1中示出的违规话术检测设备的硬件结构并不构成对违规话术检测设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Those skilled in the art can understand that the hardware structure of the illegal speech detection device shown in FIG. 1 does not constitute a limitation on the illegal speech detection device, and may include more or less components than shown in the figure, or a combination of some Components, or different component arrangements.
如图1所示,作为一种计算机可读存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及违规话术检测程序。其中,操作系统是管理和控制违规话术检测设备与软件资源的程序,支持网络通信模块、用户接口模块、违规话术检测程序以及其他程序或软件的运行;网络通信模块用于管理和控制网络接口1004;用户接口模块用于管理和控制用户接口1003。As shown in FIG. 1, the memory 1005, which is a computer-readable storage medium, may include an operating system, a network communication module, a user interface module, and an illegal speech detection program. Among them, the operating system is a program that manages and controls illegal speech detection equipment and software resources, and supports the operation of network communication modules, user interface modules, illegal speech detection programs, and other programs or software; network communication modules are used to manage and control the network Interface 1004: The user interface module is used to manage and control the user interface 1003.
在图1所示的违规话术检测设备硬件结构中,网络接口1004主要用于连接系统后台,与系统后台进行数据通信;用户接口1003主要用于连接客户端 (用户端),与客户端进行数据通信;违规话术检测设备通过处理器1001调用存储器1005中存储的违规话术检测程序,并执行以下违规话术检测方法的各实施例的操作。In the hardware structure of the illegal speech detection equipment shown in Figure 1, the network interface 1004 is mainly used to connect to the system backend, and to communicate data with the system backend; the user interface 1003 is mainly used to connect to the client (user side) and communicate with the client Data communication; the illegal speech detection device calls the illegal speech detection program stored in the memory 1005 through the processor 1001, and executes the operations of the following various embodiments of the illegal speech detection method.
基于上述违规话术检测设备硬件结构,提出本申请违规话术检测方法的各个实施例。Based on the hardware structure of the above-mentioned illegal speech detection equipment, various embodiments of the illegal speech detection method of the present application are proposed.
参照图2,图2为本申请违规话术检测方法第一实施例的流程示意图。本实施例中,所述违规话术检测方法包括以下步骤:Referring to FIG. 2, FIG. 2 is a schematic flowchart of the first embodiment of the method for detecting illegal speech in this application. In this embodiment, the method for detecting illegal speech includes the following steps:
步骤S10,以包含违规话术的通话内容为训练样本,采用机器学习方式进行训练,得到违规话术识别模型;Step S10, taking the call content containing the illegal speech as a training sample, and adopting a machine learning method for training to obtain the illegal speech recognition model;
本实施例中,为实现通话内容中违规话术的自动识别,因此需要预先训练相应的违规话术识别模型。对于训练的具体实现方式不限。In this embodiment, in order to realize automatic identification of illegal speech in the content of the call, it is necessary to train a corresponding illegal speech recognition model in advance. There is no limit to the specific implementation of training.
可选的,在一实施例中,具体通过以下方式训练违规话术识别模型:Optionally, in an embodiment, the illegal speech recognition model is specifically trained in the following manner:
(1)以包含违规话术的通话内容为训练样本,对所述训练样本进行分词,得到对应的分词数据;(1) Taking the content of the call that contains illegal speech as a training sample, segmenting the training sample to obtain corresponding segmentation data;
(2)将所有分词数据映射成待训练的词向量;(2) Map all word segmentation data into word vectors to be trained;
(3)构建基于单隐层前馈神经网络的数学模型,并以所述词向量为所述数学模型的输入,以所述训练样本中的违规话术为所述数学模型的输出,对所述数学模型进行迭代训练,得到违规话术识别模型。(3) Construct a mathematical model based on a single hidden layer feedforward neural network, and use the word vector as the input of the mathematical model, and use the illegal words in the training sample as the output of the mathematical model. The mathematical model is iteratively trained to obtain a recognition model of illegal speech.
词向量(Word embedding)是指将来自词汇表的单词或短语映射到实数的一种向量表示方式。通常字词都是自然语言形式,但在机器学习技术中并不能直接处理自然语言,而需要将自然语言的字词转换为可处理的数学结构,也即空间向量形式,任何的字词都可以在空间中表示为不同的向量。例如,将所有单词排序成一个长字符串,排序之后每个单词就会对应一个位置,然后用一个与单词数量等长的数组来表示某个单词,该单词所在的位置数组值就为1,而其他单词所有位置值都为0,从而可将单词映射为词向量。Word embedding refers to a vector representation that maps words or phrases from the vocabulary to real numbers. Usually words are in the form of natural language, but in machine learning technology, natural language cannot be processed directly. Instead, natural language words need to be converted into processable mathematical structures, that is, space vector form. Any word can be used. Represented as different vectors in space. For example, to sort all words into a long string, each word will correspond to a position after sorting, and then use an array of the same length as the number of words to represent a word, and the position array value of the word is 1. All position values of other words are 0, so that words can be mapped into word vectors.
单隐层前馈神经网络是前馈神经网络的一种特殊形式。前馈神经网络是人工神经网络的一种。在此种神经网络中,各神经元从输入层开始,接收前一级输入,并输出到下一级,直至输出层。前馈神经网络采用一种单向多层结构。其中每一层包含若干个神经元,同一层的神经元之间没有互相连接,层间信息的传送只沿一个方向进行。其中第一层称为输入层,最后一层为输出层,中间为隐层。隐层可以是一层,也可以是多层。Single hidden layer feedforward neural network is a special form of feedforward neural network. Feedforward neural network is a type of artificial neural network. In this kind of neural network, each neuron starts from the input layer, receives the input of the previous level, and outputs to the next level until the output layer. The feedforward neural network uses a unidirectional multilayer structure. Each layer contains several neurons, and the neurons in the same layer are not connected to each other, and the transmission of information between the layers is only carried out in one direction. The first layer is called the input layer, the last layer is the output layer, and the middle is the hidden layer. The hidden layer can be one layer or multiple layers.
本可选实施例中,单隐层前馈神经网络包含有:第一层:输入层(input  layer),第二层:隐层(hidden layer),第三层:输出层(output layer),其中,输入层以通话内容对应的词向量为训练样本,输出层以通话内容中的违规话术为训练样本。通过对构建的基于单隐层前馈神经网络的数学模型进行反复迭代训练,通过不断调整输入层神经元到隐层神经元的权值、隐层神经元的阈值、隐层神经元到输出层神经元的权值,以保证模型输出与期望输出误差在可接受范围内,进而完成模型训练而得到可对通话内容进行违规话术识别的违规话术识别模型。In this optional embodiment, the single hidden layer feedforward neural network includes: the first layer: input layer, the second layer: hidden layer, and the third layer: output layer, Among them, the input layer uses the word vector corresponding to the call content as the training sample, and the output layer uses the illegal language in the call content as the training sample. Through repeated iterative training of the mathematical model based on the single hidden layer feedforward neural network, by constantly adjusting the weights of the input layer neurons to the hidden layer neurons, the threshold value of the hidden layer neurons, and the hidden layer neurons to the output layer The weight of the neuron is used to ensure that the error between the model output and the expected output is within an acceptable range, and then the model training is completed to obtain an illegal speech recognition model that can identify the content of the call.
步骤S20,获取通话记录库中客服与客户之间的通话音频;Step S20: Obtain the call audio between the customer service and the customer in the call log library;
本实施例中,在客服与客户电话沟通过程中将自动进行录音,形成通话音频并保存到通话记录库中。违规话术检测所使用的通话音频既可以是准实时的,也可以是历史的,具体根据实际需要进行设置。In this embodiment, during the telephone communication between the customer service and the customer, recording is automatically performed to form the call audio and save it in the call record library. The call audio used in the illegal speech detection can be either quasi real-time or historical, which can be set according to actual needs.
对于每次检测的通话音频的选取方式不限,例如每次获取一固定时间内或期限内的通话音频。比如每次获取10个小时内的通话音频,或者每次获取1天内的通话音频。There is no limit to the selection method of the call audio detected each time, for example, the call audio of a fixed time or within a time limit is acquired each time. For example, the call audio within 10 hours is acquired every time, or the call audio within 1 day is acquired every time.
可选的,在存储通话音频时,进一步存储与通话音频相关的信息,比如客服姓名、客服工号、通话开始时间、结束时间、客户姓名、客户电话号码、通话主题(比如个人存款业务、个人转账业务等)。Optionally, when storing the call audio, further store information related to the call audio, such as customer service name, customer service ID, call start time, end time, customer name, customer phone number, call subject (such as personal deposit business, personal Transfer business, etc.).
步骤S30,对所述通话音频进行分帧,得到多个带时序的语音帧;Step S30: Framing the call audio to obtain multiple time-series voice frames;
为了更有效地提取声音特征,因此还需要对通话音频进行滤波、分帧等音频数据预处理,本实施例的分帧处理就是把声音划分成一小段一小段,每小段称为一帧语音帧,使用移动窗函数来实现分帧处理,得到多个带时序的语音帧。In order to extract sound features more effectively, it is also necessary to perform audio data preprocessing such as filtering and framing of the call audio. The framing processing in this embodiment is to divide the sound into a small segment, and each small segment is called a frame of speech. Use the moving window function to realize the framing processing, and get multiple speech frames with time sequence.
步骤S40,按照时序依次提取所述语音帧的声音特征并生成包含声音信息的多维声音特向量;Step S40, sequentially extracting the sound features of the speech frame according to the time sequence and generating a multi-dimensional sound feature vector containing sound information;
特征提取是将声音信号从时域转换到频域,从而为声学模型提供合适的输入特征向量。本实施例主要采用线性预测倒谱系数(LPCC)和梅尔倒谱系数(MFCC)算法提取声音特征,进而将每一波形语音帧转变成一个包含声音信息的多维向量。Feature extraction is to convert the sound signal from the time domain to the frequency domain to provide a suitable input feature vector for the acoustic model. This embodiment mainly uses linear prediction cepstral coefficient (LPCC) and Mel cepstral coefficient (MFCC) algorithms to extract voice features, and then converts each waveform speech frame into a multi-dimensional vector containing voice information.
步骤S50,将所述多维声音特征向量输入预置声学模型进行处理,输出语音帧对应的音素信息;Step S50: Input the multi-dimensional sound feature vector into a preset acoustic model for processing, and output phoneme information corresponding to the speech frame;
声学模型是对声学、语音学、环境变量、说话人性别、口音等差异的知识表示。声学模型通过对语音数据进行训练而得到,声学模型能够根据声学特性计算每一个特征向量在声学特征上的概率得分,也即建立语音的声音特 征到音素之间的映射关系。Acoustic model is a knowledge representation of differences in acoustics, phonetics, environmental variables, speaker gender, accent, etc. The acoustic model is obtained by training the voice data. The acoustic model can calculate the probability score of each feature vector on the acoustic feature according to the acoustic characteristics, that is, establish the mapping relationship between the voice feature and the phoneme.
步骤S60,基于所述音素信息,查找预置字典,输出各音素信息对应的字或词;Step S60, based on the phoneme information, look up a preset dictionary, and output the word or word corresponding to each phoneme information;
字典是字词对应的音素索引集合,是字词和音素之间的映射,通过查找字典,从而确定各音素信息对应的字或词。A dictionary is a collection of phoneme indexes corresponding to words, and is a mapping between words and phonemes. By looking up the dictionary, the words or words corresponding to each phoneme information are determined.
步骤S70,按照输出顺序将各音素信息对应的字或词输入预置语言模型进行处理,输出单个字或词相互关联的概率;Step S70, according to the output order, input the words or words corresponding to each phoneme information into a preset language model for processing, and output the probability that a single word or word is related to each other;
语言模型表示某一字序列发生的概率,可通过对文本语言数据进行训练得到,语言模型能够根据语言学特性计算声音信号对应词组序列的概率,也即建立文字对应的音素到文字组成的词组序列的映射关系。The language model represents the probability of a certain character sequence, which can be obtained by training text language data. The language model can calculate the probability of the sound signal corresponding to the phrase sequence based on the linguistic characteristics, that is, establish the phoneme corresponding to the text to the phrase sequence composed of the text The mapping relationship.
步骤S80,将输出的最大概率的字或词拼接为文字格式的通话内容并输出;Step S80, splicing the output word or word with the highest probability into the call content in text format and outputting it;
在得到通话音频可能对应的每一个字或词组发生的概率后,将最大概率的字或词拼接为文字格式的通话内容并作为语音识别的结果输出。After obtaining the probability of occurrence of each word or phrase that may correspond to the call audio, the word or word with the highest probability is spliced into the call content in text format and output as the result of voice recognition.
例如,假设有文字内容为“我是机器人”的通话音频,通过特征提取,得到以下特征向量[1 2 3 4 5 6....10];将该特征向量输入声学模型进行处理,得到对应的音素,也即[1 2 3 4 5 6....10]—>w o s i j i q i r n;然后再通过查找字典,得到个音素对应的字,窝:w o;我:w o;是:s i;机:j i;器:q i;人:r n;级:j i;忍:r n;最后再将上述输出结果输入语言模型进行处理,得到对应的字或词组序列,如下所示:我:0.0786,是:0.0546,我是:0.0898,机器:0.0967,机器人:0.6785;通过概率比较,确定每一个字或词组的最大概率:我是:0.0898,机器人:0.6785,拼接后的输出内容为“我是机器人”,完成通话音频的语音识别。For example, suppose there is a call audio with the text content "I am a robot", through feature extraction, the following feature vector is obtained [1 2 3 4 5 6....10]; this feature vector is input into the acoustic model for processing, and the corresponding The phoneme of, that is [1 2 3 4 5 6....10]—>w o s i j i q i r n; then look up the dictionary to get the word corresponding to the phoneme, nest: w o;我:W o; yes: s i; machine: j i; machine: q i; person: r n; level: j i; tolerance: r n; finally input the above output result into the language model for processing to obtain the corresponding word Or phrase sequence, as follows: I: 0.0786, yes: 0.0546, I am: 0.0898, machine: 0.0967, robot: 0.6785; through probability comparison, determine the maximum probability of each word or phrase: I am: 0.0898, robot: 0.6785, the output content after splicing is "I am a robot", which completes the voice recognition of the call audio.
本实施例中,采用语音识别技术对通话音频进行语音识别,并输出文字格式的通话内容。In this embodiment, the voice recognition technology is used to perform voice recognition on the call audio, and output the call content in text format.
可选的,为提升质检效率,进一步将文字格式的通话内容分别整理为客服通话内容与客户通话内容。Optionally, in order to improve the efficiency of quality inspection, the call content in text format is further organized into customer service call content and customer call content.
步骤S90,将语音识别输出的客服与客户之间的通话内容输入所述违规话术识别模型进行违规话术识别,输出识别结果;Step S90, input the content of the conversation between the customer service and the customer output by the voice recognition into the illegal speech recognition model to identify the illegal speech, and output the recognition result;
步骤S100,基于所述识别结果,确定所述通话内容中是否存在违规话术;Step S100, based on the recognition result, determine whether there is illegal speech in the conversation content;
话术是指沟通交流的规定术语,不同行业或业务使用不同的话术。比如礼貌用语话术、业务用语话术等。本实施例对于违规话术的设置不限。话术既可以是客服单独使用的,也可以是客服基于与客户的对话而成对使用的。Speech art refers to the prescribed terms of communication, and different industries or businesses use different speech art. Such as polite language, business language and so on. This embodiment does not limit the setting of illegal speech. The language skills can be used by the customer service alone or in pairs based on the conversation with the customer.
违规话术则是指不符合规定的话术,比如说不礼貌的话或者不符合业务要求的术语等。本实施例对于检测违规话术的方式不限。比如通过字符匹配方式进行检测或者基于数学模型进行分类识别。Illegal language refers to language that does not meet the requirements, such as impolite words or terms that do not meet business requirements. This embodiment does not limit the manner of detecting illegal speech. For example, detection is performed through character matching or classification and recognition based on mathematical models.
本实施例中,预先以包含违规话术的通话内容为训练样本,采用机器学习方式进行训练,得到违规话术识别模型;然后再将语音识别输出的客服与客户之间的通话内容输入所述违规话术识别模型进行违规话术识别,输出识别结果;最后再基于所述识别结果,确定所述通话内容中是否存在违规话术。In this embodiment, the call content containing the illegal speech technique is used as the training sample in advance, and the machine learning method is used for training to obtain the illegal speech technique recognition model; and then the speech recognition output of the conversation content between the customer service and the customer is input into the The illegal speech recognition model recognizes the illegal speech and outputs the recognition result; finally, based on the recognition result, it is determined whether there is any illegal speech in the conversation content.
本实施例采用分类模型匹配方式进行违规话术检测。其中,预先以包含违规话术的通话内容作为训练样本,采用机器学习方式训练得到可以识别包含违规话术的通话内容的分类模型(也即违规话术识别模型)。例如,采用监督学习方式进行模型训练,其中,监督学习的训练样本集要求包括输入和输出,也即以通话内容为输入、以违规话术为输出。常见的监督学习算法包括回归分析和统计分类等。In this embodiment, a classification model matching method is used to detect illegal words. Among them, the conversation content containing the illegal speech technique is used as a training sample in advance, and a machine learning method is used to train to obtain a classification model that can identify the conversation content containing the illegal speech technique (that is, the illegal speech recognition model). For example, a supervised learning method is used for model training, where the training sample set requirements for supervised learning include input and output, that is, the content of the call is the input and the illegal speech technique is the output. Common supervised learning algorithms include regression analysis and statistical classification.
本实施例中,采用机器学习方式训练得到的分类模型对通话内容进行检测,若输出结果非空,则确定当前检测的通话内容中存在违规话术;而若输出结果为空,则确定当前检测的通话内容中不存在违规话术。In this embodiment, a classification model trained in a machine learning method is used to detect the content of the call. If the output result is not empty, it is determined that there is illegal speech in the currently detected call content; and if the output result is empty, the current detection is determined There are no illegal words in the content of the call.
步骤S110,若存在违规话术,则将所述违规话术与对应通话音频进行关联,并保存到所述通话记录库中。In step S110, if there is an illegal speech technique, associate the illegal speech technique with the corresponding call audio and save it in the call record library.
本实施例中,同一个通话音频中可能存在一个或多个违规话术,或者也可能不存在违规话术。若检测结果中存在违规话术,则将违规话术与对应的通话音频关联保存到通话记录库中。例如,通话记录1中记录的是:通话音频A以及通话音频A中存在的违规话术a;通话记录2中记录的是:通话音频B以及通话音频B中存在违规话术b1、b2、b3。In this embodiment, there may be one or more illegal words in the same call audio, or there may be no illegal words. If there is an illegal speech in the detection result, the illegal speech and the corresponding call audio are associated and saved in the call record library. For example, call record 1 records: call audio A and the illegal speech a in call audio A; call record 2 records: call audio B and call audio B with illegal speech b1, b2, b3 .
进一步可选的,在一可选实施例中,若检测到当前通话内容中存在违规话术,则输出与该违规话术关联的通话音频的相关信息,所述相关信息包括:客服信息、客户信息以及通话主题。Further optionally, in an optional embodiment, if it is detected that there is an illegal speech in the current call content, output related information of the call audio associated with the illegal speech, and the related information includes: customer service information, customer Information and subject of the call.
本可选实施例中,若通话内容检测结果中存在违规话术,则进一步输出该存在违规话术的通话音频的相关信息,比如客服信息,例如客服姓名、工号ID、通话开始时间和结束时间;比如客户信息,例如客户姓名、手机号码等;比如通话主题,例如关于个人存款业务或者个人转账业务等。输出存在违规话术的通话音频的相关信息,可以便于便于能够快速定位相关责任人,进而改进服务质量,比如对当事客服人员进行指正,对客户进行道歉并重新回答客户问题,更正错误等处理。In this optional embodiment, if there is an illegal speech in the call content detection result, the relevant information of the call audio with the illegal speech is further output, such as customer service information, such as customer service name, job ID, call start time and end Time; such as customer information, such as customer name, mobile phone number, etc.; such as the subject of the call, such as personal deposit business or personal transfer business, etc. Outputting the relevant information of the call audio with illegal speech can facilitate the rapid positioning of the relevant responsible person, thereby improving the quality of service, such as correcting the customer service personnel involved, apologizing to the customer and re-answering customer questions, correcting errors, etc. .
本实施例采用语音识别技术,将通话音频识别为文字格式的通话内容,然后对文字格式的通话内容进行违规话术检测,若检测到违规话术,则自动减违规话术与对应通话音频进行关联保存,从而实现违规话术的自动化质检,避免了人工质检的繁琐耗时,提升了质检工作效率以及节省了成本。This embodiment uses voice recognition technology to recognize the call audio as text format call content, and then perform illegal speech detection on the text format call content. If the illegal speech is detected, the illegal speech and the corresponding call audio are automatically reduced. Associated storage, so as to realize the automatic quality inspection of illegal speech, avoid the tedious and time-consuming manual quality inspection, improve the efficiency of quality inspection and save costs.
参照图3,图3为本申请违规话术检测方法第二实施例的流程示意图。基于上述方法第一实施例,本实施例中,在上述步骤S110之后,还包括:Referring to FIG. 3, FIG. 3 is a schematic flowchart of a second embodiment of a method for detecting illegal speech in this application. Based on the first embodiment of the foregoing method, in this embodiment, after the foregoing step S110, the method further includes:
步骤S120,获取预置查询页面中输入的违规话术关键字;Step S120: Obtain the illegal speech keywords entered in the preset query page;
步骤S130,以所述违规话术关键字为查询条件,检索所述通话记录库,以检测所述通话记录库是否存在与所述违规话术关键字关联的通话音频;Step S130, searching the call log library with the illegal speech keyword as a query condition to detect whether there is a call audio associated with the illegal speech keyword in the call log library;
步骤S140,若存在,则播放与所述违规话术关键字关联的通话音频。Step S140, if it exists, play the call audio associated with the illegal speech keyword.
本实施例中,为进一步提升质检的灵活性,因此进一步提供可供用户检索的查询页面。本实施例提供的查询功能具体以查询关键字为查询条件,通过关键字匹配方式,搜索通话内容,若通话内容中存在与查询关键字匹配的内容,则自动播放与查询关键字匹配的通话内容所对应的通话音频。In this embodiment, in order to further enhance the flexibility of quality inspection, a query page that can be retrieved by the user is further provided. The query function provided in this embodiment specifically uses query keywords as query conditions to search for call content through keyword matching. If there is content matching the query keyword in the call content, the call content matching the query keyword is automatically played The corresponding call audio.
本实施例中,进一步提供查询页面,以便于用户灵活进行抽检。用户可以任意的违规话术作为查询关键字,对违规通话记录库进行检索,从而获得与目标违规话术相匹配的通话音频并自动播放,提升了检索灵活性。In this embodiment, a query page is further provided to facilitate the user to perform random inspections flexibly. The user can search the illegal call record library with any illegal speech technique as a query keyword, so as to obtain the call audio matching the target illegal speech technique and play it automatically, which improves the retrieval flexibility.
进一步可选的,在本申请违规话术检测方法一实施例中,为提升检索效率,上述步骤S130进一步包括:Further optionally, in an embodiment of the illegal speech detection method of the present application, in order to improve retrieval efficiency, the above step S130 further includes:
对所述通话记录库中各通话音频关联的违规话术进行字符拼接,以形成违规话术字符串,并将所述违规话术字符串传入内存中;Perform character splicing on the illegal speech associated with each call audio in the call record library to form a string of illegal speech, and transfer the string of illegal speech into the memory;
以所述违规话术关键字为查询条件,检索所述违规话术字符串,以检测所述通话记录库是否存在与所述违规话术关键字关联的通话音频。Using the illegal speech keyword as a query condition, retrieve the illegal speech string to detect whether there is a call audio associated with the illegal speech keyword in the call record library.
本实施例中,预先从通话记录库中提取与各通话音频关联的违规话术,然后将提取的各违规话术进行字符拼接,从而形成违规话术字符串,并将该违规话术字符串传入内存中;然后再以输入的违规话术关键字为查询条件,对违规话术字符串进行检索,只需一次检索操作即可完成通话记录库中多条违规话术的检索,进而提升了检索效率。In this embodiment, the illegal speech associated with each call audio is extracted from the call record library in advance, and then the extracted illegal speech is spliced to form a string of illegal speech, and the illegal speech string is combined Transfer to the memory; then use the entered illegal speech keywords as query conditions to retrieve the illegal speech strings. Only one retrieval operation is required to complete the retrieval of multiple illegal speeches in the call log library, thereby improving Improved retrieval efficiency.
本申请还提供一种违规话术检测装置。The application also provides a device for detecting illegal speech.
参照图4,图4为本申请违规话术检测装置第一实施例的功能模块示意图。 本实施例中,所述违规话术检测装置包括:Referring to Fig. 4, Fig. 4 is a schematic diagram of the functional modules of the first embodiment of the device for detecting illegal speech in this application. In this embodiment, the device for detecting illegal speech includes:
模型训练模块10,用于以包含违规话术的通话内容为训练样本,采用机器学习方式进行训练,得到违规话术识别模型;The model training module 10 is used to use the call content containing the illegal speech as a training sample, and use machine learning to train to obtain the illegal speech recognition model;
在一实施例中,所述模型训练模块10具体用于:In an embodiment, the model training module 10 is specifically configured to:
以包含违规话术的通话内容为训练样本,对所述训练样本进行分词,得到对应的分词数据;将所有分词数据映射成待训练的词向量;构建基于单隐层前馈神经网络的数学模型,并以所述词向量为所述数学模型的输入,以所述训练样本中的违规话术为所述数学模型的输出,对所述数学模型进行迭代训练,得到违规话术识别模型。Take the content of the call containing illegal speech as the training sample, segment the training sample to obtain the corresponding segmentation data; map all the segmentation data into the word vector to be trained; construct a mathematical model based on a single hidden layer feedforward neural network , And taking the word vector as the input of the mathematical model, and using the illegal speech in the training sample as the output of the mathematical model, and performing iterative training on the mathematical model to obtain the illegal speech recognition model.
获取模块20,用于获取通话记录库中客服与客户之间的通话音频;The obtaining module 20 is used to obtain the call audio between the customer service and the customer in the call record library;
语音识别模块30,用于对所述通话音频进行分帧,得到多个带时序的语音帧;按照时序依次提取所述语音帧的声音特征并生成包含声音信息的多维声音特向量;将所述多维声音特征向量输入预置声学模型进行处理,输出语音帧对应的音素信息;基于所述音素信息,查找预置字典,输出各音素信息对应的字或词;按照输出顺序将各音素信息对应的字或词输入预置语言模型进行处理,输出单个字或词相互关联的概率;将输出的最大概率的字或词拼接为文字格式的通话内容并输出;The speech recognition module 30 is configured to frame the call audio to obtain multiple speech frames with time sequence; sequentially extract the sound features of the speech frames according to the time sequence and generate a multi-dimensional sound feature vector containing sound information; The multi-dimensional sound feature vector is input into a preset acoustic model for processing, and the phoneme information corresponding to the speech frame is output; based on the phoneme information, the preset dictionary is searched, and the word or word corresponding to each phoneme information is output; and each phoneme information corresponds to the phoneme information according to the output order Words or words are input into a preset language model for processing, and the probability of a single word or word is outputted; the outputted word or word with the highest probability is spliced into text format call content and output;
话术识别模块40,用于将语音识别输出的客服与客户之间的通话内容输入所述违规话术识别模型进行违规话术识别,输出识别结果;基于所述识别结果,确定所述通话内容中是否存在违规话术;The speech recognition module 40 is configured to input the content of the conversation between the customer service and the customer output by the voice recognition into the illegal speech recognition model to identify the illegal speech, and output the recognition result; determine the content of the conversation based on the recognition result Whether there is any irregularity in the language;
关联保存模块50,用于若存在违规话术,则将所述违规话术与对应通话音频进行关联,并保存到所述通话记录库中。The association saving module 50 is configured to, if there is an illegal speech technique, associate the illegal speech technique with the corresponding call audio, and save it in the call record library.
基于与上述本申请违规话术检测方法相同的实施例说明内容,因此本实施例对违规话术检测装置的实施例内容不做过多赘述。Based on the description of the embodiment that is the same as the above-mentioned method for detecting illegal speech in this application, the content of the embodiment of the device for detecting illegal speech is not repeated in this embodiment.
本实施例采用语音识别技术,将通话音频识别为文字格式的通话内容,然后对文字格式的通话内容进行违规话术检测,若检测到违规话术,则自动减违规话术与对应通话音频进行关联保存,从而实现违规话术的自动化质检,避免了人工质检的繁琐耗时,提升了质检工作效率以及节省了成本。This embodiment uses voice recognition technology to recognize the call audio as text format call content, and then perform illegal speech detection on the text format call content. If the illegal speech is detected, the illegal speech and the corresponding call audio are automatically reduced. Associated storage, so as to realize the automatic quality inspection of illegal speech, avoid the tedious and time-consuming manual quality inspection, improve the efficiency of quality inspection and save costs.
参照图5,图5为本申请违规话术检测装置第二实施例的功能模块示意图。本实施例中,所述违规话术检测装置包括:Referring to Fig. 5, Fig. 5 is a schematic diagram of the functional modules of the second embodiment of the device for detecting illegal speech in this application. In this embodiment, the device for detecting illegal speech includes:
关键字获取模块60,用于获取预置查询页面中输入的违规话术关键字;The keyword acquisition module 60 is used to acquire the illegal speech keywords entered in the preset query page;
检索模块70,用于以所述违规话术关键字为查询条件,检索所述通话记 录库,以检测所述通话记录库是否存在与所述违规话术关键字关联的通话音频;The retrieval module 70 is configured to search the call log library using the illegal speech keyword as a query condition to detect whether there is a call audio associated with the illegal speech keyword in the call log library;
播放模块80,用于若所述通话记录库存在与所述违规话术关键字关联的通话音频,则播放与所述违规话术关键字关联的通话音频。The playing module 80 is configured to play the call audio associated with the illegal speech keyword if the call record inventory contains the call audio associated with the illegal speech keyword.
本实施例中,进一步提供查询页面,以便于用户灵活进行抽检。用户可以任意的违规话术作为查询关键字,对违规通话记录库进行检索,从而获得与目标违规话术相匹配的通话音频并自动播放,提升了检索灵活性。In this embodiment, a query page is further provided to facilitate the user to perform random inspections flexibly. The user can search the illegal call record library with any illegal speech technique as a query keyword, so as to obtain the call audio matching the target illegal speech technique and play it automatically, which improves the retrieval flexibility.
本申请还提供一种非易失性计算机可读存储介质。The application also provides a non-volatile computer-readable storage medium.
本实施例中,所述计算机可读存储介质上存储有违规话术检测程序,所述违规话术检测程序被处理器执行时实现如上述任一项实施例中所述的违规话术检测方法的步骤。其中,违规话术检测程序被处理器执行时所实现的方法可参照本申请违规话术检测方法的各个实施例,因此不再过多赘述。In this embodiment, the computer-readable storage medium stores an illegal speech detection program, and when the illegal speech detection program is executed by a processor, the illegal speech detection method as described in any of the above embodiments is implemented. A step of. Among them, the method implemented when the illegal speech detection program is executed by the processor can refer to the various embodiments of the illegal speech detection method of the present application, so it will not be repeated.
可选的,在一具体实施例中,所述违规话术检测程序被处理器执行时实现如下违规话术检测方法的步骤:Optionally, in a specific embodiment, when the illegal speech detection program is executed by the processor, the following steps of the illegal speech detection method are implemented:
以包含违规话术的通话内容为训练样本,采用机器学习方式进行训练,得到违规话术识别模型;Take the call content containing the illegal speech as the training sample, and use the machine learning method to train to obtain the illegal speech recognition model;
获取通话记录库中客服与客户之间的通话音频;Obtain the call audio between the customer service and the customer in the call log library;
对所述通话音频进行分帧,得到多个带时序的语音帧;Framing the call audio to obtain multiple speech frames with time sequence;
按照时序依次提取所述语音帧的声音特征并生成包含声音信息的多维声音特向量;Sequentially extracting the sound features of the speech frames according to time sequence and generating a multi-dimensional sound feature vector containing sound information;
将所述多维声音特征向量输入预置声学模型进行处理,输出语音帧对应的音素信息;Inputting the multi-dimensional sound feature vector into a preset acoustic model for processing, and outputting phoneme information corresponding to the speech frame;
基于所述音素信息,查找预置字典,输出各音素信息对应的字或词;Based on the phoneme information, look up a preset dictionary, and output the word or word corresponding to each phoneme information;
按照输出顺序将各音素信息对应的字或词输入预置语言模型进行处理,输出单个字或词相互关联的概率;According to the output order, input the words or words corresponding to each phoneme information into the preset language model for processing, and output the probability that a single word or word is related to each other;
将输出的最大概率的字或词拼接为文字格式的通话内容并输出;Combine the words or words with the highest probability of output into text format call content and output;
将语音识别输出的客服与客户之间的通话内容输入所述违规话术识别模型进行违规话术识别,输出识别结果;Input the content of the conversation between the customer service and the customer output by the voice recognition into the illegal speech recognition model to identify the illegal speech, and output the recognition result;
基于所述识别结果,确定所述通话内容中是否存在违规话术;Based on the recognition result, determine whether there is illegal speech in the conversation content;
若存在违规话术,则将所述违规话术与对应通话音频进行关联,并保存到所述通话记录库中。If there is an illegal speech, the illegal speech is associated with the corresponding call audio and saved in the call record library.
可选的,在一具体实施例中,所述违规话术检测程序被处理器执行时还 实现如下违规话术检测方法的步骤:Optionally, in a specific embodiment, when the illegal speech detection program is executed by the processor, the following steps of the illegal speech detection method are further implemented:
若存在违规话术,则输出与该违规话术关联的通话音频的相关信息,所述相关信息包括:客服信息、客户信息以及通话主题。If there is an illegal speech, output related information of the call audio associated with the illegal speech, and the relevant information includes: customer service information, customer information, and the subject of the call.
可选的,在一具体实施例中,所述违规话术检测程序被处理器执行时还实现如下违规话术检测方法的步骤:Optionally, in a specific embodiment, when the illegal speech detection program is executed by the processor, the following steps of the illegal speech detection method are further implemented:
以包含违规话术的通话内容为训练样本,对所述训练样本进行分词,得到对应的分词数据;Using the content of the call that contains illegal speech as a training sample, segment the training sample to obtain corresponding segmentation data;
将所有分词数据映射成待训练的词向量;Map all word segmentation data into word vectors to be trained;
构建基于单隐层前馈神经网络的数学模型,并以所述词向量为所述数学模型的输入,以所述训练样本中的违规话术为所述数学模型的输出,对所述数学模型进行迭代训练,得到违规话术识别模型。Construct a mathematical model based on a single hidden layer feedforward neural network, and use the word vector as the input of the mathematical model, and the illegal speech in the training sample as the output of the mathematical model. Carry out iterative training to get a recognition model of illegal speech.
可选的,在一具体实施例中,所述违规话术检测程序被处理器执行时还实现如下违规话术检测方法的步骤:Optionally, in a specific embodiment, when the illegal speech detection program is executed by the processor, the following steps of the illegal speech detection method are further implemented:
获取预置查询页面中输入的违规话术关键字;Obtain the illegal verbal keywords entered in the preset query page;
以所述违规话术关键字为查询条件,检索所述通话记录库,以检测所述通话记录库是否存在与所述违规话术关键字关联的通话音频;Using the illegal speech keywords as query conditions, search the call record library to detect whether there is a call audio associated with the illegal speech keywords in the call record library;
若存在,则播放与所述违规话术关键字关联的通话音频。If it exists, the call audio associated with the illegal speech keyword is played.
可选的,在一具体实施例中,所述违规话术检测程序被处理器执行时还实现如下违规话术检测方法的步骤:Optionally, in a specific embodiment, when the illegal speech detection program is executed by the processor, the following steps of the illegal speech detection method are further implemented:
对所述通话记录库中各通话音频关联的违规话术进行字符拼接,以形成违规话术字符串,并将所述违规话术字符串传入内存中;Perform character splicing on the illegal speech associated with each call audio in the call record library to form a string of illegal speech, and transfer the string of illegal speech into the memory;
以所述违规话术关键字为查询条件,检索所述违规话术字符串,以检测所述通话记录库是否存在与所述违规话术关键字关联的通话音频。Using the illegal speech keyword as a query condition, retrieve the illegal speech string to detect whether there is a call audio associated with the illegal speech keyword in the call record library.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product. The computer software product is stored in a storage medium (such as ROM/RAM), including Several instructions are used to make a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) execute the method described in each embodiment of the present application.

Claims (20)

  1. 一种违规话术检测方法,所述违规话术检测方法包括以下步骤:A method for detecting illegal speech, the method for detecting illegal speech includes the following steps:
    以包含违规话术的通话内容为训练样本,采用机器学习方式进行训练,得到违规话术识别模型;Take the call content containing the illegal speech as the training sample, and use the machine learning method to train to obtain the illegal speech recognition model;
    获取通话记录库中客服与客户之间的通话音频;Obtain the call audio between the customer service and the customer in the call log library;
    对所述通话音频进行分帧,得到多个带时序的语音帧;Framing the call audio to obtain multiple speech frames with time sequence;
    按照时序依次提取所述语音帧的声音特征并生成包含声音信息的多维声音特向量;Sequentially extracting the sound features of the speech frames according to time sequence and generating a multi-dimensional sound feature vector containing sound information;
    将所述多维声音特征向量输入预置声学模型进行处理,输出语音帧对应的音素信息;Inputting the multi-dimensional sound feature vector into a preset acoustic model for processing, and outputting phoneme information corresponding to the speech frame;
    基于所述音素信息,查找预置字典,输出各音素信息对应的字或词;Based on the phoneme information, look up a preset dictionary, and output the word or word corresponding to each phoneme information;
    按照输出顺序将各音素信息对应的字或词输入预置语言模型进行处理,输出单个字或词相互关联的概率;According to the output order, input the words or words corresponding to each phoneme information into the preset language model for processing, and output the probability that a single word or word is related to each other;
    将输出的最大概率的字或词拼接为文字格式的通话内容并输出;Combine the words or words with the highest probability of output into text format call content and output;
    将语音识别输出的客服与客户之间的通话内容输入所述违规话术识别模型进行违规话术识别,输出识别结果;Input the content of the conversation between the customer service and the customer output by the voice recognition into the illegal speech recognition model to identify the illegal speech, and output the recognition result;
    基于所述识别结果,确定所述通话内容中是否存在违规话术;Based on the recognition result, determine whether there is illegal speech in the conversation content;
    若存在违规话术,则将所述违规话术与对应通话音频进行关联,并保存到所述通话记录库中。If there is an illegal speech, the illegal speech is associated with the corresponding call audio and saved in the call record library.
  2. 如权利要求1所述的违规话术检测方法,所述违规话术检测方法还包括:The method for detecting illegal speech according to claim 1, wherein the method for detecting illegal speech further comprises:
    若存在违规话术,则输出与该违规话术关联的通话音频的相关信息,所述相关信息包括:客服信息、客户信息以及通话主题。If there is an illegal speech, output related information of the call audio associated with the illegal speech, and the relevant information includes: customer service information, customer information, and the subject of the call.
  3. 如权利要求1所述的违规话术检测方法,所述以包含违规话术的通话内容为训练样本,采用机器学习方式进行训练,得到违规话术识别模型包括:The method for detecting illegal speech as claimed in claim 1, wherein said using the call content containing illegal speech as a training sample and using machine learning to train to obtain the illegal speech recognition model comprises:
    以包含违规话术的通话内容为训练样本,对所述训练样本进行分词,得到对应的分词数据;Using the content of the call that contains illegal speech as a training sample, segment the training sample to obtain corresponding segmentation data;
    将所有分词数据映射成待训练的词向量;Map all word segmentation data into word vectors to be trained;
    构建基于单隐层前馈神经网络的数学模型,并以所述词向量为所述数学模型的输入,以所述训练样本中的违规话术为所述数学模型的输出,对所述 数学模型进行迭代训练,得到违规话术识别模型。Construct a mathematical model based on a single hidden layer feedforward neural network, and use the word vector as the input of the mathematical model, and the illegal speech in the training sample as the output of the mathematical model. Carry out iterative training to get a recognition model of illegal speech.
  4. 如权利要求1所述的违规话术检测方法,在所述若存在违规话术,则将所述违规话术与对应通话音频进行关联,并保存到所述通话记录库中的步骤之后,还包括:The method for detecting illegal speech according to claim 1, after the step of associating the illegal speech with the corresponding call audio if there is an illegal speech and saving it in the call record library, further include:
    获取预置查询页面中输入的违规话术关键字;Obtain the illegal verbal keywords entered in the preset query page;
    以所述违规话术关键字为查询条件,检索所述通话记录库,以检测所述通话记录库是否存在与所述违规话术关键字关联的通话音频;Using the illegal speech keywords as query conditions, search the call record library to detect whether there is a call audio associated with the illegal speech keywords in the call record library;
    若存在,则播放与所述违规话术关键字关联的通话音频。If it exists, the call audio associated with the illegal speech keyword is played.
  5. 如权利要求4所述的违规话术检测方法,所述以所述违规话术关键字为查询条件,检索所述通话记录库,以检测所述通话记录库是否存在与所述违规话术关键字关联的通话音频包括:The method for detecting illegal speech according to claim 4, wherein the said call record library is retrieved by using the keywords of the illegal speech as a query condition to detect whether the call record library is related to the key of the illegal speech. The call audio associated with the word includes:
    对所述通话记录库中各通话音频关联的违规话术进行字符拼接,以形成违规话术字符串,并将所述违规话术字符串传入内存中;Perform character splicing on the illegal speech associated with each call audio in the call record library to form a string of illegal speech, and transfer the string of illegal speech into the memory;
    以所述违规话术关键字为查询条件,检索所述违规话术字符串,以检测所述通话记录库是否存在与所述违规话术关键字关联的通话音频。Using the illegal speech keyword as a query condition, retrieve the illegal speech string to detect whether there is a call audio associated with the illegal speech keyword in the call record library.
  6. 一种违规话术检测装置,所述违规话术检测装置包括:An illegal speech detection device, the illegal speech detection device comprising:
    模型训练模块,用于以包含违规话术的通话内容为训练样本,采用机器学习方式进行训练,得到违规话术识别模型;The model training module is used to take the call content containing the illegal speech as the training sample, and use the machine learning method to train to obtain the illegal speech recognition model;
    获取模块,用于获取通话记录库中客服与客户之间的通话音频;The acquisition module is used to acquire the call audio between the customer service and the customer in the call log library;
    语音识别模块,用于对所述通话音频进行分帧,得到多个带时序的语音帧;按照时序依次提取所述语音帧的声音特征并生成包含声音信息的多维声音特向量;将所述多维声音特征向量输入预置声学模型进行处理,输出语音帧对应的音素信息;基于所述音素信息,查找预置字典,输出各音素信息对应的字或词;按照输出顺序将各音素信息对应的字或词输入预置语言模型进行处理,输出单个字或词相互关联的概率;将输出的最大概率的字或词拼接为文字格式的通话内容并输出;The voice recognition module is used to frame the call audio to obtain multiple voice frames with time sequence; extract the voice features of the voice frames in sequence according to the time sequence and generate a multi-dimensional voice feature vector containing voice information; The voice feature vector is input to the preset acoustic model for processing, and the phoneme information corresponding to the speech frame is output; based on the phoneme information, the preset dictionary is searched, and the word or word corresponding to each phoneme information is output; and the words corresponding to each phoneme information are output in the order Or words are input into a preset language model for processing, and the probability of a single word or word is outputted; the outputted word or word with the highest probability is spliced into text format call content and output;
    话术识别模块,用于将语音识别输出的客服与客户之间的通话内容输入所述违规话术识别模型进行违规话术识别,输出识别结果;基于所述识别结果,确定所述通话内容中是否存在违规话术;The speech recognition module is used to input the content of the conversation between the customer service and the customer output by the voice recognition into the illegal speech recognition model to identify the illegal speech, and output the recognition result; based on the recognition result, determine the content of the conversation Whether there is illegal speech;
    关联保存模块,用于若存在违规话术,则将所述违规话术与对应通话音 频进行关联,并保存到所述通话记录库中。The association saving module is used for associating the illegal speech with the corresponding call audio if there is an illegal speech technique, and saving it in the call record library.
  7. 如权利要求6所述的违规话术检测装置,所述模型训练模块具体用于:According to the illegal speech detection device according to claim 6, the model training module is specifically configured to:
    以包含违规话术的通话内容为训练样本,对所述训练样本进行分词,得到对应的分词数据;Using the content of the call that contains illegal speech as a training sample, segment the training sample to obtain corresponding segmentation data;
    将所有分词数据映射成待训练的词向量;Map all word segmentation data into word vectors to be trained;
    构建基于单隐层前馈神经网络的数学模型,并以所述词向量为所述数学模型的输入,以所述训练样本中的违规话术为所述数学模型的输出,对所述数学模型进行迭代训练,得到违规话术识别模型。Construct a mathematical model based on a single hidden layer feedforward neural network, and use the word vector as the input of the mathematical model, and the illegal speech in the training sample as the output of the mathematical model. Carry out iterative training to get a recognition model of illegal speech.
  8. 如权利要求6所述的违规话术检测装置,所述违规话术检测装置还包括:8. The illegal speech detection device according to claim 6, the illegal speech detection device further comprising:
    关键字获取模块,用于获取预置查询页面中输入的违规话术关键字;Keyword acquisition module, used to acquire the illegal verbal keywords entered in the preset query page;
    检索模块,用于以所述违规话术关键字为查询条件,检索所述通话记录库,以检测所述通话记录库是否存在与所述违规话术关键字关联的通话音频;A retrieval module, configured to search the call log library using the illegal speech keyword as a query condition to detect whether there is a call audio associated with the illegal speech keyword in the call log library;
    播放模块,用于若所述通话记录库存在与所述违规话术关键字关联的通话音频,则播放与所述违规话术关键字关联的通话音频。The playing module is configured to play the call audio associated with the illegal speech keyword if the call record library contains the call audio associated with the illegal speech keyword.
  9. 如权利要求6所述的违规话术检测装置,所述违规话术检测装置还包括:8. The illegal speech detection device according to claim 6, the illegal speech detection device further comprising:
    输出模块,用于若存在违规话术,则输出与该违规话术关联的通话音频的相关信息,所述相关信息包括:客服信息、客户信息以及通话主题。The output module is used for outputting related information of the call audio associated with the illegal speech if there is an illegal speech. The relevant information includes: customer service information, customer information, and the subject of the call.
  10. 如权利要求8所述的违规话术检测装置,所述检索模块具体用于:According to the illegal speech detection device according to claim 8, the retrieval module is specifically configured to:
    对所述通话记录库中各通话音频关联的违规话术进行字符拼接,以形成违规话术字符串,并将所述违规话术字符串传入内存中;Perform character splicing on the illegal speech associated with each call audio in the call record library to form a string of illegal speech, and transfer the string of illegal speech into the memory;
    以所述违规话术关键字为查询条件,检索所述违规话术字符串,以检测所述通话记录库是否存在与所述违规话术关键字关联的通话音频。Using the illegal speech keyword as a query condition, retrieve the illegal speech string to detect whether there is a call audio associated with the illegal speech keyword in the call record library.
  11. 一种违规话术检测设备,所述违规话术检测设备包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的违规话术检测程序,所述违规话术检测程序被所述处理器执行时实现如下违规话术检测方法的步骤:An illegal speech detection equipment, which includes a memory, a processor, and an illegal speech detection program stored on the memory and capable of being run on the processor, the illegal speech detection program When executed by the processor, the following steps of the illegal speech detection method are realized:
    以包含违规话术的通话内容为训练样本,采用机器学习方式进行训练,得到违规话术识别模型;Take the call content containing the illegal speech as the training sample, and use the machine learning method to train to obtain the illegal speech recognition model;
    获取通话记录库中客服与客户之间的通话音频;Obtain the call audio between the customer service and the customer in the call log library;
    对所述通话音频进行分帧,得到多个带时序的语音帧;Framing the call audio to obtain multiple speech frames with time sequence;
    按照时序依次提取所述语音帧的声音特征并生成包含声音信息的多维声音特向量;Sequentially extracting the sound features of the speech frames according to time sequence and generating a multi-dimensional sound feature vector containing sound information;
    将所述多维声音特征向量输入预置声学模型进行处理,输出语音帧对应的音素信息;Inputting the multi-dimensional sound feature vector into a preset acoustic model for processing, and outputting phoneme information corresponding to the speech frame;
    基于所述音素信息,查找预置字典,输出各音素信息对应的字或词;Based on the phoneme information, look up a preset dictionary, and output the word or word corresponding to each phoneme information;
    按照输出顺序将各音素信息对应的字或词输入预置语言模型进行处理,输出单个字或词相互关联的概率;According to the output order, input the words or words corresponding to each phoneme information into the preset language model for processing, and output the probability that a single word or word is related to each other;
    将输出的最大概率的字或词拼接为文字格式的通话内容并输出;Combine the words or words with the highest probability of output into text format call content and output;
    将语音识别输出的客服与客户之间的通话内容输入所述违规话术识别模型进行违规话术识别,输出识别结果;Input the content of the conversation between the customer service and the customer output by the voice recognition into the illegal speech recognition model to identify the illegal speech, and output the recognition result;
    基于所述识别结果,确定所述通话内容中是否存在违规话术;Based on the recognition result, determine whether there is illegal speech in the conversation content;
    若存在违规话术,则将所述违规话术与对应通话音频进行关联,并保存到所述通话记录库中。If there is an illegal speech, the illegal speech is associated with the corresponding call audio and saved in the call record library.
  12. 如权利要求11所述的违规话术检测设备,所述违规话术检测程序被所述处理器执行时还实现如下违规话术检测方法的步骤:The illegal speech detection device according to claim 11, when the illegal speech detection program is executed by the processor, the following steps of the illegal speech detection method are further implemented:
    若存在违规话术,则输出与该违规话术关联的通话音频的相关信息,所述相关信息包括:客服信息、客户信息以及通话主题。If there is an illegal speech, output related information of the call audio associated with the illegal speech, and the relevant information includes: customer service information, customer information, and the subject of the call.
  13. 如权利要求11所述的违规话术检测设备,所述违规话术检测程序被所述处理器执行时还实现如下违规话术检测方法的步骤:The illegal speech detection device according to claim 11, when the illegal speech detection program is executed by the processor, the following steps of the illegal speech detection method are further implemented:
    以包含违规话术的通话内容为训练样本,对所述训练样本进行分词,得到对应的分词数据;Using the content of the call that contains illegal speech as a training sample, segment the training sample to obtain corresponding segmentation data;
    将所有分词数据映射成待训练的词向量;Map all word segmentation data into word vectors to be trained;
    构建基于单隐层前馈神经网络的数学模型,并以所述词向量为所述数学模型的输入,以所述训练样本中的违规话术为所述数学模型的输出,对所述数学模型进行迭代训练,得到违规话术识别模型。Construct a mathematical model based on a single hidden layer feedforward neural network, and use the word vector as the input of the mathematical model, and the illegal speech in the training sample as the output of the mathematical model. Carry out iterative training to get a recognition model of illegal speech.
  14. 如权利要求11所述的违规话术检测设备,所述违规话术检测程序被 所述处理器执行时还实现如下违规话术检测方法的步骤:The illegal speech detection device according to claim 11, when the illegal speech detection program is executed by the processor, the following steps of the illegal speech detection method are further implemented:
    获取预置查询页面中输入的违规话术关键字;Obtain the illegal verbal keywords entered in the preset query page;
    以所述违规话术关键字为查询条件,检索所述通话记录库,以检测所述通话记录库是否存在与所述违规话术关键字关联的通话音频;Using the illegal speech keywords as query conditions, search the call record library to detect whether there is a call audio associated with the illegal speech keywords in the call record library;
    若存在,则播放与所述违规话术关键字关联的通话音频。If it exists, the call audio associated with the illegal speech keyword is played.
  15. 如权利要求14所述的违规话术检测设备,所述违规话术检测程序被所述处理器执行时还实现如下违规话术检测方法的步骤:The illegal speech detection device according to claim 14, wherein the illegal speech detection program further implements the following steps of the illegal speech detection method when being executed by the processor:
    对所述通话记录库中各通话音频关联的违规话术进行字符拼接,以形成违规话术字符串,并将所述违规话术字符串传入内存中;Perform character splicing on the illegal speech associated with each call audio in the call record library to form a string of illegal speech, and transfer the string of illegal speech into the memory;
    以所述违规话术关键字为查询条件,检索所述违规话术字符串,以检测所述通话记录库是否存在与所述违规话术关键字关联的通话音频。Using the illegal speech keyword as a query condition, retrieve the illegal speech string to detect whether there is a call audio associated with the illegal speech keyword in the call record library.
  16. 一种非易失性计算机可读存储介质,所述计算机可读存储介质上存储有违规话术检测程序,所述违规话术检测程序被处理器执行时实现如下违规话术检测方法的步骤:A non-volatile computer-readable storage medium, the computer-readable storage medium stores a illegal speech detection program, and when the illegal speech detection program is executed by a processor, the following steps of the illegal speech detection method are implemented:
    以包含违规话术的通话内容为训练样本,采用机器学习方式进行训练,得到违规话术识别模型;Take the call content containing the illegal speech as the training sample, and use the machine learning method to train to obtain the illegal speech recognition model;
    获取通话记录库中客服与客户之间的通话音频;Obtain the call audio between the customer service and the customer in the call log library;
    对所述通话音频进行分帧,得到多个带时序的语音帧;Framing the call audio to obtain multiple speech frames with time sequence;
    按照时序依次提取所述语音帧的声音特征并生成包含声音信息的多维声音特向量;Sequentially extracting the sound features of the speech frames according to time sequence and generating a multi-dimensional sound feature vector containing sound information;
    将所述多维声音特征向量输入预置声学模型进行处理,输出语音帧对应的音素信息;Inputting the multi-dimensional sound feature vector into a preset acoustic model for processing, and outputting phoneme information corresponding to the speech frame;
    基于所述音素信息,查找预置字典,输出各音素信息对应的字或词;Based on the phoneme information, look up a preset dictionary, and output the word or word corresponding to each phoneme information;
    按照输出顺序将各音素信息对应的字或词输入预置语言模型进行处理,输出单个字或词相互关联的概率;According to the output order, input the words or words corresponding to each phoneme information into the preset language model for processing, and output the probability that a single word or word is related to each other;
    将输出的最大概率的字或词拼接为文字格式的通话内容并输出;Combine the words or words with the highest probability of output into text format call content and output;
    将语音识别输出的客服与客户之间的通话内容输入所述违规话术识别模型进行违规话术识别,输出识别结果;Input the content of the conversation between the customer service and the customer output by the voice recognition into the illegal speech recognition model to identify the illegal speech, and output the recognition result;
    基于所述识别结果,确定所述通话内容中是否存在违规话术;Based on the recognition result, determine whether there is illegal speech in the conversation content;
    若存在违规话术,则将所述违规话术与对应通话音频进行关联,并保存到所述通话记录库中。If there is an illegal speech, the illegal speech is associated with the corresponding call audio and saved in the call record library.
  17. 如权利要求16所述的非易失性计算机可读存储介质,所述违规话术检测程序被处理器执行时还实现如下违规话术检测方法的步骤:16. The non-volatile computer-readable storage medium according to claim 16, wherein when the illegal speech detection program is executed by the processor, the following steps of the illegal speech detection method are further implemented:
    若存在违规话术,则输出与该违规话术关联的通话音频的相关信息,所述相关信息包括:客服信息、客户信息以及通话主题。If there is an illegal speech, output related information of the call audio associated with the illegal speech, and the relevant information includes: customer service information, customer information, and the subject of the call.
  18. 如权利要求16所述的非易失性计算机可读存储介质,所述违规话术检测程序被处理器执行时还实现如下违规话术检测方法的步骤:16. The non-volatile computer-readable storage medium according to claim 16, wherein when the illegal speech detection program is executed by the processor, the following steps of the illegal speech detection method are further implemented:
    以包含违规话术的通话内容为训练样本,对所述训练样本进行分词,得到对应的分词数据;Using the content of the call that contains illegal speech as a training sample, segment the training sample to obtain corresponding segmentation data;
    将所有分词数据映射成待训练的词向量;Map all word segmentation data into word vectors to be trained;
    构建基于单隐层前馈神经网络的数学模型,并以所述词向量为所述数学模型的输入,以所述训练样本中的违规话术为所述数学模型的输出,对所述数学模型进行迭代训练,得到违规话术识别模型。Construct a mathematical model based on a single hidden layer feedforward neural network, and use the word vector as the input of the mathematical model, and the illegal speech in the training sample as the output of the mathematical model. Carry out iterative training to get a recognition model of illegal speech.
  19. 如权利要求16所述的非易失性计算机可读存储介质,所述违规话术检测程序被处理器执行时还实现如下违规话术检测方法的步骤:16. The non-volatile computer-readable storage medium according to claim 16, wherein when the illegal speech detection program is executed by the processor, the following steps of the illegal speech detection method are further implemented:
    获取预置查询页面中输入的违规话术关键字;Obtain the illegal verbal keywords entered in the preset query page;
    以所述违规话术关键字为查询条件,检索所述通话记录库,以检测所述通话记录库是否存在与所述违规话术关键字关联的通话音频;Using the illegal speech keywords as query conditions, search the call record library to detect whether there is a call audio associated with the illegal speech keywords in the call record library;
    若存在,则播放与所述违规话术关键字关联的通话音频。If it exists, the call audio associated with the illegal speech keyword is played.
  20. 如权利要求19所述的非易失性计算机可读存储介质,所述违规话术检测程序被处理器执行时还实现如下违规话术检测方法的步骤:The non-volatile computer-readable storage medium according to claim 19, when the illegal speech detection program is executed by the processor, the following steps of the illegal speech detection method are further implemented:
    对所述通话记录库中各通话音频关联的违规话术进行字符拼接,以形成违规话术字符串,并将所述违规话术字符串传入内存中;Perform character splicing on the illegal speech associated with each call audio in the call record library to form a string of illegal speech, and transfer the string of illegal speech into the memory;
    以所述违规话术关键字为查询条件,检索所述违规话术字符串,以检测所述通话记录库是否存在与所述违规话术关键字关联的通话音频。Using the illegal speech keyword as a query condition, retrieve the illegal speech string to detect whether there is a call audio associated with the illegal speech keyword in the call record library.
PCT/CN2019/102261 2019-05-16 2019-08-23 Illegal speech detection method, apparatus and device and computer-readable storage medium WO2020228173A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910411437.7A CN110310663A (en) 2019-05-16 2019-05-16 Words art detection method, device, equipment and computer readable storage medium in violation of rules and regulations
CN201910411437.7 2019-05-16

Publications (1)

Publication Number Publication Date
WO2020228173A1 true WO2020228173A1 (en) 2020-11-19

Family

ID=68074763

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/102261 WO2020228173A1 (en) 2019-05-16 2019-08-23 Illegal speech detection method, apparatus and device and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN110310663A (en)
WO (1) WO2020228173A1 (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111182162B (en) * 2019-12-26 2023-04-25 深圳壹账通智能科技有限公司 Telephone quality inspection method, device, equipment and storage medium based on artificial intelligence
CN111405128B (en) * 2020-03-24 2022-02-18 中国—东盟信息港股份有限公司 Call quality inspection system based on voice-to-text conversion
CN111598485A (en) * 2020-05-28 2020-08-28 成都晓多科技有限公司 Multi-dimensional intelligent quality inspection method, device, terminal equipment and medium
CN111696531A (en) * 2020-05-28 2020-09-22 升智信息科技(南京)有限公司 Recognition method for improving speech recognition accuracy by using jargon sentences
CN111738835A (en) * 2020-06-22 2020-10-02 中国银行股份有限公司 Monitoring method, device, equipment and storage medium
CN112183079A (en) * 2020-09-07 2021-01-05 绿瘦健康产业集团有限公司 Voice monitoring method, device, medium and terminal equipment
CN112466281A (en) * 2020-10-13 2021-03-09 讯飞智元信息科技有限公司 Harmful audio recognition decoding method and device
CN112507121B (en) * 2020-12-01 2023-06-30 平安科技(深圳)有限公司 Customer service violation quality inspection method and device, computer equipment and storage medium
CN112671985A (en) * 2020-12-22 2021-04-16 平安普惠企业管理有限公司 Agent quality inspection method, device, equipment and storage medium based on deep learning
CN113038153B (en) * 2021-02-26 2023-06-02 深圳道乐科技有限公司 Financial live broadcast violation detection method, device, equipment and readable storage medium
CN115396549A (en) * 2021-05-25 2022-11-25 中国联合网络通信集团有限公司 Method for processing illegal call service device and electronic apparatus
CN113593553B (en) * 2021-07-12 2022-05-24 深圳市明源云客电子商务有限公司 Voice recognition method, voice recognition apparatus, voice management server, and storage medium
CN113641795A (en) * 2021-08-20 2021-11-12 上海明略人工智能(集团)有限公司 Method and device for dialectical statistics, electronic equipment and storage medium
CN114969293A (en) * 2022-05-31 2022-08-30 支付宝(杭州)信息技术有限公司 Data processing method, device and equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180308487A1 (en) * 2017-04-21 2018-10-25 Go-Vivace Inc. Dialogue System Incorporating Unique Speech to Text Conversion Method for Meaningful Dialogue Response
CN109151218A (en) * 2018-08-21 2019-01-04 平安科技(深圳)有限公司 Call voice quality detecting method, device, computer equipment and storage medium
CN109147768A (en) * 2018-09-13 2019-01-04 云南电网有限责任公司 A kind of audio recognition method and system based on deep learning

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107093431B (en) * 2016-02-18 2020-07-07 中国移动通信集团辽宁有限公司 Method and device for quality inspection of service quality
CN107222865B (en) * 2017-04-28 2019-08-13 北京大学 Communication swindle real-time detection method and system based on suspicious actions identification
CN109658923B (en) * 2018-10-19 2024-01-30 平安科技(深圳)有限公司 Speech quality inspection method, equipment, storage medium and device based on artificial intelligence

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180308487A1 (en) * 2017-04-21 2018-10-25 Go-Vivace Inc. Dialogue System Incorporating Unique Speech to Text Conversion Method for Meaningful Dialogue Response
CN109151218A (en) * 2018-08-21 2019-01-04 平安科技(深圳)有限公司 Call voice quality detecting method, device, computer equipment and storage medium
CN109147768A (en) * 2018-09-13 2019-01-04 云南电网有限责任公司 A kind of audio recognition method and system based on deep learning

Also Published As

Publication number Publication date
CN110310663A (en) 2019-10-08

Similar Documents

Publication Publication Date Title
WO2020228173A1 (en) Illegal speech detection method, apparatus and device and computer-readable storage medium
CN109151218B (en) Call voice quality inspection method and device, computer equipment and storage medium
US11037553B2 (en) Learning-type interactive device
US10347244B2 (en) Dialogue system incorporating unique speech to text conversion method for meaningful dialogue response
US8010343B2 (en) Disambiguation systems and methods for use in generating grammars
JP4880258B2 (en) Method and apparatus for natural language call routing using reliability scores
US7603279B2 (en) Grammar update system and method for speech recognition
US6526380B1 (en) Speech recognition system having parallel large vocabulary recognition engines
US6269335B1 (en) Apparatus and methods for identifying homophones among words in a speech recognition system
US7103542B2 (en) Automatically improving a voice recognition system
US8311824B2 (en) Methods and apparatus for language identification
US20170287474A1 (en) Improving Automatic Speech Recognition of Multilingual Named Entities
US10019514B2 (en) System and method for phonetic search over speech recordings
CN104903954A (en) Speaker verification and identification using artificial neural network-based sub-phonetic unit discrimination
JPWO2008114811A1 (en) Information search system, information search method, and information search program
CN104299623A (en) Automated confirmation and disambiguation modules in voice applications
CN109544104A (en) A kind of recruitment data processing method and device
CN113920986A (en) Conference record generation method, device, equipment and storage medium
CN112233680A (en) Speaker role identification method and device, electronic equipment and storage medium
US8423354B2 (en) Speech recognition dictionary creating support device, computer readable medium storing processing program, and processing method
KR102186641B1 (en) Method for examining applicant through automated scoring of spoken answer based on artificial intelligence
CN109119073A (en) Audio recognition method, system, speaker and storage medium based on multi-source identification
CN113129895A (en) Voice detection processing system
CN110853674A (en) Text collation method, apparatus, and computer-readable storage medium
CN110809796A (en) Speech recognition system and method with decoupled wake phrases

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19928857

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19928857

Country of ref document: EP

Kind code of ref document: A1