WO2020228173A1

WO2020228173A1 - Illegal speech detection method, apparatus and device and computer-readable storage medium

Info

Publication number: WO2020228173A1
Application number: PCT/CN2019/102261
Authority: WO
Inventors: 岳鹏昱
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-05-16
Filing date: 2019-08-23
Publication date: 2020-11-19
Also published as: CN110310663A

Abstract

The present application relates to the technical field of artificial intelligence, and disclosed is an illegal speech detection method, comprising: taking call content comprising illegal speech as training samples, and employing a machine learning method for training to obtain an illegal speech recognition model; acquiring in a call record library call audio between customer service and a customer; performing voice recognition on the call audio and outputting the call content in a text format; inputting the call content outputted by voice recognition into the illegal speech recognition model for recognition, and outputting a recognition result; on the basis of the recognition result, determining whether there is illegal speech in the call content; if there is illegal speech, then associating the illegal speech with the corresponding call audio and saving same in the call record library. Further disclosed by the present application are an illegal speech detection apparatus and device, and a computer-readable storage medium. The present application achieves the automated quality inspection of illegal speech, avoids tedious and time-consuming manual quality inspection, and improves the efficiency of quality inspection.

Description

Method, device, equipment and computer readable storage medium for detecting illegal speech

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 201910411437.7, and the invention title is "Illegal Speech Detection Method, Device, Equipment, and Computer-readable Storage Medium" on May 16, 2019, all of which The content is incorporated in the application by reference.

Technical field

This application relates to the field of artificial intelligence technology, and in particular to a method, device, equipment and computer-readable storage medium for detecting illegal speech.

Background technique

In the existing telephone customer service management, in order to improve service quality, customer service calls are generally recorded to facilitate subsequent spot checks on the service quality of the telephone customer service. For example, whether there is any illegal speech in the process of communicating with customers, so as to facilitate the evaluation of customer service work. The spot check process of the call content is usually done manually. The inventor realized that if the call content is large, the manual quality inspection method must be time-consuming and labor-intensive, and the execution efficiency is not high.

Summary of the invention

The main purpose of this application is to provide a method, device, equipment, and computer-readable storage medium for detecting illegal speech, aiming to solve the technical problem of low execution efficiency of manual quality inspection in the existing telephone customer service management.

In order to achieve the above objective, the present application provides a method for detecting illegal speech. The method for detecting illegal speech includes the following steps:

Take the call content containing the illegal speech as the training sample, and use the machine learning method to train to obtain the illegal speech recognition model;

Obtain the call audio between the customer service and the customer in the call log library;

Framing the call audio to obtain multiple speech frames with time sequence;

Sequentially extracting the sound features of the speech frames according to time sequence and generating a multi-dimensional sound feature vector containing sound information;

Inputting the multi-dimensional sound feature vector into a preset acoustic model for processing, and outputting phoneme information corresponding to the speech frame;

Based on the phoneme information, look up a preset dictionary, and output the word or word corresponding to each phoneme information;

According to the output order, input the words or words corresponding to each phoneme information into the preset language model for processing, and output the probability that a single word or word is related to each other;

Combine the words or words with the highest probability of output into text format call content and output;

Input the content of the conversation between the customer service and the customer output by the voice recognition into the illegal speech recognition model to identify the illegal speech, and output the recognition result;

Based on the recognition result, determine whether there is illegal speech in the conversation content;

If there is an illegal speech, the illegal speech is associated with the corresponding call audio and saved in the call record library.

Optionally, the method for detecting illegal speech techniques further includes:

If there is an illegal speech, output related information of the call audio associated with the illegal speech, and the relevant information includes: customer service information, customer information, and the subject of the call.

Optionally, the use of the call content containing the illegal speech as a training sample and the machine learning method for training to obtain the illegal speech recognition model includes:

Using the content of the call that contains illegal speech as a training sample, segment the training sample to obtain corresponding segmentation data;

Map all word segmentation data into word vectors to be trained;

Construct a mathematical model based on a single hidden layer feedforward neural network, and use the word vector as the input of the mathematical model, and the illegal speech in the training sample as the output of the mathematical model. Carry out iterative training to get a recognition model of illegal speech.

Optionally, after the step of associating the illegal speech with the corresponding call audio if there is an illegal speech and saving it in the call record library, the method further includes:

Obtain the illegal verbal keywords entered in the preset query page;

Using the illegal speech keywords as query conditions, search the call record library to detect whether there is a call audio associated with the illegal speech keywords in the call record library;

If it exists, the call audio associated with the illegal speech keyword is played.

Optionally, the searching the call log library using the illegal speech keyword as a query condition to detect whether there is a call audio associated with the illegal speech keyword in the call log library includes:

Perform character splicing on the illegal speech associated with each call audio in the call record library to form a string of illegal speech, and transfer the string of illegal speech into the memory;

Using the illegal speech keyword as a query condition, retrieve the illegal speech string to detect whether there is a call audio associated with the illegal speech keyword in the call record library.

Further, in order to achieve the above object, the present application also provides a device for detecting illegal speech, the device for detecting illegal speech includes:

The model training module is used to take the call content containing the illegal speech as the training sample, and use the machine learning method to train to obtain the illegal speech recognition model;

The acquisition module is used to acquire the call audio between the customer service and the customer in the call log library;

The voice recognition module is used to frame the call audio to obtain multiple voice frames with time sequence; extract the voice features of the voice frames in sequence according to the time sequence and generate a multi-dimensional voice feature vector containing voice information; The voice feature vector is input to the preset acoustic model for processing, and the phoneme information corresponding to the speech frame is output; based on the phoneme information, the preset dictionary is searched, and the word or word corresponding to each phoneme information is output; and the words corresponding to each phoneme information are output in the order Or words are input into a preset language model for processing, and the probability of a single word or word is outputted; the outputted word or word with the highest probability is spliced into text format call content and output;

The speech recognition module is used to input the content of the conversation between the customer service and the customer output by the voice recognition into the illegal speech recognition model to identify the illegal speech, and output the recognition result; based on the recognition result, determine the content of the conversation Whether there is illegal speech;

The association saving module is used for associating the illegal speech technique with the corresponding call audio if there is an illegal speech technique and saving it in the call record library.

Further, in order to achieve the above object, the present application also provides an illegal speech detection device, which includes a memory, a processor, and an illegal speech that is stored in the memory and can run on the processor. A speech art detection program, which implements the steps of the illegal speech detection method as described in any one of the above when the illegal speech detection program is executed by the processor.

Further, in order to achieve the above object, the present application also provides a computer-readable storage medium, the computer-readable storage medium stores an illegal speech detection program, and the illegal speech detection program is executed by a processor to achieve The steps of any one of the above-mentioned methods for detecting illegal speech.

This application uses voice recognition technology to recognize the call audio as text-format call content, and then conducts the illegal speech detection on the text-format call content. If the illegal speech is detected, the illegal speech is automatically associated with the corresponding call audio Save, so as to realize the automatic quality inspection of illegal speech, avoid the tedious and time-consuming manual quality inspection, improve the efficiency of quality inspection and save costs.

Description of the drawings

FIG. 1 is a schematic structural diagram of the device hardware operating environment involved in the embodiment of the illegal speech detection device solution of the application;

FIG. 2 is a schematic flowchart of the first embodiment of the method for detecting illegal speech in this application;

FIG. 3 is a schematic flowchart of a second embodiment of the method for detecting illegal speech in this application;

FIG. 4 is a schematic diagram of functional modules of the first embodiment of the device for detecting illegal speech in this application;

Fig. 5 is a schematic diagram of functional modules of a second embodiment of a device for detecting illegal speech in this application.

The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

Detailed ways

It should be understood that the specific embodiments described herein are only used to explain the application, and not used to limit the application.

This application provides a device for detecting illegal speech.

Referring to Fig. 1, Fig. 1 is a schematic structural diagram of a device hardware operating environment involved in an embodiment of a device for detecting illegal speech in this application.

As shown in FIG. 1, the illegal speech detection device may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Among them, the communication bus 1002 is used to implement connection and communication between these components. The user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. The network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface). The memory 1005 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as a magnetic disk memory. Optionally, the memory 1005 may also be a storage device independent of the foregoing processor 1001.

Those skilled in the art can understand that the hardware structure of the illegal speech detection device shown in FIG. 1 does not constitute a limitation on the illegal speech detection device, and may include more or less components than shown in the figure, or a combination of some Components, or different component arrangements.

As shown in FIG. 1, the memory 1005, which is a computer-readable storage medium, may include an operating system, a network communication module, a user interface module, and an illegal speech detection program. Among them, the operating system is a program that manages and controls illegal speech detection equipment and software resources, and supports the operation of network communication modules, user interface modules, illegal speech detection programs, and other programs or software; network communication modules are used to manage and control the network Interface 1004: The user interface module is used to manage and control the user interface 1003.

In the hardware structure of the illegal speech detection equipment shown in Figure 1, the network interface 1004 is mainly used to connect to the system backend, and to communicate data with the system backend; the user interface 1003 is mainly used to connect to the client (user side) and communicate with the client Data communication; the illegal speech detection device calls the illegal speech detection program stored in the memory 1005 through the processor 1001, and executes the operations of the following various embodiments of the illegal speech detection method.

Based on the hardware structure of the above-mentioned illegal speech detection equipment, various embodiments of the illegal speech detection method of the present application are proposed.

Referring to FIG. 2, FIG. 2 is a schematic flowchart of the first embodiment of the method for detecting illegal speech in this application. In this embodiment, the method for detecting illegal speech includes the following steps:

Step S10, taking the call content containing the illegal speech as a training sample, and adopting a machine learning method for training to obtain the illegal speech recognition model;

In this embodiment, in order to realize automatic identification of illegal speech in the content of the call, it is necessary to train a corresponding illegal speech recognition model in advance. There is no limit to the specific implementation of training.

Optionally, in an embodiment, the illegal speech recognition model is specifically trained in the following manner:

(1) Taking the content of the call that contains illegal speech as a training sample, segmenting the training sample to obtain corresponding segmentation data;

(2) Map all word segmentation data into word vectors to be trained;

(3) Construct a mathematical model based on a single hidden layer feedforward neural network, and use the word vector as the input of the mathematical model, and use the illegal words in the training sample as the output of the mathematical model. The mathematical model is iteratively trained to obtain a recognition model of illegal speech.

Word embedding refers to a vector representation that maps words or phrases from the vocabulary to real numbers. Usually words are in the form of natural language, but in machine learning technology, natural language cannot be processed directly. Instead, natural language words need to be converted into processable mathematical structures, that is, space vector form. Any word can be used. Represented as different vectors in space. For example, to sort all words into a long string, each word will correspond to a position after sorting, and then use an array of the same length as the number of words to represent a word, and the position array value of the word is 1. All position values of other words are 0, so that words can be mapped into word vectors.

Single hidden layer feedforward neural network is a special form of feedforward neural network. Feedforward neural network is a type of artificial neural network. In this kind of neural network, each neuron starts from the input layer, receives the input of the previous level, and outputs to the next level until the output layer. The feedforward neural network uses a unidirectional multilayer structure. Each layer contains several neurons, and the neurons in the same layer are not connected to each other, and the transmission of information between the layers is only carried out in one direction. The first layer is called the input layer, the last layer is the output layer, and the middle is the hidden layer. The hidden layer can be one layer or multiple layers.

In this optional embodiment, the single hidden layer feedforward neural network includes: the first layer: input layer, the second layer: hidden layer, and the third layer: output layer, Among them, the input layer uses the word vector corresponding to the call content as the training sample, and the output layer uses the illegal language in the call content as the training sample. Through repeated iterative training of the mathematical model based on the single hidden layer feedforward neural network, by constantly adjusting the weights of the input layer neurons to the hidden layer neurons, the threshold value of the hidden layer neurons, and the hidden layer neurons to the output layer The weight of the neuron is used to ensure that the error between the model output and the expected output is within an acceptable range, and then the model training is completed to obtain an illegal speech recognition model that can identify the content of the call.

Step S20: Obtain the call audio between the customer service and the customer in the call log library;

In this embodiment, during the telephone communication between the customer service and the customer, recording is automatically performed to form the call audio and save it in the call record library. The call audio used in the illegal speech detection can be either quasi real-time or historical, which can be set according to actual needs.

There is no limit to the selection method of the call audio detected each time, for example, the call audio of a fixed time or within a time limit is acquired each time. For example, the call audio within 10 hours is acquired every time, or the call audio within 1 day is acquired every time.

Optionally, when storing the call audio, further store information related to the call audio, such as customer service name, customer service ID, call start time, end time, customer name, customer phone number, call subject (such as personal deposit business, personal Transfer business, etc.).

Step S30: Framing the call audio to obtain multiple time-series voice frames;

In order to extract sound features more effectively, it is also necessary to perform audio data preprocessing such as filtering and framing of the call audio. The framing processing in this embodiment is to divide the sound into a small segment, and each small segment is called a frame of speech. Use the moving window function to realize the framing processing, and get multiple speech frames with time sequence.

Step S40, sequentially extracting the sound features of the speech frame according to the time sequence and generating a multi-dimensional sound feature vector containing sound information;

Feature extraction is to convert the sound signal from the time domain to the frequency domain to provide a suitable input feature vector for the acoustic model. This embodiment mainly uses linear prediction cepstral coefficient (LPCC) and Mel cepstral coefficient (MFCC) algorithms to extract voice features, and then converts each waveform speech frame into a multi-dimensional vector containing voice information.

Step S50: Input the multi-dimensional sound feature vector into a preset acoustic model for processing, and output phoneme information corresponding to the speech frame;

Acoustic model is a knowledge representation of differences in acoustics, phonetics, environmental variables, speaker gender, accent, etc. The acoustic model is obtained by training the voice data. The acoustic model can calculate the probability score of each feature vector on the acoustic feature according to the acoustic characteristics, that is, establish the mapping relationship between the voice feature and the phoneme.

Step S60, based on the phoneme information, look up a preset dictionary, and output the word or word corresponding to each phoneme information;

A dictionary is a collection of phoneme indexes corresponding to words, and is a mapping between words and phonemes. By looking up the dictionary, the words or words corresponding to each phoneme information are determined.

Step S70, according to the output order, input the words or words corresponding to each phoneme information into a preset language model for processing, and output the probability that a single word or word is related to each other;

The language model represents the probability of a certain character sequence, which can be obtained by training text language data. The language model can calculate the probability of the sound signal corresponding to the phrase sequence based on the linguistic characteristics, that is, establish the phoneme corresponding to the text to the phrase sequence composed of the text The mapping relationship.

Step S80, splicing the output word or word with the highest probability into the call content in text format and outputting it;

After obtaining the probability of occurrence of each word or phrase that may correspond to the call audio, the word or word with the highest probability is spliced into the call content in text format and output as the result of voice recognition.

For example, suppose there is a call audio with the text content "I am a robot", through feature extraction, the following feature vector is obtained [1 2 3 4 5 6....10]; this feature vector is input into the acoustic model for processing, and the corresponding The phoneme of, that is [1 2 3 4 5 6....10]—>w o s i j i q i r n; then look up the dictionary to get the word corresponding to the phoneme, nest: w o;我:W o; yes: s i; machine: j i; machine: q i; person: r n; level: j i; tolerance: r n; finally input the above output result into the language model for processing to obtain the corresponding word Or phrase sequence, as follows: I: 0.0786, yes: 0.0546, I am: 0.0898, machine: 0.0967, robot: 0.6785; through probability comparison, determine the maximum probability of each word or phrase: I am: 0.0898, robot: 0.6785, the output content after splicing is "I am a robot", which completes the voice recognition of the call audio.

In this embodiment, the voice recognition technology is used to perform voice recognition on the call audio, and output the call content in text format.

Optionally, in order to improve the efficiency of quality inspection, the call content in text format is further organized into customer service call content and customer call content.

Step S90, input the content of the conversation between the customer service and the customer output by the voice recognition into the illegal speech recognition model to identify the illegal speech, and output the recognition result;

Step S100, based on the recognition result, determine whether there is illegal speech in the conversation content;

Speech art refers to the prescribed terms of communication, and different industries or businesses use different speech art. Such as polite language, business language and so on. This embodiment does not limit the setting of illegal speech. The language skills can be used by the customer service alone or in pairs based on the conversation with the customer.

Illegal language refers to language that does not meet the requirements, such as impolite words or terms that do not meet business requirements. This embodiment does not limit the manner of detecting illegal speech. For example, detection is performed through character matching or classification and recognition based on mathematical models.

In this embodiment, the call content containing the illegal speech technique is used as the training sample in advance, and the machine learning method is used for training to obtain the illegal speech technique recognition model; and then the speech recognition output of the conversation content between the customer service and the customer is input into the The illegal speech recognition model recognizes the illegal speech and outputs the recognition result; finally, based on the recognition result, it is determined whether there is any illegal speech in the conversation content.

In this embodiment, a classification model matching method is used to detect illegal words. Among them, the conversation content containing the illegal speech technique is used as a training sample in advance, and a machine learning method is used to train to obtain a classification model that can identify the conversation content containing the illegal speech technique (that is, the illegal speech recognition model). For example, a supervised learning method is used for model training, where the training sample set requirements for supervised learning include input and output, that is, the content of the call is the input and the illegal speech technique is the output. Common supervised learning algorithms include regression analysis and statistical classification.

In this embodiment, a classification model trained in a machine learning method is used to detect the content of the call. If the output result is not empty, it is determined that there is illegal speech in the currently detected call content; and if the output result is empty, the current detection is determined There are no illegal words in the content of the call.

In step S110, if there is an illegal speech technique, associate the illegal speech technique with the corresponding call audio and save it in the call record library.

In this embodiment, there may be one or more illegal words in the same call audio, or there may be no illegal words. If there is an illegal speech in the detection result, the illegal speech and the corresponding call audio are associated and saved in the call record library. For example, call record 1 records: call audio A and the illegal speech a in call audio A; call record 2 records: call audio B and call audio B with illegal speech b1, b2, b3 .

Further optionally, in an optional embodiment, if it is detected that there is an illegal speech in the current call content, output related information of the call audio associated with the illegal speech, and the related information includes: customer service information, customer Information and subject of the call.

In this optional embodiment, if there is an illegal speech in the call content detection result, the relevant information of the call audio with the illegal speech is further output, such as customer service information, such as customer service name, job ID, call start time and end Time; such as customer information, such as customer name, mobile phone number, etc.; such as the subject of the call, such as personal deposit business or personal transfer business, etc. Outputting the relevant information of the call audio with illegal speech can facilitate the rapid positioning of the relevant responsible person, thereby improving the quality of service, such as correcting the customer service personnel involved, apologizing to the customer and re-answering customer questions, correcting errors, etc. .

This embodiment uses voice recognition technology to recognize the call audio as text format call content, and then perform illegal speech detection on the text format call content. If the illegal speech is detected, the illegal speech and the corresponding call audio are automatically reduced. Associated storage, so as to realize the automatic quality inspection of illegal speech, avoid the tedious and time-consuming manual quality inspection, improve the efficiency of quality inspection and save costs.

Referring to FIG. 3, FIG. 3 is a schematic flowchart of a second embodiment of a method for detecting illegal speech in this application. Based on the first embodiment of the foregoing method, in this embodiment, after the foregoing step S110, the method further includes:

Step S120: Obtain the illegal speech keywords entered in the preset query page;

Step S130, searching the call log library with the illegal speech keyword as a query condition to detect whether there is a call audio associated with the illegal speech keyword in the call log library;

Step S140, if it exists, play the call audio associated with the illegal speech keyword.

In this embodiment, in order to further enhance the flexibility of quality inspection, a query page that can be retrieved by the user is further provided. The query function provided in this embodiment specifically uses query keywords as query conditions to search for call content through keyword matching. If there is content matching the query keyword in the call content, the call content matching the query keyword is automatically played The corresponding call audio.

In this embodiment, a query page is further provided to facilitate the user to perform random inspections flexibly. The user can search the illegal call record library with any illegal speech technique as a query keyword, so as to obtain the call audio matching the target illegal speech technique and play it automatically, which improves the retrieval flexibility.

Further optionally, in an embodiment of the illegal speech detection method of the present application, in order to improve retrieval efficiency, the above step S130 further includes:

In this embodiment, the illegal speech associated with each call audio is extracted from the call record library in advance, and then the extracted illegal speech is spliced to form a string of illegal speech, and the illegal speech string is combined Transfer to the memory; then use the entered illegal speech keywords as query conditions to retrieve the illegal speech strings. Only one retrieval operation is required to complete the retrieval of multiple illegal speeches in the call log library, thereby improving Improved retrieval efficiency.

The application also provides a device for detecting illegal speech.

Referring to Fig. 4, Fig. 4 is a schematic diagram of the functional modules of the first embodiment of the device for detecting illegal speech in this application. In this embodiment, the device for detecting illegal speech includes:

The model training module 10 is used to use the call content containing the illegal speech as a training sample, and use machine learning to train to obtain the illegal speech recognition model;

In an embodiment, the model training module 10 is specifically configured to:

Take the content of the call containing illegal speech as the training sample, segment the training sample to obtain the corresponding segmentation data; map all the segmentation data into the word vector to be trained; construct a mathematical model based on a single hidden layer feedforward neural network , And taking the word vector as the input of the mathematical model, and using the illegal speech in the training sample as the output of the mathematical model, and performing iterative training on the mathematical model to obtain the illegal speech recognition model.

The obtaining module 20 is used to obtain the call audio between the customer service and the customer in the call record library;

The speech recognition module 30 is configured to frame the call audio to obtain multiple speech frames with time sequence; sequentially extract the sound features of the speech frames according to the time sequence and generate a multi-dimensional sound feature vector containing sound information; The multi-dimensional sound feature vector is input into a preset acoustic model for processing, and the phoneme information corresponding to the speech frame is output; based on the phoneme information, the preset dictionary is searched, and the word or word corresponding to each phoneme information is output; and each phoneme information corresponds to the phoneme information according to the output order Words or words are input into a preset language model for processing, and the probability of a single word or word is outputted; the outputted word or word with the highest probability is spliced into text format call content and output;

The speech recognition module 40 is configured to input the content of the conversation between the customer service and the customer output by the voice recognition into the illegal speech recognition model to identify the illegal speech, and output the recognition result; determine the content of the conversation based on the recognition result Whether there is any irregularity in the language;

The association saving module 50 is configured to, if there is an illegal speech technique, associate the illegal speech technique with the corresponding call audio, and save it in the call record library.

Based on the description of the embodiment that is the same as the above-mentioned method for detecting illegal speech in this application, the content of the embodiment of the device for detecting illegal speech is not repeated in this embodiment.

Referring to Fig. 5, Fig. 5 is a schematic diagram of the functional modules of the second embodiment of the device for detecting illegal speech in this application. In this embodiment, the device for detecting illegal speech includes:

The keyword acquisition module 60 is used to acquire the illegal speech keywords entered in the preset query page;

The retrieval module 70 is configured to search the call log library using the illegal speech keyword as a query condition to detect whether there is a call audio associated with the illegal speech keyword in the call log library;

The playing module 80 is configured to play the call audio associated with the illegal speech keyword if the call record inventory contains the call audio associated with the illegal speech keyword.

The application also provides a non-volatile computer-readable storage medium.

In this embodiment, the computer-readable storage medium stores an illegal speech detection program, and when the illegal speech detection program is executed by a processor, the illegal speech detection method as described in any of the above embodiments is implemented. A step of. Among them, the method implemented when the illegal speech detection program is executed by the processor can refer to the various embodiments of the illegal speech detection method of the present application, so it will not be repeated.

Optionally, in a specific embodiment, when the illegal speech detection program is executed by the processor, the following steps of the illegal speech detection method are implemented:

Framing the call audio to obtain multiple speech frames with time sequence;

Optionally, in a specific embodiment, when the illegal speech detection program is executed by the processor, the following steps of the illegal speech detection method are further implemented:

Map all word segmentation data into word vectors to be trained;

Obtain the illegal verbal keywords entered in the preset query page;

Through the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product. The computer software product is stored in a storage medium (such as ROM/RAM), including Several instructions are used to make a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) execute the method described in each embodiment of the present application.

Claims

A method for detecting illegal speech, the method for detecting illegal speech includes the following steps:

Take the call content containing the illegal speech as the training sample, and use the machine learning method to train to obtain the illegal speech recognition model;

Obtain the call audio between the customer service and the customer in the call log library;

Framing the call audio to obtain multiple speech frames with time sequence;

Sequentially extracting the sound features of the speech frames according to time sequence and generating a multi-dimensional sound feature vector containing sound information;

Inputting the multi-dimensional sound feature vector into a preset acoustic model for processing, and outputting phoneme information corresponding to the speech frame;

Based on the phoneme information, look up a preset dictionary, and output the word or word corresponding to each phoneme information;

According to the output order, input the words or words corresponding to each phoneme information into the preset language model for processing, and output the probability that a single word or word is related to each other;

Combine the words or words with the highest probability of output into text format call content and output;

Input the content of the conversation between the customer service and the customer output by the voice recognition into the illegal speech recognition model to identify the illegal speech, and output the recognition result;

Based on the recognition result, determine whether there is illegal speech in the conversation content;

If there is an illegal speech, the illegal speech is associated with the corresponding call audio and saved in the call record library.
The method for detecting illegal speech according to claim 1, wherein the method for detecting illegal speech further comprises:

If there is an illegal speech, output related information of the call audio associated with the illegal speech, and the relevant information includes: customer service information, customer information, and the subject of the call.
The method for detecting illegal speech as claimed in claim 1, wherein said using the call content containing illegal speech as a training sample and using machine learning to train to obtain the illegal speech recognition model comprises:

Using the content of the call that contains illegal speech as a training sample, segment the training sample to obtain corresponding segmentation data;

Map all word segmentation data into word vectors to be trained;

Construct a mathematical model based on a single hidden layer feedforward neural network, and use the word vector as the input of the mathematical model, and the illegal speech in the training sample as the output of the mathematical model. Carry out iterative training to get a recognition model of illegal speech.
The method for detecting illegal speech according to claim 1, after the step of associating the illegal speech with the corresponding call audio if there is an illegal speech and saving it in the call record library, further include:

Obtain the illegal verbal keywords entered in the preset query page;

Using the illegal speech keywords as query conditions, search the call record library to detect whether there is a call audio associated with the illegal speech keywords in the call record library;

If it exists, the call audio associated with the illegal speech keyword is played.
The method for detecting illegal speech according to claim 4, wherein the said call record library is retrieved by using the keywords of the illegal speech as a query condition to detect whether the call record library is related to the key of the illegal speech. The call audio associated with the word includes:

Perform character splicing on the illegal speech associated with each call audio in the call record library to form a string of illegal speech, and transfer the string of illegal speech into the memory;

Using the illegal speech keyword as a query condition, retrieve the illegal speech string to detect whether there is a call audio associated with the illegal speech keyword in the call record library.
An illegal speech detection device, the illegal speech detection device comprising:

The model training module is used to take the call content containing the illegal speech as the training sample, and use the machine learning method to train to obtain the illegal speech recognition model;

The acquisition module is used to acquire the call audio between the customer service and the customer in the call log library;

The voice recognition module is used to frame the call audio to obtain multiple voice frames with time sequence; extract the voice features of the voice frames in sequence according to the time sequence and generate a multi-dimensional voice feature vector containing voice information; The voice feature vector is input to the preset acoustic model for processing, and the phoneme information corresponding to the speech frame is output; based on the phoneme information, the preset dictionary is searched, and the word or word corresponding to each phoneme information is output; and the words corresponding to each phoneme information are output in the order Or words are input into a preset language model for processing, and the probability of a single word or word is outputted; the outputted word or word with the highest probability is spliced into text format call content and output;

The speech recognition module is used to input the content of the conversation between the customer service and the customer output by the voice recognition into the illegal speech recognition model to identify the illegal speech, and output the recognition result; based on the recognition result, determine the content of the conversation Whether there is illegal speech;

The association saving module is used for associating the illegal speech with the corresponding call audio if there is an illegal speech technique, and saving it in the call record library.
According to the illegal speech detection device according to claim 6, the model training module is specifically configured to:

Using the content of the call that contains illegal speech as a training sample, segment the training sample to obtain corresponding segmentation data;

Map all word segmentation data into word vectors to be trained;

Construct a mathematical model based on a single hidden layer feedforward neural network, and use the word vector as the input of the mathematical model, and the illegal speech in the training sample as the output of the mathematical model. Carry out iterative training to get a recognition model of illegal speech.
8. The illegal speech detection device according to claim 6, the illegal speech detection device further comprising:

Keyword acquisition module, used to acquire the illegal verbal keywords entered in the preset query page;

A retrieval module, configured to search the call log library using the illegal speech keyword as a query condition to detect whether there is a call audio associated with the illegal speech keyword in the call log library;

The playing module is configured to play the call audio associated with the illegal speech keyword if the call record library contains the call audio associated with the illegal speech keyword.
8. The illegal speech detection device according to claim 6, the illegal speech detection device further comprising:

The output module is used for outputting related information of the call audio associated with the illegal speech if there is an illegal speech. The relevant information includes: customer service information, customer information, and the subject of the call.
According to the illegal speech detection device according to claim 8, the retrieval module is specifically configured to:

Perform character splicing on the illegal speech associated with each call audio in the call record library to form a string of illegal speech, and transfer the string of illegal speech into the memory;

Using the illegal speech keyword as a query condition, retrieve the illegal speech string to detect whether there is a call audio associated with the illegal speech keyword in the call record library.
An illegal speech detection equipment, which includes a memory, a processor, and an illegal speech detection program stored on the memory and capable of being run on the processor, the illegal speech detection program When executed by the processor, the following steps of the illegal speech detection method are realized:

Take the call content containing the illegal speech as the training sample, and use the machine learning method to train to obtain the illegal speech recognition model;

Obtain the call audio between the customer service and the customer in the call log library;

Framing the call audio to obtain multiple speech frames with time sequence;

Sequentially extracting the sound features of the speech frames according to time sequence and generating a multi-dimensional sound feature vector containing sound information;

Inputting the multi-dimensional sound feature vector into a preset acoustic model for processing, and outputting phoneme information corresponding to the speech frame;

Based on the phoneme information, look up a preset dictionary, and output the word or word corresponding to each phoneme information;

According to the output order, input the words or words corresponding to each phoneme information into the preset language model for processing, and output the probability that a single word or word is related to each other;

Combine the words or words with the highest probability of output into text format call content and output;

Input the content of the conversation between the customer service and the customer output by the voice recognition into the illegal speech recognition model to identify the illegal speech, and output the recognition result;

Based on the recognition result, determine whether there is illegal speech in the conversation content;

If there is an illegal speech, the illegal speech is associated with the corresponding call audio and saved in the call record library.
The illegal speech detection device according to claim 11, when the illegal speech detection program is executed by the processor, the following steps of the illegal speech detection method are further implemented:

If there is an illegal speech, output related information of the call audio associated with the illegal speech, and the relevant information includes: customer service information, customer information, and the subject of the call.
The illegal speech detection device according to claim 11, when the illegal speech detection program is executed by the processor, the following steps of the illegal speech detection method are further implemented:

Using the content of the call that contains illegal speech as a training sample, segment the training sample to obtain corresponding segmentation data;

Map all word segmentation data into word vectors to be trained;

Construct a mathematical model based on a single hidden layer feedforward neural network, and use the word vector as the input of the mathematical model, and the illegal speech in the training sample as the output of the mathematical model. Carry out iterative training to get a recognition model of illegal speech.
The illegal speech detection device according to claim 11, when the illegal speech detection program is executed by the processor, the following steps of the illegal speech detection method are further implemented:

Obtain the illegal verbal keywords entered in the preset query page;

Using the illegal speech keywords as query conditions, search the call record library to detect whether there is a call audio associated with the illegal speech keywords in the call record library;

If it exists, the call audio associated with the illegal speech keyword is played.
The illegal speech detection device according to claim 14, wherein the illegal speech detection program further implements the following steps of the illegal speech detection method when being executed by the processor:

Perform character splicing on the illegal speech associated with each call audio in the call record library to form a string of illegal speech, and transfer the string of illegal speech into the memory;

Using the illegal speech keyword as a query condition, retrieve the illegal speech string to detect whether there is a call audio associated with the illegal speech keyword in the call record library.
A non-volatile computer-readable storage medium, the computer-readable storage medium stores a illegal speech detection program, and when the illegal speech detection program is executed by a processor, the following steps of the illegal speech detection method are implemented:

Take the call content containing the illegal speech as the training sample, and use the machine learning method to train to obtain the illegal speech recognition model;

Obtain the call audio between the customer service and the customer in the call log library;

Framing the call audio to obtain multiple speech frames with time sequence;

Sequentially extracting the sound features of the speech frames according to time sequence and generating a multi-dimensional sound feature vector containing sound information;

Inputting the multi-dimensional sound feature vector into a preset acoustic model for processing, and outputting phoneme information corresponding to the speech frame;

Based on the phoneme information, look up a preset dictionary, and output the word or word corresponding to each phoneme information;

According to the output order, input the words or words corresponding to each phoneme information into the preset language model for processing, and output the probability that a single word or word is related to each other;

Combine the words or words with the highest probability of output into text format call content and output;

Input the content of the conversation between the customer service and the customer output by the voice recognition into the illegal speech recognition model to identify the illegal speech, and output the recognition result;

Based on the recognition result, determine whether there is illegal speech in the conversation content;

If there is an illegal speech, the illegal speech is associated with the corresponding call audio and saved in the call record library.
16. The non-volatile computer-readable storage medium according to claim 16, wherein when the illegal speech detection program is executed by the processor, the following steps of the illegal speech detection method are further implemented:

If there is an illegal speech, output related information of the call audio associated with the illegal speech, and the relevant information includes: customer service information, customer information, and the subject of the call.
16. The non-volatile computer-readable storage medium according to claim 16, wherein when the illegal speech detection program is executed by the processor, the following steps of the illegal speech detection method are further implemented:

Using the content of the call that contains illegal speech as a training sample, segment the training sample to obtain corresponding segmentation data;

Map all word segmentation data into word vectors to be trained;

Construct a mathematical model based on a single hidden layer feedforward neural network, and use the word vector as the input of the mathematical model, and the illegal speech in the training sample as the output of the mathematical model. Carry out iterative training to get a recognition model of illegal speech.
16. The non-volatile computer-readable storage medium according to claim 16, wherein when the illegal speech detection program is executed by the processor, the following steps of the illegal speech detection method are further implemented:

Obtain the illegal verbal keywords entered in the preset query page;

Using the illegal speech keywords as query conditions, search the call record library to detect whether there is a call audio associated with the illegal speech keywords in the call record library;

If it exists, the call audio associated with the illegal speech keyword is played.
The non-volatile computer-readable storage medium according to claim 19, when the illegal speech detection program is executed by the processor, the following steps of the illegal speech detection method are further implemented:

Perform character splicing on the illegal speech associated with each call audio in the call record library to form a string of illegal speech, and transfer the string of illegal speech into the memory;

Using the illegal speech keyword as a query condition, retrieve the illegal speech string to detect whether there is a call audio associated with the illegal speech keyword in the call record library.