WO2023173554A1

WO2023173554A1 - Inappropriate agent language identification method and apparatus, electronic device and storage medium

Info

Publication number: WO2023173554A1
Application number: PCT/CN2022/090717
Authority: WO
Inventors: 王彦; 成逸吉; 马骏; 王少军
Original assignee: 平安科技（深圳）有限公司
Priority date: 2022-03-15
Filing date: 2022-04-29
Publication date: 2023-09-21
Also published as: CN114610887A

Abstract

The present application belongs to the field of artificial intelligence, and provided are an inappropriate agent language identification method and apparatus, an electronic device and a storage medium. The method comprises: acquiring agent language information for training, splitting the agent language information for training to obtain agent side single sentences, and performing text preprocessing on the agent side single sentences; on the basis of the preprocessed agent side single sentences, training by means of a three-layer BERT model so as to obtain an inappropriate agent language identification model; inputting agent language information to be identified into the inappropriate agent identification model for reasoning so as to obtain probability distribution of target classification, wherein the agent language information to be identified is agent language information in a credit card service and sales scenario; and determining inappropriate language in the agent language information to be identified according to the probability distribution of the target classification. The embodiments of the present application are optimized from two levels of model and data, are applied to inappropriate language identification, can improve the efficiency of identifying inappropriate language of quality inspection personnel in a business scenario, and have a certain popularization value.

Description

Methods, devices, electronic equipment, and storage media for identifying illegal speech techniques by agents

This application requires the priority of the Chinese patent application submitted to the China Patent Office on March 15, 2022, with the application number 202210252453.8, and the invention name is "Method, device, electronic equipment, storage medium for identifying illegal speech skills by agents", and its entire content incorporated herein by reference.

Technical field

This application belongs to the field of artificial intelligence technology, and in particular relates to a method, device, electronic equipment, and storage medium for identifying illegal speech techniques by agents.

Background technique

Existing scenarios for identifying customer service agents' illegal speech skills, such as identification of agent violation skills in credit card quality inspection service and sales business, mainly rely on manual quality inspection and keyword matching. Faced with a large amount of complicated dialogue information, quality inspection personnel need to listen to the recordings sentence by sentence and screen for illegal speech techniques, which consumes a lot of manpower and time and is inefficient. If the identification is based on the keyword matching method, it is necessary to rely on manual experience to summarize the keywords. On the one hand, incomplete keyword summary will lead to the omission of illegal words, and on the other hand, it is easy to ignore the semantic information and cause misjudgment.

technical problem

The following are the technical problems of the existing technology that the inventor is aware of: pre-training models for natural language processing in related technologies mainly include BERT (Bidirectional Encoder Representation) of various sizes. from Transformers), and some variant models of BERT, such as ALBERT, RoBERTa, ELECTRA, etc. Some of these models have problems with too large number of parameters, and the training and inference speed is too slow. Although some use parameter sharing to reduce the number of parameters, the effect is not good in the application scenario of illegal speech recognition.

Technical solutions

In the first aspect, embodiments of this application provide a method for identifying illegal speech by agents, including:

Obtain the agent's speech information for training, split the agent's speech information for training into single sentences on the agent's side, and perform text preprocessing on the single sentences on the agent's side;

Based on the preprocessed agent-side single sentences, a three-layer BERT model is used for training to obtain an agent illegal speech recognition model;

The agent's speech information to be identified is input into the agent's illegal speech recognition model for inference to obtain the probability distribution of the target classification, where the agent's speech information to be identified is the agent's speech information in the credit card service sales scenario. ;

The illegal speech in the agent speech information to be identified is determined according to the probability distribution of the target classification.

In the second aspect, embodiments of the present application provide a device for identifying illegal speech techniques by agents, including:

A preprocessing unit, used to obtain the agent's speech information for training, split the agent's speech information for training into single sentences on the agent's side, and perform text preprocessing on the single sentences on the agent's side;

The training unit is used to train using the three-layer BERT model based on the pre-processed agent-side single sentences to obtain an agent illegal speech recognition model;

The processing unit is used to input the agent speech information to be identified into the agent illegal speech recognition model for inference and obtain the probability distribution of the target classification, wherein the agent speech information to be identified is in the credit card service sales scenario. Information on agent speaking skills;

An identification unit, configured to determine illegal speech in the agent speech information to be identified according to the probability distribution of the target classification.

In a third aspect, embodiments of the present application provide an electronic device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, a A method for identifying agents' illegal speech skills, wherein the method for identifying agents' illegal speech skills includes:

In the fourth aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program, the computer program being used to execute a method for identifying an agent's illegal speech, wherein the method for identifying an agent's illegal speech includes: :

beneficial effects

The embodiments of the present application at least have the following beneficial effects: The pre-trained three-layer BERT model of the embodiments of the present application, by integrating different information extraction degrees and text semantics of different time spans, can enhance the Performance of the three-layer BERT model. The three-layer BERT model proposed in the embodiment of this application is optimized from both the model and data levels, and is applied to the identification of illegal speech techniques. It can improve the efficiency of quality inspection personnel in identifying illegal speech techniques in business scenarios, and has certain promotion value.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the application. The objectives and other advantages of the application may be realized and obtained by the structure particularly pointed out in the specification, claims and appended drawings.

Description of the drawings

The drawings are used to provide a further understanding of the technical solution of the present application and constitute a part of the specification. They are used to explain the technical solution of the present application together with the embodiments of the present application and do not constitute a limitation of the technical solution of the present application.

Figure 1 is a flow chart of a method for identifying illegal speech by agents provided by an embodiment of the present application;

Figure 2 is a flow chart for processing agent speech information for training provided by another embodiment of the present application;

Figure 3 is a flow chart of desensitization and random masking processing provided by another embodiment of the present application;

Figure 4 is a flow chart of a specific processing method for random masking provided by another embodiment of the present application;

Figure 5 is the structure of a three-layer BERT model provided by another embodiment of the present application;

Figure 6 is a flow chart for inputting agent speech information to be recognized and outputting a probability distribution of target classification provided by another embodiment of the present application;

Figure 7 is a flow chart for matching and identifying illegal words provided by another embodiment of the present application;

Figure 8 is a flow chart for optimizing the three-layer BERT model provided by another embodiment of the present application;

Figure 9 is a structural diagram of an agent illegal speech recognition device provided by another embodiment of the present application;

Figure 10 is a device diagram of an electronic device provided by another embodiment of the present application.

Embodiments of the invention

In order to make the purpose, technical solutions and advantages of the present application more clear, the present application will be further described in detail below with reference to the drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application and are not used to limit the present application.

It should be noted that although the functional modules are divided in the device schematic diagram and the logical sequence is shown in the flow chart, in some cases, the modules can be divided into different modules in the device or the order in the flow chart can be executed. The steps shown or described. The terms "first", "second", etc. in the description, claims or the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence.

This application provides a method, device, electronic device, and storage medium for identifying illegal agent speech skills. The method includes: obtaining agent speech skills information for training, splitting the agent speech skills information for training into single sentences on the agent side and combining them. Perform text preprocessing on the single sentence on the agent side; use the three-layer BERT model for training based on the preprocessed single sentence on the agent side to obtain an agent violation speech recognition model; input the agent speech information to be identified into the agent violation The speech recognition model performs inference to obtain the probability distribution of the target classification, in which the agent speech information to be identified is the agent speech information in the credit card service sales scenario; the agent speech to be identified is determined based on the probability distribution of the target classification. Illegal rhetoric in messages. The pre-trained three-layer BERT model is used to reason about agents' illegal words, thereby improving the efficiency of quality inspection personnel in identifying illegal words in business scenarios.

The embodiments of this application can obtain and process relevant data based on artificial intelligence technology. Among them, artificial intelligence (Artificial Intelligence (AI) is a new technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence; artificial intelligence is a branch of computer science, and artificial intelligence attempts to understand the nature of intelligence Essentially, and produce a new type of intelligent machine that can respond in a manner similar to human intelligence. Research in this field includes robotics, language recognition, image recognition, natural language processing, and expert systems. Artificial intelligence can simulate the information process of human consciousness and thinking. Artificial intelligence is also a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.

Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction devices, mechatronics and other technologies. Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometric technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

The terminal mentioned in the embodiment of this application may be a smartphone, a tablet computer, a notebook computer, a desktop computer, a vehicle-mounted computer, a smart home, a wearable electronic device, VR (Virtual Reality, virtual reality)/AR (Augmented Reality, augmented reality) ) equipment, etc.; the server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or it can provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, and network services. , cloud communications, middleware services, domain name services, security services, content delivery network (Content Delivery Network, CDN), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms, etc.

It should be noted that the data in the embodiments of this application can be stored in a server. The server can be an independent server, or it can provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, and intermediate servers. Cloud servers for basic cloud computing services such as software services, domain name services, security services, content distribution networks, and big data and artificial intelligence platforms.

Natural language processing is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable effective communication between humans and computers using natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Therefore, research in this field will involve natural language, that is, the language that people use every day, so it is closely related to the study of linguistics. Natural language processing technology usually includes text processing, semantic understanding, machine translation, robot question answering, knowledge graph and other technologies.

Machine Learning (ML) is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in studying how computers can simulate or implement human learning behavior to acquire new knowledge or skills, and reorganize existing knowledge structures to continuously improve their performance. Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent. Its applications cover all fields of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other technologies.

As shown in Figure 1, Figure 1 is a flow chart of a method for identifying illegal speech by an agent provided by an embodiment of the present application. The method for identifying illegal speech by an agent includes but is not limited to the following steps:

Step S100, obtain the agent speaking information for training, split the agent speaking information for training into single sentences on the agent side, and perform text preprocessing on the single sentences on the agent side;

Step S200: Based on the preprocessed single sentence on the agent side, use the three-layer BERT model for training to obtain an agent illegal speech recognition model;

Step S300, input the agent speech information to be identified into the agent violation speech recognition model for inference, and obtain the probability distribution of the target classification, where the agent speech information to be identified is the agent speech information in the credit card service sales scenario;

Step S400: Determine the illegal speech in the agent speech information to be identified according to the probability distribution of the target classification.

The embodiment of this application uses a three-layer BERT model to identify agent violations. First, a three-layer BERT model is constructed, and the marked training set, that is, the agent speech information used for training, is used to train the three-layer BERT model, and the agent violation is obtained. Then input the agent's speech information to be identified into the agent's illegal speech recognition model to obtain the probability distribution of the target classification. This probability distribution indicates that under the corresponding classification, the agent's speech information belongs to the illegal speech. probability, and finally determine the illegal speech in the agent speech information based on the probability distribution of the target classification.

BERT is a pre-trained language representation model using MLM (masked language model) to pre-train bidirectional Transformers to generate deep bi-directional language representations. After pre-training, you only need to add an additional output layer for fine-tuning, and you can achieve state-of-the-art in a variety of downstream tasks. performance of-the-art. This process does not require task-specific structural modifications to BERT, so it ultimately generates a deep bidirectional language representation that can integrate left and right contextual information. However, if the traditional BERT model or the model evolved from the BERT model under related technologies is directly applied to the agent's illegal speech, there will be a problem that the parameter amount is too large or the parameter amount is not large but the effect is not good. Therefore, the embodiments of this application are very effective for the traditional BERT model. The BERT model has been improved. The specific structure is shown in Figure 5. The improved three-layer BERT model includes:

The first layer BERT model, the second layer BERT model, the third layer BERT model, the fully connected layer, the convolution layer and the classification layer, the first layer BERT model, the second layer BERT model and the third layer BERT model are stacked, the first The hidden layers of the first-layer BERT model, the second-layer BERT model and the third-layer BERT model respectively output a CLS vector to the fully connected layer. The fully connected layer, convolutional layer and classification layer are connected in sequence. The output of the classification layer is used as the three-layer BERT model. The output; each CLS vector represents the information contained in each sentence after each preprocessed agent-side single sentence is extracted through one of the layers of the BERT model.

The fully connected layer is used to splice three CLS vectors and then output a comprehensive longitudinal dimension text information vector; the convolution layer is used to perform convolution operations on the comprehensive longitudinal dimension text information vector through multiple convolution kernels, and output a comprehensive The horizontal span text information vector is sent to the classification layer. After receiving the comprehensive horizontal span text information vector, the classification layer obtains the probability distribution of the required classification through classification processing.

Specifically, referring to Figure 2, in the above-mentioned step S100, splitting the agent speech information for training into single sentences on the agent side can be achieved through the following steps:

Step S110, label the agent speech information for training;

Step S120: Split the agent's speech information into sentences according to the annotations, and split the speech information into multiple single speech sentences;

Step S130: Perform text conversion on multiple voice sentences to obtain seat-side sentences expressed in text.

Agent speech information is speech data. When building a training set, the agent speech needs to be annotated to divide multiple sentences in the speech data to facilitate the construction of a training set in the form of a single sentence. This is because the basis of recognition based on the BERT model is The MLM operation is performed according to the sentence. Multiple voice sentences can be obtained through annotation. At the same time, because the MLM mechanism needs to randomly mask the sentences, each voice sentence needs to be converted into text information, that is, each voice sentence is converted into A single sentence on the seat side. It can be understood that the BERT model is proposed based on English. English sentences are formed by multiple English words separated by spaces. Therefore, the three-layer BERT model in the embodiment of the present application can undoubtedly be applied to the scenario of English speaking skills, but Chinese sentences It is composed of multiple consecutive Chinese characters, and "word" is also composed of several Chinese characters, that is, English is composed of phonetics (character sounds), and Chinese is composed of meaning (glyphs); therefore, when applying the BERT model for MLM processing, it is necessary to Carry out certain processing. There are several corresponding processing methods. For example, common methods for Chinese word segmentation include classic mechanical segmentation methods (such as forward/reverse maximum matching, two-way maximum matching, etc.), and statistical segmentation methods with better effects (such as Hidden Markov Markov Model, HMM), conditional random field (conditional random field (CRF), as well as RNN (Recurrent Neural Network), LSTM (Long Short-Term Memory) and other methods that have emerged in recent years using deep neural networks, which are not limited here.

Among them, the annotation method can label the agent speech information for training based on the speech differences and pause rhythm in the agent speech information for training.

Of course, after annotation, the single sentences on the agent side need to be preprocessed. For agent speech scenarios, the sentences often contain some sensitive information, such as home address and other personal identity information. Therefore, in order to avoid the leakage of user information, the single sentences on the agent side need to be preprocessed. Desensitization processing, and random masking processing to adapt to the training needs of the three-layer BERT model. Specifically, referring to Figure 3, desensitization processing and random masking processing include the following steps:

Step S140: Desensitize the sensitive words in the single sentence on the seat side according to the preset sensitive word library or the preset sensitive word judgment rules;

Step S150: Perform random masking processing on the desensitized agent-side single sentences to obtain preprocessed agent-side single sentences.

Among them, desensitization processing can replace sensitive words in single sentences on the agent side with preset characters, and random masking processing can include the following steps, as shown in Figure 4:

Step S151: Randomly select 15% of the words in the desensitized single sentence on the seat side;

Step S152, replace 80% of the selected words with [mask], keep 10% unchanged, and replace the remaining 10% with another random word;

Step S153: splice [CLS] characters at the starting position of the desensitized single sentence on the agent side.

Random mask processing is the reason why BERT is not limited by one-way language models. To put it simply, the token in each training sequence is randomly replaced with a mask token ([mask]) with a probability of 15%, and then the original word at the [mask] position is predicted. First, in each training sequence, a certain token position is randomly selected for prediction with a probability of 15%, and then if the i-th token is selected, it will be replaced with one of three tokens ([mask], random token and original token), accounting for 80%, 10%, and 10% respectively. This strategy makes BERT no longer only sensitive to [mask], but is sensitive to all tokens, so that it can extract the representation information of any token.

In short, after the agent speech information for training is annotated and divided, a training set applied to the three-layer BERT model is obtained. The training set is input into the three-layer BERT model for training, and the agent speech recognition model is obtained.

It can be understood that in one embodiment, the convolution layer of the three-layer BERT model includes three convolution kernels, namely the first convolution kernel, the second convolution kernel and the third convolution kernel. Refer to Figure 5 shows that the sizes of the first convolution kernel, the second convolution kernel and the third convolution kernel are different from each other. The size of the first convolution kernel is 2, the size of the second convolution kernel is 3, and the size of the third convolution kernel is 3. The size of is 4. In Figure 5, the first layer BERT model, the second layer BERT model and the third layer BERT model are represented by encoder1, encoder2 and encoder3 respectively. encoder1, encoder2 and encoder3 respectively output CLS1, CLS2 and CLS3 vectors to the fully connected The Liner layer is spliced to obtain a comprehensive longitudinal dimension text information vector. After the comprehensive longitudinal dimension text information vector is processed by the three convolution kernels of the convolution layer, the spliced output is input to the classification layer softmax. The activation function of the classification layer can be the tanh function.

Referring to Figure 6, in the above-mentioned step S300, the agent speech information to be identified is input into the agent violation speech recognition model for reasoning, and the probability distribution of the target classification is obtained. This can be achieved by referring to the following steps:

Step S310, convert the agent speech information to be recognized into text information;

Step S320, perform desensitization processing on the text information;

Step S330: Input the desensitized text information into the agent illegal speech recognition model to obtain probability distributions of several target categories.

For the agent speech information to be recognized, the process of converting speech to text is also used, and the sensitive words/sensitive words in the text information are desensitized after being converted into text. That is, compared with the agent speech information used for training, no need Annotation is performed in advance to divide single sentences; after the above-mentioned desensitization process, the text information is input into the agent's illegal speech recognition model to obtain the probability distribution of the required classification (that is, the target classification, the type of classification can be preset).

When the agent's illegal speech recognition model outputs the probability distribution of the required classification, in step S400, how to output the recognition result of the agent's illegal speech according to the probability distribution of the target classification can be matched according to the target classification. For example, referring to Figure 7, by Follow these steps to achieve:

Step S410, match the target classification with the preset classification, and use the successfully matched target classification as the classification to be recognized;

Step S420: When the probability value corresponding to the category to be identified exceeds the probability threshold of the corresponding preset category, it is determined that the agent speech information to be identified contains the illegal speech of the category to be identified.

Through natural language processing (NLP, Natural The training model of Language Processing (Language Processing) will eventually be matched with semantic and other types. The default classification in the embodiment of this application is a pre-set classification of semantic types of agent violation words. The pre-training model (three-layer BERT model ) is matched with the preset classification to determine whether the corresponding target classification under the current probability distribution belongs to the classification corresponding to the illegal speech technique. On the other hand, when there are several categories in the target category that match the preset categories, the corresponding probabilities of these target categories (i.e., the categories to be recognized) are further judged. If the probability is higher than the set probability threshold, it is considered that there is a target in the current recording. Illegal talk.

It can be understood that the three-layer BERT model in the embodiment of this application cannot provide 100% accurate results. For samples that the three-layer BERT model cannot accurately judge and are difficult to distinguish, the three-layer BERT model can perform loop learning. Specifically, referring to Figure 8, the optimization process of the embodiment of the present application includes:

Step S510: Classify the targets that failed to match and determine the corresponding agent speech information;

Step S520: Re-label the agent speech information corresponding to the failed target classification according to the active learning and edge sampling strategies, and input it into the agent violation speech technique recognition model for retraining.

Active learning refers to using machine learning methods to obtain sample data that are “difficult” to classify, allowing humans to reconfirm and review it, and then reuse the manually labeled data using a supervised learning model or a semi-supervised learning model. Carry out training, gradually improve the effect of the model, and integrate human experience into the machine learning model. That is, select a batch of sample data that is easily misclassified, let humans label it, and then let the machine learn the model training process. Margin sampling strategy is a metric learning method that introduces the idea of difficult sample sampling. It has the lowest confidence level, but it considers comparing the category with the highest probability with the second largest category, that is, comparing whether the classification has a higher probability. Data with large advantages and smaller advantages will be used for labeling. The edge sampling method and the minimum confidence method are equivalent in binary classification problems. Active learning scenarios involve evaluating the informativeness of unlabeled instances, and the simplest and most commonly used query framework is uncertainty sampling. In this framework, the active learner queries the instances that are most uncertain how to label them, and when using a probabilistic model for binary classification, uncertainty sampling simply queries the instances with a positive posterior probability closest to 0.5. This application uses edge sampling to process, and selects the sample with the smallest probability difference between the largest and second largest predicted by the model for judgment.

Through retraining, the recognition ability of the model can be enhanced, thereby continuously optimizing the recognition results and improving the recognition accuracy.

The Baseline model used in this application is a pre-trained three-layer BERT model. As shown in Figure 5, the BERT model is improved as follows:

1. Take out the CLS vector of each hidden layer of the three-layer BERT model. Each CLS vector represents the information contained in each sentence after each layer of BERT is used to extract information;

2. Splice the three CLS vectors and send them to the fully connected layer to obtain a comprehensive vertical dimension text information vector;

3. Use convolution kernels of sizes 2, 3, and 4 to perform convolution operations on the above vectors, and splice the three output vectors to obtain a comprehensive horizontal span text information vector;

4. Send the above vector to the classification layer, where the activation function is the tanh function to obtain the probability distribution of the required classification.

The experimental task is the identification of agent irregularities in credit card sales scenarios, which is converted into a two-classification task. The training process and optimization process of the agent illegal speech recognition model are as follows:

1. After obtaining the calls marked with violations, split them into single sentences on the agent side. After single sentence annotation and desensitization mark replacement, use the improved pre-trained three-layer BERT model for training. The loss function is the cross entropy function to obtain the agent violation Speech recognition model;

2. The model performs inference on the data to be identified and obtains the classification probability distribution. The idea of active learning and edge sampling strategy are used to re-label samples that are difficult to distinguish by the model. Train again to enhance the model’s recognition capabilities.

The embodiment of this application uses the improved pre-trained three-layer BERT model to integrate different information extraction degrees and text semantics of different time spans to obtain the enhanced performance of the three-layer BERT model; at the same time, based on the improved three-layer BERT model, it is proposed A set of illegal words identification process improves the efficiency of quality inspection personnel in identifying illegal words in business scenarios, and has certain promotion value.

In addition, referring to Figure 9, an embodiment of the present application provides a device for identifying illegal speech skills by agents. The device includes:

The preprocessing unit is used to obtain the agent speech information for training, split the agent speech information for training into single sentences on the agent side, and perform text preprocessing on the single sentences on the agent side;

The processing unit is used to input the agent speech information to be identified into the agent violation speech recognition model for inference and obtain the probability distribution of the target classification. The agent speech information to be identified is the agent speech in the credit card service sales scenario. information;

The identification unit is used to determine the illegal speech in the agent speech information to be identified according to the probability distribution of the target classification.

The device for identifying illegal agent speech skills in the embodiment of the present application identifies the agent's illegal speech skills through a three-layer BERT model. First, a three-layer BERT model is constructed, and the labeled training set, that is, the agent speech skills information used for training, is used to identify the three-layer BERT model. The model is trained to obtain an agent's illegal speech recognition model, and then the agent's speech information to be identified is input into the agent's illegal speech recognition model, thereby obtaining the probability distribution of the target classification. This probability distribution represents the agent's speech under the corresponding classification. The probability that the technical information belongs to illegal speaking skills is determined, and finally the illegal speaking skills in the agent's speaking skills are judged based on the probability distribution of the target classification.

The improved pre-trained three-layer BERT model is used to combine text semantics with different information extraction levels and different time spans to obtain the enhanced performance of the three-layer BERT model. At the same time, based on the improved three-layer BERT model, a set of illegal words are proposed The technique identification process improves the efficiency of quality inspection personnel in identifying illegal techniques in business scenarios, and has certain promotion value.

In addition, referring to FIG. 10 , an embodiment of the present application also provides an electronic device. The electronic device 2000 includes: a memory 2002, a processor 2001, and a computer program stored on the memory 2002 and executable on the processor 2001.

The processor 2001 and the memory 2002 may be connected through a bus or other means.

The non-transitory software programs and instructions required to implement the agent illegal speech identification method in the above embodiment are stored in the memory 2002. When executed by the processor 2001, the agent illegal speech technique identification applied to the device in the above embodiment is executed. The method, for example, performs the above-described method steps S100 to S400 in Fig. 1, method steps S110 to S130 in Fig. 2, method steps S140 to S150 in Fig. 3, and method steps S151 to S151 in Fig. 4. S153, method steps S310 to S330 in FIG. 6 , method steps S410 to S420 in FIG. 7 , and method steps S510 to S520 in FIG. 8 .

Electronic equipment also includes components such as input units, display units, audio processing circuits, and power supplies. Those skilled in the art can understand that this embodiment does not uniquely limit the structure of the electronic device, and may include more or fewer components than in this embodiment, or combine certain components, or arrange different components.

The input unit may be used to receive input numeric or character information, and to generate key signal input related to computer settings and function control. Specifically, the input unit may include a touch panel and other input devices. A touch panel, also known as a touch screen, can collect touch operations on or near it (such as operations on or near the touch panel using a finger, stylus, or any suitable object or accessory), and based on The preset program drives the corresponding connected device. Optionally, the touch panel may include two parts: a touch detection device and a touch controller. Among them, the touch detection device detects the touch orientation, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact point coordinates, and then sends it to the processing processor and can receive commands from the processor and execute them. In addition, touch panels can be implemented in various categories such as resistive, capacitive, infrared and surface acoustic wave. In addition to the touch panel, the input unit may also include other input devices. Specifically, other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), trackball, mouse, joystick, etc.

The display unit may be used to display input information or provided information as well as various menus of the electronic device. The display unit may include a display panel. Optionally, the display panel may be configured in the form of a Liquid Crystal Display (LCD for short) or an Organic Light-Emitting Diode (OLED for short). Further, the touch panel can cover the display panel. When the touch panel detects a touch operation on or near the touch panel, it is sent to the processor to determine the type of the touch event. Then the processor performs operations on the display panel according to the type of the touch event. Provide corresponding visual output. Although the touch panel and the display panel are used as two independent components to implement the input and output functions of the electronic device, in some embodiments, the touch panel and the display panel can be integrated to implement the input and output functions of the electronic device. .

Audio processing circuitry provides an audio interface. The audio processing circuit can transmit the electrical signal converted from the received audio data to the speaker, which converts it into a sound signal and outputs it; on the other hand, the microphone converts the collected sound signal into an electrical signal, which is received and converted by the audio processing circuit. The audio data is then output to the processor for processing and then sent to, for example, another computer via a wireless circuit, or the audio data is output to a memory for further processing.

The device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separate, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The computer also includes a power supply (such as a battery) that supplies power to various components. Preferably, the power supply can be logically connected to the processor through a power management system, so that functions such as charging, discharging, and power consumption management can be implemented through the power management system.

In addition, an embodiment of the present application also provides a computer-readable storage medium, which stores a computer program. The computer program is executed by a processor or a controller, for example, by the above-mentioned electronic device embodiment. Execution by one of the processors can cause the above processor to execute the agent violation speech recognition method in the above embodiment, for example, execute the above-described method steps S100 to S400 in Figure 1 and method steps S110 to S110 in Figure 2 Step S130, method steps S140 to step S150 in Figure 3, method steps S151 to step S153 in Figure 4, method steps S310 to step S330 in Figure 6, method steps S410 to step S420 in Figure 7, and method steps S410 to step S420 in Figure 8 method steps S510 to S520. The computer-readable storage medium may be non-volatile or volatile.

Those of ordinary skill in the art can understand that all or some steps and devices in the methods disclosed above can be implemented as software, firmware, hardware, and appropriate combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, a digital signal processor, or a microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit . Such software may be distributed on computer-readable storage media, which may include computer storage media (or non-transitory storage media) and communication storage media (or transitory storage media). As is known to those of ordinary skill in the art, the term computer storage media includes volatile and nonvolatile media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. removable, removable and non-removable storage media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other storage medium used to store desired information and that can be accessed by a computer. Furthermore, as is known to those of ordinary skill in the art, communications storage media typically embody computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery storage media.

The present application may be used in a variety of general purpose or special purpose computer device environments or configurations. For example: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor devices, microprocessor-based devices, set-top boxes, programmable consumer electronics devices, network PCs, minicomputers, mainframe computers, including Distributed computing environment for any of the above devices or equipment, etc. The application may be described in the general context of computer programs, such as program modules, executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types. The present application may also be practiced in distributed computing environments where tasks are performed by remote processing devices connected through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.

The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functions and operations of possible implementations of devices, methods and computer program products according to various embodiments of the present application. Each block in the flow chart or block diagram may represent a module, program segment, or part of the code. The above module, program segment, or part of the code includes one or more programs for implementing specified logical functions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved. It will also be noted that each block in the block diagram or flowchart illustration, and combinations of blocks in the block diagram or flowchart illustration, can be implemented by special purpose hardware-based means for performing the specified functions or operations, or may be implemented by special purpose hardware-based means for performing the specified functions or operations. Achieved by a combination of specialized hardware and computer instructions.

The units involved in the embodiments of this application can be implemented in software or hardware, and the described units can also be provided in a processor. Among them, the names of these units do not constitute a limitation on the unit itself under certain circumstances.

It should be noted that although several modules or units of equipment for action execution are mentioned in the above detailed description, this division is not mandatory. In fact, according to the embodiments of the present application, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one module or unit described above may be further divided into being embodied by multiple modules or units.

Through the above description of the embodiments, those skilled in the art can easily understand that the example embodiments described here can be implemented by software, or can be implemented by software combined with necessary hardware. Therefore, the technical solution according to the embodiment of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , including several instructions to cause a computing device (which can be a personal computer, server, touch terminal, or network device, etc.) to execute the method according to the embodiment of the present application.

Other embodiments of the present application will be readily apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of this application that follow the general principles of this application and include common knowledge or customary technical means in the technical field that are not disclosed in this application. .

It is to be understood that the present application is not limited to the precise structures described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

The above is a detailed description of the preferred implementation of the present application, but the present application is not limited to the above-mentioned embodiments. Those skilled in the art can also make various equivalent modifications or substitutions without violating the spirit of the present application. Equivalent modifications or substitutions are included within the scope defined by the claims of this application.

Claims

A method for identifying agent irregularities, which includes:

Obtain the agent's speech information for training, split the agent's speech information for training into single sentences on the agent's side, and perform text preprocessing on the single sentences on the agent's side;

Based on the preprocessed agent-side single sentences, a three-layer BERT model is used for training to obtain an agent illegal speech recognition model;

The agent's speech information to be identified is input into the agent's illegal speech recognition model for inference to obtain the probability distribution of the target classification, where the agent's speech information to be identified is the agent's speech information in the credit card service sales scenario. ;

The illegal speech in the agent speech information to be identified is determined according to the probability distribution of the target classification.
The method for identifying illegal agent speaking skills according to claim 1, wherein said splitting the agent speaking skills information for training to obtain a single sentence on the agent's side includes:

Annotate the agent speech information used for training;

Split the sentences of the agent's speech information according to the annotations, and split them to obtain multiple single speech sentences;

Perform text conversion on the plurality of voice single sentences to obtain seat-side single sentences expressed in text.
The method for identifying illegal agent speaking skills according to claim 2, wherein the labeling of the agent speaking skills information for training includes:

The agent's speech information for training is annotated according to the speech differences and pause rhythm in the agent's speech information for training.
The method for identifying illegal speech skills of an agent according to claim 1, wherein the text preprocessing of the single sentence on the agent side includes:

Desensitize the sensitive words in the single sentence on the seat side according to the preset sensitive word library or the preset sensitive word judgment rules;

The desensitized single sentence on the seat side is randomly masked to obtain the preprocessed single sentence on the seat side.
The method for identifying agent illegal speech skills according to claim 1, wherein the three-layer BERT model includes a first-layer BERT model, a second-layer BERT model, a third-layer BERT model, a fully connected layer, a convolution layer and a classification layer. layer, the first layer BERT model, the second layer BERT model and the third layer BERT model are stacked, the first layer BERT model, the second layer BERT model and the third layer BERT model The hidden layers respectively output a CLS vector to the fully connected layer, the fully connected layer, the convolution layer and the classification layer are connected in sequence, and the output of the classification layer is used as the output of the three-layer BERT model; Each of the CLS vectors represents the information contained in each sentence after each preprocessed agent-side single sentence is extracted through one of the layers of the BERT model.
The method for identifying agent's illegal speech skills according to claim 1, wherein said inputting the agent's speech skill information to be identified into said agent's illegal speech skills identification model for inference to obtain the probability distribution of target classification includes:

Convert the agent speech information to be recognized into text information;

Desensitize the text information;

The desensitized text information is input into the agent illegal speech recognition model to obtain probability distributions of several target categories.
The method for identifying illegal speech skills of agents according to claim 1, wherein determining the illegal speech skills in the agent speech skills to be identified according to the probability distribution of the target classification includes:

Match the target category with a preset category, and use the successfully matched target category as the category to be identified;

When the probability value corresponding to the category to be identified exceeds the probability threshold of the corresponding preset category, it is determined that the agent speech information to be identified contains the illegal speech of the category to be identified.
A device for identifying illegal speech techniques by agents, which includes:

A preprocessing unit, used to obtain the agent's speech information for training, split the agent's speech information for training into single sentences on the agent's side, and perform text preprocessing on the single sentences on the agent's side;

The training unit is used to train using the three-layer BERT model based on the pre-processed agent-side single sentences to obtain an agent illegal speech recognition model;

The processing unit is used to input the agent speech information to be identified into the agent illegal speech recognition model for inference and obtain the probability distribution of the target classification, wherein the agent speech information to be identified is in the credit card service sales scenario. Information on agent speaking skills;

An identification unit, configured to determine illegal speech in the agent speech information to be identified according to the probability distribution of the target classification.
An electronic device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, it implements a method for identifying illegal speech skills by agents, wherein , the method for identifying agents’ illegal speech skills includes:

Obtain the agent's speech information for training, split the agent's speech information for training into single sentences on the agent's side, and perform text preprocessing on the single sentences on the agent's side;

Based on the preprocessed agent-side single sentences, a three-layer BERT model is used for training to obtain an agent illegal speech recognition model;

The agent's speech information to be identified is input into the agent's illegal speech recognition model for inference to obtain the probability distribution of the target classification, where the agent's speech information to be identified is the agent's speech information in the credit card service sales scenario. ;

The illegal speech in the agent speech information to be identified is determined according to the probability distribution of the target classification.
The electronic device according to claim 9, wherein said splitting the agent's speech information for training to obtain agent-side single sentences includes:

Annotate the agent speech information used for training;

Split the sentences of the agent's speech information according to the annotations, and split them to obtain multiple single speech sentences;

Perform text conversion on the plurality of voice single sentences to obtain seat-side single sentences expressed in text.
The electronic device according to claim 10, wherein the annotating the agent speech information for training includes:

The agent's speech information for training is annotated according to the speech differences and pause rhythm in the agent's speech information for training.
The electronic device according to claim 9, wherein the text preprocessing of the agent-side single sentence includes:

Desensitize the sensitive words in the single sentence on the seat side according to the preset sensitive word library or the preset sensitive word judgment rules;

The desensitized single sentence on the seat side is randomly masked to obtain the preprocessed single sentence on the seat side.
The electronic device according to claim 9, wherein the three-layer BERT model includes a first-layer BERT model, a second-layer BERT model, a third-layer BERT model, a fully connected layer, a convolution layer and a classification layer, and The first layer BERT model, the second layer BERT model and the third layer BERT model are stacked, and the hidden layers of the first layer BERT model, the second layer BERT model and the third layer BERT model are respectively Output a CLS vector to the fully connected layer, the fully connected layer, the convolution layer and the classification layer are connected in sequence, and the output of the classification layer is used as the output of the three-layer BERT model; where each The CLS vector represents the information contained in each sentence after each preprocessed agent-side single sentence is extracted through one of the layers of the BERT model.
The electronic device according to claim 9, wherein the input of the agent's speech skills to be identified to the agent's illegal speech skills identification model for inference to obtain a probability distribution of target classification includes:

Convert the agent speech information to be recognized into text information;

Desensitize the text information;

The desensitized text information is input into the agent illegal speech recognition model to obtain probability distributions of several target categories.
A computer-readable storage medium stores a computer program, wherein the computer program is used to execute a method for identifying an agent's illegal speech skills, wherein the method for identifying an agent's illegal speech skills includes:

Obtain the agent's speech information for training, split the agent's speech information for training into single sentences on the agent's side, and perform text preprocessing on the single sentences on the agent's side;

Based on the preprocessed agent-side single sentences, a three-layer BERT model is used for training to obtain an agent illegal speech recognition model;

The agent's speech information to be identified is input into the agent's illegal speech recognition model for inference to obtain the probability distribution of the target classification, where the agent's speech information to be identified is the agent's speech information in the credit card service sales scenario. ;

The illegal speech in the agent speech information to be identified is determined according to the probability distribution of the target classification.
The computer-readable storage medium according to claim 15, wherein said splitting the agent's speech information for training to obtain agent-side single sentences includes:

Annotate the agent speech information used for training;

Split the sentences of the agent's speech information according to the annotations, and split them to obtain multiple single speech sentences;

Perform text conversion on the plurality of voice single sentences to obtain seat-side single sentences expressed in text.
The computer-readable storage medium according to claim 16, wherein the annotating the agent speech information for training includes:

The agent's speech information for training is annotated according to the speech differences and pause rhythm in the agent's speech information for training.
The computer-readable storage medium according to claim 15, wherein the text preprocessing of the single sentence on the agent side includes:

Desensitize the sensitive words in the single sentence on the seat side according to the preset sensitive word library or the preset sensitive word judgment rules;

The desensitized single sentence on the seat side is randomly masked to obtain the preprocessed single sentence on the seat side.
The computer-readable storage medium according to claim 15, wherein the three-layer BERT model includes a first-layer BERT model, a second-layer BERT model, a third-layer BERT model, a fully connected layer, a convolution layer and a classification layer , the first layer BERT model, the second layer BERT model and the third layer BERT model are stacked, the first layer BERT model, the second layer BERT model and the third layer BERT model are The hidden layer respectively outputs a CLS vector to the fully connected layer, the fully connected layer, the convolution layer and the classification layer are connected in sequence, and the output of the classification layer is used as the output of the three-layer BERT model; where Each CLS vector represents the information contained in each sentence after the preprocessed single sentence on the agent side is extracted through one of the layers of the BERT model.
The computer-readable storage medium according to claim 15, wherein the input of the agent's speech skills to be identified to the agent's illegal speech skills identification model for inference to obtain the probability distribution of the target classification includes:

Convert the agent speech information to be recognized into text information;

Desensitize the text information;

The desensitized text information is input into the agent illegal speech recognition model to obtain probability distributions of several target categories.