WO2023173554A1 - Inappropriate agent language identification method and apparatus, electronic device and storage medium - Google Patents

Inappropriate agent language identification method and apparatus, electronic device and storage medium Download PDF

Info

Publication number
WO2023173554A1
WO2023173554A1 PCT/CN2022/090717 CN2022090717W WO2023173554A1 WO 2023173554 A1 WO2023173554 A1 WO 2023173554A1 CN 2022090717 W CN2022090717 W CN 2022090717W WO 2023173554 A1 WO2023173554 A1 WO 2023173554A1
Authority
WO
WIPO (PCT)
Prior art keywords
agent
speech
layer
information
training
Prior art date
Application number
PCT/CN2022/090717
Other languages
French (fr)
Chinese (zh)
Inventor
王彦
成逸吉
马骏
王少军
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023173554A1 publication Critical patent/WO2023173554A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0281Customer communication at a business location, e.g. providing product or service information, consulting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • This application belongs to the field of artificial intelligence technology, and in particular relates to a method, device, electronic equipment, and storage medium for identifying illegal speech techniques by agents.
  • pre-training models for natural language processing in related technologies mainly include BERT (Bidirectional Encoder Representation) of various sizes. from Transformers), and some variant models of BERT, such as ALBERT, RoBERTa, ELECTRA, etc. Some of these models have problems with too large number of parameters, and the training and inference speed is too slow. Although some use parameter sharing to reduce the number of parameters, the effect is not good in the application scenario of illegal speech recognition.
  • BERT Bidirectional Encoder Representation
  • ALBERT RoBERTa
  • ELECTRA ELECTRA
  • embodiments of this application provide a method for identifying illegal speech by agents, including:
  • agent's speech information for training split the agent's speech information for training into single sentences on the agent's side, and perform text preprocessing on the single sentences on the agent's side;
  • a three-layer BERT model is used for training to obtain an agent illegal speech recognition model
  • the agent's speech information to be identified is input into the agent's illegal speech recognition model for inference to obtain the probability distribution of the target classification, where the agent's speech information to be identified is the agent's speech information in the credit card service sales scenario. ;
  • the illegal speech in the agent speech information to be identified is determined according to the probability distribution of the target classification.
  • embodiments of the present application provide a device for identifying illegal speech techniques by agents, including:
  • a preprocessing unit used to obtain the agent's speech information for training, split the agent's speech information for training into single sentences on the agent's side, and perform text preprocessing on the single sentences on the agent's side;
  • the training unit is used to train using the three-layer BERT model based on the pre-processed agent-side single sentences to obtain an agent illegal speech recognition model
  • the processing unit is used to input the agent speech information to be identified into the agent illegal speech recognition model for inference and obtain the probability distribution of the target classification, wherein the agent speech information to be identified is in the credit card service sales scenario.
  • An identification unit configured to determine illegal speech in the agent speech information to be identified according to the probability distribution of the target classification.
  • embodiments of the present application provide an electronic device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • a A method for identifying agents' illegal speech skills includes:
  • agent's speech information for training split the agent's speech information for training into single sentences on the agent's side, and perform text preprocessing on the single sentences on the agent's side;
  • a three-layer BERT model is used for training to obtain an agent illegal speech recognition model
  • the agent's speech information to be identified is input into the agent's illegal speech recognition model for inference to obtain the probability distribution of the target classification, where the agent's speech information to be identified is the agent's speech information in the credit card service sales scenario. ;
  • the illegal speech in the agent speech information to be identified is determined according to the probability distribution of the target classification.
  • embodiments of the present application provide a computer-readable storage medium storing a computer program, the computer program being used to execute a method for identifying an agent's illegal speech, wherein the method for identifying an agent's illegal speech includes: :
  • agent's speech information for training split the agent's speech information for training into single sentences on the agent's side, and perform text preprocessing on the single sentences on the agent's side;
  • a three-layer BERT model is used for training to obtain an agent illegal speech recognition model
  • the agent's speech information to be identified is input into the agent's illegal speech recognition model for inference to obtain the probability distribution of the target classification, where the agent's speech information to be identified is the agent's speech information in the credit card service sales scenario. ;
  • the illegal speech in the agent speech information to be identified is determined according to the probability distribution of the target classification.
  • the embodiments of the present application at least have the following beneficial effects:
  • the pre-trained three-layer BERT model of the embodiments of the present application by integrating different information extraction degrees and text semantics of different time spans, can enhance the Performance of the three-layer BERT model.
  • the three-layer BERT model proposed in the embodiment of this application is optimized from both the model and data levels, and is applied to the identification of illegal speech techniques. It can improve the efficiency of quality inspection personnel in identifying illegal speech techniques in business scenarios, and has certain promotion value.
  • Figure 1 is a flow chart of a method for identifying illegal speech by agents provided by an embodiment of the present application
  • Figure 2 is a flow chart for processing agent speech information for training provided by another embodiment of the present application.
  • FIG. 3 is a flow chart of desensitization and random masking processing provided by another embodiment of the present application.
  • Figure 4 is a flow chart of a specific processing method for random masking provided by another embodiment of the present application.
  • Figure 5 is the structure of a three-layer BERT model provided by another embodiment of the present application.
  • Figure 6 is a flow chart for inputting agent speech information to be recognized and outputting a probability distribution of target classification provided by another embodiment of the present application;
  • Figure 7 is a flow chart for matching and identifying illegal words provided by another embodiment of the present application.
  • Figure 8 is a flow chart for optimizing the three-layer BERT model provided by another embodiment of the present application.
  • Figure 9 is a structural diagram of an agent illegal speech recognition device provided by another embodiment of the present application.
  • Figure 10 is a device diagram of an electronic device provided by another embodiment of the present application.
  • This application provides a method, device, electronic device, and storage medium for identifying illegal agent speech skills.
  • the method includes: obtaining agent speech skills information for training, splitting the agent speech skills information for training into single sentences on the agent side and combining them. Perform text preprocessing on the single sentence on the agent side; use the three-layer BERT model for training based on the preprocessed single sentence on the agent side to obtain an agent violation speech recognition model; input the agent speech information to be identified into the agent violation
  • the speech recognition model performs inference to obtain the probability distribution of the target classification, in which the agent speech information to be identified is the agent speech information in the credit card service sales scenario; the agent speech to be identified is determined based on the probability distribution of the target classification. Illegal rhetoric in messages.
  • the pre-trained three-layer BERT model is used to reason about agents' illegal words, thereby improving the efficiency of quality inspection personnel in identifying illegal words in business scenarios.
  • artificial intelligence is a new technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence; artificial intelligence is a branch of computer science, and artificial intelligence attempts to understand the nature of intelligence Essentially, and produce a new type of intelligent machine that can respond in a manner similar to human intelligence.
  • Research in this field includes robotics, language recognition, image recognition, natural language processing, and expert systems.
  • Artificial intelligence can simulate the information process of human consciousness and thinking.
  • Artificial intelligence is also a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction devices, mechatronics and other technologies.
  • Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometric technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • the terminal mentioned in the embodiment of this application may be a smartphone, a tablet computer, a notebook computer, a desktop computer, a vehicle-mounted computer, a smart home, a wearable electronic device, VR (Virtual Reality, virtual reality)/AR (Augmented Reality, augmented reality) ) equipment, etc.;
  • the server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or it can provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, and network services. , cloud communications, middleware services, domain name services, security services, content delivery network (Content Delivery Network, CDN), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms, etc.
  • the server can be an independent server, or it can provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, and intermediate servers.
  • Cloud servers for basic cloud computing services such as software services, domain name services, security services, content distribution networks, and big data and artificial intelligence platforms.
  • Natural language processing is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable effective communication between humans and computers using natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Therefore, research in this field will involve natural language, that is, the language that people use every day, so it is closely related to the study of linguistics. Natural language processing technology usually includes text processing, semantic understanding, machine translation, robot question answering, knowledge graph and other technologies.
  • Machine Learning is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in studying how computers can simulate or implement human learning behavior to acquire new knowledge or skills, and reorganize existing knowledge structures to continuously improve their performance.
  • Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent. Its applications cover all fields of artificial intelligence.
  • Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other technologies.
  • Figure 1 is a flow chart of a method for identifying illegal speech by an agent provided by an embodiment of the present application.
  • the method for identifying illegal speech by an agent includes but is not limited to the following steps:
  • Step S100 obtain the agent speaking information for training, split the agent speaking information for training into single sentences on the agent side, and perform text preprocessing on the single sentences on the agent side;
  • Step S200 Based on the preprocessed single sentence on the agent side, use the three-layer BERT model for training to obtain an agent illegal speech recognition model;
  • Step S300 input the agent speech information to be identified into the agent violation speech recognition model for inference, and obtain the probability distribution of the target classification, where the agent speech information to be identified is the agent speech information in the credit card service sales scenario;
  • Step S400 Determine the illegal speech in the agent speech information to be identified according to the probability distribution of the target classification.
  • the embodiment of this application uses a three-layer BERT model to identify agent violations.
  • a three-layer BERT model is constructed, and the marked training set, that is, the agent speech information used for training, is used to train the three-layer BERT model, and the agent violation is obtained.
  • the agent's speech information to be identified into the agent's illegal speech recognition model to obtain the probability distribution of the target classification.
  • This probability distribution indicates that under the corresponding classification, the agent's speech information belongs to the illegal speech. probability, and finally determine the illegal speech in the agent speech information based on the probability distribution of the target classification.
  • BERT is a pre-trained language representation model using MLM (masked language model) to pre-train bidirectional Transformers to generate deep bi-directional language representations. After pre-training, you only need to add an additional output layer for fine-tuning, and you can achieve state-of-the-art in a variety of downstream tasks. performance of-the-art. This process does not require task-specific structural modifications to BERT, so it ultimately generates a deep bidirectional language representation that can integrate left and right contextual information.
  • MLM mask language model
  • the first layer BERT model, the second layer BERT model, the third layer BERT model, the fully connected layer, the convolution layer and the classification layer, the first layer BERT model, the second layer BERT model and the third layer BERT model are stacked, the first The hidden layers of the first-layer BERT model, the second-layer BERT model and the third-layer BERT model respectively output a CLS vector to the fully connected layer.
  • the fully connected layer, convolutional layer and classification layer are connected in sequence.
  • the output of the classification layer is used as the three-layer BERT model.
  • each CLS vector represents the information contained in each sentence after each preprocessed agent-side single sentence is extracted through one of the layers of the BERT model.
  • the fully connected layer is used to splice three CLS vectors and then output a comprehensive longitudinal dimension text information vector;
  • the convolution layer is used to perform convolution operations on the comprehensive longitudinal dimension text information vector through multiple convolution kernels, and output a comprehensive
  • the horizontal span text information vector is sent to the classification layer.
  • the classification layer After receiving the comprehensive horizontal span text information vector, the classification layer obtains the probability distribution of the required classification through classification processing.
  • splitting the agent speech information for training into single sentences on the agent side can be achieved through the following steps:
  • Step S110 label the agent speech information for training
  • Step S120 Split the agent's speech information into sentences according to the annotations, and split the speech information into multiple single speech sentences;
  • Step S130 Perform text conversion on multiple voice sentences to obtain seat-side sentences expressed in text.
  • Agent speech information is speech data.
  • the agent speech needs to be annotated to divide multiple sentences in the speech data to facilitate the construction of a training set in the form of a single sentence.
  • the MLM operation is performed according to the sentence. Multiple voice sentences can be obtained through annotation.
  • each voice sentence needs to be converted into text information, that is, each voice sentence is converted into A single sentence on the seat side.
  • the BERT model is proposed based on English. English sentences are formed by multiple English words separated by spaces.
  • the three-layer BERT model in the embodiment of the present application can undoubtedly be applied to the scenario of English speaking skills, but Chinese sentences It is composed of multiple consecutive Chinese characters, and "word” is also composed of several Chinese characters, that is, English is composed of phonetics (character sounds), and Chinese is composed of meaning (glyphs); therefore, when applying the BERT model for MLM processing, it is necessary to Carry out certain processing. There are several corresponding processing methods.
  • LSTM Long Short-Term Memory
  • classic mechanical segmentation methods such as forward/reverse maximum matching, two-way maximum matching, etc.
  • statistical segmentation methods with better effects such as Hidden Markov Markov Model, HMM), conditional random field (conditional random field (CRF), as well as RNN (Recurrent Neural Network), LSTM (Long Short-Term Memory) and other methods that have emerged in recent years using deep neural networks, which are not limited here.
  • HMM Hidden Markov Markov Model
  • CRF conditional random field
  • RNN Recurrent Neural Network
  • LSTM Long Short-Term Memory
  • the annotation method can label the agent speech information for training based on the speech differences and pause rhythm in the agent speech information for training.
  • Desensitization processing, and random masking processing to adapt to the training needs of the three-layer BERT model. Specifically, referring to Figure 3, desensitization processing and random masking processing include the following steps:
  • Step S140 Desensitize the sensitive words in the single sentence on the seat side according to the preset sensitive word library or the preset sensitive word judgment rules;
  • Step S150 Perform random masking processing on the desensitized agent-side single sentences to obtain preprocessed agent-side single sentences.
  • desensitization processing can replace sensitive words in single sentences on the agent side with preset characters
  • random masking processing can include the following steps, as shown in Figure 4:
  • Step S151 Randomly select 15% of the words in the desensitized single sentence on the seat side;
  • Step S152 replace 80% of the selected words with [mask], keep 10% unchanged, and replace the remaining 10% with another random word;
  • Step S153 splice [CLS] characters at the starting position of the desensitized single sentence on the agent side.
  • Random mask processing is the reason why BERT is not limited by one-way language models. To put it simply, the token in each training sequence is randomly replaced with a mask token ([mask]) with a probability of 15%, and then the original word at the [mask] position is predicted. First, in each training sequence, a certain token position is randomly selected for prediction with a probability of 15%, and then if the i-th token is selected, it will be replaced with one of three tokens ([mask], random token and original token), accounting for 80%, 10%, and 10% respectively. This strategy makes BERT no longer only sensitive to [mask], but is sensitive to all tokens, so that it can extract the representation information of any token.
  • a training set applied to the three-layer BERT model is obtained.
  • the training set is input into the three-layer BERT model for training, and the agent speech recognition model is obtained.
  • the convolution layer of the three-layer BERT model includes three convolution kernels, namely the first convolution kernel, the second convolution kernel and the third convolution kernel.
  • the sizes of the first convolution kernel, the second convolution kernel and the third convolution kernel are different from each other.
  • the size of the first convolution kernel is 2, the size of the second convolution kernel is 3, and the size of the third convolution kernel is 3.
  • the size of is 4.
  • the first layer BERT model, the second layer BERT model and the third layer BERT model are represented by encoder1, encoder2 and encoder3 respectively.
  • the Liner layer is spliced to obtain a comprehensive longitudinal dimension text information vector.
  • the comprehensive longitudinal dimension text information vector is processed by the three convolution kernels of the convolution layer, the spliced output is input to the classification layer softmax.
  • the activation function of the classification layer can be the tanh function.
  • step S300 the agent speech information to be identified is input into the agent violation speech recognition model for reasoning, and the probability distribution of the target classification is obtained. This can be achieved by referring to the following steps:
  • Step S310 convert the agent speech information to be recognized into text information
  • Step S320 perform desensitization processing on the text information
  • Step S330 Input the desensitized text information into the agent illegal speech recognition model to obtain probability distributions of several target categories.
  • the process of converting speech to text is also used, and the sensitive words/sensitive words in the text information are desensitized after being converted into text. That is, compared with the agent speech information used for training, no need Annotation is performed in advance to divide single sentences; after the above-mentioned desensitization process, the text information is input into the agent's illegal speech recognition model to obtain the probability distribution of the required classification (that is, the target classification, the type of classification can be preset).
  • step S400 how to output the recognition result of the agent's illegal speech according to the probability distribution of the target classification can be matched according to the target classification.
  • the agent's illegal speech recognition model outputs the probability distribution of the required classification
  • step S400 how to output the recognition result of the agent's illegal speech according to the probability distribution of the target classification can be matched according to the target classification.
  • Step S410 match the target classification with the preset classification, and use the successfully matched target classification as the classification to be recognized;
  • Step S420 When the probability value corresponding to the category to be identified exceeds the probability threshold of the corresponding preset category, it is determined that the agent speech information to be identified contains the illegal speech of the category to be identified.
  • NLP Natural language processing
  • Language Processing Language Processing
  • the default classification in the embodiment of this application is a pre-set classification of semantic types of agent violation words.
  • the pre-training model (three-layer BERT model ) is matched with the preset classification to determine whether the corresponding target classification under the current probability distribution belongs to the classification corresponding to the illegal speech technique.
  • the corresponding probabilities of these target categories i.e., the categories to be recognized
  • the probability is higher than the set probability threshold, it is considered that there is a target in the current recording. Illegal talk.
  • the optimization process of the embodiment of the present application includes:
  • Step S510 Classify the targets that failed to match and determine the corresponding agent speech information
  • Step S520 Re-label the agent speech information corresponding to the failed target classification according to the active learning and edge sampling strategies, and input it into the agent violation speech technique recognition model for retraining.
  • Active learning refers to using machine learning methods to obtain sample data that are “difficult” to classify, allowing humans to reconfirm and review it, and then reuse the manually labeled data using a supervised learning model or a semi-supervised learning model. Carry out training, gradually improve the effect of the model, and integrate human experience into the machine learning model. That is, select a batch of sample data that is easily misclassified, let humans label it, and then let the machine learn the model training process.
  • Margin sampling strategy is a metric learning method that introduces the idea of difficult sample sampling. It has the lowest confidence level, but it considers comparing the category with the highest probability with the second largest category, that is, comparing whether the classification has a higher probability. Data with large advantages and smaller advantages will be used for labeling.
  • the edge sampling method and the minimum confidence method are equivalent in binary classification problems.
  • Active learning scenarios involve evaluating the informativeness of unlabeled instances, and the simplest and most commonly used query framework is uncertainty sampling.
  • the active learner queries the instances that are most uncertain how to label them, and when using a probabilistic model for binary classification, uncertainty sampling simply queries the instances with a positive posterior probability closest to 0.5.
  • This application uses edge sampling to process, and selects the sample with the smallest probability difference between the largest and second largest predicted by the model for judgment.
  • the recognition ability of the model can be enhanced, thereby continuously optimizing the recognition results and improving the recognition accuracy.
  • the Baseline model used in this application is a pre-trained three-layer BERT model. As shown in Figure 5, the BERT model is improved as follows:
  • Each CLS vector represents the information contained in each sentence after each layer of BERT is used to extract information
  • the experimental task is the identification of agent irregularities in credit card sales scenarios, which is converted into a two-classification task.
  • the training process and optimization process of the agent illegal speech recognition model are as follows:
  • the loss function is the cross entropy function to obtain the agent violation Speech recognition model
  • the model performs inference on the data to be identified and obtains the classification probability distribution.
  • the idea of active learning and edge sampling strategy are used to re-label samples that are difficult to distinguish by the model. Train again to enhance the model’s recognition capabilities.
  • the embodiment of this application uses the improved pre-trained three-layer BERT model to integrate different information extraction degrees and text semantics of different time spans to obtain the enhanced performance of the three-layer BERT model; at the same time, based on the improved three-layer BERT model, it is proposed A set of illegal words identification process improves the efficiency of quality inspection personnel in identifying illegal words in business scenarios, and has certain promotion value.
  • an embodiment of the present application provides a device for identifying illegal speech skills by agents.
  • the device includes:
  • the preprocessing unit is used to obtain the agent speech information for training, split the agent speech information for training into single sentences on the agent side, and perform text preprocessing on the single sentences on the agent side;
  • the training unit is used to train using the three-layer BERT model based on the pre-processed agent-side single sentences to obtain an agent illegal speech recognition model
  • the processing unit is used to input the agent speech information to be identified into the agent violation speech recognition model for inference and obtain the probability distribution of the target classification.
  • the agent speech information to be identified is the agent speech in the credit card service sales scenario. information;
  • the identification unit is used to determine the illegal speech in the agent speech information to be identified according to the probability distribution of the target classification.
  • the device for identifying illegal agent speech skills in the embodiment of the present application identifies the agent's illegal speech skills through a three-layer BERT model.
  • a three-layer BERT model is constructed, and the labeled training set, that is, the agent speech skills information used for training, is used to identify the three-layer BERT model.
  • the model is trained to obtain an agent's illegal speech recognition model, and then the agent's speech information to be identified is input into the agent's illegal speech recognition model, thereby obtaining the probability distribution of the target classification.
  • This probability distribution represents the agent's speech under the corresponding classification.
  • the probability that the technical information belongs to illegal speaking skills is determined, and finally the illegal speaking skills in the agent's speaking skills are judged based on the probability distribution of the target classification.
  • the improved pre-trained three-layer BERT model is used to combine text semantics with different information extraction levels and different time spans to obtain the enhanced performance of the three-layer BERT model.
  • a set of illegal words are proposed The technique identification process improves the efficiency of quality inspection personnel in identifying illegal techniques in business scenarios, and has certain promotion value.
  • an embodiment of the present application also provides an electronic device.
  • the electronic device 2000 includes: a memory 2002, a processor 2001, and a computer program stored on the memory 2002 and executable on the processor 2001.
  • the processor 2001 and the memory 2002 may be connected through a bus or other means.
  • the non-transitory software programs and instructions required to implement the agent illegal speech identification method in the above embodiment are stored in the memory 2002.
  • the agent illegal speech technique identification applied to the device in the above embodiment is executed.
  • the method for example, performs the above-described method steps S100 to S400 in Fig. 1, method steps S110 to S130 in Fig. 2, method steps S140 to S150 in Fig. 3, and method steps S151 to S151 in Fig. 4.
  • Electronic equipment also includes components such as input units, display units, audio processing circuits, and power supplies.
  • components such as input units, display units, audio processing circuits, and power supplies.
  • this embodiment does not uniquely limit the structure of the electronic device, and may include more or fewer components than in this embodiment, or combine certain components, or arrange different components.
  • the input unit may be used to receive input numeric or character information, and to generate key signal input related to computer settings and function control.
  • the input unit may include a touch panel and other input devices.
  • a touch panel also known as a touch screen, can collect touch operations on or near it (such as operations on or near the touch panel using a finger, stylus, or any suitable object or accessory), and based on The preset program drives the corresponding connected device.
  • the touch panel may include two parts: a touch detection device and a touch controller.
  • the touch detection device detects the touch orientation, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact point coordinates, and then sends it to the processing processor and can receive commands from the processor and execute them.
  • touch panels can be implemented in various categories such as resistive, capacitive, infrared and surface acoustic wave.
  • the input unit may also include other input devices. Specifically, other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), trackball, mouse, joystick, etc.
  • the display unit may be used to display input information or provided information as well as various menus of the electronic device.
  • the display unit may include a display panel.
  • the display panel may be configured in the form of a Liquid Crystal Display (LCD for short) or an Organic Light-Emitting Diode (OLED for short).
  • the touch panel can cover the display panel.
  • the touch panel detects a touch operation on or near the touch panel, it is sent to the processor to determine the type of the touch event. Then the processor performs operations on the display panel according to the type of the touch event. Provide corresponding visual output.
  • the touch panel and the display panel are used as two independent components to implement the input and output functions of the electronic device, in some embodiments, the touch panel and the display panel can be integrated to implement the input and output functions of the electronic device. .
  • Audio processing circuitry provides an audio interface.
  • the audio processing circuit can transmit the electrical signal converted from the received audio data to the speaker, which converts it into a sound signal and outputs it; on the other hand, the microphone converts the collected sound signal into an electrical signal, which is received and converted by the audio processing circuit.
  • the audio data is then output to the processor for processing and then sent to, for example, another computer via a wireless circuit, or the audio data is output to a memory for further processing.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separate, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • the computer also includes a power supply (such as a battery) that supplies power to various components.
  • a power supply such as a battery
  • the power supply can be logically connected to the processor through a power management system, so that functions such as charging, discharging, and power consumption management can be implemented through the power management system.
  • an embodiment of the present application also provides a computer-readable storage medium, which stores a computer program.
  • the computer program is executed by a processor or a controller, for example, by the above-mentioned electronic device embodiment. Execution by one of the processors can cause the above processor to execute the agent violation speech recognition method in the above embodiment, for example, execute the above-described method steps S100 to S400 in Figure 1 and method steps S110 to S110 in Figure 2 Step S130, method steps S140 to step S150 in Figure 3, method steps S151 to step S153 in Figure 4, method steps S310 to step S330 in Figure 6, method steps S410 to step S420 in Figure 7, and method steps S410 to step S420 in Figure 8 method steps S510 to S520.
  • the computer-readable storage medium may be non-volatile or volatile.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other storage medium used to store desired information and that can be accessed by a computer.
  • Computer storage media typically embody computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery storage media.
  • the present application may be used in a variety of general purpose or special purpose computer device environments or configurations. For example: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor devices, microprocessor-based devices, set-top boxes, programmable consumer electronics devices, network PCs, minicomputers, mainframe computers, including Distributed computing environment for any of the above devices or equipment, etc.
  • the application may be described in the general context of computer programs, such as program modules, executed by a computer.
  • program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types.
  • the present application may also be practiced in distributed computing environments where tasks are performed by remote processing devices connected through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.
  • each block in the flow chart or block diagram may represent a module, program segment, or part of the code.
  • the above module, program segment, or part of the code includes one or more programs for implementing specified logical functions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
  • each block in the block diagram or flowchart illustration, and combinations of blocks in the block diagram or flowchart illustration can be implemented by special purpose hardware-based means for performing the specified functions or operations, or may be implemented by special purpose hardware-based means for performing the specified functions or operations. Achieved by a combination of specialized hardware and computer instructions.
  • the units involved in the embodiments of this application can be implemented in software or hardware, and the described units can also be provided in a processor. Among them, the names of these units do not constitute a limitation on the unit itself under certain circumstances.
  • the example embodiments described here can be implemented by software, or can be implemented by software combined with necessary hardware. Therefore, the technical solution according to the embodiment of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , including several instructions to cause a computing device (which can be a personal computer, server, touch terminal, or network device, etc.) to execute the method according to the embodiment of the present application.
  • a non-volatile storage medium which can be a CD-ROM, U disk, mobile hard disk, etc.
  • a computing device which can be a personal computer, server, touch terminal, or network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Data Mining & Analysis (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Medical Informatics (AREA)
  • Technology Law (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Machine Translation (AREA)

Abstract

The present application belongs to the field of artificial intelligence, and provided are an inappropriate agent language identification method and apparatus, an electronic device and a storage medium. The method comprises: acquiring agent language information for training, splitting the agent language information for training to obtain agent side single sentences, and performing text preprocessing on the agent side single sentences; on the basis of the preprocessed agent side single sentences, training by means of a three-layer BERT model so as to obtain an inappropriate agent language identification model; inputting agent language information to be identified into the inappropriate agent identification model for reasoning so as to obtain probability distribution of target classification, wherein the agent language information to be identified is agent language information in a credit card service and sales scenario; and determining inappropriate language in the agent language information to be identified according to the probability distribution of the target classification. The embodiments of the present application are optimized from two levels of model and data, are applied to inappropriate language identification, can improve the efficiency of identifying inappropriate language of quality inspection personnel in a business scenario, and have a certain popularization value.

Description

坐席违规话术识别方法、装置、电子设备、存储介质Methods, devices, electronic equipment, and storage media for identifying illegal speech techniques by agents
本申请要求于2022年3月15日提交中国专利局、申请号为202210252453.8,发明名称为“坐席违规话术识别方法、装置、电子设备、存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of the Chinese patent application submitted to the China Patent Office on March 15, 2022, with the application number 202210252453.8, and the invention name is "Method, device, electronic equipment, storage medium for identifying illegal speech skills by agents", and its entire content incorporated herein by reference.
技术领域Technical field
本申请属于人工智能技术领域,尤其涉及一种坐席违规话术识别方法、装置、电子设备、存储介质。This application belongs to the field of artificial intelligence technology, and in particular relates to a method, device, electronic equipment, and storage medium for identifying illegal speech techniques by agents.
背景技术Background technique
现有的客服坐席违规话术识别场景,例如信用卡质检服销业务的坐席违规话术识别,主要依靠人工质检和关键词匹配。面对大量冗杂的对话信息,质检人员需要逐句听取录音,进行违规话术的甄别,耗费人力与时间较多,效率低下。如果根据关键词匹配的方法进行识别,则需要依赖人工经验总结关键词,一方面关键词总结不全面会使违规话术遗漏,另一方面容易忽略语义信息造成误判。Existing scenarios for identifying customer service agents' illegal speech skills, such as identification of agent violation skills in credit card quality inspection service and sales business, mainly rely on manual quality inspection and keyword matching. Faced with a large amount of complicated dialogue information, quality inspection personnel need to listen to the recordings sentence by sentence and screen for illegal speech techniques, which consumes a lot of manpower and time and is inefficient. If the identification is based on the keyword matching method, it is necessary to rely on manual experience to summarize the keywords. On the one hand, incomplete keyword summary will lead to the omission of illegal words, and on the other hand, it is easy to ignore the semantic information and cause misjudgment.
技术问题technical problem
以下是发明人意识到的现有技术的技术问题:相关技术中用于自然语言处理的预训练模型主要有各种大小的BERT(Bidirectional Encoder Representation from Transformers),以及BERT的一些变种模型,如ALBERT、RoBERTa、ELECTRA等。这些模型有的存在参数量过大的问题,训练推理速度过慢,有的虽然使用了参数共享的方式减少参数量,但在违规话术识别的应用场景中效果并不好。The following are the technical problems of the existing technology that the inventor is aware of: pre-training models for natural language processing in related technologies mainly include BERT (Bidirectional Encoder Representation) of various sizes. from Transformers), and some variant models of BERT, such as ALBERT, RoBERTa, ELECTRA, etc. Some of these models have problems with too large number of parameters, and the training and inference speed is too slow. Although some use parameter sharing to reduce the number of parameters, the effect is not good in the application scenario of illegal speech recognition.
技术解决方案Technical solutions
第一方面,本申请实施例提供了一种坐席违规话术识别方法,包括:In the first aspect, embodiments of this application provide a method for identifying illegal speech by agents, including:
获取训练用的坐席话术信息,将所述训练用的坐席话术信息拆分得到坐席侧单句并对所述坐席侧单句进行文本预处理;Obtain the agent's speech information for training, split the agent's speech information for training into single sentences on the agent's side, and perform text preprocessing on the single sentences on the agent's side;
基于预处理后的坐席侧单句,使用三层BERT模型进行训练,得到坐席违规话术识别模型;Based on the preprocessed agent-side single sentences, a three-layer BERT model is used for training to obtain an agent illegal speech recognition model;
将待识别的坐席话术信息输入到所述坐席违规话术识别模型进行推理,得到目标分类的概率分布,其中,所述待识别的坐席话术信息是信用卡服销场景下的坐席话术信息;The agent's speech information to be identified is input into the agent's illegal speech recognition model for inference to obtain the probability distribution of the target classification, where the agent's speech information to be identified is the agent's speech information in the credit card service sales scenario. ;
根据所述目标分类的概率分布确定所述待识别的坐席话术信息中的违规话术。The illegal speech in the agent speech information to be identified is determined according to the probability distribution of the target classification.
第二方面,本申请实施例提供了一种坐席违规话术识别装置,包括:In the second aspect, embodiments of the present application provide a device for identifying illegal speech techniques by agents, including:
预处理单元,用于获取训练用的坐席话术信息,将所述训练用的坐席话术信息拆分得到坐席侧单句并对所述坐席侧单句进行文本预处理;A preprocessing unit, used to obtain the agent's speech information for training, split the agent's speech information for training into single sentences on the agent's side, and perform text preprocessing on the single sentences on the agent's side;
训练单元,用于基于预处理后的坐席侧单句,使用三层BERT模型进行训练,得到坐席违规话术识别模型;The training unit is used to train using the three-layer BERT model based on the pre-processed agent-side single sentences to obtain an agent illegal speech recognition model;
处理单元,用于将待识别的坐席话术信息输入到所述坐席违规话术识别模型进行推理,得到目标分类的概率分布,其中,所述待识别的坐席话术信息是信用卡服销场景下的坐席话术信息;The processing unit is used to input the agent speech information to be identified into the agent illegal speech recognition model for inference and obtain the probability distribution of the target classification, wherein the agent speech information to be identified is in the credit card service sales scenario. Information on agent speaking skills;
识别单元,用于根据所述目标分类的概率分布确定所述待识别的坐席话术信息中的违规话术。An identification unit, configured to determine illegal speech in the agent speech information to be identified according to the probability distribution of the target classification.
第三方面,本申请实施例提供了一种电子设备,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现一种坐席违规话术识别方法,其中,所述坐席违规话术识别方法包括:In a third aspect, embodiments of the present application provide an electronic device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, a A method for identifying agents' illegal speech skills, wherein the method for identifying agents' illegal speech skills includes:
获取训练用的坐席话术信息,将所述训练用的坐席话术信息拆分得到坐席侧单句并对所述坐席侧单句进行文本预处理;Obtain the agent's speech information for training, split the agent's speech information for training into single sentences on the agent's side, and perform text preprocessing on the single sentences on the agent's side;
基于预处理后的坐席侧单句,使用三层BERT模型进行训练,得到坐席违规话术识别模型;Based on the preprocessed agent-side single sentences, a three-layer BERT model is used for training to obtain an agent illegal speech recognition model;
将待识别的坐席话术信息输入到所述坐席违规话术识别模型进行推理,得到目标分类的概率分布,其中,所述待识别的坐席话术信息是信用卡服销场景下的坐席话术信息;The agent's speech information to be identified is input into the agent's illegal speech recognition model for inference to obtain the probability distribution of the target classification, where the agent's speech information to be identified is the agent's speech information in the credit card service sales scenario. ;
根据所述目标分类的概率分布确定所述待识别的坐席话术信息中的违规话术。The illegal speech in the agent speech information to be identified is determined according to the probability distribution of the target classification.
第四方面,本申请实施例提供了一种计算机可读存储介质,存储有计算机程序,所述计算机程序用于执行一种坐席违规话术识别方法,其中,所述坐席违规话术识别方法包括:In the fourth aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program, the computer program being used to execute a method for identifying an agent's illegal speech, wherein the method for identifying an agent's illegal speech includes: :
获取训练用的坐席话术信息,将所述训练用的坐席话术信息拆分得到坐席侧单句并对所述坐席侧单句进行文本预处理;Obtain the agent's speech information for training, split the agent's speech information for training into single sentences on the agent's side, and perform text preprocessing on the single sentences on the agent's side;
基于预处理后的坐席侧单句,使用三层BERT模型进行训练,得到坐席违规话术识别模型;Based on the preprocessed agent-side single sentences, a three-layer BERT model is used for training to obtain an agent illegal speech recognition model;
将待识别的坐席话术信息输入到所述坐席违规话术识别模型进行推理,得到目标分类的概率分布,其中,所述待识别的坐席话术信息是信用卡服销场景下的坐席话术信息;The agent's speech information to be identified is input into the agent's illegal speech recognition model for inference to obtain the probability distribution of the target classification, where the agent's speech information to be identified is the agent's speech information in the credit card service sales scenario. ;
根据所述目标分类的概率分布确定所述待识别的坐席话术信息中的违规话术。The illegal speech in the agent speech information to be identified is determined according to the probability distribution of the target classification.
有益效果beneficial effects
本申请实施例至少具有如下有益效果:本申请实施例的预训练的三层BERT模型,通过综合不同信息抽取程度以及不同时间跨度的文本语义,能在没有大幅度增加参数量的情况下,增强三层BERT模型的性能。本申请实施例提出的三层BERT模型,从模型和数据两个层面进行优化,应用在违规话术识别方面,能够提高业务场景中质检人员识别违规话术的效率,具有一定的推广价值。The embodiments of the present application at least have the following beneficial effects: The pre-trained three-layer BERT model of the embodiments of the present application, by integrating different information extraction degrees and text semantics of different time spans, can enhance the Performance of the three-layer BERT model. The three-layer BERT model proposed in the embodiment of this application is optimized from both the model and data levels, and is applied to the identification of illegal speech techniques. It can improve the efficiency of quality inspection personnel in identifying illegal speech techniques in business scenarios, and has certain promotion value.
本申请的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本申请而了解。本申请的目的和其他优点可通过在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。Additional features and advantages of the application will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the application. The objectives and other advantages of the application may be realized and obtained by the structure particularly pointed out in the specification, claims and appended drawings.
附图说明Description of the drawings
附图用来提供对本申请技术方案的进一步理解,并且构成说明书的一部分,与本申请的实施例一起用于解释本申请的技术方案,并不构成对本申请技术方案的限制。The drawings are used to provide a further understanding of the technical solution of the present application and constitute a part of the specification. They are used to explain the technical solution of the present application together with the embodiments of the present application and do not constitute a limitation of the technical solution of the present application.
图1是本申请一个实施例提供的坐席违规话术识别方法的流程图;Figure 1 is a flow chart of a method for identifying illegal speech by agents provided by an embodiment of the present application;
图2是本申请另一个实施例提供的处理训练用的坐席话术信息的流程图;Figure 2 is a flow chart for processing agent speech information for training provided by another embodiment of the present application;
图3是本申请另一个实施例提供的脱敏和随机遮罩处理的流程图;Figure 3 is a flow chart of desensitization and random masking processing provided by another embodiment of the present application;
图4是本申请另一个实施例提供的随机遮罩具体处理方式的流程图;Figure 4 is a flow chart of a specific processing method for random masking provided by another embodiment of the present application;
图5是本申请另一个实施例提供的三层BERT模型的结构;Figure 5 is the structure of a three-layer BERT model provided by another embodiment of the present application;
图6是本申请另一个实施例提供的输入待识别的坐席话术信息并输出目标分类的概率分布的流程图;Figure 6 is a flow chart for inputting agent speech information to be recognized and outputting a probability distribution of target classification provided by another embodiment of the present application;
图7是本申请另一个实施例提供的匹配并识别违规话术的流程图;Figure 7 is a flow chart for matching and identifying illegal words provided by another embodiment of the present application;
图8是本申请另一个实施例提供的优化三层BERT模型的流程图;Figure 8 is a flow chart for optimizing the three-layer BERT model provided by another embodiment of the present application;
图9是本申请另一个实施例提供的坐席违规话术识别装置的结构图;Figure 9 is a structural diagram of an agent illegal speech recognition device provided by another embodiment of the present application;
图10是本申请另一个实施例提供的电子设备的装置图。Figure 10 is a device diagram of an electronic device provided by another embodiment of the present application.
本发明的实施方式Embodiments of the invention
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clear, the present application will be further described in detail below with reference to the drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application and are not used to limit the present application.
需要说明的是,虽然在装置示意图中进行了功能模块划分,在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于装置中的模块划分,或流程图中的顺序执行所示出或描述的步骤。说明书、权利要求书或上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。It should be noted that although the functional modules are divided in the device schematic diagram and the logical sequence is shown in the flow chart, in some cases, the modules can be divided into different modules in the device or the order in the flow chart can be executed. The steps shown or described. The terms "first", "second", etc. in the description, claims or the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence.
本申请提供了一种坐席违规话术识别方法、装置、电子设备、存储介质,方法包括:获取训练用的坐席话术信息,将所述训练用的坐席话术信息拆分得到坐席侧单句并对所述坐席侧单句进行文本预处理;基于预处理后的坐席侧单句,使用三层BERT模型进行训练,得到坐席违规话术识别模型;将待识别的坐席话术信息输入到所述坐席违规话术识别模型进行推理,得到目标分类的概率分布,其中,待识别的坐席话术信息是信用卡服销场景下的坐席话术信息;根据所述目标分类的概率分布确定待识别的坐席话术信息中的违规话术。通过预训练的三层BERT模型对坐席违规话术进行推理,从而提高业务场景中质检人员识别违规话术的效率。This application provides a method, device, electronic device, and storage medium for identifying illegal agent speech skills. The method includes: obtaining agent speech skills information for training, splitting the agent speech skills information for training into single sentences on the agent side and combining them. Perform text preprocessing on the single sentence on the agent side; use the three-layer BERT model for training based on the preprocessed single sentence on the agent side to obtain an agent violation speech recognition model; input the agent speech information to be identified into the agent violation The speech recognition model performs inference to obtain the probability distribution of the target classification, in which the agent speech information to be identified is the agent speech information in the credit card service sales scenario; the agent speech to be identified is determined based on the probability distribution of the target classification. Illegal rhetoric in messages. The pre-trained three-layer BERT model is used to reason about agents' illegal words, thereby improving the efficiency of quality inspection personnel in identifying illegal words in business scenarios.
本申请实施例可以基于人工智能技术对相关的数据进行获取和处理。其中,人工智能(Artificial Intelligence,AI)是研究、开发用于模拟、延伸和扩展人的智能的理论、方法、技术及应用系统的一门新的技术科学;人工智能是计算机科学的一个分支,人工智能企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器,该领域的研究包括机器人、语言识别、图像识别、自然语言处理和专家系统等。人工智能可以对人的意识、思维的信息过程的模拟。人工智能还是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。The embodiments of this application can obtain and process relevant data based on artificial intelligence technology. Among them, artificial intelligence (Artificial Intelligence (AI) is a new technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence; artificial intelligence is a branch of computer science, and artificial intelligence attempts to understand the nature of intelligence Essentially, and produce a new type of intelligent machine that can respond in a manner similar to human intelligence. Research in this field includes robotics, language recognition, image recognition, natural language processing, and expert systems. Artificial intelligence can simulate the information process of human consciousness and thinking. Artificial intelligence is also a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互装置、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、机器人技术、生物识别技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction devices, mechatronics and other technologies. Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometric technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
本申请实施例所提及的终端可以是智能手机、平板电脑、笔记本电脑、台式电脑、车载计算机、智能家居、可穿戴电子设备、VR(Virtual Reality,虚拟现实)/AR(Augmented Reality,增强现实)设备等等;服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器,等等。The terminal mentioned in the embodiment of this application may be a smartphone, a tablet computer, a notebook computer, a desktop computer, a vehicle-mounted computer, a smart home, a wearable electronic device, VR (Virtual Reality, virtual reality)/AR (Augmented Reality, augmented reality) ) equipment, etc.; the server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or it can provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, and network services. , cloud communications, middleware services, domain name services, security services, content delivery network (Content Delivery Network, CDN), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms, etc.
需要说明的是,本申请实施例的数据可以保存在服务器中,服务器可以是独立的服务器,也可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络、以及大数据和人工智能平台等基础云计算服务的云服务器。It should be noted that the data in the embodiments of this application can be stored in a server. The server can be an independent server, or it can provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, and intermediate servers. Cloud servers for basic cloud computing services such as software services, domain name services, security services, content distribution networks, and big data and artificial intelligence platforms.
自然语言处理是计算机科学领域与人工智能领域中的一个重要方向。它研究能实现人与计算机之间用自然语言进行有效通信的各种理论和方法。自然语言处理是一门融语言学、计算机科学、数学于一体的科学。因此,这一领域的研究将涉及自然语言,即人们日常使用的语言,所以它与语言学的研究有着密切的联系。自然语言处理技术通常包括文本处理、语义理解、机器翻译、机器人问答、知识图谱等技术。Natural language processing is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable effective communication between humans and computers using natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Therefore, research in this field will involve natural language, that is, the language that people use every day, so it is closely related to the study of linguistics. Natural language processing technology usually includes text processing, semantic understanding, machine translation, robot question answering, knowledge graph and other technologies.
机器学习(Machine Learning,ML)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。Machine Learning (ML) is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in studying how computers can simulate or implement human learning behavior to acquire new knowledge or skills, and reorganize existing knowledge structures to continuously improve their performance. Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent. Its applications cover all fields of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other technologies.
如图1所示,图1是本申请一个实施例提供的一种坐席违规话术识别方法的流程图,该坐席违规话术识别方法,包括但不限于有以下步骤:As shown in Figure 1, Figure 1 is a flow chart of a method for identifying illegal speech by an agent provided by an embodiment of the present application. The method for identifying illegal speech by an agent includes but is not limited to the following steps:
步骤S100,获取训练用的坐席话术信息,将训练用的坐席话术信息拆分得到坐席侧单句并对坐席侧单句进行文本预处理;Step S100, obtain the agent speaking information for training, split the agent speaking information for training into single sentences on the agent side, and perform text preprocessing on the single sentences on the agent side;
步骤S200,基于预处理后的坐席侧单句,使用三层BERT模型进行训练,得到坐席违规话术识别模型;Step S200: Based on the preprocessed single sentence on the agent side, use the three-layer BERT model for training to obtain an agent illegal speech recognition model;
步骤S300,将待识别的坐席话术信息输入到坐席违规话术识别模型进行推理,得到目标分类的概率分布,其中,待识别的坐席话术信息是信用卡服销场景下的坐席话术信息;Step S300, input the agent speech information to be identified into the agent violation speech recognition model for inference, and obtain the probability distribution of the target classification, where the agent speech information to be identified is the agent speech information in the credit card service sales scenario;
步骤S400,根据目标分类的概率分布确定待识别的坐席话术信息中的违规话术。Step S400: Determine the illegal speech in the agent speech information to be identified according to the probability distribution of the target classification.
本申请实施例通过三层BERT模型对坐席违规话术进行识别,首先构建三层BERT模型,并用标注好的训练集,即训练用的坐席话术信息对三层BERT模型进行训练,得到坐席违规话术识别模型,然后将待识别的坐席话术信息输入到坐席违规话术识别模型中,从而得到目标分类的概率分布,该概率分布表示在对应分类下,坐席话术信息属于违规话术的概率大小,最后基于目标分类的概率分布判断坐席话术信息中的违规话术。The embodiment of this application uses a three-layer BERT model to identify agent violations. First, a three-layer BERT model is constructed, and the marked training set, that is, the agent speech information used for training, is used to train the three-layer BERT model, and the agent violation is obtained. Then input the agent's speech information to be identified into the agent's illegal speech recognition model to obtain the probability distribution of the target classification. This probability distribution indicates that under the corresponding classification, the agent's speech information belongs to the illegal speech. probability, and finally determine the illegal speech in the agent speech information based on the probability distribution of the target classification.
BERT是一个预训练的语言表征模型,采用MLM(masked language model)对双向的Transformers进行预训练,以生成深层的双向语言表征,预训练后,只需要添加一个额外的输出层进行fine-tune,就可以在各种各样的下游任务中取得state-of-the-art的表现。在这过程中并不需要对BERT进行任务特定的结构修改,因此最终生成能融合左右上下文信息的深层双向语言表征。但是,传统BERT模型或者相关技术下BERT模型演变出来的模型,如果直接应用到坐席违规话术上,存在参数量过大或者参数量不大却效果不好的问题,因此本申请实施例对传统的BERT模型进行了改进,具体结构如图5所示,该改进的三层BERT模型包括:BERT is a pre-trained language representation model using MLM (masked language model) to pre-train bidirectional Transformers to generate deep bi-directional language representations. After pre-training, you only need to add an additional output layer for fine-tuning, and you can achieve state-of-the-art in a variety of downstream tasks. performance of-the-art. This process does not require task-specific structural modifications to BERT, so it ultimately generates a deep bidirectional language representation that can integrate left and right contextual information. However, if the traditional BERT model or the model evolved from the BERT model under related technologies is directly applied to the agent's illegal speech, there will be a problem that the parameter amount is too large or the parameter amount is not large but the effect is not good. Therefore, the embodiments of this application are very effective for the traditional BERT model. The BERT model has been improved. The specific structure is shown in Figure 5. The improved three-layer BERT model includes:
第一层BERT模型、第二层BERT模型、第三层BERT模型、全连接层、卷积层和分类层,第一层BERT模型、第二层BERT模型和第三层BERT模型堆叠,第一层BERT模型、第二层BERT模型和第三层BERT模型的隐层分别输出一个CLS向量到全连接层,全连接层、卷积层和分类层依次连接,分类层的输出作为三层BERT模型的输出;其中每个CLS向量代表预处理后的坐席侧单句每一次通过其中一层BERT模型进行信息抽取后,每个句子所包含的信息。The first layer BERT model, the second layer BERT model, the third layer BERT model, the fully connected layer, the convolution layer and the classification layer, the first layer BERT model, the second layer BERT model and the third layer BERT model are stacked, the first The hidden layers of the first-layer BERT model, the second-layer BERT model and the third-layer BERT model respectively output a CLS vector to the fully connected layer. The fully connected layer, convolutional layer and classification layer are connected in sequence. The output of the classification layer is used as the three-layer BERT model. The output; each CLS vector represents the information contained in each sentence after each preprocessed agent-side single sentence is extracted through one of the layers of the BERT model.
全连接层用于将三个CLS向量进行拼接后输出综合纵向维度的文本信息向量;卷积层用于通过多个卷积核分别对综合纵向维度的文本信息向量进行卷积操作,并输出综合横向跨度的文本信息向量到分类层,分类层接收到综合横向跨度的文本信息向量后,通过分类处理得到所需分类的概率分布。The fully connected layer is used to splice three CLS vectors and then output a comprehensive longitudinal dimension text information vector; the convolution layer is used to perform convolution operations on the comprehensive longitudinal dimension text information vector through multiple convolution kernels, and output a comprehensive The horizontal span text information vector is sent to the classification layer. After receiving the comprehensive horizontal span text information vector, the classification layer obtains the probability distribution of the required classification through classification processing.
具体来说,参照图2,上述步骤S100中将训练用的坐席话术信息拆分得到坐席侧单句,可以通过以下步骤实现:Specifically, referring to Figure 2, in the above-mentioned step S100, splitting the agent speech information for training into single sentences on the agent side can be achieved through the following steps:
步骤S110,对训练用的坐席话术信息进行标注;Step S110, label the agent speech information for training;
步骤S120,根据标注对坐席话术信息进行句子拆分,拆分得到多个语音单句;Step S120: Split the agent's speech information into sentences according to the annotations, and split the speech information into multiple single speech sentences;
步骤S130,对多个语音单句进行文字转换,得到以文字方式表示的坐席侧单句。Step S130: Perform text conversion on multiple voice sentences to obtain seat-side sentences expressed in text.
坐席话术信息是语音数据,在构建训练集的时候,需要对坐席话术进行标注以划分语音数据中的多个句子,便于以单句形式构建训练集,这是因为基于BERT模型进行识别的基础是按照sentence进行MLM操作的,通过标注可以得到多个语音单句,同时因为MLM机制需要对句子进行随机遮罩处理,因此需要将每个语音单句都转换成文字信息,即每个语音单句转换成坐席侧单句。可以理解的是, BERT模型是基于英文基础提出的,英语句子由多个英语单词通过空格分隔形成,因此本申请实施例的三层BERT模型无疑能够应用于英语坐席话术的场景,但是中文句子是由连续的多个中文字符构成,“词”也由若干个中文字符构成,即英语以表音(字音)构成,汉语以表义(字形)构成;因此应用BERT模型进行MLM处理的时候需要进行一定的处理。相应的处理方式有几种,如中文切词常见方法里既有经典的机械切分法(如正向/逆向最大匹配,双向最大匹配等),也有效果更好一些的统计切分方法(如隐马尔可夫(Hidden Markov Model,HMM),条件随机场(conditional random field,CRF),以及近年来兴起的采用深度神经网络的RNN(Recurrent Neural Network),LSTM(Long Short-Term Memory)等方法,在此不作限定。Agent speech information is speech data. When building a training set, the agent speech needs to be annotated to divide multiple sentences in the speech data to facilitate the construction of a training set in the form of a single sentence. This is because the basis of recognition based on the BERT model is The MLM operation is performed according to the sentence. Multiple voice sentences can be obtained through annotation. At the same time, because the MLM mechanism needs to randomly mask the sentences, each voice sentence needs to be converted into text information, that is, each voice sentence is converted into A single sentence on the seat side. It can be understood that the BERT model is proposed based on English. English sentences are formed by multiple English words separated by spaces. Therefore, the three-layer BERT model in the embodiment of the present application can undoubtedly be applied to the scenario of English speaking skills, but Chinese sentences It is composed of multiple consecutive Chinese characters, and "word" is also composed of several Chinese characters, that is, English is composed of phonetics (character sounds), and Chinese is composed of meaning (glyphs); therefore, when applying the BERT model for MLM processing, it is necessary to Carry out certain processing. There are several corresponding processing methods. For example, common methods for Chinese word segmentation include classic mechanical segmentation methods (such as forward/reverse maximum matching, two-way maximum matching, etc.), and statistical segmentation methods with better effects (such as Hidden Markov Markov Model, HMM), conditional random field (conditional random field (CRF), as well as RNN (Recurrent Neural Network), LSTM (Long Short-Term Memory) and other methods that have emerged in recent years using deep neural networks, which are not limited here.
其中,标注方法可以根据训练用的坐席话术信息中的语音差别和停顿节奏,对训练用的坐席话术信息进行标注。Among them, the annotation method can label the agent speech information for training based on the speech differences and pause rhythm in the agent speech information for training.
当然,标注之后还需要对坐席侧单句进行预处理,针对坐席话术场景,句子中往往包含一些敏感信息,例如家庭住址等个人身份信息,因此为了避免用户资料的泄露,需要对坐席侧单句进行脱敏处理,以及进行随机遮罩处理以适应三层BERT模型的训练需求。具体来说,参照图3,脱敏处理和随机遮罩处理包括以下步骤:Of course, after annotation, the single sentences on the agent side need to be preprocessed. For agent speech scenarios, the sentences often contain some sensitive information, such as home address and other personal identity information. Therefore, in order to avoid the leakage of user information, the single sentences on the agent side need to be preprocessed. Desensitization processing, and random masking processing to adapt to the training needs of the three-layer BERT model. Specifically, referring to Figure 3, desensitization processing and random masking processing include the following steps:
步骤S140,根据预设敏感字库或预设敏感字判断规则,对坐席侧单句中的敏感字进行脱敏处理;Step S140: Desensitize the sensitive words in the single sentence on the seat side according to the preset sensitive word library or the preset sensitive word judgment rules;
步骤S150,对脱敏处理后的坐席侧单句进行随机遮罩处理,得到预处理后的坐席侧单句。Step S150: Perform random masking processing on the desensitized agent-side single sentences to obtain preprocessed agent-side single sentences.
其中,脱敏处理可以将坐席侧单句中的敏感字替换成预设字符,随机遮罩处理可以包括以下步骤,如图4所示:Among them, desensitization processing can replace sensitive words in single sentences on the agent side with preset characters, and random masking processing can include the following steps, as shown in Figure 4:
步骤S151,随机选取脱敏处理后的坐席侧单句中的15%的词;Step S151: Randomly select 15% of the words in the desensitized single sentence on the seat side;
步骤S152,对所选词中的80%用[mask]代替,10%保持不变,余下10%用另一个随机词进行替换;Step S152, replace 80% of the selected words with [mask], keep 10% unchanged, and replace the remaining 10% with another random word;
步骤S153,在脱敏处理后的坐席侧单句的起始位置拼接[CLS]字符。Step S153: splice [CLS] characters at the starting position of the desensitized single sentence on the agent side.
随机遮罩处理是BERT能够不受单向语言模型所限制的原因。简单来说就是以15%的概率用mask token([mask])随机地对每一个训练序列中的token进行替换,然后预测出[mask]位置原有的单词。首先在每一个训练序列中以15%的概率随机地选中某个token位置用于预测,然后假如是第i个token被选中,则会被替换成三个token之一([mask]、随机token和原有token),分别占比是80%、10%、10%。该策略令到BERT不再只对[mask]敏感,而是对所有的token都敏感,以致能抽取出任何token的表征信息。Random mask processing is the reason why BERT is not limited by one-way language models. To put it simply, the token in each training sequence is randomly replaced with a mask token ([mask]) with a probability of 15%, and then the original word at the [mask] position is predicted. First, in each training sequence, a certain token position is randomly selected for prediction with a probability of 15%, and then if the i-th token is selected, it will be replaced with one of three tokens ([mask], random token and original token), accounting for 80%, 10%, and 10% respectively. This strategy makes BERT no longer only sensitive to [mask], but is sensitive to all tokens, so that it can extract the representation information of any token.
总之,训练用的坐席话术信息通过标注和划分后,得到应用于三层BERT模型的训练集,将训练集输入到三层BERT模型中进行训练,得到坐席话术识别模型。In short, after the agent speech information for training is annotated and divided, a training set applied to the three-layer BERT model is obtained. The training set is input into the three-layer BERT model for training, and the agent speech recognition model is obtained.
可以理解的是,在一个实施例中,三层BERT模型的卷积层包括三个卷积核,分别是第一卷积核、第二卷积核和第三卷积核,参照图5所示,第一卷积核、第二卷积核和第三卷积核的大小互不相同,第一卷积核的大小为2,第二卷积核的大小为3,第三卷积核的大小为4,图5中第一层BERT模型、第二层BERT模型和第三层BERT模型分别以encoder1、encoder2和encoder3表示,encoder1、encoder2和encoder3分别输出CLS1、CLS2和CLS3向量到全连接层Liner进行拼接,得到综合纵向维度的文本信息向量,综合纵向维度的文本信息向量经过卷积层的三个卷积核处理后,拼接输出后输入到分类层softmax。分类层的激活函数可以是tanh函数。It can be understood that in one embodiment, the convolution layer of the three-layer BERT model includes three convolution kernels, namely the first convolution kernel, the second convolution kernel and the third convolution kernel. Refer to Figure 5 shows that the sizes of the first convolution kernel, the second convolution kernel and the third convolution kernel are different from each other. The size of the first convolution kernel is 2, the size of the second convolution kernel is 3, and the size of the third convolution kernel is 3. The size of is 4. In Figure 5, the first layer BERT model, the second layer BERT model and the third layer BERT model are represented by encoder1, encoder2 and encoder3 respectively. encoder1, encoder2 and encoder3 respectively output CLS1, CLS2 and CLS3 vectors to the fully connected The Liner layer is spliced to obtain a comprehensive longitudinal dimension text information vector. After the comprehensive longitudinal dimension text information vector is processed by the three convolution kernels of the convolution layer, the spliced output is input to the classification layer softmax. The activation function of the classification layer can be the tanh function.
参照图6,上述步骤S300中,将待识别的坐席话术信息输入到坐席违规话术识别模型进行推理,得到目标分类的概率分布,可以参照以下步骤实现:Referring to Figure 6, in the above-mentioned step S300, the agent speech information to be identified is input into the agent violation speech recognition model for reasoning, and the probability distribution of the target classification is obtained. This can be achieved by referring to the following steps:
步骤S310,将待识别的坐席话术信息转换成文字信息;Step S310, convert the agent speech information to be recognized into text information;
步骤S320,对文字信息进行脱敏处理;Step S320, perform desensitization processing on the text information;
步骤S330,将脱敏处理后的文字信息输入到坐席违规话术识别模型,得到若干个目标分类的概率分布。Step S330: Input the desensitized text information into the agent illegal speech recognition model to obtain probability distributions of several target categories.
对于待识别的坐席话术信息,同样通过语音转换文字的处理,在转换成文字之后对文字信息中的敏感字/敏感词进行脱敏处理,即相对于训练用的坐席话术信息,不需要预先进行标注来划分单句;经过上述脱敏处理之后,将文字信息输入到坐席违规话术识别模型中,得到所需分类(即目标分类,可预先设定分类的类型)的概率分布。For the agent speech information to be recognized, the process of converting speech to text is also used, and the sensitive words/sensitive words in the text information are desensitized after being converted into text. That is, compared with the agent speech information used for training, no need Annotation is performed in advance to divide single sentences; after the above-mentioned desensitization process, the text information is input into the agent's illegal speech recognition model to obtain the probability distribution of the required classification (that is, the target classification, the type of classification can be preset).
当坐席违规话术识别模型输出所需分类的概率分布,则在步骤S400中如何根据目标分类的概率分布输出坐席违规话术的识别结果,可以根据目标分类进行匹配,例如,参照图7,通过以下步骤实现:When the agent's illegal speech recognition model outputs the probability distribution of the required classification, in step S400, how to output the recognition result of the agent's illegal speech according to the probability distribution of the target classification can be matched according to the target classification. For example, referring to Figure 7, by Follow these steps to achieve:
步骤S410,将目标分类与预设分类进行匹配,并将匹配成功的目标分类作为待识别分类;Step S410, match the target classification with the preset classification, and use the successfully matched target classification as the classification to be recognized;
步骤S420,当待识别分类对应的概率值超过对应的预设分类的概率阈值,确定待识别的坐席话术信息中包含待识别分类的违规话术。Step S420: When the probability value corresponding to the category to be identified exceeds the probability threshold of the corresponding preset category, it is determined that the agent speech information to be identified contains the illegal speech of the category to be identified.
通过自然语言处理(NLP,Natural Language Processing)的训练模型最终要跟语义等类型进行匹配,本申请实施例中的预设分类是人为预先设定好的对坐席违规话术语义类型的划分分类,预训练模型(三层BERT模型)的输出结果与预设分类之间进行匹配,从而确定当前概率分布下对应的目标分类是否属于违规话术对应的分类。另一方面,当目标分类存在若干个分类与预设分类匹配,则这些目标分类(即待识别分类)进一步判断其对应的概率,若概率高于设定的概率阈值,则认为当前录音中存在违规话术。Through natural language processing (NLP, Natural The training model of Language Processing (Language Processing) will eventually be matched with semantic and other types. The default classification in the embodiment of this application is a pre-set classification of semantic types of agent violation words. The pre-training model (three-layer BERT model ) is matched with the preset classification to determine whether the corresponding target classification under the current probability distribution belongs to the classification corresponding to the illegal speech technique. On the other hand, when there are several categories in the target category that match the preset categories, the corresponding probabilities of these target categories (i.e., the categories to be recognized) are further judged. If the probability is higher than the set probability threshold, it is considered that there is a target in the current recording. Illegal talk.
可以理解的是,本申请实施例三层BERT模型并不能100%准确给出结果,对于三层BERT模型无法准确判断、难以区分的样本,三层BERT模型可以进行循环学习。具体来说,参照图8,本申请实施例的优化过程包括:It can be understood that the three-layer BERT model in the embodiment of this application cannot provide 100% accurate results. For samples that the three-layer BERT model cannot accurately judge and are difficult to distinguish, the three-layer BERT model can perform loop learning. Specifically, referring to Figure 8, the optimization process of the embodiment of the present application includes:
步骤S510,对匹配失败的目标分类,确定对应的坐席话术信息;Step S510: Classify the targets that failed to match and determine the corresponding agent speech information;
步骤S520,根据主动学习和边缘采样策略对匹配失败的目标分类对应的坐席话术信息进行再标注,并输入到坐席违规话术识别模型进行再训练。Step S520: Re-label the agent speech information corresponding to the failed target classification according to the active learning and edge sampling strategies, and input it into the agent violation speech technique recognition model for retraining.
主动学习(Active Learning)是指通过机器学习的方法获取到那些比较“难”分类的样本数据,让人工再次确认和审核,然后将人工标注得到的数据再次使用有监督学习模型或者半监督学习模型进行训练,逐步提升模型的效果,将人工经验融入机器学习的模型中,即选择一批容易被错分的样本数据,让人工进行标注,再让机器学习模型训练的过程。边缘采样(Margin sampling)策略是一种引入难样本采样思想的度量学习方法,置信度最低类似,但是其考虑的是将概率最大的类别与第二大的类别进行比较,即比较分类是否具有较大的优势,优势较小的数据将用以进行标注。边缘采样法和置信度最低法在二分类问题上是等价的。主动学习场景涉及评估未标记实例的信息量,最简单和最常用的查询框架是不确定性抽样。在此框架中,主动学习者查询最不确定如何标记的实例,当使用概率模型进行二元分类时,不确定性抽样只是查询后验概率为正且最接近0.5的实例。本申请采用边缘采样的方式处理,选择模型预测最大和第二大的概率差值最小的样本进行判断。Active learning refers to using machine learning methods to obtain sample data that are “difficult” to classify, allowing humans to reconfirm and review it, and then reuse the manually labeled data using a supervised learning model or a semi-supervised learning model. Carry out training, gradually improve the effect of the model, and integrate human experience into the machine learning model. That is, select a batch of sample data that is easily misclassified, let humans label it, and then let the machine learn the model training process. Margin sampling strategy is a metric learning method that introduces the idea of difficult sample sampling. It has the lowest confidence level, but it considers comparing the category with the highest probability with the second largest category, that is, comparing whether the classification has a higher probability. Data with large advantages and smaller advantages will be used for labeling. The edge sampling method and the minimum confidence method are equivalent in binary classification problems. Active learning scenarios involve evaluating the informativeness of unlabeled instances, and the simplest and most commonly used query framework is uncertainty sampling. In this framework, the active learner queries the instances that are most uncertain how to label them, and when using a probabilistic model for binary classification, uncertainty sampling simply queries the instances with a positive posterior probability closest to 0.5. This application uses edge sampling to process, and selects the sample with the smallest probability difference between the largest and second largest predicted by the model for judgment.
通过再次训练,可以增强模型的识别能力,从而不断优化识别结果和提高识别准确率。Through retraining, the recognition ability of the model can be enhanced, thereby continuously optimizing the recognition results and improving the recognition accuracy.
本申请所采用的Baseline模型为预训练的三层BERT模型。结合图5所示,BERT模型改进如下:The Baseline model used in this application is a pre-trained three-layer BERT model. As shown in Figure 5, the BERT model is improved as follows:
1. 将三层BERT模型的每一个隐层的CLS向量拿出,每个CLS向量代表每一次通过一层BERT进行信息抽取后,每个句子所包含的信息;1. Take out the CLS vector of each hidden layer of the three-layer BERT model. Each CLS vector represents the information contained in each sentence after each layer of BERT is used to extract information;
2. 将三个CLS向量拼接起来送入全连接层,得到综合纵向维度的文本信息向量;2. Splice the three CLS vectors and send them to the fully connected layer to obtain a comprehensive vertical dimension text information vector;
3. 分别用大小的为2,3,4的卷积核对上述向量进行卷积操作,将三个输出向量进行拼接,得到综合横向跨度的文本信息向量;3. Use convolution kernels of sizes 2, 3, and 4 to perform convolution operations on the above vectors, and splice the three output vectors to obtain a comprehensive horizontal span text information vector;
4. 将上述向量送入分类层,其中的激活函数为tanh函数,得到所需分类的概率分布。4. Send the above vector to the classification layer, where the activation function is the tanh function to obtain the probability distribution of the required classification.
所实验的任务为信用卡服销场景下的坐席违规话术识别,将其转化为二分类任务。坐席违规话术识别模型训练过程及优化过程如下:The experimental task is the identification of agent irregularities in credit card sales scenarios, which is converted into a two-classification task. The training process and optimization process of the agent illegal speech recognition model are as follows:
1. 得到标注违规的通话后将其拆分成坐席侧单句,在经过单句标注及脱敏记号替换后,使用改进的预训练三层BERT模型进行训练,损失函数为交叉熵函数,得到坐席违规话术识别模型;1. After obtaining the calls marked with violations, split them into single sentences on the agent side. After single sentence annotation and desensitization mark replacement, use the improved pre-trained three-layer BERT model for training. The loss function is the cross entropy function to obtain the agent violation Speech recognition model;
2. 模型在待识别数据上进行推理,得到分类概率分布。采用主动学习的思想及边缘采样策略,对模型难以区分的样本进行再标注。再次训练,增强模型的识别能力。2. The model performs inference on the data to be identified and obtains the classification probability distribution. The idea of active learning and edge sampling strategy are used to re-label samples that are difficult to distinguish by the model. Train again to enhance the model’s recognition capabilities.
本申请实施例利用改进后的预训练三层BERT模型综合不同信息抽取程度以及不同时间跨度的文本语义,得到增强的三层BERT模型的性能;与此同时,基于改进的三层BERT模型,提出一套违规话术识别流程,提高了业务场景中质检人员识别违规话术的效率,具有一定的推广价值。The embodiment of this application uses the improved pre-trained three-layer BERT model to integrate different information extraction degrees and text semantics of different time spans to obtain the enhanced performance of the three-layer BERT model; at the same time, based on the improved three-layer BERT model, it is proposed A set of illegal words identification process improves the efficiency of quality inspection personnel in identifying illegal words in business scenarios, and has certain promotion value.
另外,参照图9,本申请实施例提供了坐席违规话术识别装置,该装置包括:In addition, referring to Figure 9, an embodiment of the present application provides a device for identifying illegal speech skills by agents. The device includes:
预处理单元,用于获取训练用的坐席话术信息,将训练用的坐席话术信息拆分得到坐席侧单句并对坐席侧单句进行文本预处理;The preprocessing unit is used to obtain the agent speech information for training, split the agent speech information for training into single sentences on the agent side, and perform text preprocessing on the single sentences on the agent side;
训练单元,用于基于预处理后的坐席侧单句,使用三层BERT模型进行训练,得到坐席违规话术识别模型;The training unit is used to train using the three-layer BERT model based on the pre-processed agent-side single sentences to obtain an agent illegal speech recognition model;
处理单元,用于将待识别的坐席话术信息输入到坐席违规话术识别模型进行推理,得到目标分类的概率分布,其中,待识别的坐席话术信息是信用卡服销场景下的坐席话术信息;The processing unit is used to input the agent speech information to be identified into the agent violation speech recognition model for inference and obtain the probability distribution of the target classification. The agent speech information to be identified is the agent speech in the credit card service sales scenario. information;
识别单元,用于根据目标分类的概率分布确定待识别的坐席话术信息中的违规话术。The identification unit is used to determine the illegal speech in the agent speech information to be identified according to the probability distribution of the target classification.
本申请实施例的坐席违规话术识别装置通过三层BERT模型对坐席违规话术进行识别,首先构建三层BERT模型,并用标注好的训练集,即训练用的坐席话术信息对三层BERT模型进行训练,得到坐席违规话术识别模型,然后将待识别的坐席话术信息输入到坐席违规话术识别模型中,从而得到目标分类的概率分布,该概率分布表示在对应分类下,坐席话术信息属于违规话术的概率大小,最后基于目标分类的概率分布判断坐席话术信息中的违规话术。The device for identifying illegal agent speech skills in the embodiment of the present application identifies the agent's illegal speech skills through a three-layer BERT model. First, a three-layer BERT model is constructed, and the labeled training set, that is, the agent speech skills information used for training, is used to identify the three-layer BERT model. The model is trained to obtain an agent's illegal speech recognition model, and then the agent's speech information to be identified is input into the agent's illegal speech recognition model, thereby obtaining the probability distribution of the target classification. This probability distribution represents the agent's speech under the corresponding classification. The probability that the technical information belongs to illegal speaking skills is determined, and finally the illegal speaking skills in the agent's speaking skills are judged based on the probability distribution of the target classification.
利用改进后的预训练三层BERT模型综合不同信息抽取程度以及不同时间跨度的文本语义,得到增强的三层BERT模型的性能;与此同时,基于改进的三层BERT模型,提出一套违规话术识别流程,提高了业务场景中质检人员识别违规话术的效率,具有一定的推广价值。The improved pre-trained three-layer BERT model is used to combine text semantics with different information extraction levels and different time spans to obtain the enhanced performance of the three-layer BERT model. At the same time, based on the improved three-layer BERT model, a set of illegal words are proposed The technique identification process improves the efficiency of quality inspection personnel in identifying illegal techniques in business scenarios, and has certain promotion value.
另外,参照图10,本申请的一个实施例还提供了一种电子设备,该电子设备2000包括:存储器2002、处理器2001及存储在存储器2002上并可在处理器2001上运行的计算机程序。In addition, referring to FIG. 10 , an embodiment of the present application also provides an electronic device. The electronic device 2000 includes: a memory 2002, a processor 2001, and a computer program stored on the memory 2002 and executable on the processor 2001.
处理器2001和存储器2002可以通过总线或者其他方式连接。The processor 2001 and the memory 2002 may be connected through a bus or other means.
实现上述实施例的坐席违规话术识别方法所需的非暂态软件程序以及指令存储在存储器2002中,当被处理器2001执行时,执行上述实施例中的应用于设备的坐席违规话术识别方法,例如,执行以上描述的图1中的方法步骤S100至步骤S400、图2中的方法步骤S110至步骤S130、图3中的方法步骤S140至步骤S150、图4中的方法步骤S151至步骤S153、图6中的方法步骤S310至步骤S330、图7中的方法步骤S410至步骤S420以及图8中的方法步骤S510至步骤S520。The non-transitory software programs and instructions required to implement the agent illegal speech identification method in the above embodiment are stored in the memory 2002. When executed by the processor 2001, the agent illegal speech technique identification applied to the device in the above embodiment is executed. The method, for example, performs the above-described method steps S100 to S400 in Fig. 1, method steps S110 to S130 in Fig. 2, method steps S140 to S150 in Fig. 3, and method steps S151 to S151 in Fig. 4. S153, method steps S310 to S330 in FIG. 6 , method steps S410 to S420 in FIG. 7 , and method steps S510 to S520 in FIG. 8 .
电子设备还包括输入单元、显示单元、音频处理电路以及电源等部件。本领域技术人员可以理解,本实施例不对电子设备的结构进行唯一限定,可以包括比本实施例更多或更少的部件,或者组合某些部件,或者不同的部件布置。Electronic equipment also includes components such as input units, display units, audio processing circuits, and power supplies. Those skilled in the art can understand that this embodiment does not uniquely limit the structure of the electronic device, and may include more or fewer components than in this embodiment, or combine certain components, or arrange different components.
输入单元可用于接收输入的数字或字符信息,以及产生与计算机的设置以及功能控制有关的键信号输入。具体地,输入单元可包括触控面板以及其他输入装置。触控面板,也称为触摸屏,可收集在其上或附近的触摸操作(比如使用手指、触笔等任何适合的物体或附件在触控面板上或在触控面板附近的操作),并根据预先设定的程式驱动相应的连接装置。可选的,触控面板可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器,并能接收处理器发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类别实现触控面板。除了触控面板,输入单元还可以包括其他输入装置。具体地,其他输入装置可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。The input unit may be used to receive input numeric or character information, and to generate key signal input related to computer settings and function control. Specifically, the input unit may include a touch panel and other input devices. A touch panel, also known as a touch screen, can collect touch operations on or near it (such as operations on or near the touch panel using a finger, stylus, or any suitable object or accessory), and based on The preset program drives the corresponding connected device. Optionally, the touch panel may include two parts: a touch detection device and a touch controller. Among them, the touch detection device detects the touch orientation, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact point coordinates, and then sends it to the processing processor and can receive commands from the processor and execute them. In addition, touch panels can be implemented in various categories such as resistive, capacitive, infrared and surface acoustic wave. In addition to the touch panel, the input unit may also include other input devices. Specifically, other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), trackball, mouse, joystick, etc.
显示单元可用于显示输入的信息或提供的信息以及电子设备的各种菜单。显示单元可包括显示面板,可选的,可以采用液晶显示器(Liquid Crystal Display,简称LCD)、有机发光二极管(Organic Light-Emitting Diode,简称OLED)等形式来配置显示面板。进一步的,触控面板可覆盖显示面板,当触控面板检测到在其上或附近的触摸操作后,传送给处理器以确定触摸事件的类别,随后处理器根据触摸事件的类别在显示面板上提供相应的视觉输出。虽然触控面板与显示面板是作为两个独立的部件来实现电子设备的输入和输入功能,但是在某些实施例中,可以将触控面板与显示面板集成而实现电子设备的输入和输出功能。The display unit may be used to display input information or provided information as well as various menus of the electronic device. The display unit may include a display panel. Optionally, the display panel may be configured in the form of a Liquid Crystal Display (LCD for short) or an Organic Light-Emitting Diode (OLED for short). Further, the touch panel can cover the display panel. When the touch panel detects a touch operation on or near the touch panel, it is sent to the processor to determine the type of the touch event. Then the processor performs operations on the display panel according to the type of the touch event. Provide corresponding visual output. Although the touch panel and the display panel are used as two independent components to implement the input and output functions of the electronic device, in some embodiments, the touch panel and the display panel can be integrated to implement the input and output functions of the electronic device. .
音频处理电路可提供音频接口。音频处理电路可将接收到的音频数据转换后的电信号,传输到扬声器,由扬声器转换为声音信号输出;另一方面,传声器将收集的声音信号转换为电信号,由音频处理电路接收后转换为音频数据,再将音频数据输出处理器处理后,经无线电路以发送给比如另一计算机,或者将音频数据输出至存储器以便进一步处理。Audio processing circuitry provides an audio interface. The audio processing circuit can transmit the electrical signal converted from the received audio data to the speaker, which converts it into a sound signal and outputs it; on the other hand, the microphone converts the collected sound signal into an electrical signal, which is received and converted by the audio processing circuit. The audio data is then output to the processor for processing and then sent to, for example, another computer via a wireless circuit, or the audio data is output to a memory for further processing.
以上所描述的装置实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的, 即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separate, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
计算机还包括给各个部件供电的电源(比如电池),优选的,电源可以通过电源管理系统与处理器逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。The computer also includes a power supply (such as a battery) that supplies power to various components. Preferably, the power supply can be logically connected to the processor through a power management system, so that functions such as charging, discharging, and power consumption management can be implemented through the power management system.
此外,本申请的一个实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序被一个处理器或控制器执行,例如,被上述电子设备实施例中的一个处理器执行,可使得上述处理器执行上述实施例中的坐席违规话术识别方法,例如,执行以上描述的图1中的方法步骤S100至步骤S400、图2中的方法步骤S110至步骤S130、图3中的方法步骤S140至步骤S150、图4中的方法步骤S151至步骤S153、图6中的方法步骤S310至步骤S330、图7中的方法步骤S410至步骤S420以及图8中的方法步骤S510至步骤S520。所述计算机可读存储介质可以是非易失性,也可以是易失性。In addition, an embodiment of the present application also provides a computer-readable storage medium, which stores a computer program. The computer program is executed by a processor or a controller, for example, by the above-mentioned electronic device embodiment. Execution by one of the processors can cause the above processor to execute the agent violation speech recognition method in the above embodiment, for example, execute the above-described method steps S100 to S400 in Figure 1 and method steps S110 to S110 in Figure 2 Step S130, method steps S140 to step S150 in Figure 3, method steps S151 to step S153 in Figure 4, method steps S310 to step S330 in Figure 6, method steps S410 to step S420 in Figure 7, and method steps S410 to step S420 in Figure 8 method steps S510 to S520. The computer-readable storage medium may be non-volatile or volatile.
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、装置可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读存储介质上,计算机可读存储介质可以包括计算机存储介质(或非暂时性存储介质)和通信存储介质(或暂时性存储介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除存储介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的存储介质。此外,本领域普通技术人员公知的是,通信存储介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送存储介质。Those of ordinary skill in the art can understand that all or some steps and devices in the methods disclosed above can be implemented as software, firmware, hardware, and appropriate combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, a digital signal processor, or a microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit . Such software may be distributed on computer-readable storage media, which may include computer storage media (or non-transitory storage media) and communication storage media (or transitory storage media). As is known to those of ordinary skill in the art, the term computer storage media includes volatile and nonvolatile media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. removable, removable and non-removable storage media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other storage medium used to store desired information and that can be accessed by a computer. Furthermore, as is known to those of ordinary skill in the art, communications storage media typically embody computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery storage media.
本申请可用于众多通用或专用的计算机装置环境或配置中。例如:个人计算机、服务器计算机、手持设备或便携式设备、平板型设备、多处理器装置、基于微处理器的装置、置顶盒、可编程的消费电子设备、网络PC、小型计算机、大型计算机、包括以上任何装置或设备的分布式计算环境等等。本申请可以在由计算机执行的计算机程序的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。The present application may be used in a variety of general purpose or special purpose computer device environments or configurations. For example: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor devices, microprocessor-based devices, set-top boxes, programmable consumer electronics devices, network PCs, minicomputers, mainframe computers, including Distributed computing environment for any of the above devices or equipment, etc. The application may be described in the general context of computer programs, such as program modules, executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types. The present application may also be practiced in distributed computing environments where tasks are performed by remote processing devices connected through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.
附图中的流程图和框图,图示了按照本申请各种实施例的装置、方法和计算机程序产品的可能实现的体系架构、功能和操作。其中,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的程序。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的装置来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functions and operations of possible implementations of devices, methods and computer program products according to various embodiments of the present application. Each block in the flow chart or block diagram may represent a module, program segment, or part of the code. The above module, program segment, or part of the code includes one or more programs for implementing specified logical functions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved. It will also be noted that each block in the block diagram or flowchart illustration, and combinations of blocks in the block diagram or flowchart illustration, can be implemented by special purpose hardware-based means for performing the specified functions or operations, or may be implemented by special purpose hardware-based means for performing the specified functions or operations. Achieved by a combination of specialized hardware and computer instructions.
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现,所描述的单元也可以设置在处理器中。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定。The units involved in the embodiments of this application can be implemented in software or hardware, and the described units can also be provided in a processor. Among them, the names of these units do not constitute a limitation on the unit itself under certain circumstances.
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本申请的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。It should be noted that although several modules or units of equipment for action execution are mentioned in the above detailed description, this division is not mandatory. In fact, according to the embodiments of the present application, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one module or unit described above may be further divided into being embodied by multiple modules or units.
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本申请实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、触控终端、或者网络设备等)执行根据本申请实施方式的方法。Through the above description of the embodiments, those skilled in the art can easily understand that the example embodiments described here can be implemented by software, or can be implemented by software combined with necessary hardware. Therefore, the technical solution according to the embodiment of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , including several instructions to cause a computing device (which can be a personal computer, server, touch terminal, or network device, etc.) to execute the method according to the embodiment of the present application.
本领域技术人员在考虑说明书及实践这里公开的实施方式后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。Other embodiments of the present application will be readily apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of this application that follow the general principles of this application and include common knowledge or customary technical means in the technical field that are not disclosed in this application. .
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。It is to be understood that the present application is not limited to the precise structures described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.
以上是对本申请的较佳实施进行了具体说明,但本申请并不局限于上述实施方式,熟悉本领域的技术人员在不违背本申请精神的前提下还可作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。The above is a detailed description of the preferred implementation of the present application, but the present application is not limited to the above-mentioned embodiments. Those skilled in the art can also make various equivalent modifications or substitutions without violating the spirit of the present application. Equivalent modifications or substitutions are included within the scope defined by the claims of this application.

Claims (20)

  1. 一种坐席违规话术识别方法,其中,包括:A method for identifying agent irregularities, which includes:
    获取训练用的坐席话术信息,将所述训练用的坐席话术信息拆分得到坐席侧单句并对所述坐席侧单句进行文本预处理;Obtain the agent's speech information for training, split the agent's speech information for training into single sentences on the agent's side, and perform text preprocessing on the single sentences on the agent's side;
    基于预处理后的坐席侧单句,使用三层BERT模型进行训练,得到坐席违规话术识别模型;Based on the preprocessed agent-side single sentences, a three-layer BERT model is used for training to obtain an agent illegal speech recognition model;
    将待识别的坐席话术信息输入到所述坐席违规话术识别模型进行推理,得到目标分类的概率分布,其中,所述待识别的坐席话术信息是信用卡服销场景下的坐席话术信息;The agent's speech information to be identified is input into the agent's illegal speech recognition model for inference to obtain the probability distribution of the target classification, where the agent's speech information to be identified is the agent's speech information in the credit card service sales scenario. ;
    根据所述目标分类的概率分布确定所述待识别的坐席话术信息中的违规话术。The illegal speech in the agent speech information to be identified is determined according to the probability distribution of the target classification.
  2. 根据权利要求1所述的坐席违规话术识别方法,其中,所述将所述训练用的坐席话术信息拆分得到坐席侧单句,包括:The method for identifying illegal agent speaking skills according to claim 1, wherein said splitting the agent speaking skills information for training to obtain a single sentence on the agent's side includes:
    对所述训练用的坐席话术信息进行标注;Annotate the agent speech information used for training;
    根据标注对所述坐席话术信息进行句子拆分,拆分得到多个语音单句;Split the sentences of the agent's speech information according to the annotations, and split them to obtain multiple single speech sentences;
    对所述多个语音单句进行文字转换,得到以文字方式表示的坐席侧单句。Perform text conversion on the plurality of voice single sentences to obtain seat-side single sentences expressed in text.
  3. 根据权利要求2所述的坐席违规话术识别方法,其中,所述对所述训练用的坐席话术信息进行标注,包括:The method for identifying illegal agent speaking skills according to claim 2, wherein the labeling of the agent speaking skills information for training includes:
    根据所述训练用的坐席话术信息中的语音差别和停顿节奏,对所述训练用的坐席话术信息进行标注。The agent's speech information for training is annotated according to the speech differences and pause rhythm in the agent's speech information for training.
  4. 根据权利要求1所述的坐席违规话术识别方法,其中,所述对所述坐席侧单句进行文本预处理,包括:The method for identifying illegal speech skills of an agent according to claim 1, wherein the text preprocessing of the single sentence on the agent side includes:
    根据预设敏感字库或预设敏感字判断规则,对所述坐席侧单句中的敏感字进行脱敏处理;Desensitize the sensitive words in the single sentence on the seat side according to the preset sensitive word library or the preset sensitive word judgment rules;
    对脱敏处理后的所述坐席侧单句进行随机遮罩处理,得到预处理后的坐席侧单句。The desensitized single sentence on the seat side is randomly masked to obtain the preprocessed single sentence on the seat side.
  5. 根据权利要求1所述的坐席违规话术识别方法,其中,所述三层BERT模型包括第一层BERT模型、第二层BERT模型、第三层BERT模型、全连接层、卷积层和分类层,所述第一层BERT模型、所述第二层BERT模型和所述第三层BERT模型堆叠,所述第一层BERT模型、所述第二层BERT模型和所述第三层BERT模型的隐层分别输出一个CLS向量到所述全连接层,所述全连接层、所述卷积层和所述分类层依次连接,所述分类层的输出作为所述三层BERT模型的输出;其中每个所述CLS向量代表所述预处理后的坐席侧单句每一次通过其中一层BERT模型进行信息抽取后,每个句子所包含的信息。The method for identifying agent illegal speech skills according to claim 1, wherein the three-layer BERT model includes a first-layer BERT model, a second-layer BERT model, a third-layer BERT model, a fully connected layer, a convolution layer and a classification layer. layer, the first layer BERT model, the second layer BERT model and the third layer BERT model are stacked, the first layer BERT model, the second layer BERT model and the third layer BERT model The hidden layers respectively output a CLS vector to the fully connected layer, the fully connected layer, the convolution layer and the classification layer are connected in sequence, and the output of the classification layer is used as the output of the three-layer BERT model; Each of the CLS vectors represents the information contained in each sentence after each preprocessed agent-side single sentence is extracted through one of the layers of the BERT model.
  6. 根据权利要求1所述的坐席违规话术识别方法,其中,所述将待识别的坐席话术信息输入到所述坐席违规话术识别模型进行推理,得到目标分类的概率分布,包括:The method for identifying agent's illegal speech skills according to claim 1, wherein said inputting the agent's speech skill information to be identified into said agent's illegal speech skills identification model for inference to obtain the probability distribution of target classification includes:
    将待识别的坐席话术信息转换成文字信息;Convert the agent speech information to be recognized into text information;
    对所述文字信息进行脱敏处理;Desensitize the text information;
    将脱敏处理后的所述文字信息输入到所述坐席违规话术识别模型,得到若干个目标分类的概率分布。The desensitized text information is input into the agent illegal speech recognition model to obtain probability distributions of several target categories.
  7. 根据权利要求1所述的坐席违规话术识别方法,其中,所述根据所述目标分类的概率分布确定所述待识别的坐席话术信息中的违规话术,包括:The method for identifying illegal speech skills of agents according to claim 1, wherein determining the illegal speech skills in the agent speech skills to be identified according to the probability distribution of the target classification includes:
    将所述目标分类与预设分类进行匹配,并将匹配成功的所述目标分类作为待识别分类;Match the target category with a preset category, and use the successfully matched target category as the category to be identified;
    当所述待识别分类对应的概率值超过对应的预设分类的概率阈值,确定所述待识别的坐席话术信息中包含所述待识别分类的违规话术。When the probability value corresponding to the category to be identified exceeds the probability threshold of the corresponding preset category, it is determined that the agent speech information to be identified contains the illegal speech of the category to be identified.
  8. 一种坐席违规话术识别装置,其中,包括:A device for identifying illegal speech techniques by agents, which includes:
    预处理单元,用于获取训练用的坐席话术信息,将所述训练用的坐席话术信息拆分得到坐席侧单句并对所述坐席侧单句进行文本预处理;A preprocessing unit, used to obtain the agent's speech information for training, split the agent's speech information for training into single sentences on the agent's side, and perform text preprocessing on the single sentences on the agent's side;
    训练单元,用于基于预处理后的坐席侧单句,使用三层BERT模型进行训练,得到坐席违规话术识别模型;The training unit is used to train using the three-layer BERT model based on the pre-processed agent-side single sentences to obtain an agent illegal speech recognition model;
    处理单元,用于将待识别的坐席话术信息输入到所述坐席违规话术识别模型进行推理,得到目标分类的概率分布,其中,所述待识别的坐席话术信息是信用卡服销场景下的坐席话术信息;The processing unit is used to input the agent speech information to be identified into the agent illegal speech recognition model for inference and obtain the probability distribution of the target classification, wherein the agent speech information to be identified is in the credit card service sales scenario. Information on agent speaking skills;
    识别单元,用于根据所述目标分类的概率分布确定所述待识别的坐席话术信息中的违规话术。An identification unit, configured to determine illegal speech in the agent speech information to be identified according to the probability distribution of the target classification.
  9. 一种电子设备,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现一种坐席违规话术识别方法,其中,所述坐席违规话术识别方法包括:An electronic device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, it implements a method for identifying illegal speech skills by agents, wherein , the method for identifying agents’ illegal speech skills includes:
    获取训练用的坐席话术信息,将所述训练用的坐席话术信息拆分得到坐席侧单句并对所述坐席侧单句进行文本预处理;Obtain the agent's speech information for training, split the agent's speech information for training into single sentences on the agent's side, and perform text preprocessing on the single sentences on the agent's side;
    基于预处理后的坐席侧单句,使用三层BERT模型进行训练,得到坐席违规话术识别模型;Based on the preprocessed agent-side single sentences, a three-layer BERT model is used for training to obtain an agent illegal speech recognition model;
    将待识别的坐席话术信息输入到所述坐席违规话术识别模型进行推理,得到目标分类的概率分布,其中,所述待识别的坐席话术信息是信用卡服销场景下的坐席话术信息;The agent's speech information to be identified is input into the agent's illegal speech recognition model for inference to obtain the probability distribution of the target classification, where the agent's speech information to be identified is the agent's speech information in the credit card service sales scenario. ;
    根据所述目标分类的概率分布确定所述待识别的坐席话术信息中的违规话术。The illegal speech in the agent speech information to be identified is determined according to the probability distribution of the target classification.
  10. 根据权利要求9所述的电子设备,其中,所述将所述训练用的坐席话术信息拆分得到坐席侧单句,包括:The electronic device according to claim 9, wherein said splitting the agent's speech information for training to obtain agent-side single sentences includes:
    对所述训练用的坐席话术信息进行标注;Annotate the agent speech information used for training;
    根据标注对所述坐席话术信息进行句子拆分,拆分得到多个语音单句;Split the sentences of the agent's speech information according to the annotations, and split them to obtain multiple single speech sentences;
    对所述多个语音单句进行文字转换,得到以文字方式表示的坐席侧单句。Perform text conversion on the plurality of voice single sentences to obtain seat-side single sentences expressed in text.
  11. 根据权利要求10所述的电子设备,其中,所述对所述训练用的坐席话术信息进行标注,包括:The electronic device according to claim 10, wherein the annotating the agent speech information for training includes:
    根据所述训练用的坐席话术信息中的语音差别和停顿节奏,对所述训练用的坐席话术信息进行标注。The agent's speech information for training is annotated according to the speech differences and pause rhythm in the agent's speech information for training.
  12. 根据权利要求9所述的电子设备,其中,所述对所述坐席侧单句进行文本预处理,包括:The electronic device according to claim 9, wherein the text preprocessing of the agent-side single sentence includes:
    根据预设敏感字库或预设敏感字判断规则,对所述坐席侧单句中的敏感字进行脱敏处理;Desensitize the sensitive words in the single sentence on the seat side according to the preset sensitive word library or the preset sensitive word judgment rules;
    对脱敏处理后的所述坐席侧单句进行随机遮罩处理,得到预处理后的坐席侧单句。The desensitized single sentence on the seat side is randomly masked to obtain the preprocessed single sentence on the seat side.
  13. 根据权利要求9所述的电子设备,其中,所述三层BERT模型包括第一层BERT模型、第二层BERT模型、第三层BERT模型、全连接层、卷积层和分类层,所述第一层BERT模型、所述第二层BERT模型和所述第三层BERT模型堆叠,所述第一层BERT模型、所述第二层BERT模型和所述第三层BERT模型的隐层分别输出一个CLS向量到所述全连接层,所述全连接层、所述卷积层和所述分类层依次连接,所述分类层的输出作为所述三层BERT模型的输出;其中每个所述CLS向量代表所述预处理后的坐席侧单句每一次通过其中一层BERT模型进行信息抽取后,每个句子所包含的信息。The electronic device according to claim 9, wherein the three-layer BERT model includes a first-layer BERT model, a second-layer BERT model, a third-layer BERT model, a fully connected layer, a convolution layer and a classification layer, and The first layer BERT model, the second layer BERT model and the third layer BERT model are stacked, and the hidden layers of the first layer BERT model, the second layer BERT model and the third layer BERT model are respectively Output a CLS vector to the fully connected layer, the fully connected layer, the convolution layer and the classification layer are connected in sequence, and the output of the classification layer is used as the output of the three-layer BERT model; where each The CLS vector represents the information contained in each sentence after each preprocessed agent-side single sentence is extracted through one of the layers of the BERT model.
  14. 根据权利要求9所述的电子设备,其中,所述将待识别的坐席话术信息输入到所述坐席违规话术识别模型进行推理,得到目标分类的概率分布,包括:The electronic device according to claim 9, wherein the input of the agent's speech skills to be identified to the agent's illegal speech skills identification model for inference to obtain a probability distribution of target classification includes:
    将待识别的坐席话术信息转换成文字信息;Convert the agent speech information to be recognized into text information;
    对所述文字信息进行脱敏处理;Desensitize the text information;
    将脱敏处理后的所述文字信息输入到所述坐席违规话术识别模型,得到若干个目标分类的概率分布。The desensitized text information is input into the agent illegal speech recognition model to obtain probability distributions of several target categories.
  15. 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序用于执行一种坐席违规话术识别方法,其中,所述坐席违规话术识别方法包括:A computer-readable storage medium stores a computer program, wherein the computer program is used to execute a method for identifying an agent's illegal speech skills, wherein the method for identifying an agent's illegal speech skills includes:
    获取训练用的坐席话术信息,将所述训练用的坐席话术信息拆分得到坐席侧单句并对所述坐席侧单句进行文本预处理;Obtain the agent's speech information for training, split the agent's speech information for training into single sentences on the agent's side, and perform text preprocessing on the single sentences on the agent's side;
    基于预处理后的坐席侧单句,使用三层BERT模型进行训练,得到坐席违规话术识别模型;Based on the preprocessed agent-side single sentences, a three-layer BERT model is used for training to obtain an agent illegal speech recognition model;
    将待识别的坐席话术信息输入到所述坐席违规话术识别模型进行推理,得到目标分类的概率分布,其中,所述待识别的坐席话术信息是信用卡服销场景下的坐席话术信息;The agent's speech information to be identified is input into the agent's illegal speech recognition model for inference to obtain the probability distribution of the target classification, where the agent's speech information to be identified is the agent's speech information in the credit card service sales scenario. ;
    根据所述目标分类的概率分布确定所述待识别的坐席话术信息中的违规话术。The illegal speech in the agent speech information to be identified is determined according to the probability distribution of the target classification.
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述将所述训练用的坐席话术信息拆分得到坐席侧单句,包括:The computer-readable storage medium according to claim 15, wherein said splitting the agent's speech information for training to obtain agent-side single sentences includes:
    对所述训练用的坐席话术信息进行标注;Annotate the agent speech information used for training;
    根据标注对所述坐席话术信息进行句子拆分,拆分得到多个语音单句;Split the sentences of the agent's speech information according to the annotations, and split them to obtain multiple single speech sentences;
    对所述多个语音单句进行文字转换,得到以文字方式表示的坐席侧单句。Perform text conversion on the plurality of voice single sentences to obtain seat-side single sentences expressed in text.
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述对所述训练用的坐席话术信息进行标注,包括:The computer-readable storage medium according to claim 16, wherein the annotating the agent speech information for training includes:
    根据所述训练用的坐席话术信息中的语音差别和停顿节奏,对所述训练用的坐席话术信息进行标注。The agent's speech information for training is annotated according to the speech differences and pause rhythm in the agent's speech information for training.
  18. 根据权利要求15所述的计算机可读存储介质,其中,所述对所述坐席侧单句进行文本预处理,包括:The computer-readable storage medium according to claim 15, wherein the text preprocessing of the single sentence on the agent side includes:
    根据预设敏感字库或预设敏感字判断规则,对所述坐席侧单句中的敏感字进行脱敏处理;Desensitize the sensitive words in the single sentence on the seat side according to the preset sensitive word library or the preset sensitive word judgment rules;
    对脱敏处理后的所述坐席侧单句进行随机遮罩处理,得到预处理后的坐席侧单句。The desensitized single sentence on the seat side is randomly masked to obtain the preprocessed single sentence on the seat side.
  19. 根据权利要求15所述的计算机可读存储介质,其中,所述三层BERT模型包括第一层BERT模型、第二层BERT模型、第三层BERT模型、全连接层、卷积层和分类层,所述第一层BERT模型、所述第二层BERT模型和所述第三层BERT模型堆叠,所述第一层BERT模型、所述第二层BERT模型和所述第三层BERT模型的隐层分别输出一个CLS向量到所述全连接层,所述全连接层、所述卷积层和所述分类层依次连接,所述分类层的输出作为所述三层BERT模型的输出;其中每个所述CLS向量代表所述预处理后的坐席侧单句每一次通过其中一层BERT模型进行信息抽取后,每个句子所包含的信息。The computer-readable storage medium according to claim 15, wherein the three-layer BERT model includes a first-layer BERT model, a second-layer BERT model, a third-layer BERT model, a fully connected layer, a convolution layer and a classification layer , the first layer BERT model, the second layer BERT model and the third layer BERT model are stacked, the first layer BERT model, the second layer BERT model and the third layer BERT model are The hidden layer respectively outputs a CLS vector to the fully connected layer, the fully connected layer, the convolution layer and the classification layer are connected in sequence, and the output of the classification layer is used as the output of the three-layer BERT model; where Each CLS vector represents the information contained in each sentence after the preprocessed single sentence on the agent side is extracted through one of the layers of the BERT model.
  20. 根据权利要求15所述的计算机可读存储介质,其中,所述将待识别的坐席话术信息输入到所述坐席违规话术识别模型进行推理,得到目标分类的概率分布,包括:The computer-readable storage medium according to claim 15, wherein the input of the agent's speech skills to be identified to the agent's illegal speech skills identification model for inference to obtain the probability distribution of the target classification includes:
    将待识别的坐席话术信息转换成文字信息;Convert the agent speech information to be recognized into text information;
    对所述文字信息进行脱敏处理;Desensitize the text information;
    将脱敏处理后的所述文字信息输入到所述坐席违规话术识别模型,得到若干个目标分类的概率分布。The desensitized text information is input into the agent illegal speech recognition model to obtain probability distributions of several target categories.
PCT/CN2022/090717 2022-03-15 2022-04-29 Inappropriate agent language identification method and apparatus, electronic device and storage medium WO2023173554A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210252453.8A CN114610887A (en) 2022-03-15 2022-03-15 Seat illegal speech recognition method and device, electronic equipment and storage medium
CN202210252453.8 2022-03-15

Publications (1)

Publication Number Publication Date
WO2023173554A1 true WO2023173554A1 (en) 2023-09-21

Family

ID=81862285

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/090717 WO2023173554A1 (en) 2022-03-15 2022-04-29 Inappropriate agent language identification method and apparatus, electronic device and storage medium

Country Status (2)

Country Link
CN (1) CN114610887A (en)
WO (1) WO2023173554A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117079640A (en) * 2023-10-12 2023-11-17 深圳依时货拉拉科技有限公司 Voice monitoring method, device, computer equipment and computer readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110288192A (en) * 2019-05-23 2019-09-27 平安科技(深圳)有限公司 Quality detecting method, device, equipment and storage medium based on multiple Checking models
CN111191030A (en) * 2019-12-20 2020-05-22 北京淇瑀信息科技有限公司 Single sentence intention identification method, device and system based on classification
CN111641757A (en) * 2020-05-15 2020-09-08 北京青牛技术股份有限公司 Real-time quality inspection and auxiliary speech pushing method for seat call
CN112671985A (en) * 2020-12-22 2021-04-16 平安普惠企业管理有限公司 Agent quality inspection method, device, equipment and storage medium based on deep learning
WO2022048173A1 (en) * 2020-09-04 2022-03-10 平安科技(深圳)有限公司 Artificial intelligence-based customer intent identification method and apparatus, device, and medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110288192A (en) * 2019-05-23 2019-09-27 平安科技(深圳)有限公司 Quality detecting method, device, equipment and storage medium based on multiple Checking models
CN111191030A (en) * 2019-12-20 2020-05-22 北京淇瑀信息科技有限公司 Single sentence intention identification method, device and system based on classification
CN111641757A (en) * 2020-05-15 2020-09-08 北京青牛技术股份有限公司 Real-time quality inspection and auxiliary speech pushing method for seat call
WO2022048173A1 (en) * 2020-09-04 2022-03-10 平安科技(深圳)有限公司 Artificial intelligence-based customer intent identification method and apparatus, device, and medium
CN112671985A (en) * 2020-12-22 2021-04-16 平安普惠企业管理有限公司 Agent quality inspection method, device, equipment and storage medium based on deep learning

Also Published As

Publication number Publication date
CN114610887A (en) 2022-06-10

Similar Documents

Publication Publication Date Title
US20230016365A1 (en) Method and apparatus for training text classification model
US11893345B2 (en) Inducing rich interaction structures between words for document-level event argument extraction
WO2021121198A1 (en) Semantic similarity-based entity relation extraction method and apparatus, device and medium
WO2021042904A1 (en) Conversation intention recognition method, apparatus, computer device, and storage medium
CN110727779A (en) Question-answering method and system based on multi-model fusion
CN111143576A (en) Event-oriented dynamic knowledge graph construction method and device
CN110321563B (en) Text emotion analysis method based on hybrid supervision model
CN113268609B (en) Knowledge graph-based dialogue content recommendation method, device, equipment and medium
US11720759B2 (en) Electronic apparatus, controlling method of thereof and non-transitory computer readable recording medium
WO2018045646A1 (en) Artificial intelligence-based method and device for human-machine interaction
CN110502610A (en) Intelligent sound endorsement method, device and medium based on text semantic similarity
CN111563158B (en) Text ranking method, ranking apparatus, server and computer-readable storage medium
CN111222330B (en) Chinese event detection method and system
CN112926308B (en) Method, device, equipment, storage medium and program product for matching text
WO2021129411A1 (en) Text processing method and device
CN113761190A (en) Text recognition method and device, computer readable medium and electronic equipment
Li et al. Intention understanding in human–robot interaction based on visual-NLP semantics
CN113705191A (en) Method, device and equipment for generating sample statement and storage medium
CN110781666A (en) Natural language processing text modeling based on generative countermeasure networks
CN112036186A (en) Corpus labeling method and device, computer storage medium and electronic equipment
CN116821307B (en) Content interaction method, device, electronic equipment and storage medium
WO2023173554A1 (en) Inappropriate agent language identification method and apparatus, electronic device and storage medium
CN111767720B (en) Title generation method, computer and readable storage medium
CN116757195B (en) Implicit emotion recognition method based on prompt learning
CN113705207A (en) Grammar error recognition method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22931581

Country of ref document: EP

Kind code of ref document: A1