CN112995414A - Behavior quality inspection method, device, equipment and storage medium based on voice call - Google Patents

Behavior quality inspection method, device, equipment and storage medium based on voice call Download PDF

Info

Publication number
CN112995414A
CN112995414A CN202110220308.7A CN202110220308A CN112995414A CN 112995414 A CN112995414 A CN 112995414A CN 202110220308 A CN202110220308 A CN 202110220308A CN 112995414 A CN112995414 A CN 112995414A
Authority
CN
China
Prior art keywords
feature
text
audio
weight
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110220308.7A
Other languages
Chinese (zh)
Other versions
CN112995414B (en
Inventor
龚天伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Puhui Enterprise Management Co Ltd
Original Assignee
Ping An Puhui Enterprise Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Puhui Enterprise Management Co Ltd filed Critical Ping An Puhui Enterprise Management Co Ltd
Priority to CN202110220308.7A priority Critical patent/CN112995414B/en
Publication of CN112995414A publication Critical patent/CN112995414A/en
Application granted granted Critical
Publication of CN112995414B publication Critical patent/CN112995414B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/22Arrangements for supervision, monitoring or testing
    • H04M3/2227Quality of service monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Abstract

The application discloses a behavior quality detection method, a behavior quality detection device, behavior quality detection equipment and a storage medium based on voice call, and belongs to the technical field of artificial intelligence. In addition, the application also relates to a block chain technology, and the call record can be stored in the block chain. The voice call behavior quality inspection method and the voice call behavior quality inspection system can achieve behavior quality inspection of the voice call from two dimensions of intention and emotion, and can timely and efficiently judge the violation behaviors of customer service staff and potential customer complaints.

Description

Behavior quality inspection method, device, equipment and storage medium based on voice call
Technical Field
The application belongs to the technical field of artificial intelligence, and particularly relates to a behavior quality inspection method, device, equipment and storage medium based on voice call.
Background
Customer service, i.e. customer service for short, mainly used for embodying a customer satisfaction oriented value view, it integrates and manages at the preset optimum cost, all elements of customer interface in the service combination, in broad sense, any content capable of improving customer satisfaction belongs to the customer service scope, the speech recognition technology aims at converting the vocabulary content in human speech into computer readable input, the speech recognition technology relates to the fields of signal processing, pattern recognition, probability theory and information theory, sound production mechanism and auditory mechanism, artificial intelligence, etc., as the key technology of human-computer interaction, the speech recognition technology is in the call center, the system has wide application in telecom value-added, enterprise informatization systems, intelligent robots, intelligent outbound, intelligent vehicle-mounted and other application systems, and the quality inspection analysis engine performs quality inspection and quality control on voice of a customer service seat in a telephone customer service center or a call center.
The existing customer service call center still adopts a manual quality inspection form, quality inspection and control are carried out on the voice of a customer service seat by a telephone customer service center or a call center manually, subjective judgment is easy to generate by a manual quality inspection mode, quality inspection repeatability works to make quality inspectors tired, the operation efficiency is relatively low, timely analysis and mining on information leaked by users in the voice are not facilitated, the real-time performance and accuracy of quality inspection are poor, violation behaviors of the customer service personnel and potential customer complaints are difficult to find in a short time, and further commercial value hidden in user information is not facilitated to further find.
Disclosure of Invention
The embodiment of the application aims to provide a behavior quality inspection method, a behavior quality inspection device, computer equipment and a storage medium based on voice call, so as to solve the technical problems of poor real-time performance and poor accuracy of the existing customer service behavior detection mode of manual quality inspection.
In order to solve the above technical problem, an embodiment of the present application provides a behavior quality inspection method based on voice call, which adopts the following technical scheme:
a behavior quality inspection method based on voice call comprises the following steps:
acquiring a training sample from a preset database, and respectively acquiring text information and audio information of the training sample;
extracting text features from the text information of the training samples, and extracting audio features from the audio information of the training samples;
calculating the feature weights of the text features and the audio features based on a preset feature weight algorithm, and combining the feature weights based on a preset combination strategy to obtain a feature weight combination;
the text feature, the audio feature and the feature weight combination are led into a preset initial detection model for training to obtain a behavior detection model;
and receiving the behavior detection instruction, acquiring a call record corresponding to the behavior detection instruction, importing the call record into the trained behavior detection model, and outputting a behavior detection result.
Further, the method comprises the steps of calculating feature weights of text features and audio features based on a preset feature weight algorithm, and combining the feature weights based on a preset combination strategy to obtain a feature weight combination, wherein the step of combining the feature weights specifically comprises the following steps:
assigning the same initial weight to the text feature and the audio feature;
calculating the feature weight of the text feature based on a preset feature weight algorithm to obtain a text feature weight, and calculating the feature weight of the audio feature based on a preset feature weight algorithm to obtain an audio feature weight;
and combining the text characteristic weight and the audio characteristic weight based on a preset combination strategy to obtain a characteristic weight combination.
Further, the preset feature weight algorithm is a Relief algorithm, and the step of calculating the feature weight of the text feature based on the preset feature weight algorithm to obtain the text feature weight specifically includes:
classifying the text features given with the initial weight to obtain a plurality of text feature combinations;
calculating the similarity of the text features in the same text feature combination to obtain a first similarity;
calculating the similarity of text features among different text feature combinations to obtain a second similarity;
and adjusting the initial weight of the text features based on the first similarity and the second similarity to obtain the feature weight of each text feature.
Further, the preset feature weight algorithm is a Relief algorithm, and the step of calculating the feature weight of the audio feature based on the preset feature weight algorithm to obtain the audio feature weight specifically includes:
classifying the audio features given with the initial weight to obtain a plurality of audio feature combinations;
calculating the similarity of the audio features in the same audio feature combination to obtain a third similarity;
calculating the similarity of the audio features among different audio feature combinations to obtain a fourth similarity;
and adjusting the initial weight of the audio features based on the third similarity and the fourth similarity to obtain the feature weight of each audio feature.
Further, the initial detection model comprises an input layer, a convolutional layer and an output layer, and the step of combining and importing the text features, the audio features and the feature weights into a preset initial detection model for training to obtain the behavior detection model specifically comprises the following steps:
respectively importing the text features and the audio features into an input layer of a preset initial detection model, and importing the feature weight combination into an output layer of the preset initial detection model;
acquiring initial feature vectors of text features and audio features through an input layer, performing convolution operation on the initial feature vectors through a convolution layer to obtain an initial feature matrix, and integrating the initial feature matrix based on feature weight combination in an output layer to obtain a feature matrix;
and adjusting the parameters of a preset initial detection model based on the characteristic matrix to obtain a behavior detection model.
Further, the method comprises the steps of respectively obtaining initial feature vectors of text features and audio features through an input layer, carrying out convolution operation on the initial feature vectors through a convolution layer to obtain an initial feature matrix, and integrating the initial feature matrix based on feature weight combination in an output layer to obtain a feature matrix, and specifically comprises the following steps:
respectively carrying out vector feature conversion on the text features and the audio features through an input layer to obtain initial feature vectors of the text features and the initial feature vectors of the audio features;
performing convolution operation on the initial characteristic vector of the text characteristic and the initial characteristic vector of the audio characteristic through the convolution layer to obtain an initial characteristic matrix of the text characteristic and an initial characteristic matrix of the audio characteristic;
and carrying out matrix splicing on the initial feature matrix of the text features and the initial feature matrix of the audio features in an output layer based on the feature weight combination to obtain a feature matrix.
Further, after the step of adjusting the preset parameters of the initial detection model based on the feature matrix to obtain the behavior detection model, the method further includes:
obtaining a verification sample from a preset database, importing the verification sample into a behavior detection model, and outputting a verification result;
fitting by using a back propagation algorithm based on the verification result and a preset standard result to obtain a detection error;
comparing the detection error with a preset error threshold, and if the detection error is larger than the preset error threshold, iteratively updating the behavior detection model until the detection error is smaller than or equal to the preset error threshold;
and outputting the behavior detection model with the detection error smaller than or equal to a preset error threshold.
In order to solve the above technical problem, an embodiment of the present application further provides a behavior quality inspection device based on voice call, which adopts the following technical scheme:
a behavior quality inspection device based on voice call comprises:
the information extraction module is used for acquiring training samples from a preset database and respectively acquiring text information and audio information of the training samples;
the feature extraction module is used for extracting text features from the text information of the training samples and extracting audio features from the audio information of the training samples;
the weight calculation module is used for calculating the feature weights of the text features and the audio features based on a preset feature weight algorithm, and combining the feature weights based on a preset combination strategy to obtain a feature weight combination;
the model training module is used for leading the combination of the text characteristics, the audio characteristics and the characteristic weights into a preset initial detection model for training to obtain a behavior detection model;
and the behavior detection module is used for receiving the behavior detection instruction, acquiring the call record corresponding to the behavior detection instruction, importing the call record into the trained behavior detection model, and outputting a behavior detection result.
In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:
an apparatus comprising a memory having computer readable instructions stored therein and a processor that when executed implements the steps of the voice call based behavioral quality inspection method as described above.
In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:
a computer readable storage medium having computer readable instructions stored thereon which, when executed by a processor, implement the steps of the voice call based behavioral quality inspection method as described above.
Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:
the application discloses a behavior quality inspection method, a device, equipment and a storage medium based on voice call, belonging to the technical field of artificial intelligence, by extracting text characteristics of training samples and extracting audio characteristics of the training samples, calculating the feature weights of the text features and the audio features based on a preset feature weight algorithm to obtain a feature weight combination, training a behavior detection model by using the text features, the audio features and the feature weight combination, the behavior detection model comprises a call intention recognition module and an emotion recognition module, can respectively recognize the intention and emotion of both parties of a call, the intention and emotion of the customer service and the client are identified in real time through the trained behavior detection model, and judging whether the customer service personnel have illegal behaviors and whether the customers have potential complaints according to the intention recognition result and the emotion recognition result. The technical scheme of the proposal can realize the behavior quality inspection of the voice call from two dimensions of intention and emotion, and can judge the violation behaviors of the customer service staff and potential customer complaints in time and efficiently.
Drawings
In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.
FIG. 1 illustrates an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 illustrates a flow diagram of one embodiment of a voice call based behavioral quality inspection method in accordance with the present application;
FIG. 3 is a schematic diagram illustrating an embodiment of a voice call based performance quality testing apparatus according to the present application;
FIG. 4 shows a schematic block diagram of one embodiment of a computer device according to the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that the voice call based behavior quality inspection method provided in the embodiment of the present application is generally executed by a server, and accordingly, the voice call based behavior quality inspection apparatus is generally disposed in the server.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continuing reference to FIG. 2, a flow diagram of one embodiment of a method for voice call based behavioral quality inspection in accordance with the present application is shown. The behavior quality inspection method based on the voice call comprises the following steps:
s201, training samples are obtained from a preset database, and text information and audio information of the training samples are obtained respectively.
Specifically, a training sample is obtained from a preset database, and text information and audio information of the training sample are respectively obtained. The preset database is used for storing training samples, and the training samples are historical call records of customer service and customers. The method comprises the steps of obtaining training samples from a preset database, labeling the training samples, randomly combining the labeled training samples to obtain a training sample set and a verification sample set, storing the training sample set and the verification sample set into the preset database, wherein the training sample set is used for model training, and the verification sample set is used for model verification.
S202, extracting text features from the text information of the training samples, and extracting audio features from the audio information of the training samples.
Specifically, the training samples are historical call records of customer service and clients, voice-to-text processing is performed on the training samples to obtain text information of the training samples, preprocessing is performed on the text information of the training samples, wherein the text preprocessing comprises weight checking, error correcting, punctuation mark removing and the like, word segmentation processing and serialization processing are performed on the preprocessed text information to obtain text characteristics corresponding to the training samples. Extracting audio information of the training samples, preprocessing the audio information of the training samples, wherein the audio preprocessing comprises noise reduction, framing, windowing and the like, and performing serialization processing on the preprocessed audio information to obtain audio features of the training samples. In a specific embodiment of the application, the behavior detection model comprises two modules, namely an intention recognition module and an emotion recognition module, wherein the text feature is used for training the intention recognition module, and the audio feature is used for training the emotion recognition module.
S203, calculating the feature weights of the text feature and the audio feature based on a preset feature weight algorithm, and combining the feature weights based on a preset combination strategy to obtain a feature weight combination.
The preset Feature weight algorithm may be a Feature weight (Feature weighting algorithms), different weights are given to features according to the relevance of each Feature and category, the features with the weights smaller than a certain threshold are removed, and the relevance of the features and the categories in the Feature weight algorithm is based on the distinguishing capability of the features on the short-distance samples. The running time of the Relief algorithm increases linearly with the sampling times of the samples and the number of the original characteristics, so that the running efficiency is very high.
Specifically, feature weights of the text features and the audio features are calculated based on a preset feature weight algorithm, and the feature weights are combined based on a preset combination strategy to obtain a feature weight combination. The feature weight of a certain text feature reflects the influence degree of the text feature on the intention recognition result, and in the intention recognition process, the contribution size made by each text feature can be determined through the feature weight combination of the text feature. Correspondingly, the characteristic weight of one audio characteristic reflects the influence degree of the audio characteristic on the emotion recognition result, and the contribution of each audio characteristic can be determined through the characteristic weight combination of the audio characteristic in the emotion recognition process. In a specific embodiment of the present application, combining the feature weights based on a preset combination strategy specifically includes sorting the feature weights, and combining the feature weights based on a sorting result of the feature weights to obtain a feature weight combination of the training sample. It should be noted that, in another specific embodiment of the present application, a combination policy may also be set according to actual requirements to combine the feature weights, and the present application is not limited herein.
And S204, combining and importing the text features, the audio features and the feature weights into a preset initial detection model for training to obtain a behavior detection model.
The preset initial detection model adopts a deep Convolutional Neural network model, and a Convolutional Neural Network (CNN) is a feed forward Neural network (fed Neural network) containing convolution calculation and having a deep structure, and is one of the representative algorithms of deep learning (deep learning). Convolutional neural networks have a feature learning (representation learning) capability, and can perform shift-invariant classification (shift-invariant classification) on input information according to a hierarchical structure thereof, and are also called "shift-invariant artificial neural networks". The convolutional neural network is constructed by imitating a visual perception (visual perception) mechanism of a living being, can perform supervised learning and unsupervised learning, and has stable effect and no additional characteristic engineering requirement on data, and the convolutional kernel parameter sharing in a convolutional layer and the sparsity of interlayer connection enable the convolutional neural network to learn grid-like topology (pixels and audio) features with small calculation amount.
Specifically, the text features, the audio features and the feature weight combinations are led into a preset initial detection model for training to obtain a behavior detection model. The behavior detection model can adopt a double-model structure, namely an intention recognition model and an emotion recognition model, at the moment, the intention recognition model and the emotion recognition model are respectively provided with a three-layer structure, namely an input layer, a convolution layer and an output layer, and the input layer, the convolution layer and the output layer of the intention recognition model and the emotion recognition model are mutually independent. The behavior detection model can also adopt a single model structure, and at the moment, two types of convolution kernels are arranged in a convolution layer of the behavior detection model, namely a text convolution kernel for performing convolution operation on the text characteristic vector and an audio convolution kernel for performing convolution operation on the audio characteristic vector.
S205, receiving the behavior detection instruction, obtaining a call record corresponding to the behavior detection instruction, importing the call record into a trained behavior detection model, and outputting a behavior detection result.
Specifically, when the behavior detection instruction is received, a call record corresponding to the behavior detection instruction is obtained in real time, the call record is led into a trained behavior detection model, an intention recognition result and an emotion recognition result of the call record are obtained, whether violation behaviors exist in customer service staff and whether potential complaint behaviors exist in customers are judged based on the intention recognition result and the emotion recognition result, the judgment result is used as a behavior detection result, and the behavior detection result is output.
In this embodiment, the electronic device (for example, the server shown in fig. 1) on which the voice call based behavior quality inspection method operates may receive the behavior detection instruction through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.
The application discloses a behavior quality inspection method based on voice call, which belongs to the technical field of artificial intelligence, and is characterized in that text features of training samples and audio features of the training samples are extracted, the feature weights of the text features and the audio features are calculated based on a preset feature weight algorithm to obtain a feature weight combination, a behavior detection model is trained by using the text features, the audio features and the feature weight combination, the behavior detection model comprises a call intention recognition module and an emotion recognition module, the intention and the emotion of two parties of the call can be recognized respectively, the intention and the emotion of a customer service and a customer are recognized in real time through the trained behavior detection model, and whether violation behaviors exist in the customer service and whether potential complaint behaviors exist in the customer are judged according to intention recognition results and emotion recognition results. The technical scheme of the proposal can realize the behavior quality inspection of the voice call from two dimensions of intention and emotion, and can judge the violation behaviors of the customer service staff and potential customer complaints in time and efficiently.
Further, the method comprises the steps of calculating feature weights of text features and audio features based on a preset feature weight algorithm, and combining the feature weights based on a preset combination strategy to obtain a feature weight combination, wherein the step of combining the feature weights specifically comprises the following steps:
assigning the same initial weight to the text feature and the audio feature;
calculating the feature weight of the text feature based on a preset feature weight algorithm to obtain a text feature weight, and calculating the feature weight of the audio feature based on a preset feature weight algorithm to obtain an audio feature weight;
and combining the text characteristic weight and the audio characteristic weight based on a preset combination strategy to obtain a characteristic weight combination.
The preset feature weight algorithm can be a Relief algorithm, the Relief algorithm gives different weights to features according to the relevance of each feature and each category, the features with the weights smaller than a certain threshold value are removed, and the relevance of the features and the categories in the Relief algorithm is based on the distinguishing capability of the features on the close-range samples.
Specifically, each text feature and each audio feature obtain an initial weight by assigning the same initial weight to the text feature and the audio feature, then each text feature and each audio feature obtain an initial weight by a Relief algorithm to be adjusted to obtain a text feature weight and an audio feature weight, then the text feature weight and the audio feature weight are ranked, and the text feature weight and the audio feature weight are combined based on the ranking result to obtain a feature weight combination.
In the embodiment, the feature weight of each text feature and the feature weight of each audio feature are respectively calculated through the Relief algorithm, the feature weight corresponding to each text feature and the feature weight corresponding to each audio feature are obtained, the running time of the Relief algorithm is increased linearly along with the increase of the sampling times and the number of original features of the samples, so that the running efficiency is very high, the feature weight corresponding to each text feature and the feature weight corresponding to each audio feature can be quickly calculated, and the subsequent processing is facilitated.
Further, based on a preset feature weight algorithm, calculating a feature weight of the text feature to obtain a text feature weight, specifically including:
classifying the text features given with the initial weight to obtain a plurality of text feature combinations;
calculating the similarity of the text features in the same text feature combination to obtain a first similarity;
calculating the similarity of text features among different text feature combinations to obtain a second similarity;
and adjusting the initial weight of the text features based on the first similarity and the second similarity to obtain the feature weight of each text feature.
Further, based on a preset feature weight algorithm, calculating a feature weight of the audio feature to obtain an audio feature weight, specifically including:
classifying the audio features given with the initial weight to obtain a plurality of audio feature combinations;
calculating the similarity of the audio features in the same audio feature combination to obtain a third similarity;
calculating the similarity of the audio features among different audio feature combinations to obtain a fourth similarity;
and adjusting the initial weight of the audio features based on the third similarity and the fourth similarity to obtain the feature weight of each audio feature.
The Relief algorithm randomly selects a sample R from any one feature combination D, then finds a sample H nearest to the sample R from the D, the sample H is called Near Hit, finds a sample M nearest to the sample R from other feature combinations, the sample M is called Near Miss, and then updates the weight of each feature according to the following rule: if the distance between R and H on a certain feature is smaller than the distance between R and M, wherein the distance is the similarity between the two features, the feature is beneficial to distinguishing the nearest neighbors of the same class and different classes, and the weight of the feature is increased; conversely, if R and H are greater in distance from a feature than R and M, indicating that the feature is negatively contributing to distinguishing between homogeneous and heterogeneous nearest neighbors, the weight of the feature is reduced. Repeating the above processes m times to finally obtain the average weight of each feature, wherein the larger the weight of the feature is, the stronger the classification capability of the feature is, and conversely, the weaker the classification capability of the feature is. The running time of the Relief algorithm is increased linearly along with the increase of the sampling times m of the samples and the number N of the original features, so that the running efficiency is very high.
Specifically, the text features with the initial weight are classified to obtain a plurality of text feature combinations, the similarity of the text features in the same text feature combination is calculated to obtain a first similarity, the similarity of the text features between different text feature combinations is calculated to obtain a second similarity, and the initial weight of the text features is adjusted based on the first similarity and the second similarity to obtain the feature weight of each text feature. For example, the initial weight of the text feature a is 0.5, the text feature a is one text feature in a text feature combination a, the similarity between the text feature a and other text features is calculated in the text feature combination a, and the minimum value of the calculated similarity is used as a first similarity, for example, the first similarity is 0.4. And calculating the similarity of the text feature a and the text features of different text feature combinations, and taking the minimum value of the calculated similarity as a second similarity, such as 0.2 of the second similarity. It can be seen that, for the text feature a, the calculated first similarity is greater than the second similarity, i.e. the text feature a is beneficial to distinguish homogeneous features from heterogeneous features, so that the weight of the text feature a is increased, for example, the weight of the text feature a is adjusted from 0.5 to 0.6. In addition, the audio features with the initial weight are classified to obtain a plurality of audio feature combinations, the similarity of the audio features in the same audio feature combination is calculated to obtain a third similarity, the similarity of the audio features among different audio feature combinations is calculated to obtain a fourth similarity, and the initial weight of the audio features is adjusted based on the third similarity and the fourth similarity to obtain the feature weight of each audio feature. The method for adjusting the initial weight of the audio feature is the same as the method for adjusting the initial weight of the text feature, and is not described herein again.
In the above embodiment, the feature weight of each text feature and the feature weight of each audio feature may be obtained by calculating the sample feature similarity, comparing the similarity calculation results, and adjusting the initial weight of each text feature and the initial weight of each audio feature through similarity comparison.
Further, the initial detection model comprises an input layer, a convolutional layer and an output layer, and the step of combining and importing the text features, the audio features and the feature weights into a preset initial detection model for training to obtain the behavior detection model specifically comprises the following steps:
respectively importing the text features and the audio features into an input layer of a preset initial detection model, and importing the feature weight combination into an output layer of the preset initial detection model;
acquiring initial feature vectors of text features and audio features through an input layer, performing convolution operation on the initial feature vectors through a convolution layer to obtain an initial feature matrix, and integrating the initial feature matrix based on feature weight combination in an output layer to obtain a feature matrix;
and adjusting the parameters of a preset initial detection model based on the characteristic matrix to obtain a behavior detection model.
Specifically, the preset initial detection model adopts a CNN deep convolutional neural network model, the initial detection model comprises an input layer, a convolutional layer and an output layer, the obtained text features and audio features are respectively led into the input layer of the preset initial detection model, feature weight combinations are led into the output layer of the preset initial detection model, the initial detection model is trained, initial feature vectors of the text features and the audio features are respectively obtained through the input layer, convolution operation is carried out on the initial feature vectors through the convolutional layer to obtain an initial feature matrix, the initial feature matrices are integrated in the output layer based on the feature weight combinations to obtain a feature matrix, parameters of the preset initial detection model are adjusted based on the feature matrix to obtain the behavior detection model.
In the above embodiment, the behavior detection model is obtained by training the initial detection model with the text feature and the audio feature, and the behavior detection model includes two modules, namely an intention recognition module and an emotion recognition module, respectively, where the text feature is used for training the intention recognition module, and the audio feature is used for training the emotion recognition module. And the intention recognition result and emotion recognition of the behavior detection model are integrated based on the characteristic weight combination of the initial detection model, so that when the model is used for behavior detection, the intention factor and the emotion factor are considered at the same time, and whether the customer service staff has violation behaviors and whether the customer has potential complaint behaviors can be judged more accurately according to the intention recognition result and the emotion recognition result.
Further, the method comprises the steps of respectively obtaining initial feature vectors of text features and audio features through an input layer, carrying out convolution operation on the initial feature vectors through a convolution layer to obtain an initial feature matrix, and integrating the initial feature matrix based on feature weight combination in an output layer to obtain a feature matrix, and specifically comprises the following steps:
respectively carrying out vector feature conversion on the text features and the audio features through an input layer to obtain initial feature vectors of the text features and the initial feature vectors of the audio features;
performing convolution operation on the initial characteristic vector of the text characteristic and the initial characteristic vector of the audio characteristic through the convolution layer to obtain an initial characteristic matrix of the text characteristic and an initial characteristic matrix of the audio characteristic;
and carrying out matrix splicing on the initial feature matrix of the text features and the initial feature matrix of the audio features in an output layer based on the feature weight combination to obtain a feature matrix.
The initial detection model comprises an input layer, a convolutional layer and an output layer. Respectively importing the text features and the audio features into an input layer of an initial detection model, and importing the feature weight combination into an output layer of the initial detection model; respectively acquiring initial feature vectors of text features and audio features through an input layer to obtain initial feature vectors corresponding to the text features and initial feature vectors corresponding to the audio features; performing convolution operation on an initial feature vector row corresponding to each text feature through a text convolution core of the convolution layer to obtain an initial feature matrix corresponding to each text feature, performing convolution operation on an initial feature vector corresponding to each audio feature through an audio convolution core of the convolution layer to obtain an initial feature matrix corresponding to each audio feature, performing matrix splicing on the initial feature matrix corresponding to each text feature based on feature weight combination in an output layer to obtain a text feature matrix, performing matrix splicing on the initial feature matrix corresponding to each audio feature based on feature weight combination in the output layer to obtain a feature matrix of the audio feature, and adjusting parameters of a preset initial detection model based on the text feature and the feature matrix of the audio feature to obtain a behavior detection model.
In the above embodiment, the behavior detection model is obtained by training the initial detection model with the text feature and the audio feature, and the behavior detection model includes two modules, namely an intention recognition module and an emotion recognition module, where the text feature is used for training the intention recognition module, and the audio feature is used for training the emotion recognition module.
Further, after the step of adjusting the preset parameters of the initial detection model based on the feature matrix to obtain the behavior detection model, the method further includes:
obtaining a verification sample from a preset database, importing the verification sample into a behavior detection model, and outputting a verification result;
fitting by using a back propagation algorithm based on the verification result and a preset standard result to obtain a detection error;
comparing the detection error with a preset error threshold, and if the detection error is larger than the preset error threshold, iteratively updating the behavior detection model until the detection error is smaller than or equal to the preset error threshold;
and outputting the behavior detection model with the detection error smaller than or equal to a preset error threshold.
The back propagation algorithm, namely a back propagation algorithm (BP algorithm), is a learning algorithm suitable for a multi-layer neuron network, and is established on the basis of a gradient descent method and used for error calculation of a deep learning network. The input and output relationship of the BP network is essentially a mapping relationship: an n-input m-output BP neural network performs the function of continuous mapping from n-dimensional euclidean space to a finite field in m-dimensional euclidean space, which is highly non-linear. The learning process of the BP algorithm consists of a forward propagation process and a backward propagation process. In the forward propagation process, input information passes through the hidden layer through the input layer, is processed layer by layer and is transmitted to the output layer, the backward propagation is converted, the partial derivatives of the target function to the weight of each neuron are calculated layer by layer, and the gradient of the target function to the weight vector is formed to be used as the basis for modifying the weight.
Specifically, a loss function of the behavior detection model is established, a verification sample is obtained from a preset database, the verification sample is led into the trained behavior detection model for verification, a verification result is output, a detection error is calculated by using the loss function of the behavior detection model based on the verification result and a preset standard result, the detection error is compared with a preset error threshold, if the detection error is larger than the preset error threshold, iterative updating is carried out on the trained behavior detection model based on back propagation until the detection error is smaller than or equal to the preset error threshold, and the verified behavior detection model is obtained. The preset standard result and the preset error threshold value can be set in advance.
In the above embodiment, the loss output by the behavior detection model is obtained through the constructed loss function of the behavior detection model, and the behavior detection model is iteratively updated by using a back propagation algorithm based on the loss output by the behavior detection model, so that a behavior detection model fitted by the model can be obtained, and the detection accuracy of the behavior detection model after the model is fitted is higher.
The application discloses a behavior quality inspection method based on voice call, which belongs to the technical field of artificial intelligence, and comprises the steps of extracting text features of training samples and audio features of the training samples, calculating feature weights of the text features and the audio features based on a preset feature weight algorithm to obtain a feature weight combination, training a behavior detection model by utilizing the text features, the audio features and the feature weight combination, wherein the behavior detection model comprises a call intention recognition module and an emotion recognition module, the intention and the emotion of two parties of the call can be recognized respectively, the intention and the emotion of a customer service and a customer are recognized in real time through the trained behavior detection model, and whether violation behaviors exist in the customer service staff and whether potential complaint behaviors exist in the customer are judged according to intention recognition results and emotion recognition results. The technical scheme of the proposal can realize the behavior quality inspection of the voice call from two dimensions of intention and emotion, and can judge the violation behaviors of the customer service staff and potential customer complaints in time and efficiently.
It is emphasized that, in order to further ensure the privacy and security of the call record, the call record may also be stored in a node of a block chain.
The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a computer readable storage medium, and when executed, can include processes of the embodiments of the methods described above. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a behavior quality inspection apparatus based on voice call, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be applied to various electronic devices.
As shown in fig. 3, the behavior quality inspection device based on voice call according to the present embodiment includes:
the information extraction module 301 is configured to obtain a training sample from a preset database, and respectively obtain text information and audio information of the training sample;
a feature extraction module 302, configured to extract text features from the text information of the training samples, and extract audio features from the audio information of the training samples;
the weight calculation module 303 is configured to calculate feature weights of the text feature and the audio feature based on a preset feature weight algorithm, and combine the feature weights based on a preset combination strategy to obtain a feature weight combination;
the model training module 304 is configured to import a combination of the text features, the audio features, and the feature weights into a preset initial detection model for training to obtain a behavior detection model;
the behavior detection module 305 is configured to receive a behavior detection instruction, obtain a call record corresponding to the behavior detection instruction, import the call record into a trained behavior detection model, and output a behavior detection result.
Further, the weight calculating module 303 specifically includes:
the weight assignment unit is used for assigning the same initial weight to the text characteristic and the audio characteristic;
the weight calculation unit is used for calculating the feature weight of the text feature based on a preset feature weight algorithm to obtain a text feature weight, and calculating the feature weight of the audio feature based on the preset feature weight algorithm to obtain an audio feature weight;
and the weight combination unit is used for combining the text characteristic weight and the audio characteristic weight based on a preset combination strategy to obtain a characteristic weight combination.
Further, the preset feature weight algorithm is a Relief algorithm, and the weight calculation unit specifically includes:
the text feature classification subunit is used for classifying the text features given with the initial weights to obtain a plurality of text feature combinations;
the first similarity calculation subunit is used for calculating the similarity of the text features in the same text feature combination to obtain a first similarity;
the second similarity sub-calculation unit is used for calculating the similarity of the text features among different text feature combinations to obtain a second similarity;
and the text weight calculating subunit is used for adjusting the initial weight of the text features based on the first similarity and the second similarity to obtain the feature weight of each text feature.
Further, the preset feature weight algorithm is a Relief algorithm, and the weight calculation unit further includes:
the audio feature classification subunit is used for classifying the audio features given with the initial weights to obtain a plurality of audio feature combinations;
the third similarity degree operator unit is used for calculating the similarity degree of the audio features in the same audio feature combination to obtain a third similarity degree;
the fourth similarity degree operator unit is used for calculating the similarity degree of the audio features among different audio feature combinations to obtain a fourth similarity degree;
and the audio weight calculating subunit is used for adjusting the initial weight of the audio features based on the third similarity and the fourth similarity to obtain the feature weight of each audio feature.
Further, the initial detection model includes an input layer, a convolutional layer, and an output layer, and the model training module 304 specifically includes:
the model input unit is used for respectively importing the text characteristics and the audio characteristics into an input layer of a preset initial detection model and importing the characteristic weight combination into an output layer of the preset initial detection model;
the model training unit is used for respectively acquiring initial feature vectors of text features and audio features through the input layer, performing convolution operation on the initial feature vectors through the convolution layer to obtain an initial feature matrix, and integrating the initial feature matrix based on feature weight combination in the output layer to obtain a feature matrix;
and the model adjusting unit is used for adjusting the parameters of the preset initial detection model based on the characteristic matrix to obtain the behavior detection model.
Further, the model training unit specifically includes:
the vector conversion subunit is used for respectively carrying out vector feature conversion on the text features and the audio features through the input layer to obtain initial feature vectors of the text features and the initial feature vectors of the audio features;
the convolution operation subunit is used for respectively carrying out convolution operation on the initial characteristic vector of the text characteristic and the initial characteristic vector of the audio characteristic through the convolution layer to obtain an initial characteristic matrix of the text characteristic and an initial characteristic matrix of the audio characteristic;
and the matrix splicing subunit is used for performing matrix splicing on the initial feature matrix of the text features and the initial feature matrix of the audio features in the output layer based on the feature weight combination to obtain a feature matrix.
Further, the model training module 304 further comprises:
the model verification unit is used for acquiring a verification sample from a preset database, importing the verification sample into the behavior detection model and outputting a verification result;
the error calculation unit is used for fitting by using a back propagation algorithm based on the verification result and a preset standard result to obtain a detection error;
the model iteration unit is used for comparing the detection error with a preset error threshold value, and if the detection error is larger than the preset error threshold value, the behavior detection model is iteratively updated until the detection error is smaller than or equal to the preset error threshold value;
and the model output unit is used for outputting the behavior detection model with the detection error smaller than or equal to a preset error threshold value.
The application discloses behavior quality inspection device based on voice call, which belongs to the technical field of artificial intelligence, the application obtains a characteristic weight combination by extracting text characteristics of training samples and audio characteristics of the training samples and calculating the characteristic weights of the text characteristics and the audio characteristics based on a preset characteristic weight algorithm, a behavior detection model is trained by utilizing the combination of the text characteristics, the audio characteristics and the characteristic weights, the behavior detection model comprises a call intention identification module and an emotion identification module, the intention and the emotion of two parties of the call can be respectively identified, the intention and the emotion of a customer service and a customer are identified in real time through the trained behavior detection model, and whether violation behaviors exist in the customer service and whether potential complaint behaviors exist in the customer are judged according to intention identification results and emotion identification results. The technical scheme of the proposal can realize the behavior quality inspection of the voice call from two dimensions of intention and emotion, and can judge the violation behaviors of the customer service staff and potential customer complaints in time and efficiently.
In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 4, fig. 4 is a block diagram of a basic structure of a computer device according to the present embodiment.
The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only computer device 4 having components 41-43 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 41 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 4. Of course, the memory 41 may also include both internal and external storage devices of the computer device 4. In this embodiment, the memory 41 is generally used for storing an operating system installed in the computer device 4 and various types of application software, such as computer readable instructions of a behavior quality inspection method based on voice call. Further, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 42 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute the computer readable instructions stored in the memory 41 or process data, for example, execute the computer readable instructions of the voice call based behavior quality inspection method.
The network interface 43 may comprise a wireless network interface or a wired network interface, and the network interface 43 is generally used for establishing communication connection between the computer device 4 and other electronic devices.
The application discloses equipment, which belongs to the technical field of artificial intelligence, and is characterized in that text features of training samples and audio features of the training samples are extracted, feature weights of the text features and the audio features are calculated based on a preset feature weight algorithm to obtain feature weight combinations, a behavior detection model is trained by the combination of the text features, the audio features and the feature weights, the behavior detection model comprises a conversation intention recognition module and an emotion recognition module, intentions and emotions of two parties in conversation can be recognized respectively, intentions and emotions of customer service and customers are recognized in real time through the trained behavior detection model, and whether violation behaviors exist in customer service staff and whether potential complaints exist in the customers are judged according to intention recognition results and emotion recognition results. The technical scheme of the proposal can realize the behavior quality inspection of the voice call from two dimensions of intention and emotion, and can judge the violation behaviors of the customer service staff and potential customer complaints in time and efficiently.
The present application further provides another embodiment, which is to provide a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the voice call based behavior quality inspection method as described above.
The application discloses a storage medium, which belongs to the technical field of artificial intelligence, and is characterized in that text features of training samples and audio features of the training samples are extracted, feature weights of the text features and the audio features are calculated based on a preset feature weight algorithm to obtain feature weight combinations, a behavior detection model is trained by the combination of the text features, the audio features and the feature weights, the behavior detection model comprises a conversation intention recognition module and an emotion recognition module, intentions and emotions of two parties in conversation can be recognized respectively, intentions and emotions of customer service and customers are recognized in real time through the trained behavior detection model, and whether violation behaviors exist in customer service staff and whether potential complaints exist in the customers are judged according to intention recognition results and emotion recognition results. The technical scheme of the proposal can realize the behavior quality inspection of the voice call from two dimensions of intention and emotion, and can judge the violation behaviors of the customer service staff and potential customer complaints in time and efficiently.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims (10)

1. A behavior quality inspection method based on voice call is characterized by comprising the following steps:
acquiring a training sample from a preset database, and respectively acquiring text information and audio information of the training sample;
extracting text features from the text information of the training samples, and extracting audio features from the audio information of the training samples;
calculating the feature weights of the text features and the audio features based on a preset feature weight algorithm, and combining the feature weights based on a preset combination strategy to obtain a feature weight combination;
the text feature, the audio feature and the feature weight combination are led into a preset initial detection model to be trained, and a behavior detection model is obtained;
and receiving a behavior detection instruction, acquiring a call record corresponding to the behavior detection instruction, importing the call record into the trained behavior detection model, and outputting a behavior detection result.
2. The behavior quality inspection method based on voice call as claimed in claim 1, wherein the step of calculating the feature weights of the text feature and the audio feature based on a preset feature weight algorithm, and combining the feature weights based on a preset combination strategy to obtain a feature weight combination specifically comprises:
assigning the same initial weight to the text feature and the audio feature;
calculating the feature weight of the text feature based on a preset feature weight algorithm to obtain a text feature weight, and calculating the feature weight of the audio feature based on a preset feature weight algorithm to obtain an audio feature weight;
and combining the text characteristic weight and the audio characteristic weight based on a preset combination strategy to obtain the characteristic weight combination.
3. The behavior quality inspection method based on voice call as claimed in claim 2, wherein the preset feature weight algorithm is a Relief algorithm, and the step of calculating the feature weight of the text feature based on the preset feature weight algorithm to obtain the text feature weight specifically comprises:
classifying the text features given with the initial weights to obtain a plurality of text feature combinations;
calculating the similarity of the text features in the same text feature combination to obtain a first similarity;
calculating the similarity of text features among different text feature combinations to obtain a second similarity;
and adjusting the initial weight of the text features based on the first similarity and the second similarity to obtain the feature weight of each text feature.
4. The behavior quality inspection method based on voice call as claimed in claim 2, wherein the preset feature weight algorithm is a Relief algorithm, and the step of calculating the feature weight of the audio feature based on the preset feature weight algorithm to obtain the audio feature weight specifically comprises:
classifying the audio features given with the initial weights to obtain a plurality of audio feature combinations;
calculating the similarity of the audio features in the same audio feature combination to obtain a third similarity;
calculating the similarity of the audio features among different audio feature combinations to obtain a fourth similarity;
and adjusting the initial weight of the audio features based on the third similarity and the fourth similarity to obtain the feature weight of each audio feature.
5. The method according to any one of claims 1 to 4, wherein the initial detection model includes an input layer, a convolutional layer, and an output layer, and the step of training the combination of the text feature, the audio feature, and the feature weight in a preset initial detection model to obtain the behavior detection model specifically includes:
respectively importing the text features and the audio features into an input layer of a preset initial detection model, and importing the feature weight combination into an output layer of the preset initial detection model;
respectively obtaining initial feature vectors of the text features and the audio features through the input layer, performing convolution operation on the initial feature vectors through the convolution layer to obtain an initial feature matrix, and integrating the initial feature matrix based on the feature weight combination in the output layer to obtain a feature matrix;
and adjusting preset parameters of the initial detection model based on the characteristic matrix to obtain the behavior detection model.
6. The behavior quality inspection method based on voice call as claimed in claim 5, wherein the steps of obtaining initial feature vectors of the text feature and the audio feature through the input layer, performing convolution operation on the initial feature vectors through the convolution layer to obtain an initial feature matrix, and integrating the initial feature matrices based on the feature weight combination in the output layer to obtain a feature matrix specifically include:
respectively carrying out vector feature conversion on the text features and the audio features through the input layer to obtain initial feature vectors of the text features and the audio features;
performing convolution operation on the initial feature vector of the text feature and the initial feature vector of the audio feature through the convolution layer to obtain an initial feature matrix of the text feature and an initial feature matrix of the audio feature;
and performing matrix splicing on the initial feature matrix of the text feature and the initial feature matrix of the audio feature in the output layer based on the feature weight combination to obtain the feature matrix.
7. The method for behavioral quality inspection based on voice call according to claim 5, wherein after the step of adjusting the preset parameters of the initial detection model based on the feature matrix to obtain the behavioral detection model, the method further comprises:
obtaining a verification sample from a preset database, importing the verification sample into the behavior detection model, and outputting a verification result;
fitting by using a back propagation algorithm based on the verification result and a preset standard result to obtain a detection error;
comparing the detection error with a preset error threshold, and if the detection error is greater than the preset error threshold, iteratively updating the behavior detection model until the detection error is less than or equal to the preset error threshold;
and outputting a behavior detection model with the detection error smaller than or equal to a preset error threshold.
8. A behavior quality inspection device based on voice call is characterized by comprising:
the information extraction module is used for acquiring training samples from a preset database and respectively acquiring text information and audio information of the training samples;
the feature extraction module is used for extracting text features from the text information of the training samples and extracting audio features from the audio information of the training samples;
the weight calculation module is used for calculating the feature weights of the text features and the audio features based on a preset feature weight algorithm, and combining the feature weights based on a preset combination strategy to obtain a feature weight combination;
the model training module is used for importing the text feature, the audio feature and the feature weight combination into a preset initial detection model for training to obtain a behavior detection model;
and the behavior detection module is used for receiving a behavior detection instruction, acquiring a call record corresponding to the behavior detection instruction, importing the call record into the trained behavior detection model, and outputting a behavior detection result.
9. An apparatus comprising a memory having computer readable instructions stored therein and a processor that when executed performs the steps of the voice call based behavioral quality inspection method of any one of claims 1 to 7.
10. A computer-readable storage medium having computer-readable instructions stored thereon which, when executed by a processor, implement the steps of the voice call based behavioral quality inspection method according to any one of claims 1 to 7.
CN202110220308.7A 2021-02-26 2021-02-26 Behavior quality inspection method, device, equipment and storage medium based on voice call Active CN112995414B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110220308.7A CN112995414B (en) 2021-02-26 2021-02-26 Behavior quality inspection method, device, equipment and storage medium based on voice call

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110220308.7A CN112995414B (en) 2021-02-26 2021-02-26 Behavior quality inspection method, device, equipment and storage medium based on voice call

Publications (2)

Publication Number Publication Date
CN112995414A true CN112995414A (en) 2021-06-18
CN112995414B CN112995414B (en) 2022-10-25

Family

ID=76351299

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110220308.7A Active CN112995414B (en) 2021-02-26 2021-02-26 Behavior quality inspection method, device, equipment and storage medium based on voice call

Country Status (1)

Country Link
CN (1) CN112995414B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113468296A (en) * 2021-09-02 2021-10-01 杭州实在智能科技有限公司 Model self-iteration type intelligent customer service quality inspection system and method capable of configuring business logic
CN113903358A (en) * 2021-10-15 2022-01-07 北京房江湖科技有限公司 Voice quality inspection method, readable storage medium and computer program product

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020002464A1 (en) * 1999-08-31 2002-01-03 Valery A. Petrushin System and method for a telephonic emotion detection that provides operator feedback
CN103188410A (en) * 2011-12-29 2013-07-03 上海博泰悦臻电子设备制造有限公司 Voice auto-answer cloud server, voice auto-answer system and voice auto-answer method
CN109784414A (en) * 2019-01-24 2019-05-21 出门问问信息科技有限公司 Customer anger detection method, device and electronic equipment in a kind of phone customer service
CN111209970A (en) * 2020-01-08 2020-05-29 Oppo(重庆)智能科技有限公司 Video classification method and device, storage medium and server

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020002464A1 (en) * 1999-08-31 2002-01-03 Valery A. Petrushin System and method for a telephonic emotion detection that provides operator feedback
CN103188410A (en) * 2011-12-29 2013-07-03 上海博泰悦臻电子设备制造有限公司 Voice auto-answer cloud server, voice auto-answer system and voice auto-answer method
CN109784414A (en) * 2019-01-24 2019-05-21 出门问问信息科技有限公司 Customer anger detection method, device and electronic equipment in a kind of phone customer service
CN111209970A (en) * 2020-01-08 2020-05-29 Oppo(重庆)智能科技有限公司 Video classification method and device, storage medium and server

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113468296A (en) * 2021-09-02 2021-10-01 杭州实在智能科技有限公司 Model self-iteration type intelligent customer service quality inspection system and method capable of configuring business logic
CN113468296B (en) * 2021-09-02 2022-05-03 杭州实在智能科技有限公司 Model self-iteration type intelligent customer service quality inspection system and method capable of configuring business logic
CN113903358A (en) * 2021-10-15 2022-01-07 北京房江湖科技有限公司 Voice quality inspection method, readable storage medium and computer program product

Also Published As

Publication number Publication date
CN112995414B (en) 2022-10-25

Similar Documents

Publication Publication Date Title
CN112732911B (en) Semantic recognition-based speaking recommendation method, device, equipment and storage medium
CN110569377B (en) Media file processing method and device
WO2021120677A1 (en) Warehousing model training method and device, computer device and storage medium
CN112085565B (en) Deep learning-based information recommendation method, device, equipment and storage medium
CN110825956A (en) Information flow recommendation method and device, computer equipment and storage medium
CN112418059A (en) Emotion recognition method and device, computer equipment and storage medium
CN112995414B (en) Behavior quality inspection method, device, equipment and storage medium based on voice call
CN112632244A (en) Man-machine conversation optimization method and device, computer equipment and storage medium
CN112863683A (en) Medical record quality control method and device based on artificial intelligence, computer equipment and storage medium
CN115130711A (en) Data processing method and device, computer and readable storage medium
CN111368551A (en) Method and device for determining event subject
WO2019227629A1 (en) Text information generation method and apparatus, computer device and storage medium
CN115619448A (en) User loss prediction method and device, computer equipment and storage medium
CN115730597A (en) Multi-level semantic intention recognition method and related equipment thereof
CN110532448B (en) Document classification method, device, equipment and storage medium based on neural network
CN114398466A (en) Complaint analysis method and device based on semantic recognition, computer equipment and medium
CN113936677A (en) Tone conversion method, device, computer equipment and storage medium
CN113420869A (en) Translation method based on omnidirectional attention and related equipment thereof
CN116911304B (en) Text recommendation method and device
CN113792342B (en) Desensitization data reduction method, device, computer equipment and storage medium
CN117172632B (en) Enterprise abnormal behavior detection method, device, equipment and storage medium
CN113792163B (en) Multimedia recommendation method and device, electronic equipment and storage medium
CN110992067B (en) Message pushing method, device, computer equipment and storage medium
CN113361629A (en) Training sample generation method and device, computer equipment and storage medium
CN117391812A (en) Recommendation method, device, equipment and medium based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant