WO2022116438A1 - 客服违规质检方法、装置、计算机设备及存储介质 - Google Patents

客服违规质检方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2022116438A1
WO2022116438A1 PCT/CN2021/083795 CN2021083795W WO2022116438A1 WO 2022116438 A1 WO2022116438 A1 WO 2022116438A1 CN 2021083795 W CN2021083795 W CN 2021083795W WO 2022116438 A1 WO2022116438 A1 WO 2022116438A1
Authority
WO
WIPO (PCT)
Prior art keywords
customer service
tested
vector
dialogue text
service dialogue
Prior art date
Application number
PCT/CN2021/083795
Other languages
English (en)
French (fr)
Inventor
颜泽龙
王健宗
吴天博
程宁
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022116438A1 publication Critical patent/WO2022116438A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06398Performance of employee with respect to a job function
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular, to a method, device, computer equipment and storage medium for quality inspection of customer service violations.
  • Quality inspection is quality inspection.
  • quality inspection is essential. On the one hand, it requires customer service to provide services as required to ensure service quality, and on the other hand, it can check whether the service is compliant.
  • the applicant and customer service need to conduct multiple rounds of communication to confirm whether the applicant has the corresponding qualifications and the corresponding amount. Sometimes there will be violations. In order to increase the order rate, customer service will guide applicants to provide false information, resulting in bad debts.
  • Embodiments of the present application provide a method, device, computer equipment, and storage medium for customer service violation quality inspection, which aim to solve the problem of low efficiency and poor accuracy in existing manual quality inspection.
  • an embodiment of the present application provides a method for quality inspection of customer service violations, which includes:
  • each violation category includes at least one violations
  • word-level interaction and sentence-level interaction are performed respectively between the violating customer service dialogue text to be tested and the feature vector searched for the violation category in advance, so as to obtain the feature discrimination vector of the customer service dialogue text to be tested, wherein, Each violation category has a corresponding feature vector;
  • the final vector is input into a pre-trained third linear classifier, and the violation situation of the customer service dialogue text to be tested is predicted by the third linear classifier.
  • the embodiments of the present application also provide a customer service violation quality inspection device, which includes:
  • an encoding unit configured to receive the customer service dialogue text to be tested, and encode the customer service dialogue text to be tested through a preset bidirectional RNN network to obtain a basic vector of the customer service dialogue text to be tested;
  • a judgment unit configured to input the basic vector into a pre-trained first linear classifier, and judge whether the customer service dialogue text to be tested violates the rules through the first linear classifier;
  • the first prediction unit is configured to input the basic vector into the pre-trained second linear classifier if the customer service dialogue text to be tested violates the rules, and output the violation category of the customer service dialogue text to be tested, wherein each A violation category includes at least one violation;
  • the interaction unit is used to perform word-level interaction and sentence-level interaction between the illegal customer service dialogue text to be tested and the feature vector searched for the violation category in advance according to the attention mechanism, so as to obtain the characteristics of the customer service dialogue text to be tested Distinguish vectors, in which each violation category has a corresponding feature vector;
  • a merging unit for merging the basic vector of the illegal customer service dialogue text to be tested and the feature distinguishing vector to obtain a final vector
  • the second prediction unit is configured to input the final vector into a pre-trained third linear classifier, and use the third linear classifier to predict the violation situation of the customer service dialogue text to be tested.
  • an embodiment of the present application further provides a computer device, the computer device includes a memory and a processor, the memory stores a computer program, and the processor is configured to run the computer program to perform the following steps :
  • word-level interaction and sentence-level interaction are performed respectively between the violating customer service dialogue text to be tested and the feature vector searched for the violation category in advance, so as to obtain the feature discrimination vector of the customer service dialogue text to be tested;
  • the final vector is input into a pre-trained third linear classifier, and the violation situation of the customer service dialogue text to be tested is predicted by the third linear classifier.
  • embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, wherein, when the computer program is executed by a processor, the processor executes the following steps :
  • word-level interaction and sentence-level interaction are performed respectively between the violating customer service dialogue text to be tested and the feature vector searched for the violation category in advance, so as to obtain the feature discrimination vector of the customer service dialogue text to be tested;
  • the final vector is input into a pre-trained third linear classifier, and the violation situation of the customer service dialogue text to be tested is predicted by the third linear classifier.
  • the embodiments of the present application provide a method, device, computer equipment and storage medium for quality inspection of customer service violations, which can realize automatic quality inspection of the customer service dialogue text to be tested, which is more efficient and more accurate than manual quality inspection. high advantage.
  • FIG. 1 is a schematic flowchart of a customer service violation quality inspection method provided by an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a method for calculating a basic vector of a customer service dialogue text to be tested according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of a sub-flow of a customer service violation quality inspection method provided by an embodiment of the present application
  • FIG. 4 is a schematic diagram of a sub-flow of a customer service violation quality inspection method provided by an embodiment of the present application
  • FIG. 5 is a schematic diagram of a sub-flow of the method for quality inspection of customer service violations provided by the embodiment of the present application
  • FIG. 6 is a schematic block diagram of a customer service violation quality inspection device provided by an embodiment of the present application.
  • FIG. 7 is a schematic block diagram of a computer device provided by an embodiment of the present application.
  • the term “if” may be contextually interpreted as “when” or “once” or “in response to determining” or “in response to detecting” .
  • the phrases “if it is determined” or “if the [described condition or event] is detected” may be interpreted, depending on the context, to mean “once it is determined” or “in response to the determination” or “once the [described condition or event] is detected. ]” or “in response to detection of the [described condition or event]”.
  • FIG. 1 is a schematic flowchart of a method for quality inspection of customer service violations provided by an embodiment of the present application.
  • This application can be applied to scenarios of smart government affairs/smart city management/smart community/smart security/smart logistics/smart medical care/smart education/smart environmental protection/smart transportation, so as to promote the construction of smart cities.
  • the method includes the following steps:
  • the customer service dialogue text to be tested is obtained.
  • Bidirectional Recurrent Neural Networks are composed of two RNNs superimposed on top of each other, and their output is jointly determined by the states of these two RNNs. Compared with the RNN network, the bidirectional RNN network can more accurately extract the features in the customer service dialogue text to be tested.
  • step S1 specifically includes the following steps:
  • S11 Perform word segmentation on the customer service dialogue text to be tested to obtain a word segmentation set of the customer service dialogue text to be tested.
  • word segmentation processing refers to dividing a Chinese character sequence into individual words. Word segmentation is the process of recombining consecutive word sequences into word sequences according to certain specifications.
  • step S11 specifically includes the following steps:
  • S111 Divide the customer service dialogue text to be tested into a plurality of words by using a preset word segmentation tool to obtain an initial word segmentation set.
  • the commonly used word segmentation tool is the stuttering word segmentation tool.
  • the customer service dialogue text to be tested is divided into a plurality of words by a stammering word segmentation tool, and these words constitute an initial word segmentation set.
  • stop words are often prepositions, adverbs or conjunctions, etc.
  • stop words are all stop words. Stop words have no actual meaning and can cause interference, so in practical applications, stop words need to be deleted.
  • the contained stop words are deleted to obtain the participle set.
  • S12 Perform word vector training on the words in the word segmentation set to obtain word vectors of the words in the word segmentation set.
  • word2vec is used to perform word vector training on the words in the word segmentation set.
  • word2vec is a natural language processing tool whose function is to convert words in natural language into word vectors that computers can understand.
  • word2vec is used in this embodiment to obtain word vectors, which can be calculated by vector The distance between words reflects the similarity between words.
  • word vector tools may be used for word vector training, which is not specifically limited in this application.
  • the bidirectional RNN network encodes the word vectors of the words in the word segmentation set, and the output of the bidirectional RNN network is the basic vector of the customer service dialogue text to be tested.
  • the basic vector is input into a pre-trained first linear classifier, and the first linear classifier is used to determine whether the customer service dialogue text to be tested violates the rules.
  • the mathematical expression of the first linear classifier is softmax refers to normalization processing.
  • W 1 represents the weight of the linear classifier, and b 1 represents the bias value, both of which can be obtained through training.
  • the first linear classifier is pre-trained with a large number of labeled samples, so that the first linear classifier has the function of identifying whether the customer service dialogue text to be tested violates the rules.
  • the basic vector is input into the first linear classifier to predict whether the violation is violated by the first linear classifier.
  • the mathematical expression of the second linear classifier is
  • softmax refers to normalization processing.
  • W 2 represents the weight of the linear classifier, and b 2 represents the bias value, both of which can be obtained through training.
  • the second linear classifier is pre-trained through a large number of labeled samples, so that the second linear classifier has the function of identifying the violation category of the customer service dialogue text to be tested.
  • the basic vector is input into the second linear classifier, so that the second linear classifier can predict the violation category to which the customer service dialogue text to be tested belongs.
  • the violation category to which the customer service dialogue text to be tested belongs is predicted by the second linear classifier as the violation category of the customer service dialogue text.
  • each violation category includes at least one violation situation.
  • the violation category includes three violation scenarios.
  • each violation category is represented by a feature vector.
  • the feature vector of each violation category is predetermined.
  • a plurality of violation categories are preset in this application, and a feature vector is preset for each.
  • the violation category output by the second linear classifier belongs to one of a plurality of preset categories. After the violation category output by the second linear classifier is determined, a feature vector having a mapping relationship with the violation category is searched in the database.
  • the feature vector of each violation category only needs to be calculated once, and the calculation process is performed before step S1.
  • the calculation process of the feature vector of each violation category includes the following steps:
  • a violation situation graph G is first constructed according to the type of the violation situation, and each node of the violation situation graph G represents a violation situation. If “providing false information” and “failure to clearly explain the loan" are two of the violations, they can be represented by two corresponding nodes in the violation graph G. By constructing a violation situation graph, the violation cases can be exhausted, and the number of violation cases is not limited.
  • S102 Divide the violation situation graph into multiple subgraphs according to distances between nodes in the violation situation graph, wherein the distance between nodes in the same subgraph is less than a preset distance threshold, and the nodes in different subgraphs The distance between them is greater than the preset distance threshold, and each sub-image corresponds to a violation category.
  • each violation situation has a corresponding description text, and through these description texts, the distance between all nodes is calculated by using the TFIDF algorithm. Given an appropriate distance threshold, if the distance between two nodes is greater than this distance threshold, the edge between the two nodes is retained, and the corresponding weight is set to the distance between the two, otherwise, set the two There are no direct edge connections between nodes. With appropriate distance threshold settings, the violation situation graph can be made to include multiple subgraphs that are not connected to each other.
  • A, B, and C are connected by edges, and D, E, and F are connected by edges, but ⁇ A, B, C ⁇ and ⁇ D, E, F ⁇ are connected by edges. There are no edge connections. Accordingly, A, B, and C belong to the same violation category. D, E, and F belong to the same category of violations.
  • S103 Calculate, by using a preset graph neural network, a feature vector of a violation category corresponding to each subgraph of the violation situation graph.
  • the graph neural network may adopt a graph convolutional neural network (Graph Convolutional Network, GCN).
  • GCN Graph Convolutional Network
  • Ni represents the set of all nodes connected to the node by edges
  • Both represent weights, which can be obtained through training
  • b l is the bias value
  • the maximum pooling and the minimum pooling operations are used to obtain the feature vector d i of the violation category corresponding to each sub-graph with distinction.
  • the specific formula is as follows:
  • MaP and MiP represent the maximum pooling and minimum pooling operations, respectively. It can be understood that by sorting out the information of all nodes in the entire subgraph g i , the feature vector d i of the violation category corresponding to the subgraph g i can be obtained.
  • the feature vector d i can be understood as a mathematical expression of the violation category, so that the computer can identify the violation category.
  • step S4 specifically includes the following steps:
  • the customer service dialogue text to be tested can be expressed by the following mathematical expression:
  • s represents the customer service dialogue text to be tested
  • s i represents the i-th sentence
  • w n, m represents the m-th word of the n-th sentence.
  • the attention mechanism is used to perform word-level interaction, and the sentence vector of each sentence is obtained.
  • the specific calculation method is as follows:
  • a i,j is the interactive word vector
  • hi ,j is the hidden state vector of the word
  • W w is the weight of the hidden state vector of the word
  • d o is the feature vector of the violation category
  • W g is the feature vector of the violation category
  • the weight of , v si is the sentence vector.
  • S43 Encode the sentence vector of each sentence of the customer service dialogue text to be tested by using a preset two-way GRU to obtain a hidden state vector of each sentence of the customer service dialogue text to be tested.
  • the mathematical expression for calculating the hidden state vector h i of each sentence of the customer service dialogue text to be tested by bidirectional GRU encoding is as follows:
  • the attention mechanism is used to perform sentence-level interaction, and the feature discrimination vector of the customer service dialogue text to be tested is obtained.
  • the specific calculation method is as follows:
  • a i is the interactive sentence vector
  • hi is the hidden state vector of the sentence
  • W s is the weight of the hidden state vector of the sentence
  • do is the feature vector of the violation category
  • W g is the weight of the feature vector of the violation category
  • the final vector in is the basic vector, Distinguish vectors for features.
  • the final vector is input into a pre-trained third linear classifier, and the violation situation of the customer service dialogue text to be tested is predicted by the third linear classifier.
  • the mathematical expression of the third linear classifier is:
  • softmax refers to normalization processing.
  • W 3 represents the weight of the linear classifier, and b 2 represents the bias value, both of which can be obtained through training.
  • the third linear classifier is pre-trained through a large number of labeled samples, so that the third linear classifier has the function of identifying the violation of the customer service dialogue text to be tested.
  • the final vector is input into the third linear classifier, so that the third linear classifier can predict the violation situation to which the customer service dialogue text to be tested belongs.
  • the technical solution of the present application can realize the automatic quality inspection of the customer service dialogue text to be tested, and has the advantages of high efficiency and high accuracy compared with the manual quality inspection method.
  • FIG. 6 is a schematic block diagram of a customer service violation quality inspection apparatus 70 provided by an embodiment of the present application.
  • the present application further provides a customer service violation quality inspection device 70 .
  • the customer service violation quality inspection device 70 includes a unit for executing the above customer service violation quality inspection method, and the customer service violation quality inspection device 70 can be configured in a desktop computer, a tablet computer, a laptop computer, or other terminals.
  • the customer service violation quality inspection device 70 includes an encoding unit 71 , a judgment unit 72 , a first prediction unit 73 , an interaction unit 74 , a merging unit 75 and a second prediction unit 76 .
  • the encoding unit 71 is configured to receive the customer service dialogue text to be tested, and encode the customer service dialogue text to be tested through a preset bidirectional RNN network to obtain a basic vector of the customer service dialogue text to be tested;
  • Judging unit 72 configured to input the basic vector into the pre-trained first linear classifier, and determine whether the customer service dialogue text to be tested violates the rules through the first linear classifier;
  • the first prediction unit 73 is configured to input the basic vector into the pre-trained second linear classifier if the customer service dialogue text to be tested violates the rules, and output the violation category of the customer service dialogue text to be tested, wherein, Each violation category includes at least one violation;
  • the interaction unit 74 is configured to perform word-level interaction and sentence-level interaction between the customer service dialogue text to be tested and the feature vector searched for the violation category in advance according to the attention mechanism, so as to obtain the content of the customer service dialogue text to be tested.
  • Feature discrimination vector in which each violation category has a corresponding feature vector
  • Merging unit 75 for merging the basic vector of the illegal customer service dialogue text to be tested and the feature distinguishing vector to obtain a final vector
  • the second prediction unit 76 is configured to input the final vector into a pre-trained third linear classifier, and use the third linear classifier to predict the violation situation of the customer service dialogue text to be tested.
  • the encoding of the customer service dialogue text to be tested through a preset bidirectional RNN network to obtain a basic vector of the customer service dialogue text to be tested includes:
  • the word vector of the words in the word segmentation set is input into the bidirectional RNN network and the basic vector of the customer service dialogue text to be tested is output.
  • performing word segmentation processing on the customer service dialogue text to be tested to obtain a word segmentation set of the customer service dialogue text to be tested including:
  • the stop words in the initial word segmentation set are deleted to obtain the word segmentation set.
  • word-level interaction and sentence-level interaction are performed respectively between the text of the customer service dialogue under test that violates the rules and the feature vector searched for the violation category in advance, so as to obtain the customer service dialogue under test to be tested.
  • Feature discriminating vectors of text including:
  • the hidden state vector of the words of each sentence perform word-level interaction between the customer service dialogue text to be tested and the feature vector of the violation category to obtain sentence vectors of each sentence of the customer service dialogue text to be tested;
  • a sentence-level interaction is performed between the customer service dialogue text to be tested and the feature vector of the violation category to obtain a feature discrimination vector of the customer service dialogue text to be tested.
  • sentence vector including:
  • a i,j are the interactive word vectors
  • hi ,j are the hidden state vectors of the words
  • W w is the weight of the hidden state vectors of the words
  • d o is the feature vector of the violation category
  • W g is the weight of the feature vector of the violation category
  • v si is the sentence vector.
  • performing sentence-level interaction between the customer service dialogue text to be tested and the feature vector of the violation category to obtain a feature discrimination vector of the customer service dialogue text to be tested include:
  • a i is the interactive sentence vector
  • hi is the hidden state vector of the sentence
  • W s is the weight of the hidden state vector of the sentence
  • d o is the feature vector of the violation category
  • W g is the weight of the feature vector of the violation category
  • the apparatus 70 for quality inspection of customer service violations proposed in the present application is based on the above-mentioned embodiment and adds a construction unit, a division unit, and a calculation unit.
  • the construction unit is used for constructing a violation situation graph, wherein the nodes of the violation situation graph are the violation cases.
  • a dividing unit configured to divide the violation situation graph into a plurality of subgraphs according to the distances between the nodes of the violation case graph, wherein the distance between the nodes of the same subgraph is less than a preset distance threshold, and The distance between the nodes of the graph is greater than the preset distance threshold, and each subgraph corresponds to a violation category.
  • the computing unit is configured to calculate, through a preset graph neural network, the feature vector of the violation category corresponding to each sub-graph of the violation situation graph.
  • the above-mentioned customer service violation quality inspection apparatus 70 may be implemented in the form of a computer program, and the computer program may be executed on a computer device as shown in FIG. 7 .
  • the computer device 500 may be a terminal.
  • the terminal may be an electronic device with a communication function, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, and a wearable device.
  • the computer device 500 includes a processor 502 connected by a system bus 501 , memory, and a network interface 505 , where the memory may include a non-volatile storage medium 503 and an internal memory 504 .
  • the nonvolatile storage medium 503 can store an operating system 5031 and a computer program 5032 .
  • the computer program 5032 can cause the processor 502 to execute a customer service violation quality inspection method.
  • the processor 502 is used to provide computing and control capabilities to support the operation of the entire computer device 500 .
  • the internal memory 504 provides an environment for running the computer program 5032 in the non-volatile storage medium 503.
  • the processor 502 can execute a customer service violation quality inspection method.
  • the network interface 505 is used for network communication with other devices.
  • Those skilled in the art can understand that the above structure is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied. shown in more or less components, or in combination with certain components, or with different arrangements of components.
  • processor 502 is used to run the computer program 5032 stored in the memory to realize the following steps:
  • each violation category includes at least one violations
  • word-level interaction and sentence-level interaction are performed respectively between the violating customer service dialogue text to be tested and the feature vector searched for the violation category in advance, so as to obtain the feature discrimination vector of the customer service dialogue text to be tested, wherein, Each violation category has a corresponding feature vector;
  • the final vector is input into a pre-trained third linear classifier, and the violation situation of the customer service dialogue text to be tested is predicted by the third linear classifier.
  • the processing is performed before the customer service dialogue text to be tested is received, and the customer service dialogue text to be tested is encoded by a preset bidirectional RNN network to obtain the basic vector of the customer service dialogue text to be tested.
  • the device 502 also implements the following steps:
  • the violation situation graph is divided into a plurality of subgraphs according to the distances between the nodes of the violation graph The distance is greater than the preset distance threshold, and each sub-image corresponds to a violation category;
  • the feature vector of the violation category corresponding to each sub-graph of the violation situation graph is calculated by using a preset graph neural network.
  • the encoding of the customer service dialogue text to be tested through a preset bidirectional RNN network to obtain a basic vector of the customer service dialogue text to be tested includes:
  • the word vector of the words in the word segmentation set is input into the bidirectional RNN network and the basic vector of the customer service dialogue text to be tested is output.
  • performing word segmentation processing on the customer service dialogue text to be tested to obtain a word segmentation set of the customer service dialogue text to be tested including:
  • the stop words in the initial word segmentation set are deleted to obtain the word segmentation set.
  • word-level interaction and sentence-level interaction are performed respectively between the text of the customer service dialogue under test that violates the rules and the feature vector searched for the violation category in advance, so as to obtain the customer service dialogue under test to be tested.
  • Feature discriminating vectors of text including:
  • the hidden state vector of the words of each sentence perform word-level interaction between the customer service dialogue text to be tested and the feature vector of the violation category to obtain sentence vectors of each sentence of the customer service dialogue text to be tested;
  • a sentence-level interaction is performed between the customer service dialogue text to be tested and the feature vector of the violation category to obtain a feature discrimination vector of the customer service dialogue text to be tested.
  • sentence vector including:
  • a i,j are the interactive word vectors
  • hi ,j are the hidden state vectors of the words
  • W w is the weight of the hidden state vectors of the words
  • d o is the feature vector of the violation category
  • W g is the weight of the feature vector of the violation category
  • v si is the sentence vector.
  • performing sentence-level interaction between the customer service dialogue text to be tested and the feature vector of the violation category to obtain a feature discrimination vector of the customer service dialogue text to be tested include:
  • a i is the interactive sentence vector
  • hi is the hidden state vector of the sentence
  • W s is the weight of the hidden state vector of the sentence
  • d o is the feature vector of the violation category
  • W g is the weight of the feature vector of the violation category
  • the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor can be a microprocessor or the processor can also be any conventional processor or the like.
  • the computer program can be stored in a storage medium, which is a computer-readable storage medium.
  • the computer program is executed by at least one processor in the computer system to implement the flow steps of the above-described method embodiments.
  • the present application also provides a storage medium.
  • the storage medium may be a computer-readable storage medium.
  • the storage medium stores a computer program.
  • the computer program when executed by the processor, causes the processor to perform the following steps:
  • each violation category includes at least one violations
  • word-level interaction and sentence-level interaction are performed on the violating customer service dialogue text to be tested and the feature vector searched for the violation category in advance, respectively, so as to obtain the feature discrimination vector of the customer service dialogue text to be tested, wherein, Each violation category has a corresponding feature vector;
  • the final vector is input into a pre-trained third linear classifier, and the violation situation of the customer service dialogue text to be tested is predicted by the third linear classifier.
  • the computer program before the customer service dialogue text to be tested is received, the computer program is used to encode the customer service dialogue text to be tested through a preset bidirectional RNN network to obtain the basic vector of the customer service dialogue text to be tested.
  • the processor executes the following steps:
  • the violation situation graph is divided into a plurality of subgraphs according to the distances between the nodes of the violation graph The distance is greater than the preset distance threshold, and each sub-image corresponds to a violation category;
  • the feature vector of the violation category corresponding to each sub-graph of the violation situation graph is calculated by using a preset graph neural network.
  • the encoding of the customer service dialogue text to be tested through a preset bidirectional RNN network to obtain a basic vector of the customer service dialogue text to be tested includes:
  • the word vector of the words in the word segmentation set is input into the bidirectional RNN network and the basic vector of the customer service dialogue text to be tested is output.
  • performing word segmentation processing on the customer service dialogue text to be tested to obtain a word segmentation set of the customer service dialogue text to be tested including:
  • the stop words in the initial word segmentation set are deleted to obtain the word segmentation set.
  • word-level interaction and sentence-level interaction are performed respectively between the text of the customer service dialogue under test that violates the rules and the feature vector searched for the violation category in advance, so as to obtain the customer service dialogue under test to be tested.
  • Feature discriminating vectors of text including:
  • the hidden state vector of the words of each sentence perform word-level interaction between the customer service dialogue text to be tested and the feature vector of the violation category to obtain sentence vectors of each sentence of the customer service dialogue text to be tested;
  • a sentence-level interaction is performed between the customer service dialogue text to be tested and the feature vector of the violation category to obtain a feature discrimination vector of the customer service dialogue text to be tested.
  • sentence vector including:
  • a i,j are the interactive word vectors
  • hi ,j are the hidden state vectors of the words
  • W w is the weight of the hidden state vectors of the words
  • d o is the feature vector of the violation category
  • W g is the weight of the feature vector of the violation category
  • v si is the sentence vector.
  • performing sentence-level interaction between the customer service dialogue text to be tested and the feature vector of the violation category to obtain a feature discrimination vector of the customer service dialogue text to be tested include:
  • a i is the interactive sentence vector
  • hi is the hidden state vector of the sentence
  • W s is the weight of the hidden state vector of the sentence
  • d o is the feature vector of the violation category
  • W g is the weight of the feature vector of the violation category
  • the storage medium is a physical, non-transitory storage medium, such as a U disk, a removable hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk or an optical disk and other physical storage that can store program codes. medium.
  • the computer-readable storage medium may be non-volatile or volatile.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a storage medium.
  • the technical solutions of the present application are essentially or part of contributions to the prior art, or all or part of the technical solutions can be embodied in the form of software products, and the computer software products are stored in a storage medium , including several instructions to cause a computer device (which may be a personal computer, a terminal, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Human Resources & Organizations (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • Biophysics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Biomedical Technology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Game Theory and Decision Science (AREA)
  • Human Computer Interaction (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种客服违规质检方法、装置、计算机设备及存储介质,涉及人工智能技术领域。该方法包括:通过双向RNN网络对待测客服对话文本进行编码以得到其的基本向量;将基本向量输入到第一线性分类器中,以判断待测客服对话文本是否违规;若是,将基本向量输入到第二线性分类器中以预测待测客服对话文本的违规类别作为违规类别;根据注意力机制,将待测客服对话文本与违规类别的特征向量分别进行词级别交互以及句子级别交互,以得到特征区分向量;将基本向量与特征区分向量合并得到最终向量;将最终向量输入到第三线性分类器中以预测待测客服对话文本的违规情形。

Description

客服违规质检方法、装置、计算机设备及存储介质
本申请要求于2020年12月01日提交中国专利局,申请号为202011387369.4,发明名称为“客服违规质检方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种客服违规质检方法、装置、计算机设备及存储介质。
背景技术
质检,就是质量检测。在任何一项客服服务中,质检都必不可少。一方面,既要求客服按照要求提供服务,保证服务质量,另一方面可以检查服务是否合规。在贷款申请过程中,申请人跟客服需要进行多轮的沟通,确认申请人是否具备相应的资格和对应的额度。有时会出现违规场景,客服为了提高成单率,会引导申请人提供虚假信息,导致最终坏账。
发明人发现,传统的质检系统都是依赖人工去复查,人工处理的效率底下,面对需要质检的庞大对话数据,只能通过抽检的方式进行。同时,由于人的机能限制,很难保证所有错综复杂的质检规则都被考虑到。
发明内容
本申请实施例提供了一种客服违规质检方法、装置、计算机设备及存储介质,旨在解决现有人工质检存在的效率低准确性差的问题。
第一方面,本申请实施例提供了一种客服违规质检方法,其包括:
接收到待测客服对话文本,通过预设的双向RNN网络对所述待测客服对话文本进行编码以得到所述待测客服对话文本的基本向量;
将所述基本向量输入到预训练的第一线性分类器中,并通过所述第一线性分类器判断所述待测客服对话文本是否违规;
若所述待测客服对话文本违规,将所述基本向量输入到预训练的第二线性分类器中,并输出所述待测客服对话文本的违规类别,其中,每一违规类别至少包括一种违规情形;
根据注意力机制,将违规的待测客服对话文本与预先针对所述违规类别查找的特征向量分别进行词级别交互以及句子级别交互,以得到所述待测客服对话文本的特征区分向量,其中,各违规类别均对应设有特征向量;
将所述违规的待测客服对话文本的基本向量与所述特征区分向量合并得到最终向量;
将所述最终向量输入到预训练的第三线性分类器中,并通过所述第三线性分类器预测所述待测客服对话文本的违规情形。
第二方面,本申请实施例还提供了一种客服违规质检装置,其包括:
编码单元,用于接收到待测客服对话文本,通过预设的双向RNN网络对所述待测客服对话文本进行编码以得到所述待测客服对话文本的基本向量;
判断单元,用于将所述基本向量输入到预训练的第一线性分类器中,并通过所述第一线性分类器判断所述待测客服对话文本是否违规;
第一预测单元,用于若所述待测客服对话文本违规,将所述基本向量输入到预训练的第二线性分类器中,并输出所述待测客服对话文本的违规类别,其中,每一违规类别至少包括一种违规情形;
交互单元,用于根据注意力机制,将违规的待测客服对话文本与预先针对所述违规类别查找的特征向量分别进行词级别交互以及句子级别交互,以得到所述待测客服对话文本的特征区分向量,其中,各违规类别均对应设有特征向量;
合并单元,用于将所述违规的待测客服对话文本的基本向量与所述特征区分向量合并得 到最终向量;
第二预测单元,用于将所述最终向量输入到预训练的第三线性分类器中,并通过所述第三线性分类器预测所述待测客服对话文本的违规情形。
第三方面,本申请实施例还提供了一种计算机设备,所述计算机设备包括存储器及处理器,所述存储器上存储有计算机程序,所述处理器用于运行所述计算机程序,以执行如下步骤:
接收待测客服对话文本,通过预设的双向RNN网络对所述待测客服对话文本进行编码以得到所述待测客服对话文本的基本向量;
将所述基本向量输入到预训练的第一线性分类器中,并通过所述第一线性分类器判断所述待测客服对话文本是否违规;
若所述待测客服对话文本违规,将所述基本向量输入到预训练的第二线性分类器中,并输出所述待测客服对话文本的违规类别;
根据注意力机制,将违规的待测客服对话文本与预先针对所述违规类别查找的特征向量分别进行词级别交互以及句子级别交互,以得到所述待测客服对话文本的特征区分向量;
将所述违规的待测客服对话文本的基本向量与所述特征区分向量合并得到最终向量;
将所述最终向量输入到预训练的第三线性分类器中,并通过所述第三线性分类器预测所述待测客服对话文本的违规情形。
第四方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序被处理器执行时使所述处理器执行以下步骤:
接收待测客服对话文本,通过预设的双向RNN网络对所述待测客服对话文本进行编码以得到所述待测客服对话文本的基本向量;
将所述基本向量输入到预训练的第一线性分类器中,并通过所述第一线性分类器判断所述待测客服对话文本是否违规;
若所述待测客服对话文本违规,将所述基本向量输入到预训练的第二线性分类器中,并输出所述待测客服对话文本的违规类别;
根据注意力机制,将违规的待测客服对话文本与预先针对所述违规类别查找的特征向量分别进行词级别交互以及句子级别交互,以得到所述待测客服对话文本的特征区分向量;
将所述违规的待测客服对话文本的基本向量与所述特征区分向量合并得到最终向量;
将所述最终向量输入到预训练的第三线性分类器中,并通过所述第三线性分类器预测所述待测客服对话文本的违规情形。
本申请实施例提供了一种客服违规质检方法、装置、计算机设备及存储介质,能够实现自动对待测客服对话文本进行质检,相比于人工质检的方式,具有效率高,且准确性高的优点。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的客服违规质检方法的流程示意图;
图2为本申请实施例提供的待测客服对话文本的基本向量的计算方法的流程示意图;
图3为本申请实施例提供的客服违规质检方法的子流程示意图;
图4为本申请实施例提供的客服违规质检方法的子流程示意图;
图5为本申请实施例提供的客服违规质检方法的子流程示意图;
图6为本申请实施例提供的客服违规质检装置的示意性框图;
图7为本申请实施例提供的计算机设备的示意性框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。
还应当进一步理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
如在本说明书和所附权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。
请参阅图1,图1是本申请实施例提供的客服违规质检方法的流程示意图。本申请可应用于智慧政务/智慧城管/智慧社区/智慧安防/智慧物流/智慧医疗/智慧教育/智慧环保/智慧交通场景中,从而推动智慧城市的建设。如图所示,该方法包括以下步骤:
S1,接收到待测客服对话文本,通过预设的双向RNN网络对所述待测客服对话文本进行编码以得到所述待测客服对话文本的基本向量。
具体实施中,将客服人员与客户之间的通话录音转换为文字后,得到待测客服对话文本。
双向RNN网络(Bidirectional Recurrent Neural Networks,双向循环神经网络)是由两个RNN上下叠加在一起组成的,其输出由这两个RNN的状态共同决定。相比于RNN网络,双向RNN网络能够更加准确的提取待测客服对话文本中的特征。
参见图2,在一实施例中,以上步骤S1具体包括如下步骤:
S11,对所述待测客服对话文本进行分词处理以得到所述待测客服对话文本的分词集合。
具体实施中,分词处理指的是将一个汉字序列切分成一个一个单独的词语。分词就是将连续的字序列按照一定的规范重新组合成词序列的过程。
参见图3,在一实施例中,以上步骤S11具体包括如下步骤:
S111,通过预设分词工具将所述待测客服对话文本划分为多个词语以得到初始分词集合。
具体实施中,常用的分词工具为结巴分词工具。通过结巴分词工具将所述待测客服对话文本划分为多个词语,这些词语组成了初始分词集合。
S112,将所述初始分词集合中的停止词删除以得到所述分词集合。
具体实施中,停止词(stop word),常为介词、副词或连词等。例如,"在"、"里面"、"也"、"的"、"它"、"为"等都为停止词。停止词没有实际含义,且会产生干扰,因此,在实际应用时,需要将停止词删除。
若初始分词集合包含停止词,则将其包含的停止词删除以得到分词集合。
S12,对所述分词集合的词语进行词向量训练以得到所述分词集合的词语的词向量。
具体实施中,采用word2vec来对对所述分词集合的词语进行词向量训练。word2vec是一种自然语言处理工具,其作用就是将自然语言中的字词转为计算机可以理解的词向量。
传统的词向量容易受维数灾难的困扰,且任意两个词之间都是孤立的,不能体现词和词之间的关系,因此本实施例采用word2vec来得到词向量,其可通过计算向量之间的距离来体现词与词之间的相似性。
或者,在其他实施例中,可采用其他词向量工具进行词向量训练,本申请对此不作具体限定。
S13,将所述分词集合的词语的词向量输入到双向RNN网络中并输出所述待测客服对话文本的基本向量。
具体实施中,双向RNN网络对所述分词集合中的词语的词向量进行编码,双向RNN网络的输出即为所述待测客服对话文本的基本向量。
S2,将所述基本向量输入到预训练的第一线性分类器中,并通过所述第一线性分类器判断所述待测客服对话文本是否违规。
具体实施中,第一线性分类器的数学表达式为
Figure PCTCN2021083795-appb-000001
softmax是指归一化处理。W 1表示该线性分类器的权重,b 1表示偏置值,二者均可以通过训练获得。
可以理解地,预先通过大量经过标注的样本对第一线性分类器进行训练,使得第一线性分类器具备识别待测客服对话文本是否违规的功能。
训练结束后,将基本向量输入到第一线性分类器中,以由第一线性分类器预测是否违规,若违规,则判定待测客服对话文本违规。
S3,若所述待测客服对话文本违规,将所述基本向量输入到预训练的第二线性分类器中,并输出所述待测客服对话文本的违规类别。
具体实施中,第二线性分类器的数学表达式为
Figure PCTCN2021083795-appb-000002
softmax是指归一化处理。W 2表示该线性分类器的权重,b 2表示偏置值,二者均可以通过训练获得。
可以理解地,预先通过大量经过标注的样本对第二线性分类器进行训练,使得第二线性分类器具备识别待测客服对话文本的违规类别的功能。
训练结束后,将基本向量输入到第二线性分类器中,以由第二线性分类器预测待测客服对话文本所属的违规类别。本申请中,将第二线性分类器预测待测的客服对话文本所属的违规类别作为客服对话文本的违规类别。
需要说明的是,每一违规类别至少包括一种违规情形。例如,在一实施例中,违规类别包括三种违规情形。
S4,根据注意力机制(attention),将违规的待测客服对话文本与预先针对所述违规类别查找的特征向量分别进行词级别交互以及句子级别交互,以得到所述待测客服对话文本的特征区分向量,其中,各违规类别均对应设有特征向量。
具体实施中,对于每一个违规类别,均通过一个特征向量来表示。本申请实施例中预先确定各违规类别的特征向量。
进一步地,在本申请中预设了多个违规类别,针对每个都预设有特征向量。第二线性分类器输出的违规类别属于预设的多个类别的一种。在确定第二线性分类器输出的违规类别后,在数据库中查找与该违规类别具有映射关系的特征向量。
可以理解地,各违规类别的特征向量只需计算一次,并且该计算过程在步骤S1之前进行。
具体地,参见图4,各违规类别的特征向量的计算过程包括如下步骤:
S101,构建违规情形图,所述违规情形图的节点为各违规情形。
具体实施中,首先会根据违规情形的种类构建一个违规情形图G,违规情形图G的每个节点代表一种违规情形。假如“提供虚假信息”,“未能清楚解释贷款”是其中两项违规情形,则在违规情形图G中可以通过两个相应的节点进行表示。通过构建违规情形图,能够穷尽违规情形,且对违规情形的数量没有限定。
S102,根据所述违规情形图的各节点之间的距离将所述违规情形图划分为多个子图,其中,同一子图的节点之间的距离小于预设的距离阈值,不同子图的节点之间的距离大于预设的距离阈值,每一个子图对应一种违规类别。
具体实施中,每种违规情形都有相应的描述文本,通过这些描述文本,利用TFIDF算法,计算出所有节点两两之间的距离。给定一个合适的距离阈值,若两个节点之间的距离大于这个距离阈值,则保留这两个节点之间的边,相应的权重设置为两者之间的距离,否则,设置 这两个节点之间没有直接的边连接。通过合适的距离阈值设置,可以使得违规情形图包括多个没有相互连接的子图。
例如,假设总有A、B、C、D、E以及F总共6种违规情形,其中A、B以及C三种违规情形比较接近,D、E以及F三种违规情形比较接近。
通过合适的阈值可以得到相应的违规情形图为:G={g1,g2}。
其中,g1={A,B,C};g2={D,E,F}。最终得到的违规情形图G中A、B以及C之间有边相互连接,D、E以及F之间有边相互连接,但是{A,B,C}跟{D,E,F}之间没有边连接。相应地,A、B以及C属于同一种违规类别。D、E以及F属于同一种违规类别。
S103,通过预设的图神经网络计算所述违规情形图的各子图对应的违规类别的特征向量。
具体实施中,图神经网络可采用图卷积神经网络(Graph Convolutional Network,GCN)。将违规情形图输入到图神经网络中,通过图神经网络,获取违规情形图的每个节点的向量表示,每一层的某特定节点的向量表示都通过上一层该节点的向量表示和其对应的邻居节点的向量表示,以及相应的偏置值计算得到。具体的计算公式如下:
Figure PCTCN2021083795-appb-000003
其中,
Figure PCTCN2021083795-appb-000004
表示第l+1层节点的向量表示,N i表示所有跟该节点有边连接的节点组成的集合,
Figure PCTCN2021083795-appb-000005
Figure PCTCN2021083795-appb-000006
均表示权重,可通过训练得到,b l是偏置值。
进一步地,通过池化操作,利用最大池化跟最小池化操作得到具有区分性的各个子图对应的违规类别的特征向量d i。具体公式如下:
Figure PCTCN2021083795-appb-000007
其中,MaP,MiP分别表示最大池化跟最小池化操作。可以理解为通过整理整个子图g i里所有节点的信息,可以获得子图g i对应的违规类别的特征向量d i
特征向量d i可以理解为违规类别的数学表达式,便于计算机能够识别违规类别。
参见图5,在一实施例中,以上步骤S4具体包括如下步骤:
S41,通过预设的双向GRU(Bidirectional Gated RecurrentUnit)对所述待测客服对话文本的各句子进行编码,得到所述待测客服对话文本的各句子的词语的隐状态向量。
具体实施中,待测客服对话文本可通过如下数学式表达:
S=[s 1,s 2,…,s n]=[w 1,1,w 1,2,…w n,m]
其中,s表示待测客服对话文本,s i表示第i个句子,w n,m表示第n个句子的第m个词,利用双向GRU对每个子句进行编码,得到各句子的每个词对应的隐状态向量h i,j该过程的数学表达式如下:
Figure PCTCN2021083795-appb-000008
S42,根据各句子的词语的隐状态向量,将所述待测客服对话文本与所述违规类别的特征向量进行词级别交互以得到所述待测客服对话文本的各句子的句向量。
具体实施中,利用注意力机制进行词级别的交互,得到各句子的句向量
Figure PCTCN2021083795-appb-000009
其具体计算方式如下:
通过以下公式
Figure PCTCN2021083795-appb-000010
以及
Figure PCTCN2021083795-appb-000011
计算所述待测客服对话文本的各句子的句向量。
其中,a i,j为交互词向量,h i,j为词语的隐状态向量,W w为词语的隐状态向量的权重,d o为违规类别的特征向量,W g为违规类别的特征向量的权重,v si为句向量。
S43,通过预设的双向GRU对所述待测客服对话文本的各句子的句向量进行编码,得到所述待测客服对话文本的各句子的隐状态向量。
具体实施中,通过双向GRU编码计算所述待测客服对话文本的各句子的隐状态向量h i的数学表达式如下:
Figure PCTCN2021083795-appb-000012
S44,根据各句子的隐状态向量,将所述待测客服对话文本与所述违规类别的特征向量进行句子级别交互以得到所述待测客服对话文本的特征区分向量。
具体实施中,利用注意力机制进行句子级别的交互,得到所述待测客服对话文本的特征区分向量。其具体计算方式如下:
通过以下公式
Figure PCTCN2021083795-appb-000013
以及
Figure PCTCN2021083795-appb-000014
计算所述待测客服对话文本的特征区分向量。
其中,a i为交互句向量,h i为句子的隐状态向量,W s为句子的隐状态向量的权重,d o为违规类别的特征向量,W g为违规类别的特征向量的权重,
Figure PCTCN2021083795-appb-000015
为特征区分向量。
S5,将所述违规的待测客服对话文本的基本向量与所述特征区分向量合并得到最终向量。
具体实施中,最终向量
Figure PCTCN2021083795-appb-000016
其中,
Figure PCTCN2021083795-appb-000017
为基本向量,
Figure PCTCN2021083795-appb-000018
为特征区分向量。
S6,将所述最终向量输入到预训练的第三线性分类器中,并通过所述第三线性分类器预测所述待测客服对话文本的违规情形。
具体实施中,第三线性分类器的数学表达式为:
y 3=softmax(W 3V f+b 3)
softmax是指归一化处理。W 3表示该线性分类器的权重,b 2表示偏置值,二者均可以通过训练获得。
可以理解地,预先通过大量经过标注的样本对第三线性分类器进行训练,使得第三线性分类器具备识别待测客服对话文本的违规情形的功能。
训练结束后,将最终向量输入到第三线性分类器中,以由第三线性分类器预测待测客服对话文本所属的违规情形。
本申请的技术方案,能够实现自动对待测客服对话文本进行质检,相比于人工质检的方式,具有效率高,且准确性高的优点。
参见图6,图6是本申请实施例提供的一种客服违规质检装置70的示意性框图。对应于以上客服违规质检方法,本申请还提供一种客服违规质检装置70。该客服违规质检装置70包括用于执行上述客服违规质检方法的单元,该客服违规质检装置70可以被配置于台式电脑、平板电脑、手提电脑、等终端中。具体地,该客服违规质检装置70包括编码单元71、判断单元72、第一预测单元73、交互单元74、合并单元75以及第二预测单元76。
编码单元71,用于接收到待测客服对话文本,通过预设的双向RNN网络对所述待测客服对话文本进行编码以得到所述待测客服对话文本的基本向量;
判断单元72,用于将所述基本向量输入到预训练的第一线性分类器中,并通过所述第一线性分类器判断所述待测客服对话文本是否违规;
第一预测单元73,用于若所述待测客服对话文本违规,将所述基本向量输入到预训练的第二线性分类器中,并输出所述待测客服对话文本的违规类别,其中,每一违规类别至少包括一种违规情形;
交互单元74,用于根据注意力机制,将违规的待测客服对话文本与预先针对所述违规类别查找的特征向量分别进行词级别交互以及句子级别交互,以得到所述待测客服对话文本的特征区分向量,其中,各违规类别均对应设有特征向量;
合并单元75,用于将所述违规的待测客服对话文本的基本向量与所述特征区分向量合并得到最终向量;
第二预测单元76,用于将所述最终向量输入到预训练的第三线性分类器中,并通过所述第三线性分类器预测所述待测客服对话文本的违规情形。
在一实施例中,所述通过预设的双向RNN网络对所述待测客服对话文本进行编码以得到所述待测客服对话文本的基本向量,包括:
对所述待测客服对话文本进行分词处理以得到所述待测客服对话文本的分词集合;
对所述分词集合的词语进行词向量训练以得到所述分词集合的词语的词向量;
将所述分词集合的词语的词向量输入到双向RNN网络中并输出所述待测客服对话文本的基本向量。
在一实施例中,所述对所述待测客服对话文本进行分词处理以得到所述待测客服对话文本的分词集合,包括:
通过预设分词工具将所述待测客服对话文本划分为多个词语以得到初始分词集合;
将所述初始分词集合中的停止词删除以得到所述分词集合。
在一实施例中,所述根据注意力机制,将违规的待测客服对话文本与预先针对所述违规类别查找的特征向量分别进行词级别交互以及句子级别交互,以得到所述待测客服对话文本的特征区分向量,包括:
通过预设的双向GRU对所述待测客服对话文本的各句子进行编码,得到所述待测客服对话文本的各句子的词语的隐状态向量;
根据各句子的词语的隐状态向量,将所述待测客服对话文本与所述违规类别的特征向量进行词级别交互以得到所述待测客服对话文本的各句子的句向量;
通过预设的双向GRU对所述待测客服对话文本的各句子的句向量进行编码,得到所述待测客服对话文本的各句子的隐状态向量;
根据各句子的隐状态向量,将所述待测客服对话文本与所述违规类别的特征向量进行句子级别交互以得到所述待测客服对话文本的特征区分向量。
在一实施例中,所述根据各句子的词语的隐状态向量,将所述待测客服对话文本与所述违规类别的特征向量进行词级别交互以得到所述待测客服对话文本的各句子的句向量,包括:
通过以下公式
Figure PCTCN2021083795-appb-000019
以及
Figure PCTCN2021083795-appb-000020
计算所述待测客服对话文本的各句子的句向量,其中,a i,j为交互词向量,h i,j为词语的隐状态向量,W w为词语的隐状态向量的权重,d o为违规类别的特征向量,W g为违规类别的特征向量的权重,v si为句向量。
在一实施例中,所述根据各句子的隐状态向量,将所述待测客服对话文本与所述违规类别的特征向量进行句子级别交互以得到所述待测客服对话文本的特征区分向量,包括:
通过以下公式
Figure PCTCN2021083795-appb-000021
以及
Figure PCTCN2021083795-appb-000022
计算所述待测客服对话文本的特征区分向量,其中,a i为交互句向量,h i为句子的隐状态向量,W s为句子的隐状态向量的权重,d o为违规类别的特征向量,W g为违规类别的特征向量的权重,
Figure PCTCN2021083795-appb-000023
为特征区分向量。
在一实施例中,本申请提出的客服违规质检装置70是上述实施例的基础上增加了构建单元、划分单元以及计算单元。
构建单元,用于构建违规情形图,所述违规情形图的节点为违规情形。
划分单元,用于根据所述违规情形图的各节点之间的距离将所述违规情形图划分为多个子图,其中,同一子图的节点之间的距离小于预设的距离阈值,不同子图的节点之间的距离大于预设的距离阈值,每一个子图对应一种违规类别。
计算单元,用于通过预设的图神经网络计算所述违规情形图的各子图对应的违规类别的特征向量。
需要说明的是,所属领域的技术人员可以清楚地了解到,上述客服违规质检装置70和各单元的具体实现过程,可以参考前述方法实施例中的相应描述,为了描述的方便和简洁,在此不再赘述。
上述客服违规质检装置70可以实现为一种计算机程序的形式,该计算机程序可以在如图7所示的计算机设备上运行。
请参阅图7,图7是本申请实施例提供的一种计算机设备的示意性框图。该计算机设备 500可以是终端。其中,终端可以是智能手机、平板电脑、笔记本电脑、台式电脑、个人数字助理和穿戴式设备等具有通信功能的电子设备。
该计算机设备500包括通过系统总线501连接的处理器502、存储器和网络接口505,其中,存储器可以包括非易失性存储介质503和内存储器504。
该非易失性存储介质503可存储操作系统5031和计算机程序5032。该计算机程序5032被执行时,可使得处理器502执行一种客服违规质检方法。
该处理器502用于提供计算和控制能力,以支撑整个计算机设备500的运行。
该内存储器504为非易失性存储介质503中的计算机程序5032的运行提供环境,该计算机程序5032被处理器502执行时,可使得处理器502执行一种客服违规质检方法。
该网络接口505用于与其它设备进行网络通信。本领域技术人员可以理解,上述结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备500的限定,具体的计算机设备500可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
其中,所述处理器502用于运行存储在存储器中的计算机程序5032,以实现如下步骤:
接收到待测客服对话文本,通过预设的双向RNN网络对所述待测客服对话文本进行编码以得到所述待测客服对话文本的基本向量;
将所述基本向量输入到预训练的第一线性分类器中,并通过所述第一线性分类器判断所述待测客服对话文本是否违规;
若所述待测客服对话文本违规,将所述基本向量输入到预训练的第二线性分类器中,并输出所述待测客服对话文本的违规类别,其中,每一违规类别至少包括一种违规情形;
根据注意力机制,将违规的待测客服对话文本与预先针对所述违规类别查找的特征向量分别进行词级别交互以及句子级别交互,以得到所述待测客服对话文本的特征区分向量,其中,各违规类别均对应设有特征向量;
将所述违规的待测客服对话文本的基本向量与所述特征区分向量合并得到最终向量;
将所述最终向量输入到预训练的第三线性分类器中,并通过所述第三线性分类器预测所述待测客服对话文本的违规情形。
在一实施例中,所述接收到待测客服对话文本,通过预设的双向RNN网络对所述待测客服对话文本进行编码以得到所述待测客服对话文本的基本向量之前,所述处理器502还实现如下步骤:
构建违规情形图,所述违规情形图的节点为违规情形;
根据所述违规情形图的各节点之间的距离将所述违规情形图划分为多个子图,其中,同一子图的节点之间的距离小于预设的距离阈值,不同子图的节点之间的距离大于预设的距离阈值,每一个子图对应一种违规类别;
通过预设的图神经网络计算所述违规情形图的各子图对应的违规类别的特征向量。
在一实施例中,所述通过预设的双向RNN网络对所述待测客服对话文本进行编码以得到所述待测客服对话文本的基本向量,包括:
对所述待测客服对话文本进行分词处理以得到所述待测客服对话文本的分词集合;
对所述分词集合的词语进行词向量训练以得到所述分词集合的词语的词向量;
将所述分词集合的词语的词向量输入到双向RNN网络中并输出所述待测客服对话文本的基本向量。
在一实施例中,所述对所述待测客服对话文本进行分词处理以得到所述待测客服对话文本的分词集合,包括:
通过预设分词工具将所述待测客服对话文本划分为多个词语以得到初始分词集合;
将所述初始分词集合中的停止词删除以得到所述分词集合。
在一实施例中,所述根据注意力机制,将违规的待测客服对话文本与预先针对所述违规类别查找的特征向量分别进行词级别交互以及句子级别交互,以得到所述待测客服对话文本 的特征区分向量,包括:
通过预设的双向GRU对所述待测客服对话文本的各句子进行编码,得到所述待测客服对话文本的各句子的词语的隐状态向量;
根据各句子的词语的隐状态向量,将所述待测客服对话文本与所述违规类别的特征向量进行词级别交互以得到所述待测客服对话文本的各句子的句向量;
通过预设的双向GRU对所述待测客服对话文本的各句子的句向量进行编码,得到所述待测客服对话文本的各句子的隐状态向量;
根据各句子的隐状态向量,将所述待测客服对话文本与所述违规类别的特征向量进行句子级别交互以得到所述待测客服对话文本的特征区分向量。
在一实施例中,所述根据各句子的词语的隐状态向量,将所述待测客服对话文本与所述违规类别的特征向量进行词级别交互以得到所述待测客服对话文本的各句子的句向量,包括:
通过以下公式
Figure PCTCN2021083795-appb-000024
以及
Figure PCTCN2021083795-appb-000025
计算所述待测客服对话文本的各句子的句向量,其中,a i,j为交互词向量,h i,j为词语的隐状态向量,W w为词语的隐状态向量的权重,d o为违规类别的特征向量,W g为违规类别的特征向量的权重,v si为句向量。
在一实施例中,所述根据各句子的隐状态向量,将所述待测客服对话文本与所述违规类别的特征向量进行句子级别交互以得到所述待测客服对话文本的特征区分向量,包括:
通过以下公式
Figure PCTCN2021083795-appb-000026
以及
Figure PCTCN2021083795-appb-000027
计算所述待测客服对话文本的特征区分向量,其中,a i为交互句向量,h i为句子的隐状态向量,W s为句子的隐状态向量的权重,d o为违规类别的特征向量,W g为违规类别的特征向量的权重,
Figure PCTCN2021083795-appb-000028
为特征区分向量。
应当理解,在本申请实施例中,处理器502可以是中央处理单元(Central Processing Unit,CPU),该处理器502还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
本领域普通技术人员可以理解的是实现上述实施例的方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成。该计算机程序可存储于一存储介质中,该存储介质为计算机可读存储介质。该计算机程序被该计算机系统中的至少一个处理器执行,以实现上述方法的实施例的流程步骤。
因此,本申请还提供一种存储介质。该存储介质可以为计算机可读存储介质。该存储介质存储有计算机程序。该计算机程序被处理器执行时使处理器执行如下步骤:
接收到待测客服对话文本,通过预设的双向RNN网络对所述待测客服对话文本进行编码以得到所述待测客服对话文本的基本向量;
将所述基本向量输入到预训练的第一线性分类器中,并通过所述第一线性分类器判断所述待测客服对话文本是否违规;
若所述待测客服对话文本违规,将所述基本向量输入到预训练的第二线性分类器中,并输出所述待测客服对话文本的违规类别,其中,每一违规类别至少包括一种违规情形;
根据注意力机制,将违规的待测客服对话文本与预先针对所述违规类别查找的特征向量分别进行词级别交互以及句子级别交互,以得到所述待测客服对话文本的特征区分向量,其中,各违规类别均对应设有特征向量;
将所述违规的待测客服对话文本的基本向量与所述特征区分向量合并得到最终向量;
将所述最终向量输入到预训练的第三线性分类器中,并通过所述第三线性分类器预测所述待测客服对话文本的违规情形。
在一实施例中,所述接收到待测客服对话文本,通过预设的双向RNN网络对所述待测客服对话文本进行编码以得到所述待测客服对话文本的基本向量之前,该计算机程序被处理器执行时使处理器还执行如下步骤:
构建违规情形图,所述违规情形图的节点为违规情形;
根据所述违规情形图的各节点之间的距离将所述违规情形图划分为多个子图,其中,同一子图的节点之间的距离小于预设的距离阈值,不同子图的节点之间的距离大于预设的距离阈值,每一个子图对应一种违规类别;
通过预设的图神经网络计算所述违规情形图的各子图对应的违规类别的特征向量。
在一实施例中,所述通过预设的双向RNN网络对所述待测客服对话文本进行编码以得到所述待测客服对话文本的基本向量,包括:
对所述待测客服对话文本进行分词处理以得到所述待测客服对话文本的分词集合;
对所述分词集合的词语进行词向量训练以得到所述分词集合的词语的词向量;
将所述分词集合的词语的词向量输入到双向RNN网络中并输出所述待测客服对话文本的基本向量。
在一实施例中,所述对所述待测客服对话文本进行分词处理以得到所述待测客服对话文本的分词集合,包括:
通过预设分词工具将所述待测客服对话文本划分为多个词语以得到初始分词集合;
将所述初始分词集合中的停止词删除以得到所述分词集合。
在一实施例中,所述根据注意力机制,将违规的待测客服对话文本与预先针对所述违规类别查找的特征向量分别进行词级别交互以及句子级别交互,以得到所述待测客服对话文本的特征区分向量,包括:
通过预设的双向GRU对所述待测客服对话文本的各句子进行编码,得到所述待测客服对话文本的各句子的词语的隐状态向量;
根据各句子的词语的隐状态向量,将所述待测客服对话文本与所述违规类别的特征向量进行词级别交互以得到所述待测客服对话文本的各句子的句向量;
通过预设的双向GRU对所述待测客服对话文本的各句子的句向量进行编码,得到所述待测客服对话文本的各句子的隐状态向量;
根据各句子的隐状态向量,将所述待测客服对话文本与所述违规类别的特征向量进行句子级别交互以得到所述待测客服对话文本的特征区分向量。
在一实施例中,所述根据各句子的词语的隐状态向量,将所述待测客服对话文本与所述违规类别的特征向量进行词级别交互以得到所述待测客服对话文本的各句子的句向量,包括:
通过以下公式
Figure PCTCN2021083795-appb-000029
以及
Figure PCTCN2021083795-appb-000030
计算所述待测客服对话文本的各句子的句向量,其中,a i,j为交互词向量,h i,j为词语的隐状态向量,W w为词语的隐状态向量的权重,d o为违规类别的特征向量,W g为违规类别的特征向量的权重,v si为句向量。
在一实施例中,所述根据各句子的隐状态向量,将所述待测客服对话文本与所述违规类别的特征向量进行句子级别交互以得到所述待测客服对话文本的特征区分向量,包括:
通过以下公式
Figure PCTCN2021083795-appb-000031
以及
Figure PCTCN2021083795-appb-000032
计算所述待测客服对话文本的特征区分向量,其中,a i为交互句向量,h i为句子的隐状态向量,W s为句子的隐状态向量的权重,d o为违规类别的特征向量,W g为违规类别的特征向量的权重,
Figure PCTCN2021083795-appb-000033
为特征区分向量。
所述存储介质为实体的、非瞬时性的存储介质,例如可以是U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的实体存储介质。所述计算机可读存储介质可以是非易失性,也可以是易失性。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算 法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的。例如,各个单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。
本申请实施例方法中的步骤可以根据实际需要进行顺序调整、合并和删减。本申请实施例装置中的单元可以根据实际需要进行合并、划分和删减。另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。
该集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,终端,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详细描述的部分,可以参见其他实施例的相关描述。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,尚且本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (20)

  1. 一种客服违规质检方法,包括:
    接收待测客服对话文本,通过预设的双向RNN网络对所述待测客服对话文本进行编码以得到所述待测客服对话文本的基本向量;
    将所述基本向量输入到预训练的第一线性分类器中,并通过所述第一线性分类器判断所述待测客服对话文本是否违规;
    若所述待测客服对话文本违规,将所述基本向量输入到预训练的第二线性分类器中,并输出所述待测客服对话文本的违规类别;
    根据注意力机制,将违规的待测客服对话文本与预先针对所述违规类别查找的特征向量分别进行词级别交互以及句子级别交互,以得到所述待测客服对话文本的特征区分向量;
    将所述违规的待测客服对话文本的基本向量与所述特征区分向量合并得到最终向量;
    将所述最终向量输入到预训练的第三线性分类器中,并通过所述第三线性分类器预测所述待测客服对话文本的违规情形。
  2. 根据权利要求1所述的客服违规质检方法,其中,所述接收到待测客服对话文本,通过预设的双向RNN网络对所述待测客服对话文本进行编码以得到所述待测客服对话文本的基本向量之前,所述客服违规质检方法还包括:
    构建违规情形图,所述违规情形图的节点为违规情形;
    根据所述违规情形图的各节点之间的距离将所述违规情形图划分为多个子图,其中,同一子图的节点之间的距离小于预设的距离阈值,不同子图的节点之间的距离大于预设的距离阈值,每一个子图对应一种违规类别;
    通过预设的图神经网络计算所述违规情形图的各子图对应的违规类别的特征向量。
  3. 根据权利要求1所述的客服违规质检方法,其中,所述通过预设的双向RNN网络对所述待测客服对话文本进行编码以得到所述待测客服对话文本的基本向量,包括:
    对所述待测客服对话文本进行分词处理以得到所述待测客服对话文本的分词集合;
    对所述分词集合的词语进行词向量训练以得到所述分词集合的词语的词向量;
    将所述分词集合的词语的词向量输入到双向RNN网络中并输出所述待测客服对话文本的基本向量。
  4. 根据权利要求3所述的客服违规质检方法,其中,所述对所述待测客服对话文本进行分词处理以得到所述待测客服对话文本的分词集合,包括:
    通过预设分词工具将所述待测客服对话文本划分为多个词语以得到初始分词集合;
    将所述初始分词集合中的停止词删除以得到所述分词集合。
  5. 根据权利要求1所述的客服违规质检方法,其中,所述根据注意力机制,将违规的待测客服对话文本与预先针对所述违规类别查找的特征向量分别进行词级别交互以及句子级别交互,以得到所述待测客服对话文本的特征区分向量,包括:
    通过预设的双向GRU对所述待测客服对话文本的各句子进行编码,得到所述待测客服对话文本的各句子的词语的隐状态向量;
    根据各句子的词语的隐状态向量,将所述待测客服对话文本与所述违规类别的特征向量进行词级别交互以得到所述待测客服对话文本的各句子的句向量;
    通过预设的双向GRU对所述待测客服对话文本的各句子的句向量进行编码,得到所述待测客服对话文本的各句子的隐状态向量;
    根据各句子的隐状态向量,将所述待测客服对话文本与所述违规类别的特征向量进行句子级别交互以得到所述待测客服对话文本的特征区分向量。
  6. 根据权利要求5所述的客服违规质检方法,其中,所述根据各句子的词语的隐状态向量,将所述待测客服对话文本与所述违规类别的特征向量进行词级别交互以得到所述待测客服对话文本的各句子的句向量,包括:
    通过以下公式
    Figure PCTCN2021083795-appb-100001
    以及
    Figure PCTCN2021083795-appb-100002
    计算所述待测客服对话文本的各句子的句向量,其中,a i,j为交互词向量,h i,j为词语的隐状态向量,W w为词语的隐状态向量的权重,d o为违规类别的特征向量,W g为违规类别的特征向量的权重,v si为句向量。
  7. 根据权利要求5所述的客服违规质检方法,其中,所述根据各句子的隐状态向量,将所述待测客服对话文本与所述违规类别的特征向量进行句子级别交互以得到所述待测客服对话文本的特征区分向量,包括:
    通过以下公式
    Figure PCTCN2021083795-appb-100003
    以及
    Figure PCTCN2021083795-appb-100004
    计算所述待测客服对话文本的特征区分向量,其中,a i为交互句向量,h i为句子的隐状态向量,W s为句子的隐状态向量的权重,d o为违规类别的特征向量,W g为违规类别的特征向量的权重,
    Figure PCTCN2021083795-appb-100005
    为特征区分向量。
  8. 根据权利要求3所述的客服违规质检方法,其中,所述对所述分词集合的词语进行词向量训练以得到所述分词集合的词语的词向量,包括:
    采用word2vec来对对所述分词集合的词语进行词向量训练。
  9. 根据权利要求1所述的客服违规质检方法,其中,所述将所述基本向量输入到预训练的第一线性分类器中,并通过所述第一线性分类器判断所述待测客服对话文本是否违规之前,所述方法还包括:
    通过经过标注的样本对第一线性分类器进行训练。
  10. 一种客服违规质检装置,包括:
    编码单元,用于接收到待测客服对话文本,通过预设的双向RNN网络对所述待测客服对话文本进行编码以得到所述待测客服对话文本的基本向量;
    判断单元,用于将所述基本向量输入到预训练的第一线性分类器中,并通过所述第一线性分类器判断所述待测客服对话文本是否违规;
    第一预测单元,用于若所述待测客服对话文本违规,将所述基本向量输入到预训练的第二线性分类器中,并输出所述待测客服对话文本的违规类别,其中,每一违规类别至少包括一种违规情形;
    交互单元,用于根据注意力机制,将违规的待测客服对话文本与预先针对所述违规类别查找的特征向量分别进行词级别交互以及句子级别交互,以得到所述待测客服对话文本的特征区分向量;
    合并单元,用于将所述违规的待测客服对话文本的基本向量与所述特征区分向量合并得到最终向量;
    第二预测单元,用于将所述最终向量输入到预训练的第三线性分类器中,并通过所述第三线性分类器预测所述待测客服对话文本的违规情形。
  11. 一种计算机设备,所述计算机设备包括存储器及处理器,所述存储器上存储有计算机程序,所述处理器用于运行所述计算机程序,以执行如下步骤:
    接收待测客服对话文本,通过预设的双向RNN网络对所述待测客服对话文本进行编码以得到所述待测客服对话文本的基本向量;
    将所述基本向量输入到预训练的第一线性分类器中,并通过所述第一线性分类器判断所述待测客服对话文本是否违规;
    若所述待测客服对话文本违规,将所述基本向量输入到预训练的第二线性分类器中,并输出所述待测客服对话文本的违规类别;
    根据注意力机制,将违规的待测客服对话文本与预先针对所述违规类别查找的特征向量分别进行词级别交互以及句子级别交互,以得到所述待测客服对话文本的特征区分向量;
    将所述违规的待测客服对话文本的基本向量与所述特征区分向量合并得到最终向量;
    将所述最终向量输入到预训练的第三线性分类器中,并通过所述第三线性分类器预测所 述待测客服对话文本的违规情形。
  12. 根据权利要求11所述的计算机设备,其中,所述接收到待测客服对话文本,通过预设的双向RNN网络对所述待测客服对话文本进行编码以得到所述待测客服对话文本的基本向量之前,所述处理器还执行如下步骤:
    构建违规情形图,所述违规情形图的节点为违规情形;
    根据所述违规情形图的各节点之间的距离将所述违规情形图划分为多个子图,其中,同一子图的节点之间的距离小于预设的距离阈值,不同子图的节点之间的距离大于预设的距离阈值,每一个子图对应一种违规类别;
    通过预设的图神经网络计算所述违规情形图的各子图对应的违规类别的特征向量。
  13. 根据权利要求11所述的计算机设备,其中,所述通过预设的双向RNN网络对所述待测客服对话文本进行编码以得到所述待测客服对话文本的基本向量,包括:
    对所述待测客服对话文本进行分词处理以得到所述待测客服对话文本的分词集合;
    对所述分词集合的词语进行词向量训练以得到所述分词集合的词语的词向量;
    将所述分词集合的词语的词向量输入到双向RNN网络中并输出所述待测客服对话文本的基本向量。
  14. 根据权利要求13所述的计算机设备,其中,所述对所述待测客服对话文本进行分词处理以得到所述待测客服对话文本的分词集合,包括:
    通过预设分词工具将所述待测客服对话文本划分为多个词语以得到初始分词集合;
    将所述初始分词集合中的停止词删除以得到所述分词集合。
  15. 根据权利要求11所述的计算机设备,其中,所述根据注意力机制,将违规的待测客服对话文本与预先针对所述违规类别查找的特征向量分别进行词级别交互以及句子级别交互,以得到所述待测客服对话文本的特征区分向量,包括:
    通过预设的双向GRU对所述待测客服对话文本的各句子进行编码,得到所述待测客服对话文本的各句子的词语的隐状态向量;
    根据各句子的词语的隐状态向量,将所述待测客服对话文本与所述违规类别的特征向量进行词级别交互以得到所述待测客服对话文本的各句子的句向量;
    通过预设的双向GRU对所述待测客服对话文本的各句子的句向量进行编码,得到所述待测客服对话文本的各句子的隐状态向量;
    根据各句子的隐状态向量,将所述待测客服对话文本与所述违规类别的特征向量进行句子级别交互以得到所述待测客服对话文本的特征区分向量。
  16. 根据权利要求15所述的计算机设备,其中,所述根据各句子的词语的隐状态向量,将所述待测客服对话文本与所述违规类别的特征向量进行词级别交互以得到所述待测客服对话文本的各句子的句向量,包括:
    通过以下公式
    Figure PCTCN2021083795-appb-100006
    以及
    Figure PCTCN2021083795-appb-100007
    计算所述待测客服对话文本的各句子的句向量,其中,a i,j为交互词向量,h i,j为词语的隐状态向量,W w为词语的隐状态向量的权重,d o为违规类别的特征向量,W g为违规类别的特征向量的权重,v si为句向量。
  17. 根据权利要求15所述的计算机设备,其中,所述根据各句子的隐状态向量,将所述待测客服对话文本与所述违规类别的特征向量进行句子级别交互以得到所述待测客服对话文本的特征区分向量,包括:
    通过以下公式
    Figure PCTCN2021083795-appb-100008
    以及
    Figure PCTCN2021083795-appb-100009
    计算所述待测客服对话文本的特征区分向量,其中,a i为交互句向量,h i为句子的隐状态向量,W s为句子的隐状态向量的权重,d o为违规类别的特征向量,W g为违规类别的特征向量的权重,
    Figure PCTCN2021083795-appb-100010
    为特征区分向量。
  18. 根据权利要求13所述的计算机设备,其中,所述对所述分词集合的词语进行词向量 训练以得到所述分词集合的词语的词向量,包括:
    采用word2vec来对对所述分词集合的词语进行词向量训练。
  19. 根据权利要求11所述的计算机设备,其中,所述将所述基本向量输入到预训练的第一线性分类器中,并通过所述第一线性分类器判断所述待测客服对话文本是否违规之前,所述处理器还执行如下步骤:
    通过经过标注的样本对第一线性分类器进行训练。
  20. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序被处理器执行时使所述处理器执行以下步骤:
    接收待测客服对话文本,通过预设的双向RNN网络对所述待测客服对话文本进行编码以得到所述待测客服对话文本的基本向量;
    将所述基本向量输入到预训练的第一线性分类器中,并通过所述第一线性分类器判断所述待测客服对话文本是否违规;
    若所述待测客服对话文本违规,将所述基本向量输入到预训练的第二线性分类器中,并输出所述待测客服对话文本的违规类别;
    根据注意力机制,将违规的待测客服对话文本与预先针对所述违规类别查找的特征向量分别进行词级别交互以及句子级别交互,以得到所述待测客服对话文本的特征区分向量;
    将所述违规的待测客服对话文本的基本向量与所述特征区分向量合并得到最终向量;
    将所述最终向量输入到预训练的第三线性分类器中,并通过所述第三线性分类器预测所述待测客服对话文本的违规情形。
PCT/CN2021/083795 2020-12-01 2021-03-30 客服违规质检方法、装置、计算机设备及存储介质 WO2022116438A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011387369.4 2020-12-01
CN202011387369.4A CN112507121B (zh) 2020-12-01 2020-12-01 客服违规质检方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022116438A1 true WO2022116438A1 (zh) 2022-06-09

Family

ID=74969185

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/083795 WO2022116438A1 (zh) 2020-12-01 2021-03-30 客服违规质检方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN112507121B (zh)
WO (1) WO2022116438A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507121B (zh) * 2020-12-01 2023-06-30 平安科技(深圳)有限公司 客服违规质检方法、装置、计算机设备及存储介质
CN114418542A (zh) * 2022-01-20 2022-04-29 京东科技信息技术有限公司 业务操作流程的检验方法、装置、设备及存储介质
CN116823069B (zh) * 2023-08-30 2023-12-05 北京中关村科金技术有限公司 基于文本分析的智能客服服务质检方法及相关设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106909654A (zh) * 2017-02-24 2017-06-30 北京时间股份有限公司 一种基于新闻文本信息的多级分类系统及方法
US20190266325A1 (en) * 2018-02-28 2019-08-29 Microsoft Technology Licensing, Llc Automatic malicious session detection
CN110288192A (zh) * 2019-05-23 2019-09-27 平安科技(深圳)有限公司 基于多个质检模型的质检方法、装置、设备及存储介质
CN110310663A (zh) * 2019-05-16 2019-10-08 平安科技(深圳)有限公司 违规话术检测方法、装置、设备及计算机可读存储介质
CN112507121A (zh) * 2020-12-01 2021-03-16 平安科技(深圳)有限公司 客服违规质检方法、装置、计算机设备及存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8727056B2 (en) * 2011-04-01 2014-05-20 Navman Wireless North America Ltd. Systems and methods for generating and using moving violation alerts
US20130110748A1 (en) * 2011-08-30 2013-05-02 Google Inc. Policy Violation Checker
CN107507442A (zh) * 2017-06-29 2017-12-22 百度在线网络技术(北京)有限公司 车辆违章预警方法及装置、计算机设备及可读介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106909654A (zh) * 2017-02-24 2017-06-30 北京时间股份有限公司 一种基于新闻文本信息的多级分类系统及方法
US20190266325A1 (en) * 2018-02-28 2019-08-29 Microsoft Technology Licensing, Llc Automatic malicious session detection
CN110310663A (zh) * 2019-05-16 2019-10-08 平安科技(深圳)有限公司 违规话术检测方法、装置、设备及计算机可读存储介质
CN110288192A (zh) * 2019-05-23 2019-09-27 平安科技(深圳)有限公司 基于多个质检模型的质检方法、装置、设备及存储介质
CN112507121A (zh) * 2020-12-01 2021-03-16 平安科技(深圳)有限公司 客服违规质检方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
CN112507121A (zh) 2021-03-16
CN112507121B (zh) 2023-06-30

Similar Documents

Publication Publication Date Title
WO2022116438A1 (zh) 客服违规质检方法、装置、计算机设备及存储介质
US11227118B2 (en) Methods, devices, and systems for constructing intelligent knowledge base
US10984340B2 (en) Composite machine-learning system for label prediction and training data collection
US11645470B2 (en) Automated testing of dialog systems
CN107644011B (zh) 用于细粒度医疗实体提取的系统和方法
US7606784B2 (en) Uncertainty management in a decision-making system
WO2017067153A1 (zh) 基于文本分析的信用风险评估方法及装置、存储介质
US20160232630A1 (en) System and method in support of digital document analysis
WO2022218186A1 (zh) 个性化知识图谱的生成方法、装置及计算机设备
US20190121842A1 (en) Content adjustment and display augmentation for communication
WO2020082734A1 (zh) 文本情感识别方法、装置、电子设备及计算机非易失性可读存储介质
WO2021208727A1 (zh) 基于人工智能的文本错误检测方法、装置、计算机设备
CN113407677B (zh) 评估咨询对话质量的方法、装置、设备和存储介质
Argamon Computational forensic authorship analysis: Promises and pitfalls
US11080313B2 (en) Recommendation technique using automatic conversation
CN112100374A (zh) 文本聚类方法、装置、电子设备及存储介质
CN113392218A (zh) 文本质量评估模型的训练方法和确定文本质量的方法
KR20200041199A (ko) 챗봇 구동 방법, 장치 및 컴퓨터 판독가능 매체
CN116127001A (zh) 敏感词检测方法、装置、计算机设备及存储介质
WO2021147404A1 (zh) 依存关系分类方法及相关设备
CN116975400B (zh) 一种数据分类分级方法、装置、电子设备及存储介质
CN116402166B (zh) 一种预测模型的训练方法、装置、电子设备及存储介质
WO2021072864A1 (zh) 文本相似度获取方法、装置、电子设备及计算机可读存储介质
EP3876228A1 (en) Automated assessment of the quality of a dialogue system in real time
US10296585B2 (en) Assisted free form decision definition using rules vocabulary

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21899482

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21899482

Country of ref document: EP

Kind code of ref document: A1