CN114676247A - Complaint prediction method, model building method and device thereof and related equipment - Google Patents

Complaint prediction method, model building method and device thereof and related equipment Download PDF

Info

Publication number
CN114676247A
CN114676247A CN202210107767.9A CN202210107767A CN114676247A CN 114676247 A CN114676247 A CN 114676247A CN 202210107767 A CN202210107767 A CN 202210107767A CN 114676247 A CN114676247 A CN 114676247A
Authority
CN
China
Prior art keywords
word
sequence
prediction model
customer service
complaint prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210107767.9A
Other languages
Chinese (zh)
Inventor
王子奕
鞠剑勋
李健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zhilv Information Technology Co ltd
Original Assignee
Shanghai Zhilv Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Zhilv Information Technology Co ltd filed Critical Shanghai Zhilv Information Technology Co ltd
Priority to CN202210107767.9A priority Critical patent/CN114676247A/en
Publication of CN114676247A publication Critical patent/CN114676247A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0282Rating or review of business operators or products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/14Travel agencies

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Strategic Management (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Tourism & Hospitality (AREA)
  • Software Systems (AREA)
  • Accounting & Taxation (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Primary Health Care (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Resources & Organizations (AREA)
  • Game Theory and Decision Science (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a complaint prediction method, a method and a device for establishing a model thereof and related equipment, wherein the method comprises the following steps: setting a predicted tag set of the complaint prediction model; acquiring a historical customer service conversation text and a label of the historical customer service conversation text; segmenting the historical customer service dialogue text by using an LTP language processing tool; generating a word token id sequence, a word position id sequence, a word fragment id sequence and a word mask sequence by using a BERT word segmentation device; inputting a model coding layer to obtain a character characteristic sequence; obtaining a word feature sequence; inputting the word characteristic sequence into a graph neural network of a grammar layer to sense grammatical characteristics; outputting a sentence representation of the historical customer service dialog text; performing affine transformation and normalization processing on the sentence expression to obtain label probability distribution; inputting the sentence representation into a label confusion layer, and calculating the probability distribution of the pseudo labels; calculating a loss function; and iteratively training a complaint prediction model. The invention realizes the complaint prediction of the online travel agency.

Description

Complaint prediction method, model building method and device thereof and related equipment
Technical Field
The invention relates to the technical field of computer application, in particular to a complaint prediction method, a complaint prediction model building method, a complaint prediction device and related equipment.
Background
With the rapid development of deep learning related technologies, the position of a neural network model in the field of Natural Language Processing (NLP) is becoming more important. Compared with the traditional machine learning algorithms such as naive Bayes, a support vector machine and an N-gram, the neural network has the great advantages in the aspect of automatically extracting features and constructing high-level abstraction, so that the limitation that manual feature engineering consumes time and labor and relies on expert experience can be effectively overcome.
The identification of the customer complaints in the online travel agency industry is a main application direction of text classification in a travel scene, and the task difficulty lies in that:
1. the Chinese language has a large number of synonyms and word ambiguity phenomena, and informality, fuzziness and diversity of a user in Instant Message (IM) expression bring great challenges to classification;
2. the labeling data amount is usually less, the labeling cost is high, and a classifier with enough generalization capability is difficult to learn;
3. the boundaries among the early warning labels are not necessarily clear, some contents can be even divided into a plurality of categories according to semantics, and the labeling noise is serious.
Compared with Computer Vision (CV), although the supervised data set in the NLP domain is often very small, resulting in easy overfitting of the deep learning model, the NLP domain has an advantage in that there are a lot of unsupervised corpora, and if the unsupervised and self-supervised learning can be performed by fully utilizing such data, there is a possibility to improve the performance of the model in the downstream task, which is an incentive for pre-training the language model. Since the BERT model proposed by Google in 2018 refreshed SOTA of various natural language understanding tasks, the research of the pre-training language model came into the hot tide. BERT takes a Transformer encoder as a main structure, replaces the position of Recurrent Neural Networks (RNNs) in sequence processing by parallel computation of a multi-head self-attention machine system, and provides a new paradigm for encoding text representation.
When applied to Chinese text processing, the WordPiece segmentation algorithm adopted by BERT generally only converts sentences into character sequences simply, which ignores word-level information and dependency relationships among Chinese words which can be independently used as sentence components. In addition, because the assumption of the label one-hot encoding mode is too strong, the mutual overlapping relation between labels is ignored, a large amount of semantic information contained in the labels is easily lost, and the situation that a model cannot process label confusion or noise exists is caused.
Therefore, how to capture word-level association and avoid losing a large amount of semantic information contained in the tags to cause that a model cannot process the situation of tag confusion or noise so as to realize complaint prediction of an online travel agency is a technical problem to be solved in the field.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a complaint prediction method, a complaint prediction model establishing device and related equipment, which can capture word-level association and avoid the condition that the model cannot process label confusion or noise caused by losing a large amount of semantic information contained in the labels, thereby realizing the complaint prediction of an online travel agency.
According to an aspect of the present invention, there is provided a complaint prediction model building method including:
Setting a set of predicted tags for the complaint prediction model;
obtaining a historical customer service conversation text and a label of the historical customer service conversation text;
using an LTP language processing tool to perform word segmentation on the historical customer service dialogue text to generate a word mask sequence and a syntactic dependency relationship graph;
generating a word token id sequence, a word position id sequence, a word fragment id sequence and a word mask sequence based on the historical customer service conversation text by using a BERT word segmentation device;
inputting the word token id sequence, the word position id sequence and the word fragment id sequence into a model coding layer to obtain a word feature sequence of the user content identified by the historical customer service dialog text in a character dimension;
converting the character feature sequence and the word mask sequence into set sizes, and obtaining a word feature sequence based on the character feature sequence and the word mask sequence;
inputting the word feature sequence into a graph neural network of a grammar layer so as to sense grammatical features of the word feature sequence;
outputting sentence representation of the historical customer service dialogue text according to the grammatical feature of the word feature sequence and the word mask sequence;
performing affine transformation and normalization processing on the sentence representation to obtain label probability distribution of labels output by the complaint prediction model;
Inputting the sentence representation into a label confusion layer, and calculating the probability distribution of pseudo labels;
calculating a loss function of the complaint prediction model according to the pseudo label probability distribution and the label probability distribution;
iteratively training the complaint prediction model so that the calculated loss function meets a set condition.
In some embodiments of the present invention, the obtaining the historical customer service dialog text and the tag of the historical customer service dialog text further comprises:
setting the maximum word quantity of the historical customer service dialogue text;
and setting the maximum character number of the words in the historical customer service dialogue text.
In some embodiments of the invention, the word feature sequence is a sequence obtained using only the first 6 output characters of the BERT tokenizer, and the word feature sequence obtained by the encoding layer is input.
In some embodiments of the present invention, the converting the word feature sequence and the word mask sequence into a set size and obtaining the word feature sequence based on the word feature sequence and the word mask sequence includes:
and taking the word mask of the word mask sequence as a weight, and averaging the word character dimensionality to the characteristics of each character of the word to obtain a word characteristic sequence.
In some embodiments of the invention, the graph neural network of the syntax layer includes two multiple relation graph neural network layers.
In some embodiments of the invention, the loss function of the complaint prediction model is KL divergence.
According to still another aspect of the present invention, there is also provided a complaint prediction method including:
receiving dialog text input by a user;
generating a word token id sequence, a word position id sequence, a word segmentation id sequence, a word mask sequence and a syntactic dependency relationship graph according to the dialog text;
inputting the dialog text, generated word token id sequence, word position id sequence, word segmentation id sequence, word mask sequence and syntax dependence relationship graph into the complaint prediction model established by the complaint prediction model establishing method;
obtaining label probability distribution of labels predicted by the complaint prediction model;
and taking the label with the highest probability in the label probability distribution as a prediction label of the dialog text.
According to another aspect of the present invention, there is also provided a complaint prediction model creation apparatus, including:
a setting module for setting a set of predicted labels for the complaint prediction model;
the acquisition module is used for acquiring the historical customer service conversation text and the label of the historical customer service conversation text;
The word segmentation module is used for segmenting words of the historical customer service dialogue text by using an LTP language processing tool to generate a word mask sequence and a syntactic dependency relationship graph;
the embedded module is used for generating a word token id sequence, a word position id sequence, a word fragment id sequence and a word mask sequence based on the historical customer service conversation text by using a BERT word segmentation device;
the coding module is used for inputting the word token id sequence, the word position id sequence and the word segment id sequence into a model coding layer to obtain a word feature sequence of the user content identified by the historical customer service dialogue text in character dimension;
the conversion module is used for converting the character characteristic sequence and the word mask sequence into set sizes and acquiring a word characteristic sequence based on the character characteristic sequence and the word mask sequence;
the grammar module is used for inputting the word feature sequence into a graph neural network of a grammar layer so as to sense the grammar features of the word feature sequence;
a sentence representation module for outputting sentence representation of the history customer service dialogue text according to the grammatical feature of the word feature sequence and the word mask sequence;
the probability distribution module is used for executing affine transformation and normalization processing on the sentence expression to obtain label probability distribution of labels output by the complaint prediction model;
The label confusion module is used for inputting the sentence representation into a label confusion layer and calculating the probability distribution of the pseudo labels;
a loss calculation module, configured to calculate a loss function of the complaint prediction model according to the pseudo label probability distribution and the label probability distribution;
and the iterative training module is used for iteratively training the complaint prediction model so as to enable the calculated loss function to accord with the set condition.
According to still another aspect of the present invention, there is also provided an electronic apparatus, including: a processor; a storage medium having stored thereon a computer program which, when executed by the processor, performs the steps of the complaint prediction model building method described above.
According to yet another aspect of the present invention, there is also provided a storage medium having stored thereon a computer program which, when being executed by a processor, performs the steps of the complaint prediction model building method described above.
Compared with the prior art, the invention has the advantages that:
the invention applies the deep learning frontier technology to solve the problem of customer complaint early warning in the tourism industry, improves the recognition rate of the early warning system to abnormal messages and reduces manual intervention. Supplementary staff in time discovers and fixes a position the not enough of product service, knows user's demand and feedback to in time make corresponding improvement and perfect, promote the user and keep the rate. In order to capture word-level association, the invention learns sentence-level syntactic characteristics by converting syntactic dependency trees into heterogeneous graph structures and performing message passing and specification on the graphs using a multiple-relation graph neural network CompGCN. Furthermore, the method introduces a Label fusion Model (LCM) to Model potential relations between labels and sentences, and automatically learns a soft distribution for each sample to replace the original one-hot distribution for loss calculation so as to avoid the situation that the Model cannot process Label fusion or noise due to the loss of a large amount of semantic information contained in the labels.
Drawings
The above and other features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.
FIG. 1 shows a flow chart of a complaint prediction model building method according to an embodiment of the invention.
FIG. 2 is a schematic diagram illustrating a complaint prediction model building method according to an embodiment of the invention.
FIG. 3 is a block diagram of a complaint prediction model building apparatus according to an embodiment of the present invention.
Fig. 4 schematically illustrates a computer-readable storage medium in an exemplary embodiment of the disclosure.
Fig. 5 schematically illustrates an electronic device in an exemplary embodiment of the disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
In order to solve the defects of the prior art, capture word-level association and avoid the condition that a model cannot process label confusion or noise due to the fact that a large amount of semantic information contained in a label is lost, so that complaint prediction of an online travel agency is realized, the invention provides a complaint prediction method, a complaint prediction model establishing method, a complaint prediction device and related equipment.
Referring first to fig. 1 and 2, fig. 1 is a schematic diagram illustrating a complaint prediction model building method according to an embodiment of the invention. FIG. 2 is a schematic diagram of a complaint prediction model building method according to an embodiment of the invention. The complaint prediction model establishing method comprises the following steps:
step S101: setting a predicted tag set for the complaint prediction model.
Specifically, the model output space can be set to a set of seven kinds of tags of non-complaint conversation, poor customer experience, complaint/exposure, deniability, unresolved problem, four-to-one kind (store-to-store, vehicle-to-field/ticket, vehicle-to-field, ticket-to-field), and personal injury
Step S102: and acquiring a historical customer service conversation text and a label of the historical customer service conversation text.
Specifically, the dialog text of customer service in a certain period of time can be taken from a Hive (data warehouse tool) table, and the user is given a comment and typed with these types of labels by a annotator.
Further, a single historical customer service dialog text may be limited to contain a maximum of 40 words, each containing a maximum of 4 characters, i.e., a maximum length of 160 characters in a word sequence.
Step S103: and segmenting words of the historical customer service dialog text by using an LTP language processing tool to generate a word mask sequence and a syntactic dependency relationship graph.
Specifically, the LTP language processing tool may be used to perform syntactic dependency analysis on the segmented original text, and the resulting syntactic tree may be regarded as a kind of multi-relationship graph.
The LTP defines 15 relation types including a main predicate relation (SBV), a moving object relation (VOB) and an inter object relation (IOB), wherein the HEAD relation that a virtual node root points to a sentence core word is not considered, the other 14 relation types are set as R, the relation type of the reverse edge of the relation represented by each edge is introduced into the relation type of the reverse edge of the relation represented by each edge, and the relation type is recorded as RinvPlus a self-connecting relationship tau that adds a self-ring representation for each node in the graph, there are a total of 29 relationship types.
Step S104: generating a word token id sequence, a word position id sequence, a word fragment id sequence, and a word mask sequence based on the historical customer service conversation text using a BERT tokenizer.
In particular, a BERT tokenizer may be used as the embedding layer 201, such that historical customer service dialog text may be entered as a sequence of characters into the embedding layer 201. The embedding layer 201 first converts the input character sequence into an index id sequence corresponding to a token (token) vocabulary and performs one-hot encoding to obtain the result
Figure RE-GDA0003622635800000071
Wherein
Figure RE-GDA0003622635800000072
A character set of the t-th word in a sentence with a length st。xt,iIs a one-hot vector representation of the ith character in the tth wordToken-passing embedding matrix
Figure RE-GDA0003622635800000073
After transformation of (d) has:
Figure RE-GDA0003622635800000074
similarly, similar operations are carried out on the word position id sequence and the word fragment id sequence to obtain position and segmentation dense vectors
Figure RE-GDA0003622635800000075
And
Figure RE-GDA0003622635800000076
the three parts of embedded features are summed up and subjected to layer normalization operation to obtain an output result as follows:
Figure RE-GDA0003622635800000077
step S105: and inputting the word token id sequence, the word position id sequence and the word fragment id sequence into a model coding layer to obtain a word feature sequence of the user content identified by the historical customer service dialog text in the character dimension.
In particular, the encoding layer 202 may be an encoding layer in BERT. The basic structure of the encoding layer 202 includes a multi-headed self-attention layer and a fully connected network layer. Assume that the sequence feature matrix after passing through the embedding layer 201 is
Figure RE-GDA0003622635800000078
A total of L encoder blocks (encoderlock) are used, and for L1, 2.
In the self-attention layer, N heads (heads) are used, the dimension of an input vector is h, the dimension of each head is d ═ h/N, and the query, key, value and projection (query, key, value and proj) weights are respectively
Figure RE-GDA0003622635800000079
Figure RE-GDA00036226358000000710
Figure RE-GDA00036226358000000711
Figure RE-GDA00036226358000000712
Wherein Q is (l,n)、K(l,n)、V(l,n)Respectively representing the inquiry, key and value matrixes corresponding to the nth attention head of the ith layer, wherein LayerNorm is the layer normalization operation.
In the full connection layer, use first
Figure RE-GDA00036226358000000713
Affine transformation is carried out on the input, and the input is processed by a nonlinear activation function and then is passed through
Figure RE-GDA0003622635800000081
Projected to the original dimension, i.e.
Figure RE-GDA0003622635800000082
Figure RE-GDA0003622635800000083
Wherein gelu is a gaussian error linear unit.
In particular, for model performance considerations, only the first 6 blocks (blocks) of the BERT pre-trained model may be used at the encoding layer 202, and the embedded vector sequence e of the previous step may be inputt,iAnd obtaining the coded word sequence representation.
Step S106: and converting the character feature sequence and the character mask sequence into set sizes, and acquiring a word feature sequence based on the character feature sequence and the character mask sequence.
Specifically, step S106 may transform the shape of the word feature sequence into a normalized size (batch _ size) × 40 × 4 × h, transform the word mask sequence into a batch _ size × 40 × 4, and average the features in the word character dimension with the word mask as a weight to obtain a word feature sequence.
For example, the character characteristic sequence processed by the coding layer 202
Figure RE-GDA0003622635800000084
Figure RE-GDA0003622635800000085
To obtain a word feature sequence, the hidden vector of the characters of each word is averaged as a representation of the word, i.e.
Figure RE-GDA0003622635800000086
Step S107: the word feature sequence is input to a neural network of a grammar layer 203 to perceive grammatical features of the word feature sequence.
Step S108: and outputting sentence representation of the historical customer service dialogue text according to the grammatical features of the word feature sequence and the word mask sequence.
In particular, the graph neural network of the syntax layer comprises two multiple relation graph neural networks, CompGCN layers.
CompGCN is a graph neural network framework capable of considering various relation information, can simultaneously learn the representation of nodes and relations, and simultaneously designs a vector decomposition operation to effectively alleviate the parameter overload problem. First, a set of basis vectors { v } is initialized1,v2,...,vBAnd a coefficient matrix alphab,rIn which α isb,rRepresenting the relation r in the basis vector vbCoefficient of (d), so the initial representation of the relation r is zrIs a linear combination of the entire basis vectors:
Figure RE-GDA0003622635800000087
in order to perform vector operations on the features of nodes and edges, it is necessary to project a relational representation from the edge space to the node space, that is:
hr=Wrelzr
Wrelfor projection matrix, transformed hrNamely, the relation representation in the node hidden space.
Furthermore, the direction for a side can define the following 3 convolution kernels:
Figure RE-GDA0003622635800000091
here, WO、WIAnd WSRespectively, representing the convolution parameters used when the direction of the edge is forward, backward, and self-looping.
At the k-th layer, the process of updating the node representation by the comp cn aggregation neighborhood feature is as follows:
Figure RE-GDA0003622635800000092
the phi function is used for integrating the common influence of the neighbor node u and the edge type r, and the following selection modes are mainly adopted:
φ(xu,zr)=xu-zr
φ(xu,zr)=xu*zr
φ(xu,zr)=xu★zr
x in the formulauAnd zrIs a vector representation of node u and edge type r.
When stacking multiple layers of the CompGCN, the basis vector combination only occurs in the first layer network, and then the updated relationship characterization takes the linear transformation as shown below:
Figure RE-GDA0003622635800000093
assume that a K-layer graph convolution is used to obtain a word sequence characterized as
Figure RE-GDA0003622635800000094
And finally, aggregating the hidden vectors of all nodes by adopting a read-out function to obtain a whole sentence, wherein m is represented as:
m=Readout(HK)
step S109: and performing affine transformation and normalization processing on the sentence representation to obtain the label probability distribution of the labels output by the complaint prediction model.
Specifically, assuming that the tag set is C, the sentence vector obtained in the previous layer is projected to | C | dimension, and then softmax normalization is performed as the tag probability distribution predicted by the model:
Figure RE-GDA0003622635800000095
here WoutputAnd boutputRespectively, the weights and biases of the model output layers.
Step S110: the sentence representation is input to the tag obfuscation layer 204, which computes a pseudo tag probability distribution.
In particular, the tag matrix may be initialized
Figure RE-GDA0003622635800000101
First accessing a DNN feed-forward network for learning tag characterization maps to the same dimension as the sentence representation m, having
Vlabel=g(Elabel)
Tag matrix VlabelAnd interacting with the sentence expression m, and carrying out affine transformation and normalization on the result to obtain label confusion distribution as follows:
Figure RE-GDA0003622635800000102
where W iscAnd bcRepresenting the weight and bias of the label obfuscation layer.
Assuming that the true label of the sample is y, a pseudo label distribution y is defined(s)Comprises the following steps:
y(s)=softmax(αy+y(c))
wherein the smoothing parameter alpha is used to control the relative importance of the aliased distribution to the true distribution. The smoothing parameter α may be set to 0.8, for example, and the present invention is not limited thereto.
Step S111: and calculating a loss function of the complaint prediction model according to the pseudo label probability distribution and the label probability distribution.
Specifically, the model employs a penalty function that is the KL divergence of the pseudo tag distribution with respect to the predicted tag distribution, i.e.
Figure RE-GDA0003622635800000103
Step S112: iteratively training the complaint prediction model so that the calculated loss function meets a set condition.
Specifically, the AdamW optimization algorithm can be used to minimize the loss, and the final model is obtained through continuous iterative training.
Under the OTA customer service scene, the model provided by the text classification method classifies texts of the client in IM from the grammatical and semantic level, judges whether early warning is needed or not, can achieve the prediction precision of 92%, and reduces a large amount of feature work on the premise of obtaining higher performance compared with the traditional machine learning methods such as SVM, naive Bayes, simple CNN, RNN and other network modules.
In the complaint prediction model establishing method provided by the invention, the deep learning frontier technology is applied to solving the problem of customer complaint early warning in the tourism industry, the recognition rate of an early warning system on abnormal messages is improved, and manual intervention is reduced. Supplementary staff in time discovers and fixes a position the not enough of product service, knows user's demand and feedback to in time make corresponding improvement and perfect, promote the user and keep the rate. In order to capture word-level association, the invention learns sentence-level syntactic characteristics by converting syntactic dependency trees into heterogeneous graph structures and performing message passing and specification on the graphs using a multiple-relation graph neural network CompGCN. Furthermore, the method introduces a Label fusion Model (LCM) to Model potential relations between labels and sentences, and automatically learns a soft distribution for each sample to replace the original one-hot distribution for loss calculation so as to avoid the situation that the Model cannot process Label fusion or noise due to the loss of a large amount of semantic information contained in the labels.
Further, the invention also provides a complaint prediction method, which comprises the following steps:
receiving dialog text input by a user;
Generating a word token id sequence, a word position id sequence, a word segmentation id sequence, a word mask sequence and a syntactic dependency relationship graph according to the dialog text;
inputting the dialog text, generated word token id sequence, word position id sequence, word segmentation id sequence, word mask sequence and syntactic dependency graph into the complaint prediction model built by the complaint prediction model building method of any one of claims 1 to 6;
obtaining label probability distribution of labels predicted by the complaint prediction model;
and taking the label with the highest probability in the label probability distribution as a prediction label of the dialog text.
In the inference stage, label probability distribution predicted by the model is directly used as output without passing through a label confusion layer, the text is judged to be the label type with the maximum probability, and then the system determines whether human intervention is needed or not according to a prediction result.
In the complaint prediction method provided by the invention, the deep learning frontier technology is applied to solve the problem of customer complaint early warning in the tourism industry, the recognition rate of an early warning system on abnormal messages is improved, and manual intervention is reduced. Supplementary staff in time discovers and fixes a position the not enough of product service, knows user's demand and feedback to in time make corresponding improvement and perfect, promote the user and keep the rate. In order to capture word-level association, the invention learns sentence-level syntactic characteristics by converting syntactic dependency trees into heterogeneous graph structures and performing message passing and specification on the graphs using a multiple-relation graph neural network CompGCN. Furthermore, the method introduces a Label fusion Model (LCM) to Model potential relations between labels and sentences, and automatically learns a soft distribution for each sample to replace the original one-hot distribution for loss calculation so as to avoid the situation that the Model cannot process Label fusion or noise due to the loss of a large amount of semantic information contained in the labels.
The above is merely an exemplary description of several implementations of the present invention, which is not intended to be limiting.
The invention also provides a device for establishing the complaint prediction model, and fig. 3 shows a schematic diagram of the device for establishing the complaint prediction model according to the embodiment of the invention. The complaint prediction model building device 300 comprises a setting module 301, an obtaining module 302, a word segmentation module 303, an embedding module 304, an encoding module 305, a conversion module 306, a grammar module 307, a sentence representation module 308, a probability distribution module 309, a tag confusion module 310, a loss calculation module 311 and an iteration training module 312.
A setting module 301 is configured to set a predicted tag set of the complaint prediction model;
the obtaining module 302 is configured to obtain a historical customer service dialog text and a tag of the historical customer service dialog text;
the word segmentation module 303 is configured to perform word segmentation on the historical customer service dialog text by using an LTP language processing tool, and generate a word mask sequence and a syntactic dependency relationship diagram;
the embedding module 304 is used for generating a word token id sequence, a word position id sequence, a word fragment id sequence and a word mask sequence based on the historical customer service dialogue text by using a BERT word segmenter;
the encoding module 305 is configured to input the word token id sequence, the word position id sequence, and the word segment id sequence into a model encoding layer, and obtain a word feature sequence of the user content identified by the historical customer service dialog text in a character dimension;
The conversion module 306 is configured to convert the word feature sequence and the word mask sequence into a set size, and obtain a word feature sequence based on the word feature sequence and the word mask sequence;
the grammar module 307 is used for inputting the word feature sequence to a neural network of a graph of a grammar layer so as to sense grammatical features of the word feature sequence;
the sentence representation module 308 is configured to output a sentence representation of the historical customer service dialog text according to the grammatical features of the word feature sequence and the word mask sequence;
the probability distribution module 309 is configured to perform affine transformation and normalization processing on the sentence representation to obtain a label probability distribution of a label output by the complaint prediction model;
the tag confusion module 310 is configured to input the sentence representation to a tag confusion layer, and calculate a pseudo tag probability distribution;
the loss calculating module 311 is configured to calculate a loss function of the complaint prediction model according to the pseudo label probability distribution and the label probability distribution;
the iterative training module 312 is configured to iteratively train the complaint prediction model so that the calculated loss function meets a set condition.
In the complaint prediction model building device provided by the invention, the deep learning frontier technology is applied to solving the problem of customer complaint early warning in the tourism industry, the recognition rate of an early warning system on abnormal messages is improved, and manual intervention is reduced. Supplementary staff in time discovers and fixes a position the not enough of product service, knows user's demand and feedback to in time make corresponding improvement and perfect, promote the user and keep the rate. In order to capture word-level association, the invention learns sentence-level syntactic characteristics by converting syntactic dependency trees into heterogeneous graph structures and performing message passing and specification on the graphs using a multiple-relation graph neural network CompGCN. Furthermore, the method introduces a Label fusion Model (LCM) to Model potential relations between labels and sentences, and automatically learns a soft distribution for each sample to replace the original one-hot distribution for loss calculation so as to avoid the situation that the Model cannot process Label fusion or noise due to the loss of a large amount of semantic information contained in the labels.
Fig. 3 is a schematic illustration of the complaint prediction model building apparatus provided by the invention, and the splitting, merging and adding of modules are within the protection scope of the invention without departing from the inventive concept. The complaint prediction model establishing device provided by the invention can be realized by software, hardware, firmware, plug-in and any combination of the software, the hardware, the firmware and the plug-in, and the invention is not limited by the invention.
In an exemplary embodiment of the present disclosure, a computer-readable storage medium is further provided, on which a computer program is stored, which when executed by, for example, a processor, may implement the steps of the complaint prediction method and/or the prediction model building method described in any of the above embodiments. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the complaint prediction method and/or prediction model building method sections of the description above, when said program product is run on the terminal device.
Referring to fig. 4, a program product 400 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the tenant computing device, partly on the tenant device, as a stand-alone software package, partly on the tenant computing device and partly on a remote computing device, or entirely on the remote computing device or server. In situations involving remote computing devices, the remote computing devices may be connected to the tenant computing device over any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to external computing devices (e.g., over the internet using an internet service provider).
In an exemplary embodiment of the present disclosure, there is also provided an electronic device that may include a processor, and a memory for storing executable instructions of the processor. Wherein the processor is configured to perform the steps of the complaint prediction method and/or the prediction model building method of any of the above embodiments via execution of the executable instructions.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Accordingly, various aspects of the present invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 5. The electronic device 600 shown in fig. 5 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 5, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one storage unit 620, a bus 630 that connects the various system components (including the storage unit 620 and the processing unit 610), a display unit 640, and the like.
Wherein the storage unit stores program code executable by the processing unit 610 to cause the processing unit 610 to perform steps according to various exemplary embodiments of the present invention described in the complaint prediction methods and/or prediction model building method sections of the present description above. For example, the processing unit 610 may perform the steps as shown in fig. 1 to 3.
The storage unit 620 may include readable media in the form of volatile storage units, such as a random access memory unit (RAM)6201 and/or a cache storage unit 6202, and may further include a read-only memory unit (ROM) 6203.
The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 630 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a tenant to interact with the electronic device 600, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 600 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 via the bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, and may also be implemented by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, or a network device, etc.) to execute the complaint prediction method and/or the prediction model building method according to the embodiments of the present disclosure.
Compared with the prior art, the invention has the advantages that:
the invention applies the deep learning frontier technology to solve the problem of customer complaint early warning in the tourism industry, improves the recognition rate of the early warning system to abnormal messages and reduces manual intervention. Supplementary staff in time discovers and fixes a position the not enough of product service, knows user's demand and feedback to in time make corresponding improvement and perfect, promote the user and keep the rate. In order to capture word-level association, the invention learns sentence-level syntactic characteristics by converting syntactic dependency trees into heterogeneous graph structures and performing message passing and specification on the graphs using a multiple-relation graph neural network CompGCN. Furthermore, the method introduces a Label fusion Model (LCM) to Model potential relations between labels and sentences, and automatically learns a soft distribution for each sample to replace the original one-hot distribution for loss calculation so as to avoid the situation that the Model cannot process Label fusion or noise due to the loss of a large amount of semantic information contained in the labels.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (10)

1. A complaint prediction model building method is characterized by comprising the following steps:
setting a predicted tag set of the complaint prediction model;
acquiring a historical customer service conversation text and a label of the historical customer service conversation text;
using an LTP language processing tool to perform word segmentation on the historical customer service dialogue text to generate a word mask sequence and a syntactic dependency relationship graph;
generating a word token id sequence, a word position id sequence, a word fragment id sequence and a word mask sequence based on the historical customer service conversation text by using a BERT word segmentation device;
inputting the word token id sequence, the word position id sequence and the word fragment id sequence into a model coding layer to obtain a word feature sequence of the user content identified by the historical customer service dialog text in a character dimension;
Converting the character characteristic sequence and the character mask sequence into set sizes, and acquiring a word characteristic sequence based on the character characteristic sequence and the character mask sequence;
inputting the word feature sequence into a graph neural network of a grammar layer so as to sense grammar features of the word feature sequence;
outputting sentence representation of the historical customer service dialogue text according to the grammatical feature of the word feature sequence and the word mask sequence;
performing affine transformation and normalization processing on the sentence representation to obtain label probability distribution of labels output by the complaint prediction model;
inputting the sentence representation into a label confusion layer, and calculating the probability distribution of pseudo labels;
calculating a loss function of the complaint prediction model according to the pseudo label probability distribution and the label probability distribution;
iteratively training the complaint prediction model so that the calculated loss function meets a set condition.
2. The complaint prediction model building method of claim 1, wherein obtaining historical customer service dialog text and labels for the historical customer service dialog text further comprises:
setting the maximum word number of the historical customer service dialogue text;
setting a maximum number of characters of a word in the historical customer service dialog text.
3. The complaint prediction model building method of claim 1, wherein the word feature sequence is a sequence obtained using only the first 6 output characters of the BERT tokenizer, and the word feature sequence obtained by the coding layer is input.
4. The complaint prediction model building method of claim 1, wherein the converting the word feature sequence and the word mask sequence to a set size and obtaining a word feature sequence based on the word feature sequence and the word mask sequence comprises:
and averaging the character features of the words according to the word character dimension by taking the word masks of the word mask sequence as weights to obtain a word feature sequence.
5. The complaint prediction model building method of claim 1, wherein the graph neural network of the grammar layer includes two multi-relational graph neural network layers.
6. The method of claim 5, wherein the loss function of the complaint prediction model is KL divergence.
7. A complaint prediction method, comprising:
receiving dialog text input by a user;
generating a word token id sequence, a word position id sequence, a word segmentation id sequence, a word mask sequence and a syntactic dependency relationship graph according to the dialog text;
Inputting the dialog text, generated word token id sequence, word position id sequence, word segmentation id sequence, word mask sequence and syntactic dependency graph into the complaint prediction model built by the complaint prediction model building method of any one of claims 1 to 6;
obtaining label probability distribution of labels predicted by the complaint prediction model;
and taking the label with the highest probability in the label probability distribution as a prediction label of the dialog text.
8. A complaint prediction model creation device, comprising:
a setting module for setting a set of predicted labels for the complaint prediction model;
the acquisition module is used for acquiring the historical customer service conversation text and the label of the historical customer service conversation text;
the word segmentation module is used for segmenting words of the historical customer service dialogue text by using an LTP language processing tool to generate a word mask sequence and a syntactic dependency relationship graph;
the embedded module is used for generating a word token id sequence, a word position id sequence, a word fragment id sequence and a word mask sequence based on the historical customer service conversation text by using a BERT word segmentation device;
the coding module is used for inputting the word token id sequence, the word position id sequence and the word segment id sequence into a model coding layer to obtain a word feature sequence of the user content identified by the historical customer service dialogue text in character dimension;
The conversion module is used for converting the character characteristic sequence and the character mask sequence into set sizes and acquiring a word characteristic sequence based on the character characteristic sequence and the character mask sequence;
the grammar module is used for inputting the word feature sequence to a neural network of a graph of a grammar layer so as to sense grammar features of the word feature sequence;
a sentence representation module for outputting sentence representation of the historical customer service dialogue text according to the grammatical feature of the word feature sequence and the word mask sequence;
the probability distribution module is used for executing affine transformation and normalization processing on the sentence expression to obtain label probability distribution of labels output by the complaint prediction model;
the label confusion module is used for inputting the sentence representation into a label confusion layer and calculating the probability distribution of the pseudo labels;
a loss calculation module, configured to calculate a loss function of the complaint prediction model according to the pseudo label probability distribution and the label probability distribution;
and the iterative training module is used for iteratively training the complaint prediction model so as to enable the calculated loss function to accord with the set condition.
9. An electronic device, characterized in that the electronic device comprises:
A processor;
storage medium having stored thereon a computer program which, when being executed by the processor, carries out the complaint prediction model building method as claimed in any one of claims 1 to 7.
10. A storage medium having stored thereon a computer program for executing the complaint prediction model building method according to any one of claims 1-7 when the computer program is executed by a processor.
CN202210107767.9A 2022-01-28 2022-01-28 Complaint prediction method, model building method and device thereof and related equipment Pending CN114676247A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210107767.9A CN114676247A (en) 2022-01-28 2022-01-28 Complaint prediction method, model building method and device thereof and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210107767.9A CN114676247A (en) 2022-01-28 2022-01-28 Complaint prediction method, model building method and device thereof and related equipment

Publications (1)

Publication Number Publication Date
CN114676247A true CN114676247A (en) 2022-06-28

Family

ID=82071552

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210107767.9A Pending CN114676247A (en) 2022-01-28 2022-01-28 Complaint prediction method, model building method and device thereof and related equipment

Country Status (1)

Country Link
CN (1) CN114676247A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117828082A (en) * 2024-01-03 2024-04-05 文华智典(武汉)科技有限公司 File security identification method and system based on semantic learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117828082A (en) * 2024-01-03 2024-04-05 文华智典(武汉)科技有限公司 File security identification method and system based on semantic learning

Similar Documents

Publication Publication Date Title
JP7400007B2 (en) Deep neural network model for processing data through multiple linguistic task hierarchies
Zhou et al. Progress in neural NLP: modeling, learning, and reasoning
CN110188167B (en) End-to-end dialogue method and system integrating external knowledge
WO2020244475A1 (en) Method and apparatus for language sequence labeling, storage medium, and computing device
CN112800776A (en) Bidirectional GRU relation extraction data processing method, system, terminal and medium
US20200081978A1 (en) Machine-learning based detection and classification of personally identifiable information
Liu et al. A hybrid deep-learning approach for complex biochemical named entity recognition
CN113987169A (en) Text abstract generation method, device and equipment based on semantic block and storage medium
CN110569359A (en) Recognition model training and application method and device, computing equipment and storage medium
Tran et al. Semantic refinement gru-based neural language generation for spoken dialogue systems
KR102139272B1 (en) A system for biomedical named entity recognition
CN116661805B (en) Code representation generation method and device, storage medium and electronic equipment
US20210089904A1 (en) Learning method of neural network model for language generation and apparatus for performing the learning method
CN112528654A (en) Natural language processing method and device and electronic equipment
CN115034201A (en) Augmenting textual data for sentence classification using weakly supervised multi-reward reinforcement learning
CN116069931A (en) Hierarchical label text classification method, system, equipment and storage medium
CN112036189A (en) Method and system for recognizing gold semantic
CN111597816A (en) Self-attention named entity recognition method, device, equipment and storage medium
CN108875024B (en) Text classification method and system, readable storage medium and electronic equipment
CN114676247A (en) Complaint prediction method, model building method and device thereof and related equipment
CN116341564A (en) Problem reasoning method and device based on semantic understanding
CN116821326A (en) Text abstract generation method and device based on self-attention and relative position coding
CN115964497A (en) Event extraction method integrating attention mechanism and convolutional neural network
Sekiyama et al. Automated proof synthesis for the minimal propositional logic with deep neural networks
WO2023137903A1 (en) Reply statement determination method and apparatus based on rough semantics, and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination