CN109376226A - Complain disaggregated model, construction method, system, classification method and the system of text - Google Patents

Complain disaggregated model, construction method, system, classification method and the system of text Download PDF

Info

Publication number
CN109376226A
CN109376226A CN201811324875.1A CN201811324875A CN109376226A CN 109376226 A CN109376226 A CN 109376226A CN 201811324875 A CN201811324875 A CN 201811324875A CN 109376226 A CN109376226 A CN 109376226A
Authority
CN
China
Prior art keywords
evidence
classification
class
text
complaint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811324875.1A
Other languages
Chinese (zh)
Inventor
杨颖�
周海芹
王珺
陈杨楠
余本功
曹雨蒙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN201811324875.1A priority Critical patent/CN109376226A/en
Publication of CN109376226A publication Critical patent/CN109376226A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Embodiment of the present invention provides a kind of disaggregated model, construction method, system, classification method and system for complaining text, belongs to Text Classification field.The disaggregated model includes: preprocessing module, is pre-processed for reading the complaint text, and to the complaint text;BTM module, for being handled the complaint text to generate theme vector;Doc2vec module, for being handled the complaint text to generate term vector;The theme vector and the term vector are spliced to generate feature vector;ER classifier, for being classified the complaint text to generate classification results according to described eigenvector.

Description

Complaint text classification model, construction method and system, and classification method and system
Technical Field
The invention relates to the technical field of text classification, in particular to a classification model, a construction method, a system, a classification method and a system of a complaint text.
Background
At present, the method for mobile communication operators to process complaints is mainly to construct a complaint management system with customer as guidance, optimize complaint processing flow, increase service channels of customer service or adopt online customer service and the like. After receiving the complaint work order, the technical support department diagnoses by experienced technical experts, analyzes the cause of the complaint, gives corresponding treatment opinions, and delivers the treatment opinions to a related network construction or maintenance department for treatment, and simultaneously feeds back the treatment opinions to the customer service center in a form of work order reply. Therefore, in the analysis and diagnosis of the complaint problem of the mobile communication quality, the manual processing mode is adopted, which mainly depends on the experience and knowledge of technical experts.
In order to improve the situation, a telecommunication enterprise needs to pre-classify the complaint content before handling the complaint problem, judge whether the complaint problem is caused by a service reason, if the complaint problem is caused by the service reason, improve the complaint problem in time, and if the complaint problem is caused by the user's own reason, remind the user in time, so that the user can conveniently find the real reason of the problem. However, the classification of the problem imposes a high demand on the complaint acceptors, and since many acceptors do not practice the problem processing procedure in person, it is difficult to determine the problem category only by the expression of the user, and once the misclassification is made, the burden on the problem handler is increased.
In recent years, artificial intelligence methods have some applications in the aspect of handling customer complaints, and a few documents propose that a complaint identification system is established by adopting text mining and artificial intelligence algorithms to intelligently classify complaint hotspots, so that the complaint hotspots are classified to correct complaint navigation in a short time. The existing short text classification method mainly utilizes an external corpus or additional information to enrich text content to process the sparse problem. For the complaint short text, the text is difficult to expand through external corpora, and the complaint text of the client is short in length and large in quantity, so that the requirement is put on the dimensionality represented by the text. In previous researches, text feature extraction generally uses a TF-IDF (Term Frequency-Inverse text Frequency index) algorithm or an LDA (Latent dirichletaillocation) topic model, text classification generally uses an SVM method, and input vectors of an SVM classifier constructed by using the TF-IDF algorithm have the characteristics of excessively high vector dimensionality, low classification efficiency and the like.
Disclosure of Invention
The embodiment of the invention aims to provide a classification model, a construction method, a system, a classification method and a system of a complaint text, wherein the classification model of the complaint text can improve the classification accuracy of the complaint text; the construction method and the system can construct a classification model with higher classification accuracy; the classification method and the classification system can improve the classification of the complaint texts more accurately.
In order to achieve the above object, an embodiment of the present invention provides a classification model of a complaint text, including:
the preprocessing module is used for reading the complaint text and preprocessing the complaint text;
the BTM module is used for processing the complaint text to generate a subject vector;
a Doc2vec module, configured to process the complaint text to generate a word vector;
splicing the topic vector and the word vector to generate a feature vector;
an ER classifier to:
and classifying the complaint texts according to the feature vectors to generate classification results.
Another aspect of the present invention provides a method for constructing a classification model of a complaint text, which is used for constructing the above classification model, and the method includes:
initializing a classification model;
obtaining a complaint text and a real classification result of the complaint text;
preprocessing the complaint text;
processing the complaint text by adopting a BTM (text to model) to generate a topic vector, wherein the dimensionality of the topic vector is N1Maintaining;
processing the complaint text using a Doc2vec model to generate a word vector, whereinThe dimension of the word vector is N2Maintaining;
stitching the topic vector and the word vector to generate a feature vector with dimension N, wherein N is N1+N2
Obtaining evidence of the complaint text by a Bayesian method;
calculating a weight of the evidence;
adopting an ER classifier of the classification model to classify the complaint text according to the evidence and the weight so as to generate a classification result;
comparing the classification result with the real classification result to calculate a classification error;
judging whether the variation value of the classification error is smaller than a preset value or not;
under the condition that the change value of the classification error is judged to be smaller than the preset value, outputting the classification model;
and under the condition that the change value of the classification error is judged to be larger than or equal to the preset value, optimizing the parameters of the ER classifier to update the classification model, classifying the complaint text by adopting the ER classifier again, and executing the construction method until the change value of the classification error is smaller than the preset value.
Optionally, the pre-processing comprises at least one of text screening, desensitization processing, removal of stop words, filtering of sensitive words, creation of custom dictionaries.
Optionally, the obtaining the evidence of the complaint text by using the bayesian method includes:
setting a reference value for each eigenvalue in the eigenvector;
converting the corresponding relation between the characteristic value and a preset class into the corresponding relation between the reference value and the class to calculate the likelihood;
and obtaining the evidence between the characteristic value and the class according to the likelihood by adopting a Bayesian probability statistic method.
Optionally, the obtaining the evidence of the complaint text by using the bayesian method includes:
taking any one eigenvalue from the eigenvector as the ith eigenvalue;
calculating a likelihood of the ith said feature value according to equation (1),
wherein,is the jth of said reference values, θ, of the ith of said characteristic valuessFor the S-th said class, L is the number of reference values corresponding to each said characteristic value,according to the jth reference value and the class thetasLikelihood of the corresponding relation calculation of (2);
calculating the probability of evidence obtained from the ith said feature value according to equation (2),
wherein,the jth reference value corresponding to the ith characteristic value is classified into a class thetasProbability of evidence of (1), θsFor the S-th of said class,reference value and class theta according to ith characteristic valuesLikelihood of the corresponding relation calculation of (2);
obtaining evidence of a jth reference value corresponding to the ith characteristic value according to formula (3),
wherein e isjAs proof of the jth reference value corresponding to the ith feature value,express evidence ejTo be provided withProbability support class ofSTheta is the set of said classes, thetaSFor the S-th class, L is the number of the reference values corresponding to each feature value;
traversing each of the feature values in the feature vector to calculate a likelihood, a probability, and a probability for each of the feature valuesAnd evidence eiAnd expressing the evidence e by using a formula (4)i
Wherein e isiFor the evidence obtained from the ith feature value,express evidence eiTo be provided withProbability support class ofSTheta is the set of said classes, thetaSFor the S-th class, N is the dimension of the eigenvalue.
Optionally, the calculating the weight of the evidence comprises:
the weight of the evidence is calculated according to equation (5),
wherein, wiAs the ith evidence eiWeight of diuFor calculated probabilityWith a preset value puThe Euclidean distance between (e)i,eu) For a predetermined uniform probability distribution, θSFor the S-th of said class,supporting class θ for evidence obtained from ith eigenvalueSThe probability value of (a) is determined,u is the product of the number of reference values and the total number of classes, and N is the dimension of the feature value.
Optionally, the classifying, by the ER classifier using the classification model, the complaint text according to the evidence and the weight to generate a classification result includes:
a weighted confidence distribution is defined according to equation (6),
whereinAs evidence eiFor class thetaSWeighted confidence of, wiAs the ith evidence eiThe weight of (a) is determined,supporting class θ for evidence obtained from ith eigenvalueSThe probability of (a) is an identification frame which is a set of all classes;
calculating possible classification results using evidence reasoning rules in combination with the weighted belief distributions, and further representing the possible classification results using formula (7), formula (8), and formula (9),
wherein,the classification result of prediction after the synthesis of the first i pieces of evidence is a class thetaSThe probability of (a) of (b) being,the classification result of prediction after the synthesis of the first i pieces of evidence is a class thetaSThe degree of support of (a) is,the classification result of prediction after the synthesis of the first i-1 evidences is a class thetaSWeighted confidence of, mp(Θ),e(i-1)The weighted confidence for p (Θ) after the synthesis of the first i-1 pieces of evidence,the degree of support for p (Θ) after synthesis for the first i pieces of evidence.The degree of support for D after synthesis for the first i evidences. Theta is the recognition framework, D is a subset of the recognition framework, riAs the ith evidence eiReliability of (2), wiAs the ith evidence eiWeight of ri=wiP (Θ) is the power set of the recognition framework, B and C are a subset of the power set, mB,e(i-1)Weighted confidence, m, for subset B after synthesis of the first i-1 evidencesC,iWeighted confidence for the ith evidence for subset C;
generating a plurality of said possible classification results according to equation (10),
wherein, ymFor the classification result of the mth complaint text, θSFor the purpose of the said class(s),the classification result of prediction after the synthesis of the first i pieces of evidence is a class thetasN is the dimensionality of the feature value, and M is the number of the complaint text;
selecting a probability from a plurality of said possible classification resultsThe possible classification result with the largest value is used as the generated classification result;
the optimizing parameters of the ER classifier to update the classification model comprises:
optimizing the parameters of the ER classifier according to an optimization model of equation (11),
wherein M is the number of the complaint texts, ymFor the classification result of the mth complaint text,the classification result of prediction after the synthesis of the first i pieces of evidence is a class thetasProbability of vmIs probability ofIn the classification result ymThe vector of (a) represents (b),for true classification resultsIs used to represent the vector of (a),is v ismAndthe Euclidean distance between riIth evidence eiThe reliability of (2).
In another aspect of the present invention, a system for constructing a classification model of a complaint text is further provided, where the system includes a processor configured to execute the construction method described above.
The invention further provides a method for classifying the complaint texts, which comprises the step of classifying the complaint texts by adopting the classification model established by the method.
In still another aspect of the present invention, a system for classifying a complaint text is provided, where the system includes a processor configured to execute the classification method described above.
By the technical scheme, the classification model, the construction method and the system of the classification model, and the classification method and the system of the complaint text provided by the invention adopt the BTM topic model to reduce the dimension of the complaint text, so that the complaint text is converted into a vector consisting of a plurality of topics, and the method and the system are more suitable for topic extraction of short texts; the method comprises the steps of converting a complaint text into word vectors by adopting a Doc2vec model, so that short texts are modeled from the granularity level of a word and the granularity level of a text theme at the same time, and the problems of sparse short text features and poor theme focusing are solved; the classification model, the construction method and the system of the classification model, and the classification method and the system of the complaint text further adopt the evidence reasoning rule to process the ambiguity and the uncertainty of the data, and are more effective compared with the SVM model in the prior art, so that a reasonable diagnosis result can be given even if the text information provided by a client is incomplete or inaccurate. In addition, the construction method provided by the invention can further ensure that the working efficiency of the constructed classification model is more accurate by expanding the training text.
Additional features and advantages of embodiments of the invention will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the embodiments of the invention without limiting the embodiments of the invention. In the drawings:
FIG. 1 is a flow diagram of a method of constructing a classification model of complaint text according to one embodiment of the invention;
FIG. 2 is a partial flow diagram of a method for constructing a classification model of complaint texts according to an embodiment of the invention.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating embodiments of the invention, are given by way of illustration and explanation only, not limitation.
One embodiment of the present invention provides a classification model for complaint text. The classification model may include a pre-processing module, a BTM module, a Doc2vec module, and an ER classifier.
In this embodiment of the present invention, the collection of the complaint text can be, but is not limited to, the following steps:
1. batch or single integration of mobile communication client complaint work orders transferred by a customer service department into a data set;
2. key information is extracted from the complaint work order (or integrated data set). In one example of the invention, the extracted key information may be, for example, the customer's cell phone number, complaint time, and complaint location;
3. and determining the state information of the characteristic elements from the complaint work order. When a complaint work order is processed, for example, the state value of the feature element corresponding to the key information of the complaint work order may be obtained from the feature element obtaining module through the cloud database access interface. In one example of the present invention, the complaint text can be, for example:
text 1: the user indicates that the calling is unavailable, the network is busy, and indicates that the calling is unavailable at about 7 am every day, and the calling can be called and the network is full;
text 2: the user responds to the user that the signal of the position is not good, the signal is not good at present after the previous response, and the user asks for processing and replies to the user;
text 3: when the user indicates to surf the internet, the speed is very slow, the signal is often unstable, and the signal is required to be processed as soon as possible, and the speed is three grids.
The preprocessing module can be used for reading the complaint texts to be classified and preprocessing the complaint texts. In one example of the invention, the pre-processing may be a process including, but not limited to, at least one of text filtering, desensitization processing, removal of stop words, filtering of sensitive words, and creation of custom dictionaries.
The BTM module may be used to process the complaint text to generate a topic vector. In this embodiment, the dimension of the topic vector may be N1Dimension, dimension N of the topic vector1Can be determined by the degree of confusion of the complaint text set.
The Doc2vec module may be used to process the complaint text to generate a word vector. In this embodiment, the dimension of the word vector may be N2Dimension, dimension N of the word vector2May be determined according to the number of complaint texts.
The ER classifier is used for splicing the topic vector and the word vector to generate a feature vector, and further classifying the complaint text according to the feature vector based on the principle of evidence reasoning to generate a classification result. In this embodiment, the ER classifier may be a evidence reasoning rule based classifier.
As shown in FIG. 1, another aspect of the present invention further provides a method for constructing a classification model of a complaint text. The construction method can be used for constructing the classification model of the complaint text. In fig. 1, the construction method may include:
in step S100, a classification model is initialized. The structure of the classification model may be the classification model described above (the parameters of the classification model may be unoptimized or modified). Accordingly, the functions of each part of the classification model have been described in detail in the above description, and thus are not described in detail herein.
In step S110, a preset complaint text and a real classification result of the complaint text are obtained.
In step S120, the complaint text is preprocessed. In this embodiment, preprocessing the text may be preprocessing with a preprocessing module of the classification model, which may include, but is not limited to, at least one of text filtering, desensitization processing, stop word removal, sensitive word filtering, and custom dictionary creation. In one example of the invention, the pre-processing module may be a pre-processing module known to those skilled in the art.
In step S130, the complaint text is processed using the BTM model to generate a subject vector. In this embodiment, the dimension of the topic vector may be N1Dimension, dimension N of the topic vector1Can be determined by the degree of confusion of the complaint text.
In step S140, the complaint text is processed using the Doc2vec model to generate a word vector. In this embodiment, the dimension of the word vector may be N2Dimension, dimension N of the word vector2May be determined according to the number of complaint texts.
In step S150, the topic vector and the word vector are concatenated to generate a feature vector. Since the dimension of the theme vector is N1Dimension, the dimension of the word vector being N2Dimension, then the dimension of the feature vector is N1+N2And (5) maintaining. In this embodiment, this step may be implemented by, for example, stitching the topic vector and the word vector using an ER classifier to generate a feature vector.
In step S160, evidence of the complaint text is obtained by a bayesian method. Optionally, in an embodiment of the present invention, the step S160 may further include a step as shown in fig. 2. In fig. 2, the step S160 may further include:
in step S161, a reference value is set for each eigenvalue in the concatenated eigenvector. In one example of the invention, x may beiTo represent the ith eigenvalue in the eigenvector,to represent the jth reference value for the ith characteristic value.
In step S162, the feature value x is setiWith a predetermined class thetaSIs converted into a reference valueAnd class thetaSTo calculate likelihood
In step S163, a method of bayesian probability statistics (bayesian method) is employed to acquire evidence between the feature value and the class from the calculated likelihood.
Taking the above example as an example, the step S160 may also specifically be:
1. taking any one eigenvalue from the eigenvector as the ith eigenvalue;
the likelihood of the ith feature value is calculated according to equation (1),
wherein,is the jth reference value, theta, of the ith characteristic valuesFor the S-th class, L is the number of reference values corresponding to each eigenvalue,according to the jth reference value and the class thetasThe likelihood of the correspondence calculation of (2). As can be seen from equation (1), in this example, the number of classes may be 2. However, the numerical values are only used for supplementing and explaining the invention and do not cause any limitation to the scope of the inventionAnd (4) limiting. Under the same technical conception of the invention, the person skilled in the art can understand that other kinds of quantity values are also applicable to the invention;
2. calculating the ith characteristic value x according to formula (2)iThe probability of the evidence being obtained is,
wherein,is the ith characteristic value xiCorresponding jth reference valueIs classified into class thetasProbability of evidence of (1), θsIn the case of the S-th class,is a reference value and a class theta according to the ith characteristic valuesLikelihood of the corresponding relation calculation of (2);
3. obtaining the ith characteristic value x according to the formula (3)iCorresponding jth reference valueThe evidence of (a) is shown,
wherein e isjAs proof of the jth reference value corresponding to the ith feature value,express evidence ejTo be provided withAm (a)Rate support class θSTheta is the set of all classes, ThetaSFor the S class, L is the number of reference values corresponding to each characteristic value of the ith class;
4. traversing each feature value in the feature vector to calculate the likelihood and probability of each feature valueAnd evidence eiAnd expressing the evidence e by using the formula (4)i
Wherein e isiFor the evidence obtained from the ith eigenvalue,express evidence eiTo be provided withProbability support class ofSTheta is a set of classes, ThetaSAnd N is the dimension of the characteristic value.
In step S170, the weight of the evidence is calculated. In an example of the present invention, the step S170 may specifically be:
the weight of the evidence is calculated according to equation (5),
wherein, wiAs the ith evidence eiWeight of diuFor calculated probabilityWith a preset value puThe Euclidean distance between (e)i,eu) To a predetermined uniformityProbability distribution, θSIn the case of the S-th class,supporting class θ for evidence obtained from ith eigenvalueSThe probability value of (a) is determined,u is the product of the number of reference values and the total number of classes, and N is the dimension of the eigenvalue.
In step S180, the ER classifier using the classification model classifies the complaint text according to the evidence and the weight to generate a classification result. In this embodiment, the step S180 may specifically be:
1. a weighted confidence distribution is defined according to equation (6),
wherein,as evidence eiFor class thetaSWeighted confidence of, wiAs the ith evidence eiThe weight of (a) is determined,supporting class θ for evidence obtained from ith eigenvalueSThe probability value theta is an identification frame which is a set of all classes; further, the formula (6) can be expressed by the formula (6A),
wherein m isiFor the weighted confidence distribution of the ith evidence,for the ith evidence pair thetaSWeighted confidence of, mp(Θ),i=1-wiRepresenting the weighted confidence determined by the weight of the evidence.
2. Calculating possible classification results by adopting an evidence reasoning rule and combining with the weighted credibility distribution, and further expressing the possible classification results by adopting a formula (7), a formula (8) and a formula (9),
wherein,the classification result of prediction after the synthesis of the first i pieces of evidence is a class thetaSThe probability of (a) of (b) being,the classification result of prediction after the synthesis of the first i pieces of evidence is a class thetaSThe degree of support of (a) is,the classification result of prediction after the synthesis of the first i-1 evidences is a class thetaSWeighted confidence of, mp(Θ),e(i-1)The weighted confidence for p (Θ) after the synthesis of the first i-1 pieces of evidence,the degree of support for p (Θ) after synthesis for the first i pieces of evidence.The degree of support for D after synthesis for the first i evidences. Theta is the recognition framework, D is a subset of the recognition framework, riAs the ith evidence eiReliability of (2), wiAs the ith evidence eiWeight of ri=wiP (Θ) is the power set of the recognition frame, B and C are a subset of the power set, mB,e(i-1)Weighted confidence, m, for subset B after synthesis of the first i-1 evidencesC,iWeighted confidence for the ith evidence for subset C;
3. a plurality of possible classification results are generated according to equation (10),
wherein, ymFor the classification result of the mth complaint text, θSIs a group of a plurality of groups,the classification result predicted for the first i pieces of evidence is a class thetasN is the dimension of the feature value, and M is the number of complaint texts;
4. selecting probabilities from a plurality of possible classification resultsAnd taking the possible classification result with the maximum value as the generated classification result.
In step S190, the classification result is compared with the real classification result to calculate a classification error. In this embodiment, the manner of calculating the classification error may be a calculation manner known to those skilled in the art, and therefore, the description thereof is omitted here.
In step S200, it is determined whether the variation value of the classification error is smaller than a preset value. The preset value can be determined according to the precision of the classification model obtained according to actual needs. The variation value may be, for example, a variation value of the classification error obtained by calculating a difference between the classification error calculated last time and the classification error calculated this time. In addition, since there is no classification error calculated last time before the optimization of the classification model (when the classification error is calculated for the first time), the variation value cannot be calculated at this time, and the step S200 may be skipped and the step S210 may be performed without updating the classification model.
In step S220, in case that it is judged that the variation value of the classification error is smaller than the preset value, the classification model is output. Since the variation value is smaller than the preset value, it can be considered that the classification error of the classification model tends to converge, and the benefit of continuously updating the classification model is reduced. No further updates to the classification model may be required.
In step S210, in case that the classification error variation value is judged to be greater than or equal to the preset value, the parameters of the ER classifier are optimized to update the classification model, the ER classifier is adopted again to classify the complaint text, and the corresponding steps of the construction method are executed (step S180) until the variation rate of the classification error is less than the preset value. In step S210, the classification model may be optimized, for example, as follows:
the parameters of the ER classifier are optimized according to the optimization model of equation (11),
wherein M is the number of complaint texts, ymAs a result of classification of the mth complaint text,the classification result of prediction after the synthesis of the first i pieces of evidence is a class thetasProbability of vmIs probability ofIn the classification result ymThe vector of (a) represents (b),for true classification resultsIs used to represent the vector of (a),is v ismAndthe Euclidean distance between riIth evidence eiThe reliability of (2).
In another aspect of the present invention, a system for constructing a classification model of a complaint text is provided, and the system may include a processor configured to execute the construction method described above.
In another aspect of the present invention, a method for classifying a classification model of a complaint text is provided, where the method for classifying a complaint text includes classifying the complaint text by using the classification model constructed as described above.
In yet another aspect of the present invention, a classification system for a classification model of a complaint text is provided, which includes a processor for executing the classification method described above.
Further, the processors described above may each be general purpose processor, special purpose processor, conventional processor, Digital Signal Processor (DSP), multiple microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of Integrated Circuit (IC), state machine, system on a chip (SOC), and the like.
By the technical scheme, the classification model, the construction method and the system of the classification model, and the classification method and the system of the complaint text provided by the invention adopt the BTM topic model to reduce the dimension of the complaint text, so that the complaint text is converted into a vector consisting of a plurality of topics, and the method and the system are more suitable for topic extraction of short texts; the method comprises the steps of converting a complaint text into word vectors by adopting a Doc2vec model, so that short texts are modeled from the granularity level of a word and the granularity level of a text theme at the same time, and the problems of sparse short text features and poor theme focusing are solved; the classification model, the construction method and the system of the classification model, and the classification method and the system of the complaint text further adopt the evidence reasoning rule to process the ambiguity and the uncertainty of the data, and are more effective compared with the SVM model in the prior art, so that a reasonable diagnosis result can be given even under the condition that the text information provided by a client is incomplete or inaccurate. In addition, the construction method provided by the invention can further ensure that the working efficiency of the constructed classification model is more accurate by expanding the training text.
Although the embodiments of the present invention have been described in detail with reference to the accompanying drawings, the embodiments of the present invention are not limited to the details of the above embodiments, and various simple modifications can be made to the technical solution of the embodiments of the present invention within the technical idea of the embodiments of the present invention, and the simple modifications all belong to the protection scope of the embodiments of the present invention.
It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. In order to avoid unnecessary repetition, the embodiments of the present invention will not be described separately for the various possible combinations.
Those skilled in the art can understand that all or part of the steps in the method for implementing the above embodiments may be implemented by a program to instruct related hardware, where the program is stored in a storage medium and includes several instructions to enable a (may be a single chip, a chip, etc.) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In addition, various different embodiments of the present invention may be arbitrarily combined with each other, and the embodiments of the present invention should be considered as disclosed in the disclosure of the embodiments of the present invention as long as the embodiments do not depart from the spirit of the embodiments of the present invention.

Claims (10)

1. A classification model for complaint text, the classification model comprising:
the preprocessing module is used for reading the complaint text and preprocessing the complaint text;
the BTM module is used for processing the complaint text to generate a subject vector;
a Doc2vec module, configured to process the complaint text to generate a word vector;
splicing the topic vector and the word vector to generate a feature vector;
an ER classifier to:
and classifying the complaint texts according to the feature vectors to generate classification results.
2. A method for constructing a classification model of a complaint text, which is used for constructing the classification model of claim 1, and which comprises:
initializing a classification model;
obtaining a complaint text and a real classification result of the complaint text;
preprocessing the complaint text;
processing the complaint text by adopting a BTM (text to model) to generate a topic vector, wherein the dimensionality of the topic vector is N1Maintaining;
processing the complaint text by adopting a Doc2vec model to generate a word vector, wherein the dimension of the word vector is N2Maintaining;
stitching the topic vector and the word vector to generate a feature vector with dimension N, wherein N is N1+N2
Obtaining evidence of the complaint text by a Bayesian method;
calculating a weight of the evidence;
adopting an ER classifier of the classification model to classify the complaint text according to the evidence and the weight so as to generate a classification result;
comparing the classification result with the real classification result to calculate a classification error;
judging whether the variation value of the classification error is smaller than a preset value or not;
under the condition that the change value of the classification error is judged to be smaller than the preset value, outputting the classification model;
and under the condition that the change value of the classification error is judged to be larger than or equal to the preset value, optimizing the parameters of the ER classifier to update the classification model, classifying the complaint text by adopting the ER classifier again, and executing the construction method until the change value of the classification error is smaller than the preset value.
3. The method of constructing a classification model according to claim 2, wherein the preprocessing includes at least one of text screening, desensitization, stop word removal, sensitive word filtering, and custom dictionary creation.
4. The method for constructing a classification model according to claim 2, wherein the obtaining evidence of the complaint text by the bayesian method comprises:
setting a reference value for each eigenvalue in the eigenvector;
converting the corresponding relation between the characteristic value and a preset class into the corresponding relation between the reference value and the class to calculate the likelihood;
and obtaining the evidence between the characteristic value and the class according to the likelihood by adopting a Bayesian probability statistic method.
5. The method for constructing a classification model according to claim 2, wherein the obtaining evidence of the complaint text by the bayesian method comprises:
taking any one eigenvalue from the eigenvector as the ith eigenvalue;
calculating a likelihood of the ith said feature value according to equation (1),
wherein,is the jth of said reference values, θ, of the ith of said characteristic valuessFor the S-th said class, L is the number of reference values corresponding to each said characteristic value,according to the jth reference value and the class thetasLikelihood of the corresponding relation calculation of (2);
calculating the probability of evidence obtained from the ith said feature value according to equation (2),
wherein,the jth reference value corresponding to the ith characteristic value is classified into a class thetasProbability of evidence of (1), θsFor the S-th of said class,reference value and class theta according to ith characteristic valuesLikelihood of the corresponding relation calculation of (2);
obtaining evidence of a jth reference value corresponding to the ith characteristic value according to formula (3),
wherein e isjAs proof of the jth reference value corresponding to the ith feature value,express evidence ejTo be provided withProbability support class ofSTheta is the set of said classes, thetaSFor the S-th class, L is the number of the reference values corresponding to each feature value;
traversing each of the feature values in the feature vector to calculate a likelihood, a probability, and a probability for each of the feature valuesAnd evidence eiAnd expressing the evidence e by using a formula (4)i
Wherein e isiFor the evidence obtained from the ith feature value,express evidence eiTo be provided withProbability support class ofSTheta is the set of said classes, thetaSFor the S-th class, N is the dimension of the eigenvalue.
6. The construction method according to claim 5, wherein the calculating the weight of the evidence comprises:
the weight of the evidence is calculated according to equation (5),
wherein, wiAs the ith evidence eiWeight of diuFor calculated probabilityWith a preset value puThe Euclidean distance between (e)i,eu) For a predetermined uniform probability distribution, θSFor the S-th of said class,supporting class θ for evidence obtained from ith eigenvalueSThe probability value of (a) is determined,u is the product of the number of reference values and the total number of classes, and N is the dimension of the feature value.
7. The method of claim 6, wherein the ER classifier using the classification model classifying the complaint text according to the evidence and the weights to generate a classification result comprises:
a weighted confidence distribution is defined according to equation (6),
wherein,as evidence eiFor class thetaSWeighted confidence of, wiAs the ith evidence eiThe weight of (a) is determined,supporting class θ for evidence obtained from ith eigenvalueSThe probability of (a) is an identification frame which is a set of all classes;
calculating possible classification results using evidence reasoning rules in combination with the weighted belief distributions, and further representing the possible classification results using formula (7), formula (8), and formula (9),
wherein,the classification result of prediction after the synthesis of the first i pieces of evidence is a class thetaSThe probability of (a) of (b) being,the classification result of prediction after the synthesis of the first i pieces of evidence is a class thetaSThe degree of support of (a) is,the classification result of prediction after the synthesis of the first i-1 evidences is a class thetaSWeighted confidence of, mp(Θ),e(i-1)The weighted confidence for p (Θ) after the synthesis of the first i-1 pieces of evidence,the degree of support for p (Θ) after synthesis for the first i pieces of evidence.The degree of support for subset D after synthesis for the first i pieces of evidence. Theta is the recognition framework, D is a subset of the recognition framework, riAs the ith evidence eiReliability of (2), wiAs the ith evidence eiWeight of ri=wiP (Θ) is the power set of the recognition framework, B and C are a subset of the power set, mB,e(i-1)Weighted confidence, m, for subset B after synthesis of the first i-1 evidencesc,iWeighted confidence for the ith evidence for subset C;
generating a plurality of said possible classification results according to equation (10),
wherein, ymFor the classification result of the mth complaint text, θSFor the purpose of the said class(s),the classification result of prediction after the synthesis of the first i pieces of evidence is a class thetasN is the dimensionality of the feature value, and M is the number of the complaint text;
selecting a probability from a plurality of said possible classification resultsThe possible classification result with the largest value is used as the generated classification result;
the optimizing parameters of the ER classifier to update the classification model comprises:
optimizing the parameters of the ER classifier according to an optimization model of equation (11),
wherein M is the number of the complaint texts, ymFor the classification result of the mth complaint text,the classification result of prediction after the synthesis of the first i pieces of evidence is a class thetasProbability of vmIs probability ofIn the classification result ymThe vector of (a) represents (b),for true classification resultsIs used to represent the vector of (a),is v ismAndthe Euclidean distance between riIth evidence eiThe reliability of (2).
8. A system for constructing a classification model of a complaint text, comprising a processor for executing the construction method of any one of claims 2 to 7.
9. A method for classifying a complaint text, comprising classifying the complaint text using the classification model constructed according to any one of claims 2 to 7.
10. A classification system for complaint texts, characterized in that it comprises a processor for carrying out the classification method according to claim 9.
CN201811324875.1A 2018-11-08 2018-11-08 Complain disaggregated model, construction method, system, classification method and the system of text Pending CN109376226A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811324875.1A CN109376226A (en) 2018-11-08 2018-11-08 Complain disaggregated model, construction method, system, classification method and the system of text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811324875.1A CN109376226A (en) 2018-11-08 2018-11-08 Complain disaggregated model, construction method, system, classification method and the system of text

Publications (1)

Publication Number Publication Date
CN109376226A true CN109376226A (en) 2019-02-22

Family

ID=65383840

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811324875.1A Pending CN109376226A (en) 2018-11-08 2018-11-08 Complain disaggregated model, construction method, system, classification method and the system of text

Country Status (1)

Country Link
CN (1) CN109376226A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427959A (en) * 2019-06-14 2019-11-08 合肥工业大学 Complain classification method, system and the storage medium of text
CN111159335A (en) * 2019-12-12 2020-05-15 中国电子科技集团公司第七研究所 Short text classification method based on pyramid pooling and LDA topic model
CN112288446A (en) * 2020-10-28 2021-01-29 中国联合网络通信集团有限公司 Method and device for calculating complaint and claim
CN112860893A (en) * 2021-02-08 2021-05-28 国网河北省电力有限公司营销服务中心 Short text classification method and terminal equipment
CN113094567A (en) * 2021-03-31 2021-07-09 四川新网银行股份有限公司 Malicious complaint identification method and system based on text clustering
CN113591473A (en) * 2021-07-21 2021-11-02 西北工业大学 Text similarity calculation method based on BTM topic model and Doc2vec

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101770454A (en) * 2010-02-13 2010-07-07 武汉理工大学 Method for expanding feature space of short text
CN105516499A (en) * 2015-12-14 2016-04-20 北京奇虎科技有限公司 Method and device for classifying short messages, communication terminal and server
CN106909537A (en) * 2017-02-07 2017-06-30 中山大学 A kind of polysemy analysis method based on topic model and vector space
CN108241741A (en) * 2017-12-29 2018-07-03 深圳市金立通信设备有限公司 A kind of file classification method, server and computer readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101770454A (en) * 2010-02-13 2010-07-07 武汉理工大学 Method for expanding feature space of short text
CN105516499A (en) * 2015-12-14 2016-04-20 北京奇虎科技有限公司 Method and device for classifying short messages, communication terminal and server
CN106909537A (en) * 2017-02-07 2017-06-30 中山大学 A kind of polysemy analysis method based on topic model and vector space
CN108241741A (en) * 2017-12-29 2018-07-03 深圳市金立通信设备有限公司 A kind of file classification method, server and computer readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YING YANG等: "An evidential reasoning-based decision support system for handling customer complaints in mobile telecommunications", 《KNOWLEDGE-BASED SYSTEMS》 *
张小川,余林峰,桑瑞婷,张宜浩: "融合CNN和LDA的短文本分类研究", 《软件工程》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427959A (en) * 2019-06-14 2019-11-08 合肥工业大学 Complain classification method, system and the storage medium of text
CN111159335A (en) * 2019-12-12 2020-05-15 中国电子科技集团公司第七研究所 Short text classification method based on pyramid pooling and LDA topic model
CN112288446A (en) * 2020-10-28 2021-01-29 中国联合网络通信集团有限公司 Method and device for calculating complaint and claim
CN112288446B (en) * 2020-10-28 2023-06-06 中国联合网络通信集团有限公司 Calculation method and device for complaint and claim payment
CN112860893A (en) * 2021-02-08 2021-05-28 国网河北省电力有限公司营销服务中心 Short text classification method and terminal equipment
CN112860893B (en) * 2021-02-08 2023-02-28 国网河北省电力有限公司营销服务中心 Short text classification method and terminal equipment
CN113094567A (en) * 2021-03-31 2021-07-09 四川新网银行股份有限公司 Malicious complaint identification method and system based on text clustering
CN113591473A (en) * 2021-07-21 2021-11-02 西北工业大学 Text similarity calculation method based on BTM topic model and Doc2vec
CN113591473B (en) * 2021-07-21 2024-03-12 西北工业大学 Text similarity calculation method based on BTM topic model and Doc2vec

Similar Documents

Publication Publication Date Title
CN109376226A (en) Complain disaggregated model, construction method, system, classification method and the system of text
CN110598206B (en) Text semantic recognition method and device, computer equipment and storage medium
JP7225395B2 (en) Dynamic Reconfiguration Training Computer Architecture
US20230281445A1 (en) Population based training of neural networks
US11734510B2 (en) Natural language processing of encoded question tokens and encoded table schema based on similarity
US11610064B2 (en) Clarification of natural language requests using neural networks
WO2020140073A1 (en) Neural architecture search through a graph search space
CN113220886A (en) Text classification method, text classification model training method and related equipment
US11183174B2 (en) Speech recognition apparatus and method
CN110377733B (en) Text-based emotion recognition method, terminal equipment and medium
CN113449084A (en) Relationship extraction method based on graph convolution
CN113254615A (en) Text processing method, device, equipment and medium
CN114511023B (en) Classification model training method and classification method
CN116127060A (en) Text classification method and system based on prompt words
CN115659244A (en) Fault prediction method, device and storage medium
US20230078246A1 (en) Centralized Management of Distributed Data Sources
US11875128B2 (en) Method and system for generating an intent classifier
CN111046177A (en) Automatic arbitration case prejudging method and device
CN116662555B (en) Request text processing method and device, electronic equipment and storage medium
CN117076946A (en) Short text similarity determination method, device and terminal
CN111382265B (en) Searching method, device, equipment and medium
JP2015038709A (en) Model parameter estimation method, device, and program
US20240160196A1 (en) Hybrid model creation method, hybrid model creation device, and recording medium
CN107729509B (en) Discourse similarity determination method based on recessive high-dimensional distributed feature representation
CN114841588A (en) Information processing method, device, electronic equipment and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190222