CN109376226A

CN109376226A - Complain disaggregated model, construction method, system, classification method and the system of text

Info

Publication number: CN109376226A
Application number: CN201811324875.1A
Authority: CN
Inventors: 杨颖�; 周海芹; 王珺; 陈杨楠; 余本功; 曹雨蒙
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2018-11-08
Filing date: 2018-11-08
Publication date: 2019-02-22

Abstract

Embodiment of the present invention provides a kind of disaggregated model, construction method, system, classification method and system for complaining text, belongs to Text Classification field.The disaggregated model includes: preprocessing module, is pre-processed for reading the complaint text, and to the complaint text；BTM module, for being handled the complaint text to generate theme vector；Doc2vec module, for being handled the complaint text to generate term vector；The theme vector and the term vector are spliced to generate feature vector；ER classifier, for being classified the complaint text to generate classification results according to described eigenvector.

Description

Complaint text classification model, construction method and system, and classification method and system

Technical Field

The invention relates to the technical field of text classification, in particular to a classification model, a construction method, a system, a classification method and a system of a complaint text.

Background

At present, the method for mobile communication operators to process complaints is mainly to construct a complaint management system with customer as guidance, optimize complaint processing flow, increase service channels of customer service or adopt online customer service and the like. After receiving the complaint work order, the technical support department diagnoses by experienced technical experts, analyzes the cause of the complaint, gives corresponding treatment opinions, and delivers the treatment opinions to a related network construction or maintenance department for treatment, and simultaneously feeds back the treatment opinions to the customer service center in a form of work order reply. Therefore, in the analysis and diagnosis of the complaint problem of the mobile communication quality, the manual processing mode is adopted, which mainly depends on the experience and knowledge of technical experts.

In order to improve the situation, a telecommunication enterprise needs to pre-classify the complaint content before handling the complaint problem, judge whether the complaint problem is caused by a service reason, if the complaint problem is caused by the service reason, improve the complaint problem in time, and if the complaint problem is caused by the user's own reason, remind the user in time, so that the user can conveniently find the real reason of the problem. However, the classification of the problem imposes a high demand on the complaint acceptors, and since many acceptors do not practice the problem processing procedure in person, it is difficult to determine the problem category only by the expression of the user, and once the misclassification is made, the burden on the problem handler is increased.

In recent years, artificial intelligence methods have some applications in the aspect of handling customer complaints, and a few documents propose that a complaint identification system is established by adopting text mining and artificial intelligence algorithms to intelligently classify complaint hotspots, so that the complaint hotspots are classified to correct complaint navigation in a short time. The existing short text classification method mainly utilizes an external corpus or additional information to enrich text content to process the sparse problem. For the complaint short text, the text is difficult to expand through external corpora, and the complaint text of the client is short in length and large in quantity, so that the requirement is put on the dimensionality represented by the text. In previous researches, text feature extraction generally uses a TF-IDF (Term Frequency-Inverse text Frequency index) algorithm or an LDA (Latent dirichletaillocation) topic model, text classification generally uses an SVM method, and input vectors of an SVM classifier constructed by using the TF-IDF algorithm have the characteristics of excessively high vector dimensionality, low classification efficiency and the like.

Disclosure of Invention

The embodiment of the invention aims to provide a classification model, a construction method, a system, a classification method and a system of a complaint text, wherein the classification model of the complaint text can improve the classification accuracy of the complaint text; the construction method and the system can construct a classification model with higher classification accuracy; the classification method and the classification system can improve the classification of the complaint texts more accurately.

In order to achieve the above object, an embodiment of the present invention provides a classification model of a complaint text, including:

the preprocessing module is used for reading the complaint text and preprocessing the complaint text;

the BTM module is used for processing the complaint text to generate a subject vector;

a Doc2vec module, configured to process the complaint text to generate a word vector;

splicing the topic vector and the word vector to generate a feature vector;

an ER classifier to:

and classifying the complaint texts according to the feature vectors to generate classification results.

Another aspect of the present invention provides a method for constructing a classification model of a complaint text, which is used for constructing the above classification model, and the method includes:

initializing a classification model;

obtaining a complaint text and a real classification result of the complaint text;

preprocessing the complaint text;

processing the complaint text by adopting a BTM (text to model) to generate a topic vector, wherein the dimensionality of the topic vector is N₁Maintaining;

processing the complaint text using a Doc2vec model to generate a word vector, whereinThe dimension of the word vector is N₂Maintaining;

stitching the topic vector and the word vector to generate a feature vector with dimension N, wherein N is N₁+N₂；

Obtaining evidence of the complaint text by a Bayesian method;

calculating a weight of the evidence;

adopting an ER classifier of the classification model to classify the complaint text according to the evidence and the weight so as to generate a classification result;

comparing the classification result with the real classification result to calculate a classification error;

judging whether the variation value of the classification error is smaller than a preset value or not;

under the condition that the change value of the classification error is judged to be smaller than the preset value, outputting the classification model;

and under the condition that the change value of the classification error is judged to be larger than or equal to the preset value, optimizing the parameters of the ER classifier to update the classification model, classifying the complaint text by adopting the ER classifier again, and executing the construction method until the change value of the classification error is smaller than the preset value.

Optionally, the pre-processing comprises at least one of text screening, desensitization processing, removal of stop words, filtering of sensitive words, creation of custom dictionaries.

Optionally, the obtaining the evidence of the complaint text by using the bayesian method includes:

setting a reference value for each eigenvalue in the eigenvector;

converting the corresponding relation between the characteristic value and a preset class into the corresponding relation between the reference value and the class to calculate the likelihood;

and obtaining the evidence between the characteristic value and the class according to the likelihood by adopting a Bayesian probability statistic method.

taking any one eigenvalue from the eigenvector as the ith eigenvalue;

calculating a likelihood of the ith said feature value according to equation (1),

wherein,is the jth of said reference values, θ, of the ith of said characteristic values_sFor the S-th said class, L is the number of reference values corresponding to each said characteristic value,according to the jth reference value and the class theta_sLikelihood of the corresponding relation calculation of (2);

calculating the probability of evidence obtained from the ith said feature value according to equation (2),

wherein,the jth reference value corresponding to the ith characteristic value is classified into a class theta_sProbability of evidence of (1), θ_sFor the S-th of said class,reference value and class theta according to ith characteristic value_sLikelihood of the corresponding relation calculation of (2);

obtaining evidence of a jth reference value corresponding to the ith characteristic value according to formula (3),

wherein e is_jAs proof of the jth reference value corresponding to the ith feature value,express evidence e_jTo be provided withProbability support class of_STheta is the set of said classes, theta_SFor the S-th class, L is the number of the reference values corresponding to each feature value;

traversing each of the feature values in the feature vector to calculate a likelihood, a probability, and a probability for each of the feature valuesAnd evidence e_iAnd expressing the evidence e by using a formula (4)_i，

Wherein e is_iFor the evidence obtained from the ith feature value,express evidence e_iTo be provided withProbability support class of_STheta is the set of said classes, theta_SFor the S-th class, N is the dimension of the eigenvalue.

Optionally, the calculating the weight of the evidence comprises:

the weight of the evidence is calculated according to equation (5),

wherein, w_iAs the ith evidence e_iWeight of d_iuFor calculated probabilityWith a preset value p_uThe Euclidean distance between (e)_i，e_u) For a predetermined uniform probability distribution, θ_SFor the S-th of said class,supporting class θ for evidence obtained from ith eigenvalue_SThe probability value of (a) is determined,u is the product of the number of reference values and the total number of classes, and N is the dimension of the feature value.

Optionally, the classifying, by the ER classifier using the classification model, the complaint text according to the evidence and the weight to generate a classification result includes:

a weighted confidence distribution is defined according to equation (6),

whereinAs evidence e_iFor class theta_SWeighted confidence of, w_iAs the ith evidence e_iThe weight of (a) is determined,supporting class θ for evidence obtained from ith eigenvalue_SThe probability of (a) is an identification frame which is a set of all classes;

calculating possible classification results using evidence reasoning rules in combination with the weighted belief distributions, and further representing the possible classification results using formula (7), formula (8), and formula (9),

wherein,the classification result of prediction after the synthesis of the first i pieces of evidence is a class theta_SThe probability of (a) of (b) being,the classification result of prediction after the synthesis of the first i pieces of evidence is a class theta_SThe degree of support of (a) is,the classification result of prediction after the synthesis of the first i-1 evidences is a class theta_SWeighted confidence of, m_{p(Θ)，e(i-1)}The weighted confidence for p (Θ) after the synthesis of the first i-1 pieces of evidence,the degree of support for p (Θ) after synthesis for the first i pieces of evidence.The degree of support for D after synthesis for the first i evidences. Theta is the recognition framework, D is a subset of the recognition framework, r_iAs the ith evidence e_iReliability of (2), w_iAs the ith evidence e_iWeight of r_i＝w_iP (Θ) is the power set of the recognition framework, B and C are a subset of the power set, m_B，e(i-1)Weighted confidence, m, for subset B after synthesis of the first i-1 evidences_C，iWeighted confidence for the ith evidence for subset C;

generating a plurality of said possible classification results according to equation (10),

wherein, y_mFor the classification result of the mth complaint text, θ_SFor the purpose of the said class(s),the classification result of prediction after the synthesis of the first i pieces of evidence is a class theta_sN is the dimensionality of the feature value, and M is the number of the complaint text;

selecting a probability from a plurality of said possible classification resultsThe possible classification result with the largest value is used as the generated classification result;

the optimizing parameters of the ER classifier to update the classification model comprises:

optimizing the parameters of the ER classifier according to an optimization model of equation (11),

wherein M is the number of the complaint texts, y_mFor the classification result of the mth complaint text,the classification result of prediction after the synthesis of the first i pieces of evidence is a class theta_sProbability of v_mIs probability ofIn the classification result y_mThe vector of (a) represents (b),for true classification resultsIs used to represent the vector of (a),is v is_mAndthe Euclidean distance between r_iIth evidence e_iThe reliability of (2).

In another aspect of the present invention, a system for constructing a classification model of a complaint text is further provided, where the system includes a processor configured to execute the construction method described above.

The invention further provides a method for classifying the complaint texts, which comprises the step of classifying the complaint texts by adopting the classification model established by the method.

In still another aspect of the present invention, a system for classifying a complaint text is provided, where the system includes a processor configured to execute the classification method described above.

By the technical scheme, the classification model, the construction method and the system of the classification model, and the classification method and the system of the complaint text provided by the invention adopt the BTM topic model to reduce the dimension of the complaint text, so that the complaint text is converted into a vector consisting of a plurality of topics, and the method and the system are more suitable for topic extraction of short texts; the method comprises the steps of converting a complaint text into word vectors by adopting a Doc2vec model, so that short texts are modeled from the granularity level of a word and the granularity level of a text theme at the same time, and the problems of sparse short text features and poor theme focusing are solved; the classification model, the construction method and the system of the classification model, and the classification method and the system of the complaint text further adopt the evidence reasoning rule to process the ambiguity and the uncertainty of the data, and are more effective compared with the SVM model in the prior art, so that a reasonable diagnosis result can be given even if the text information provided by a client is incomplete or inaccurate. In addition, the construction method provided by the invention can further ensure that the working efficiency of the constructed classification model is more accurate by expanding the training text.

Additional features and advantages of embodiments of the invention will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the embodiments of the invention without limiting the embodiments of the invention. In the drawings:

FIG. 1 is a flow diagram of a method of constructing a classification model of complaint text according to one embodiment of the invention;

FIG. 2 is a partial flow diagram of a method for constructing a classification model of complaint texts according to an embodiment of the invention.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating embodiments of the invention, are given by way of illustration and explanation only, not limitation.

One embodiment of the present invention provides a classification model for complaint text. The classification model may include a pre-processing module, a BTM module, a Doc2vec module, and an ER classifier.

In this embodiment of the present invention, the collection of the complaint text can be, but is not limited to, the following steps:

1. batch or single integration of mobile communication client complaint work orders transferred by a customer service department into a data set;

2. key information is extracted from the complaint work order (or integrated data set). In one example of the invention, the extracted key information may be, for example, the customer's cell phone number, complaint time, and complaint location;

3. and determining the state information of the characteristic elements from the complaint work order. When a complaint work order is processed, for example, the state value of the feature element corresponding to the key information of the complaint work order may be obtained from the feature element obtaining module through the cloud database access interface. In one example of the present invention, the complaint text can be, for example:

text 1: the user indicates that the calling is unavailable, the network is busy, and indicates that the calling is unavailable at about 7 am every day, and the calling can be called and the network is full;

text 2: the user responds to the user that the signal of the position is not good, the signal is not good at present after the previous response, and the user asks for processing and replies to the user;

text 3: when the user indicates to surf the internet, the speed is very slow, the signal is often unstable, and the signal is required to be processed as soon as possible, and the speed is three grids.

The preprocessing module can be used for reading the complaint texts to be classified and preprocessing the complaint texts. In one example of the invention, the pre-processing may be a process including, but not limited to, at least one of text filtering, desensitization processing, removal of stop words, filtering of sensitive words, and creation of custom dictionaries.

The BTM module may be used to process the complaint text to generate a topic vector. In this embodiment, the dimension of the topic vector may be N₁Dimension, dimension N of the topic vector₁Can be determined by the degree of confusion of the complaint text set.

The Doc2vec module may be used to process the complaint text to generate a word vector. In this embodiment, the dimension of the word vector may be N₂Dimension, dimension N of the word vector₂May be determined according to the number of complaint texts.

The ER classifier is used for splicing the topic vector and the word vector to generate a feature vector, and further classifying the complaint text according to the feature vector based on the principle of evidence reasoning to generate a classification result. In this embodiment, the ER classifier may be a evidence reasoning rule based classifier.

As shown in FIG. 1, another aspect of the present invention further provides a method for constructing a classification model of a complaint text. The construction method can be used for constructing the classification model of the complaint text. In fig. 1, the construction method may include:

in step S100, a classification model is initialized. The structure of the classification model may be the classification model described above (the parameters of the classification model may be unoptimized or modified). Accordingly, the functions of each part of the classification model have been described in detail in the above description, and thus are not described in detail herein.

In step S110, a preset complaint text and a real classification result of the complaint text are obtained.

In step S120, the complaint text is preprocessed. In this embodiment, preprocessing the text may be preprocessing with a preprocessing module of the classification model, which may include, but is not limited to, at least one of text filtering, desensitization processing, stop word removal, sensitive word filtering, and custom dictionary creation. In one example of the invention, the pre-processing module may be a pre-processing module known to those skilled in the art.

In step S130, the complaint text is processed using the BTM model to generate a subject vector. In this embodiment, the dimension of the topic vector may be N₁Dimension, dimension N of the topic vector₁Can be determined by the degree of confusion of the complaint text.

In step S140, the complaint text is processed using the Doc2vec model to generate a word vector. In this embodiment, the dimension of the word vector may be N₂Dimension, dimension N of the word vector₂May be determined according to the number of complaint texts.

In step S150, the topic vector and the word vector are concatenated to generate a feature vector. Since the dimension of the theme vector is N₁Dimension, the dimension of the word vector being N₂Dimension, then the dimension of the feature vector is N₁+N₂And (5) maintaining. In this embodiment, this step may be implemented by, for example, stitching the topic vector and the word vector using an ER classifier to generate a feature vector.

In step S160, evidence of the complaint text is obtained by a bayesian method. Optionally, in an embodiment of the present invention, the step S160 may further include a step as shown in fig. 2. In fig. 2, the step S160 may further include:

in step S161, a reference value is set for each eigenvalue in the concatenated eigenvector. In one example of the invention, x may be_iTo represent the ith eigenvalue in the eigenvector,to represent the jth reference value for the ith characteristic value.

In step S162, the feature value x is set_iWith a predetermined class theta_SIs converted into a reference valueAnd class theta_STo calculate likelihood

In step S163, a method of bayesian probability statistics (bayesian method) is employed to acquire evidence between the feature value and the class from the calculated likelihood.

Taking the above example as an example, the step S160 may also specifically be:

1. taking any one eigenvalue from the eigenvector as the ith eigenvalue;

the likelihood of the ith feature value is calculated according to equation (1),

wherein,is the jth reference value, theta, of the ith characteristic value_sFor the S-th class, L is the number of reference values corresponding to each eigenvalue,according to the jth reference value and the class theta_sThe likelihood of the correspondence calculation of (2). As can be seen from equation (1), in this example, the number of classes may be 2. However, the numerical values are only used for supplementing and explaining the invention and do not cause any limitation to the scope of the inventionAnd (4) limiting. Under the same technical conception of the invention, the person skilled in the art can understand that other kinds of quantity values are also applicable to the invention;

2. calculating the ith characteristic value x according to formula (2)_iThe probability of the evidence being obtained is,

wherein,is the ith characteristic value x_iCorresponding jth reference valueIs classified into class theta_sProbability of evidence of (1), θ_sIn the case of the S-th class,is a reference value and a class theta according to the ith characteristic value_sLikelihood of the corresponding relation calculation of (2);

3. obtaining the ith characteristic value x according to the formula (3)_iCorresponding jth reference valueThe evidence of (a) is shown,

wherein e is_jAs proof of the jth reference value corresponding to the ith feature value,express evidence e_jTo be provided withAm (a)Rate support class θ_STheta is the set of all classes, Theta_SFor the S class, L is the number of reference values corresponding to each characteristic value of the ith class;

4. traversing each feature value in the feature vector to calculate the likelihood and probability of each feature valueAnd evidence e_iAnd expressing the evidence e by using the formula (4)_i，

Wherein e is_iFor the evidence obtained from the ith eigenvalue,express evidence e_iTo be provided withProbability support class of_STheta is a set of classes, Theta_SAnd N is the dimension of the characteristic value.

In step S170, the weight of the evidence is calculated. In an example of the present invention, the step S170 may specifically be:

the weight of the evidence is calculated according to equation (5),

wherein, w_iAs the ith evidence e_iWeight of d_iuFor calculated probabilityWith a preset value p_uThe Euclidean distance between (e)_i，e_u) To a predetermined uniformityProbability distribution, θ_SIn the case of the S-th class,supporting class θ for evidence obtained from ith eigenvalue_SThe probability value of (a) is determined,u is the product of the number of reference values and the total number of classes, and N is the dimension of the eigenvalue.

In step S180, the ER classifier using the classification model classifies the complaint text according to the evidence and the weight to generate a classification result. In this embodiment, the step S180 may specifically be:

1. a weighted confidence distribution is defined according to equation (6),

wherein,as evidence e_iFor class theta_SWeighted confidence of, w_iAs the ith evidence e_iThe weight of (a) is determined,supporting class θ for evidence obtained from ith eigenvalue_SThe probability value theta is an identification frame which is a set of all classes; further, the formula (6) can be expressed by the formula (6A),

wherein m is_iFor the weighted confidence distribution of the ith evidence,for the ith evidence pair theta_SWeighted confidence of, m_p(Θ)，i＝1-w_iRepresenting the weighted confidence determined by the weight of the evidence.

2. Calculating possible classification results by adopting an evidence reasoning rule and combining with the weighted credibility distribution, and further expressing the possible classification results by adopting a formula (7), a formula (8) and a formula (9),

wherein,the classification result of prediction after the synthesis of the first i pieces of evidence is a class theta_SThe probability of (a) of (b) being,the classification result of prediction after the synthesis of the first i pieces of evidence is a class theta_SThe degree of support of (a) is,the classification result of prediction after the synthesis of the first i-1 evidences is a class theta_SWeighted confidence of, m_{p(Θ)，e(i-1)}The weighted confidence for p (Θ) after the synthesis of the first i-1 pieces of evidence,the degree of support for p (Θ) after synthesis for the first i pieces of evidence.The degree of support for D after synthesis for the first i evidences. Theta is the recognition framework, D is a subset of the recognition framework, r_iAs the ith evidence e_iReliability of (2), w_iAs the ith evidence e_iWeight of r_i＝w_iP (Θ) is the power set of the recognition frame, B and C are a subset of the power set, m_B，e(i-1)Weighted confidence, m, for subset B after synthesis of the first i-1 evidences_C，iWeighted confidence for the ith evidence for subset C;

3. a plurality of possible classification results are generated according to equation (10),

wherein, y_mFor the classification result of the mth complaint text, θ_SIs a group of a plurality of groups,the classification result predicted for the first i pieces of evidence is a class theta_sN is the dimension of the feature value, and M is the number of complaint texts;

4. selecting probabilities from a plurality of possible classification resultsAnd taking the possible classification result with the maximum value as the generated classification result.

In step S190, the classification result is compared with the real classification result to calculate a classification error. In this embodiment, the manner of calculating the classification error may be a calculation manner known to those skilled in the art, and therefore, the description thereof is omitted here.

In step S200, it is determined whether the variation value of the classification error is smaller than a preset value. The preset value can be determined according to the precision of the classification model obtained according to actual needs. The variation value may be, for example, a variation value of the classification error obtained by calculating a difference between the classification error calculated last time and the classification error calculated this time. In addition, since there is no classification error calculated last time before the optimization of the classification model (when the classification error is calculated for the first time), the variation value cannot be calculated at this time, and the step S200 may be skipped and the step S210 may be performed without updating the classification model.

In step S220, in case that it is judged that the variation value of the classification error is smaller than the preset value, the classification model is output. Since the variation value is smaller than the preset value, it can be considered that the classification error of the classification model tends to converge, and the benefit of continuously updating the classification model is reduced. No further updates to the classification model may be required.

In step S210, in case that the classification error variation value is judged to be greater than or equal to the preset value, the parameters of the ER classifier are optimized to update the classification model, the ER classifier is adopted again to classify the complaint text, and the corresponding steps of the construction method are executed (step S180) until the variation rate of the classification error is less than the preset value. In step S210, the classification model may be optimized, for example, as follows:

the parameters of the ER classifier are optimized according to the optimization model of equation (11),

wherein M is the number of complaint texts, y_mAs a result of classification of the mth complaint text,the classification result of prediction after the synthesis of the first i pieces of evidence is a class theta_sProbability of v_mIs probability ofIn the classification result y_mThe vector of (a) represents (b),for true classification resultsIs used to represent the vector of (a),is v is_mAndthe Euclidean distance between r_iIth evidence e_iThe reliability of (2).

In another aspect of the present invention, a system for constructing a classification model of a complaint text is provided, and the system may include a processor configured to execute the construction method described above.

In another aspect of the present invention, a method for classifying a classification model of a complaint text is provided, where the method for classifying a complaint text includes classifying the complaint text by using the classification model constructed as described above.

In yet another aspect of the present invention, a classification system for a classification model of a complaint text is provided, which includes a processor for executing the classification method described above.

Further, the processors described above may each be general purpose processor, special purpose processor, conventional processor, Digital Signal Processor (DSP), multiple microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of Integrated Circuit (IC), state machine, system on a chip (SOC), and the like.

By the technical scheme, the classification model, the construction method and the system of the classification model, and the classification method and the system of the complaint text provided by the invention adopt the BTM topic model to reduce the dimension of the complaint text, so that the complaint text is converted into a vector consisting of a plurality of topics, and the method and the system are more suitable for topic extraction of short texts; the method comprises the steps of converting a complaint text into word vectors by adopting a Doc2vec model, so that short texts are modeled from the granularity level of a word and the granularity level of a text theme at the same time, and the problems of sparse short text features and poor theme focusing are solved; the classification model, the construction method and the system of the classification model, and the classification method and the system of the complaint text further adopt the evidence reasoning rule to process the ambiguity and the uncertainty of the data, and are more effective compared with the SVM model in the prior art, so that a reasonable diagnosis result can be given even under the condition that the text information provided by a client is incomplete or inaccurate. In addition, the construction method provided by the invention can further ensure that the working efficiency of the constructed classification model is more accurate by expanding the training text.

Although the embodiments of the present invention have been described in detail with reference to the accompanying drawings, the embodiments of the present invention are not limited to the details of the above embodiments, and various simple modifications can be made to the technical solution of the embodiments of the present invention within the technical idea of the embodiments of the present invention, and the simple modifications all belong to the protection scope of the embodiments of the present invention.

It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. In order to avoid unnecessary repetition, the embodiments of the present invention will not be described separately for the various possible combinations.

Those skilled in the art can understand that all or part of the steps in the method for implementing the above embodiments may be implemented by a program to instruct related hardware, where the program is stored in a storage medium and includes several instructions to enable a (may be a single chip, a chip, etc.) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In addition, various different embodiments of the present invention may be arbitrarily combined with each other, and the embodiments of the present invention should be considered as disclosed in the disclosure of the embodiments of the present invention as long as the embodiments do not depart from the spirit of the embodiments of the present invention.

Claims

1. A classification model for complaint text, the classification model comprising:

splicing the topic vector and the word vector to generate a feature vector;

an ER classifier to:

2. A method for constructing a classification model of a complaint text, which is used for constructing the classification model of claim 1, and which comprises:

initializing a classification model;

preprocessing the complaint text;

processing the complaint text by adopting a Doc2vec model to generate a word vector, wherein the dimension of the word vector is N₂Maintaining;

Obtaining evidence of the complaint text by a Bayesian method;

calculating a weight of the evidence;

3. The method of constructing a classification model according to claim 2, wherein the preprocessing includes at least one of text screening, desensitization, stop word removal, sensitive word filtering, and custom dictionary creation.

4. The method for constructing a classification model according to claim 2, wherein the obtaining evidence of the complaint text by the bayesian method comprises:

setting a reference value for each eigenvalue in the eigenvector;

5. The method for constructing a classification model according to claim 2, wherein the obtaining evidence of the complaint text by the bayesian method comprises:

taking any one eigenvalue from the eigenvector as the ith eigenvalue;

6. The construction method according to claim 5, wherein the calculating the weight of the evidence comprises:

the weight of the evidence is calculated according to equation (5),

7. The method of claim 6, wherein the ER classifier using the classification model classifying the complaint text according to the evidence and the weights to generate a classification result comprises:

a weighted confidence distribution is defined according to equation (6),

wherein,as evidence e_iFor class theta_SWeighted confidence of, w_iAs the ith evidence e_iThe weight of (a) is determined,supporting class θ for evidence obtained from ith eigenvalue_SThe probability of (a) is an identification frame which is a set of all classes;

wherein,the classification result of prediction after the synthesis of the first i pieces of evidence is a class theta_SThe probability of (a) of (b) being,the classification result of prediction after the synthesis of the first i pieces of evidence is a class theta_SThe degree of support of (a) is,the classification result of prediction after the synthesis of the first i-1 evidences is a class theta_SWeighted confidence of, m_{p(Θ)，e(i-1)}The weighted confidence for p (Θ) after the synthesis of the first i-1 pieces of evidence,the degree of support for p (Θ) after synthesis for the first i pieces of evidence.The degree of support for subset D after synthesis for the first i pieces of evidence. Theta is the recognition framework, D is a subset of the recognition framework, r_iAs the ith evidence e_iReliability of (2), w_iAs the ith evidence e_iWeight of r_i＝w_iP (Θ) is the power set of the recognition framework, B and C are a subset of the power set, m_B，e(i-1)Weighted confidence, m, for subset B after synthesis of the first i-1 evidences_c，iWeighted confidence for the ith evidence for subset C;

8. A system for constructing a classification model of a complaint text, comprising a processor for executing the construction method of any one of claims 2 to 7.

9. A method for classifying a complaint text, comprising classifying the complaint text using the classification model constructed according to any one of claims 2 to 7.

10. A classification system for complaint texts, characterized in that it comprises a processor for carrying out the classification method according to claim 9.