CN109710766A - A kind of the complaint trend analysis method for early warning and device of work order data - Google Patents

A kind of the complaint trend analysis method for early warning and device of work order data Download PDF

Info

Publication number
CN109710766A
CN109710766A CN201811631912.3A CN201811631912A CN109710766A CN 109710766 A CN109710766 A CN 109710766A CN 201811631912 A CN201811631912 A CN 201811631912A CN 109710766 A CN109710766 A CN 109710766A
Authority
CN
China
Prior art keywords
work order
vector
order data
participle
complaint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811631912.3A
Other languages
Chinese (zh)
Other versions
CN109710766B (en
Inventor
杨政
刘柱揆
尹春林
潘侃
朱华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electric Power Research Institute of Yunnan Power System Ltd
Original Assignee
Electric Power Research Institute of Yunnan Power System Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Power Research Institute of Yunnan Power System Ltd filed Critical Electric Power Research Institute of Yunnan Power System Ltd
Priority to CN201811631912.3A priority Critical patent/CN109710766B/en
Publication of CN109710766A publication Critical patent/CN109710766A/en
Application granted granted Critical
Publication of CN109710766B publication Critical patent/CN109710766B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides the complaint trend analysis method for early warning and device of a kind of work order data, wherein the described method includes: being segmented and being deleted stop words to each work order data, obtains the combination of the second participle;Term vector is generated using word2vec model;Solve the corresponding sentence vector of the work order data;Each vector is divided into three clusters using k-means algorithm;It is generated using Softmax logistic regression and complains tendency disaggregated model;Utilize the complaint tendency classification for complaining tendency disaggregated model to judge new work order data;If the complaint tendency classification of the new work order data is that high-risk complaint is inclined to or has complaint to be inclined to, early warning is made.In method provided by the present application, it is generated using a large amount of work order data and complains tendency disaggregated model, on the basis of complaining tendency disaggregated model, new work order data complain with the prediction of tendency classification, according to prediction result, the purpose of timely active forewarning is realized, to solve the problems, such as existing manual analysis method inefficiency.

Description

A kind of the complaint trend analysis method for early warning and device of work order data
Technical field
This application involves big data applied technical fields, and in particular to a kind of pre- police of complaint trend analysis of work order data Method and device.
Background technique
With the fast development and commonly used, electric power enterprise unit at different levels, each business department of informatization of power industry construction Door realizes information-based all standing substantially.Wherein, 95598 customer service systems are the important windows of electric power enterprise and customer communication Mouthful, a large amount of non-structured work order data are had accumulated in the system, electric power enterprise understands client according to the content of work order data Intention and attitude, promoted service quality.
It is large number of due to work order data, and the urgency level of each work order data has differences, for urgency level Higher work order data, if electric power enterprise is not handled in time, it is likely that by customer complaint.In order to reduce by customer complaint Risk, electric power enterprise need to analyze work order data, work order data are carried out to complain tendency grade classification, and incline to complaint Early warning is made to higher ranked work order data, electric power enterprise, being capable of quick, proactive and specific aim according to early warning situation Ground takes measures.
The complaint trend analysis of existing work order data is still within the stage of manual analysis, and manual analysis can not be to work order Data are timely and effectively handled, and the problem of so as to cause existing analysis method inefficiency, therefore, need a kind of energy at present It is enough that timely and effectively work order data are analyzed, and the method for complaining tendency early warning is made in time.
Summary of the invention
The application provides the complaint trend analysis method for early warning and device of a kind of work order data, to solve existing manual analysis The problem of method inefficiency.
The application's in a first aspect, provide a kind of complaint trend analysis method for early warning of work order data, comprising:
Work order data are obtained, as unit of the identical work order data of work order odd numbers, are under the jurisdiction of the same unit to each Work order data are segmented, and first participle combination is obtained;
The stop words in the first participle combination is deleted, the second participle combination of the work order data is obtained;
The term vector of each participle in the second participle portmanteau word is generated using word2vec model;
The average vector for solving the corresponding term vector of the work order data, using the average vector as the work order data Sentence vector;
The sentence vector of each work order data is divided into three clusters using k-means algorithm, three clusters are corresponding Classification is inclined in three complaints of the work order data, wherein three complaint tendency classification be respectively as follows: high-risk complaint be inclined to, There is complaint to be inclined to and be inclined to without complaining;
Corresponding first output vector is respectively set for each work order data for complaining tendency classification, and by the work order number According to sentence vector as the first input vector, utilize Softmax logistic regression to generate and complain tendency disaggregated model;
Tendency disaggregated model is complained to judge that classification is inclined in the complaint of new work order data using described;
If the complaint tendency classification of the new work order data is that high-risk complaint is inclined to or has complaint to be inclined to, make pre- It is alert.
Optionally, it is described using word2vec model generate it is described second participle portmanteau word in it is each second participle word to Amount, comprising:
The frequency that second participle occurs is calculated according to the following formula, and the frequency for judging that second participle occurs is It is no to be greater than the first preset threshold:
Wherein, P (wi) is the frequency that the second participle occurs, and f (wi) is the frequency of occurrence of the second participle, and wi is second point Word, i=1,2,3...x, x are the quantity of the second participle, and n is the second preset threshold;
If the frequency that second participle occurs is greater than the first preset threshold, it is determined that frequency is greater than the first preset threshold Second participle is that high frequency segments, and high frequency participle is rejected from the second participle combination, after rejecting high frequency participle Second participle combination as third participle combination;
Combined training pattern is segmented using third described in the skip-gram model construction in word2vec model;
Using the training pattern, the term vector of each participle in the third participle combination is generated.
Optionally, described that the sentence vector of each work order data is divided into three clusters using k-means algorithm, it is described Three clusters correspond to three complaints tendency classification of the work order data, comprising:
Step 301, using k-means algorithm, three sentence vectors are randomly selected respectively as the center of three clusters, it will be described The center of three clusters is denoted as C1, C2 and C3 respectively;
Step 302, calculate separately the Euclidean distance between each sentence vector and the center of three clusters, determine with The nearest Ci of the Euclidean distance of each sentence vector, and the sentence vector is referred to the corresponding cluster of Ci, wherein i=1,2, 3;
Step 303, the mean value for calculating each dimension of all vectors in each cluster, the vector that the mean value is formed are made For the new center of the cluster;
Step 304, whether the new center for judging the cluster is consistent with the center of the cluster randomly selected, if different It causes, then returns to step 302 operation, until the new center of each cluster is consistent with the preceding center once calculated, and by institute The new center of cluster is stated as target's center.
Optionally, tendency disaggregated model is complained to judge that classification is inclined in the complaint of new work order data using described, comprising:
Generate sentence vector corresponding with the new work order data;
The second input vector of disaggregated model is inclined to using the corresponding sentence vector of the new work order data as the complaint, Obtain the second output vector corresponding with second input vector;
By second output vector compared with first output vector, obtain corresponding with second output vector Corresponding first output vector of second output vector is made target output vector by the first output vector;
Determine corresponding with target output vector complaint tendency classification, and by the corresponding throwing of the target output vector Tell that tendency classification is inclined to classification as the complaint of the new work order data.
The second aspect of the application provides a kind of complaint trend analysis prior-warning device of work order data, comprising:
Module is obtained as unit of the identical work order data of work order odd numbers, to be under the jurisdiction of to each for obtaining work order data The work order data of the same unit are segmented, and first participle combination is obtained;
Removing module obtains second point of the work order data for deleting the stop words in the first participle combination Word combination;
Term vector generation module, for generating each participle in the second participle portmanteau word using word2vec model Term vector;
Sentence vector generation module will be described average for solving the average vector of the corresponding term vector of the work order data Sentence vector of the vector as the work order data;
Division module, for the sentence vector of each work order data to be divided into three clusters, institute using k-means algorithm State three complaints tendency classification that three clusters correspond to the work order data, wherein three complaints tendency classification is respectively as follows: height Danger complains tendency, has complaint to be inclined to and be inclined to without complaining;
Disaggregated model generation module, it is defeated for being respectively set corresponding first for each work order data for complaining tendency classification Outgoing vector, and using the sentence vector of the work order data as the first input vector, it generates to complain using Softmax logistic regression and incline To disaggregated model;
Judgment module, for complaining tendency disaggregated model to judge that classification is inclined in the complaint of new work order data using described;
Warning module is high-risk throwing for determining that classification is inclined in the complaint of the new work order data in the judgment module In the case where telling tendency or thering is complaint to be inclined to, early warning is made.
Optionally, the term vector generation module includes:
First judging unit, the frequency occurred for calculating second participle according to the following formula, and judge described the Whether the frequency that two participles occur is greater than the first preset threshold:
Wherein, P (wi) is the frequency that the second participle occurs, and f (wi) is the frequency of occurrence of the second participle, and wi is second point Word, i=1,2,3...x, x are the quantity of the second participle, and n is the second preset threshold;
Culling unit, it is default for determining that the frequency that second participle occurs is greater than first in first judging unit In the case where threshold value, determine frequency be greater than the first preset threshold second participle be high frequency segment, and by the high frequency participle from It is rejected in the second participle combination, the second participle combination after high frequency segments will be rejected and combined as third participle;
Training pattern construction unit, for using third described in the skip-gram model construction in word2vec model point The training pattern of word combination;
First generation unit generates the word of each participle in the third participle combination for utilizing the training pattern Vector.
Optionally, the division module includes:
Selection unit randomly selects three sentence vectors respectively as the center of three clusters for utilizing k-means algorithm, The center of three clusters is denoted as C1, C2 and C3 respectively;
First computing unit, for calculate separately between each sentence vector and the center of three clusters it is European away from From, the determining Ci nearest with the Euclidean distance of each sentence vector, and the sentence vector is referred to the corresponding cluster of Ci, In, i=1,2,3;
Second computing unit, for calculating the mean value of each dimension of all vectors in each cluster, by the mean value group At new center of the vector as the cluster;
Second judgment unit, for judge the cluster new center and the cluster randomly selected center whether one It causes, if inconsistent, returns to the operation for executing first computing unit, until the new center of each cluster and preceding primary calculating Center it is consistent, and using the new center of the cluster as target's center.
Optionally, the judgment module includes:
Second generation unit, for generating sentence vector corresponding with the new work order data;
First acquisition unit, for being inclined to classification mould for the corresponding sentence vector of the new work order data as the complaint Second input vector of type obtains the second output vector corresponding with second input vector;
Second acquisition unit, for by second output vector compared with first output vector, obtain with it is described It is defeated to be made target by corresponding first output vector of the second output vector for corresponding first output vector of second output vector Outgoing vector;
Determination unit, for determining that classification is inclined in complaint corresponding with the target output vector, and the target is defeated Outgoing vector is corresponding to complain tendency classification to be inclined to classification as the complaint of the new work order data.
From the above technical scheme, the application provides the complaint trend analysis method for early warning and dress of a kind of work order data It sets, wherein be under the jurisdiction of as unit of the identical work order data of work order odd numbers to each the described method includes: obtaining work order data The work order data of the same unit are segmented, and first participle combination is obtained;The stop words in the first participle combination is deleted, Obtain the second participle combination of the work order data;Each point is generated in the second participle portmanteau word using word2vec model The term vector of word;The average vector for solving the corresponding term vector of the work order data, obtain the corresponding sentence of the work order data to Amount;The sentence vector of each work order data is divided into three clusters using k-means algorithm;Classification is inclined to for each complain Work order data are respectively set corresponding first output vector, and using the sentence vector of the work order data as the first input vector, It is generated using Softmax logistic regression and complains tendency disaggregated model;New work order is judged using complaint tendency disaggregated model Classification is inclined in the complaint of data;If the complaint tendency classification of the new work order data is that high-risk complaint is inclined to or has complaint to incline To then making early warning.
It in method provided by the present application, is generated using a large amount of work order data and complains tendency disaggregated model, complaining tendency point On the basis of class model, new work order data complain with the prediction of tendency classification, according to prediction result, is realized in time actively The purpose of early warning, to solve the problems, such as existing manual analysis method inefficiency.
Detailed description of the invention
In order to illustrate more clearly of the technical solution of the application, letter will be made to attached drawing needed in the embodiment below Singly introduce, it should be apparent that, for those of ordinary skills, without creative efforts, also Other drawings may be obtained according to these drawings without any creative labor.
Fig. 1 is a kind of workflow of the complaint trend analysis method for early warning of work order data provided by the embodiments of the present application Figure;
Fig. 2 is in a kind of complaint trend analysis method for early warning of work order data provided by the embodiments of the present application, term vector The work flow diagram of generation method;
Fig. 3 is in a kind of complaint trend analysis method for early warning of work order data provided by the embodiments of the present application, by sentence vector It is divided into the work flow diagram of the method for three clusters;
Fig. 4 is to judge new in a kind of complaint trend analysis method for early warning of work order data provided by the embodiments of the present application The work flow diagram of the complaint tendency class method for distinguishing of work order data;
Fig. 5 is a kind of structural representation of the complaint trend analysis prior-warning device of work order data provided by the embodiments of the present application Figure.
Specific embodiment
To solve the problems, such as that existing manual analysis method inefficiency, the application provide a kind of complaint tendency of work order data Analysis and early warning method and device.
Work flow diagram shown in referring to Fig.1, the complaint trend analysis that the embodiment of the present application provides a kind of work order data are pre- Alarm method, comprising the following steps:
Step 101, obtain work order data, as unit of the identical work order data of work order odd numbers, to it is each be under the jurisdiction of it is same The work order data of a unit are segmented, and first participle combination is obtained.
In the step, real-time or offline mode work order text is obtained by 95598 customer service systems, by work order text As work order data, wherein the work order text includes following field: work order odd numbers, incoming call content, applicant, phone number, Handler, urgency level, station address, time started, creation work order time, work order end time, current link, work order class Type.Wherein, work order type may include 7 seed types, be respectively: praise, troublshooting, suggestion, report, complaint, business consultation And opinion.
In a kind of achievable mode, as unit of the identical work order data of work order odd numbers, using Python, Under the support of jieba kit, using accurate model, work order data are segmented.In the mistake segmented to work order data Cheng Zhong defines the relevant industry dictionary of power industry, for example, by the relevant industry dictionary of power industry is defined as: stop the financial moon Power transmission is spaced apart, stops sending the words such as a little.
Step 102, the stop words in the first participle combination is deleted, the second participle group of the work order data is obtained It closes.
In a kind of achievable mode, using Harbin Institute of Technology's stop words dictionary, first participle combination is handled, is deleted Stop words in first participle combination obtains the combination of the second participle.
Step 103, the term vector of each participle in the second participle portmanteau word is generated using word2vec model.
Word2vec model is the correlation model that a group is used to generate term vector, these models are the shallow and double-deck nerve net Network is used to training with the word text of construction linguistics again.Neural network need to guess adjacent position with word text presentation Input word, in the case where the bag of words in word2vec model are assumed, the sequence of word text is unessential.After training is completed, Word2vec model can be used to map each word text to a vector, which indicates word to the relationship between word, the vector For the hidden layer of neural network.In the embodiment of the present application, using each participle in the second participle portmanteau word as Word2vec mould The word text of type carries out the generation of term vector.
Term vector is the vector that the word or expression of vocabulary is mapped to real number.In concept, it is related to from each Mathematics insertion of the one-dimensional space of word to the vector row space with more low dimensional.The application asks in embodiment, and second point Word combination is vocabulary.
Step 104, the average vector for solving the corresponding term vector of the work order data, using the average vector as described in The sentence vector of work order data.
Sentence vector is the vector that a sentence or paragraph are mapped to real number.In a kind of achievable mode, according to every The corresponding term vector of work order data, sums to these term vectors, the term vector summed;Then to the word of summation to Amount is averaging, and generates the sentence vector of every work order data.
In the step, it is assumed that the corresponding second participle group of a work order data is combined into " client seeks advice from family raise-position problem ", If segmenting " client " corresponding vector is [1,0,0,0,0];Segmenting " consulting " corresponding vector is [0,1,0,0,0];Segment " family Number " corresponding vector is [0,0,1,0,0];Segmenting " raise-position " corresponding vector is [0,0,0,1,0];It is corresponding to segment " problem " Vector is [0,0,0,0,1], then the corresponding sentence vector of this work order data should be [0.2,0.2,0.2,0.2,0.2].
Step 105, the sentence vector of each work order data is divided into three clusters using k-means algorithm, described three A cluster corresponds to three complaints tendency classification of the work order data, wherein three complaints tendency classification is respectively as follows: high-risk throwing It tells tendency, there is complaint to be inclined to and be inclined to without complaining.
K-means algorithm is input cluster number k, and the database comprising n data object, output meet variance most A kind of algorithm of small standard k cluster.K-means algorithm receives input quantity k;Then n data object is divided into k cluster To meet cluster obtained: the object similarity in same cluster is higher;And the object similarity in different clusters It is smaller.In the embodiment of the present application, using the corresponding sentence vector of each work order data as data object, the value for clustering number k is 3。
Step 106, corresponding first output vector is respectively set for each work order data for complaining tendency classification, and by institute The sentence vector of work order data is stated as the first input vector, is generated using Softmax logistic regression and complains tendency disaggregated model.
Softmax Logic Regression Models are popularization of the logistic regression model in more classification problems, are asked in more classification In topic, class label y can take more than two values.It include a parameter set in Softmax logistic regression, the embodiment of the present application makes The parameter in the parameter set is optimized with gradient descent algorithm, when optimization, learning rate learning_rate is set as 0.1.
In a kind of achievable mode, the embodiment of the present application is provided a kind of generated using Softmax logistic regression and complained The method for being inclined to disaggregated model, comprising the following steps:
(1) Python is used, under the support of tensorflow kit, specifies and assumes that function is hypothesis =tf.nn.softmax (tf.matmul (X, W)+b), wherein X is sentence vector, and W is weight, and b is offset;
(2) Python is used, under the support of tensorflow kit, uses cross entropy as objective function, It is denoted as: cost=tf.reduce_mean (- tf.reduce_sum (Y*tf.log (hypothesis), axis=1));
(3) Python is used, under the support of tensorflow kit, declines optimization weight using gradient and joins Number, optimizer=tf.train.GradientDescentOptimizer (learning_rate=0.1) .minimize (cost);
(4) by above-mentioned (1), (2) and (3) step, using the work order data with the first output vector as training data, Generate the disaggregated model for complaining tendency.
Step 107, tendency disaggregated model is complained to judge that classification is inclined in the complaint of new work order data using described.
Step 108, if the complaint tendency classification of the new work order data is that high-risk complaint is inclined to or has complaint to be inclined to, Make early warning.
For example, the content of a work order data is as follows: " one, somewhere building electricity box due to catching fire in case before, burn by existing electricity box Burnt sign is serious, and inlet wire and outlet have and burn sign, and worry has security risk, it is desirable that handles as early as possible ", if disaggregated model The grade of prediction is high-risk complaint tendency, then means there are complaint risk, will early warning immediately, and notify at related service department Reason.
From the above technical scheme, it in method provided by the present application, is generated using a large amount of work order data and complains tendency point Class model complain to new work order data the prediction of tendency classification, according to pre- on the basis of complaining tendency disaggregated model It surveys as a result, the purpose of timely active forewarning is realized, to solve the problems, such as existing manual analysis method inefficiency.
It is described to be generated in the second participle portmanteau word using word2vec model referring to work flow diagram shown in Fig. 2 The term vector of each second participle, comprising the following steps:
Step 201, the frequency that second participle occurs is calculated according to the following formula, and judges that second participle occurs Frequency whether be greater than the first preset threshold:
Wherein, P (wi) is the frequency that the second participle occurs, and f (wi) is the frequency of occurrence of the second participle, and wi is second point Word, i=1,2,3...x, x are the quantity of the second participle, and n is the second preset threshold.
Step 202, if the frequency that second participle occurs is greater than the first preset threshold, it is determined that it is pre- that frequency is greater than first If the second participle of threshold value is that high frequency segments, and high frequency participle is rejected from the second participle combination, high by rejecting The second participle combination after frequency division word is as third participle combination.
In a kind of achievable mode, n takes 1e-5.First preset threshold takes 0.8, at this point, as the corresponding P of the second participle (wi) >=0.8 when, which is that high frequency segments, then second participle will be deleted.
Step 203, combined training mould is segmented using third described in the skip-gram model construction in word2vec model Type.
For each third participle in third participle combination, training data is constructed, the format of training data is (defeated Enter participle, output participle).Firstly, finding out the upper of each third participle from third participle combination using skip_window=2 Hereafter;Secondly, being based on context, training data is constructed.For example, it is assumed that the corresponding third participle combination of a work order data are as follows: " client seeks advice from family raise-position problem ", if " family number " is input participle, the training data generated includes: (family number, client); (family number, consulting);(family number, raise-position) and (family number, problem).
Training pattern is constructed using three layers of full Connection Neural Network, which includes: input layer, hidden layer and output Layer, the training pattern from training data carry out weight update using negative sample mode (negative sampling).Wherein, hidden It include 100 neurons containing layer.
Step 204, using the training pattern, the term vector of each participle in the third participle combination is generated.
Referring to work flow diagram shown in Fig. 3, it is described using k-means algorithm by the sentence vector of each work order data Three clusters are divided into, three clusters correspond to three complaints tendency classification of the work order data, comprising the following steps:
Step 301, using k-means algorithm, three sentence vectors are randomly selected respectively as the center of three clusters, it will be described The center of three clusters is denoted as C1, C2 and C3 respectively.
Step 302, calculate separately the Euclidean distance between each sentence vector and the center of three clusters, determine with The nearest Ci of the Euclidean distance of each sentence vector, and the sentence vector is referred to the corresponding cluster of Ci, wherein i=1,2, 3。
Step 303, the mean value for calculating each dimension of all vectors in each cluster, the vector that the mean value is formed are made For the new center of the cluster.
Step 304, whether the new center for judging the cluster is consistent with the center of the cluster randomly selected, if different It causes, then returns to step 302 operation, until the new center of each cluster is consistent with the preceding center once calculated.
Step 305, using the new center of the cluster as target's center.
Referring to work flow diagram shown in Fig. 4, the throwing for complaining tendency disaggregated model to judge new work order data is utilized Tell tendency classification, comprising the following steps:
Step 401, sentence vector corresponding with the new work order data is generated.
Step 402, the second of tendency disaggregated model is complained using the corresponding sentence vector of the new work order data as described Input vector obtains the second output vector corresponding with second input vector.
Step 403, by second output vector compared with first output vector, obtain with described second export to Corresponding first output vector is measured, corresponding first output vector of second output vector is made into target output vector.
Step 404, corresponding with target output vector complaint tendency classification is determined, and by the target output vector It is corresponding that tendency classification is complained to be inclined to classification as the complaint of the new work order data.
In the embodiment of the present application, if 3 the first output vectors be respectively as follows: [1,0,0,0,0], [0,0,1,0,0] and [0,0, 0,0,1], wherein [1,0,0,0,0] represents high-risk complaint tendency, and [0,0,1,0,0], which represents, has complaint to be inclined to, [0,0,0,0,1] It represents and is inclined to without complaint.
If corresponding second output vector of the sentence vector of new work order data be [1,0,0,0,0], then this second export to Measuring corresponding first output vector is [1,0,0,0,0], and the complaint of the corresponding new work order data of second output vector is inclined to Classification is high-risk complaint tendency.
Referring to structural schematic diagram shown in Fig. 2, the complaint trend analysis that the embodiment of the present application provides a kind of work order data is pre- Alarm device, comprising:
Module 100 is obtained, for obtaining work order data, as unit of the identical work order data of work order odd numbers, to each person in servitude The work order data for belonging to the same unit are segmented, and first participle combination is obtained;
Removing module 200 obtains the second of the work order data for deleting the stop words in the first participle combination Participle combination;
Term vector generation module 300, for generating in the second participle portmanteau word each point using word2vec model The term vector of word;
Sentence vector generation module 400 will be described flat for solving the average vector of the corresponding term vector of the work order data Equal sentence vector of the vector as the work order data;
Division module 500, for the sentence vector of each work order data to be divided into three using k-means algorithm Cluster, three clusters correspond to three complaints tendency classification of the work order data, wherein three complaints tendency classification difference Are as follows: high-risk complaint tendency has complaint to be inclined to and be inclined to without complaining;
Disaggregated model generation module 600, for being respectively set corresponding for each work order data for complaining tendency classification One output vector, and using the sentence vector of the work order data as the first input vector, it is generated and is thrown using Softmax logistic regression Tell tendency disaggregated model;
Judgment module 700, for complaining tendency disaggregated model to judge that class is inclined in the complaint of new work order data using described Not;
Warning module 800, for determining that the complaint of the new work order data is inclined to classification as height in the judgment module In the case that danger complains tendency or has complaint to be inclined to, early warning is made.
Optionally, the term vector generation module includes:
First judging unit, the frequency occurred for calculating second participle according to the following formula, and judge described the Whether the frequency that two participles occur is greater than the first preset threshold:
Wherein, P (wi) is the frequency that the second participle occurs, and f (wi) is the frequency of occurrence of the second participle, and wi is second point Word, i=1,2,3...x, x are the quantity of the second participle, and n is the second preset threshold;
Culling unit, it is default for determining that the frequency that second participle occurs is greater than first in first judging unit In the case where threshold value, determine frequency be greater than the first preset threshold second participle be high frequency segment, and by the high frequency participle from It is rejected in the second participle combination, the second participle combination after high frequency segments will be rejected and combined as third participle;
Training pattern construction unit, for using third described in the skip-gram model construction in word2vec model point The training pattern of word combination;
First generation unit generates the word of each participle in the third participle combination for utilizing the training pattern Vector.
Optionally, the division module includes:
Selection unit randomly selects three sentence vectors respectively as the center of three clusters for utilizing k-means algorithm, The center of three clusters is denoted as C1, C2 and C3 respectively;
First computing unit, for calculate separately between each sentence vector and the center of three clusters it is European away from From, the determining Ci nearest with the Euclidean distance of each sentence vector, and the sentence vector is referred to the corresponding cluster of Ci, In, i=1,2,3;
Second computing unit, for calculating the mean value of each dimension of all vectors in each cluster, by the mean value group At new center of the vector as the cluster;
Second judgment unit, for judge the cluster new center and the cluster randomly selected center whether one It causes, if inconsistent, returns to the operation for executing first computing unit, until the new center of each cluster and preceding primary calculating Center it is consistent, and using the new center of the cluster as target's center.
Optionally, the judgment module includes:
Second generation unit, for generating sentence vector corresponding with the new work order data;
First acquisition unit, for being inclined to classification mould for the corresponding sentence vector of the new work order data as the complaint Second input vector of type obtains the second output vector corresponding with second input vector;
Second acquisition unit, for by second output vector compared with first output vector, obtain with it is described It is defeated to be made target by corresponding first output vector of the second output vector for corresponding first output vector of second output vector Outgoing vector;
Determination unit, for determining that classification is inclined in complaint corresponding with the target output vector, and the target is defeated Outgoing vector is corresponding to complain tendency classification to be inclined to classification as the complaint of the new work order data.
It is required that those skilled in the art can be understood that the technology in the embodiment of the present invention can add by software The mode of general hardware platform realize.Based on this understanding, the technical solution in the embodiment of the present invention substantially or Say that the part that contributes to existing technology can be embodied in the form of software products, which can deposit Storage is in storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions are used so that computer equipment (can be with It is personal computer, server or the network equipment etc.) execute certain part institutes of each embodiment of the present invention or embodiment The method stated.
Same and similar part may refer to each other between each embodiment in this specification.Implement especially for device For example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring in embodiment of the method Explanation.
Combine detailed description and exemplary example that the application is described in detail above, but these explanations are simultaneously It should not be understood as the limitation to the application.It will be appreciated by those skilled in the art that without departing from the application spirit and scope, A variety of equivalent substitution, modification or improvements can be carried out to technical scheme and embodiments thereof, these each fall within the application In the range of.The protection scope of the application is determined by the appended claims.

Claims (8)

1. a kind of complaint trend analysis method for early warning of work order data characterized by comprising
Work order data are obtained, as unit of the identical work order data of work order odd numbers, to each work order for being under the jurisdiction of the same unit Data are segmented, and first participle combination is obtained;
The stop words in the first participle combination is deleted, the second participle combination of the work order data is obtained;
The term vector of each participle in the second participle portmanteau word is generated using word2vec model;
The average vector for solving the corresponding term vector of the work order data, using the average vector as the sentence of the work order data Vector;
The sentence vector of each work order data is divided into three clusters using k-means algorithm, described in three clusters are corresponding Classification is inclined in three complaints of work order data, wherein three complaints tendency classification is respectively as follows: high-risk complaint tendency, has throwing It tells tendency and is inclined to without complaining;
Corresponding first output vector is respectively set for each work order data for complaining tendency classification, and by the work order data Sentence vector is generated using Softmax logistic regression as the first input vector and complains tendency disaggregated model;
Tendency disaggregated model is complained to judge that classification is inclined in the complaint of new work order data using described;
If the complaint tendency classification of the new work order data is that high-risk complaint is inclined to or has complaint to be inclined to, early warning is made.
2. the method according to claim 1, wherein described generate second participle using word2vec model The term vector of each second participle in portmanteau word, comprising:
The frequency that second participle occurs is calculated according to the following formula, and whether the frequency for judging that second participle occurs is big In the first preset threshold:
Wherein, P (wi) it is the frequency that the second participle occurs, f (wi) it is the second frequency of occurrence segmented, wiFor the second participle, i= 1,2,3...x, x are the quantity of the second participle, and n is the second preset threshold;
If the frequency that second participle occurs is greater than the first preset threshold, it is determined that frequency is greater than the second of the first preset threshold Participle is that high frequency segments, and high frequency participle is rejected from the second participle combination, will reject the after high frequency participle Two participle combinations are as third participle combination;
Combined training pattern is segmented using third described in the skip-gram model construction in word2vec model;
Using the training pattern, the term vector of each participle in the third participle combination is generated.
3. the method according to claim 1, wherein described utilize k-means algorithm by each work order number According to sentence vector be divided into three clusters, three clusters correspond to three complaints tendency classification of the work order data, comprising:
Step 301, using k-means algorithm, three sentence vectors are randomly selected respectively as the center of three clusters, by described three The center of cluster is denoted as C respectively1、C2And C3
Step 302, calculate separately the Euclidean distance between each sentence vector and the center of three clusters, it is determining with it is each The nearest C of the Euclidean distance of the sentence vectori, and the sentence vector is referred to CiCorresponding cluster, wherein i=1,2,3;
Step 303, the mean value of each dimension of all vectors in each cluster is calculated, the vector that the mean value is formed is as institute State the new center of cluster;
Step 304, whether the new center for judging the cluster is consistent with the center of the cluster randomly selected, if inconsistent, 302 operation is returned to step, until the new center of each cluster is consistent with the preceding center once calculated, and by the cluster New center is as target's center.
4. the method according to claim 1, wherein judging new work order using complaint tendency disaggregated model Classification is inclined in the complaint of data, comprising:
Generate sentence vector corresponding with the new work order data;
Using the corresponding sentence vector of the new work order data as second input vector for complaining tendency disaggregated model, obtain The second output vector corresponding with second input vector;
By second output vector compared with first output vector, obtain and second output vector corresponding first Corresponding first output vector of second output vector is made target output vector by output vector;
It determines that classification is inclined in complaint corresponding with the target output vector, and the corresponding complaint of the target output vector is inclined Classification is inclined in complaint to classification as the new work order data.
5. a kind of complaint trend analysis prior-warning device of work order data characterized by comprising
Obtain module, for obtaining work order data, as unit of the identical work order data of work order odd numbers, to it is each be under the jurisdiction of it is same The work order data of a unit are segmented, and first participle combination is obtained;
Removing module obtains the second participle group of the work order data for deleting the stop words in the first participle combination It closes;
Term vector generation module, for using word2vec model generate it is described second participle portmanteau word in each participle word to Amount;
Sentence vector generation module, for solving the average vector of the corresponding term vector of the work order data, by the average vector Sentence vector as the work order data;
Division module, for the sentence vector of each work order data to be divided into three clusters using k-means algorithm, described three A cluster corresponds to three complaints tendency classification of the work order data, wherein three complaints tendency classification is respectively as follows: high-risk throwing It tells tendency, there is complaint to be inclined to and be inclined to without complaining;
Disaggregated model generation module, for for it is each complain tendency classification work order data be respectively set corresponding first export to Amount, and using the sentence vector of the work order data as the first input vector, it is generated using Softmax logistic regression and complains tendency point Class model;
Judgment module, for complaining tendency disaggregated model to judge that classification is inclined in the complaint of new work order data using described;
Warning module is that high-risk complaint is inclined for determining that classification is inclined in the complaint of the new work order data in the judgment module To or have complaint be inclined in the case where, make early warning.
6. device according to claim 5, which is characterized in that the term vector generation module includes:
First judging unit, the frequency occurred for calculating second participle according to the following formula, and judge described second point Whether the frequency that word occurs is greater than the first preset threshold:
Wherein, P (wi) it is the frequency that the second participle occurs, f (wi) it is the second frequency of occurrence segmented, wiFor the second participle, i= 1,2,3...x, x are the quantity of the second participle, and n is the second preset threshold;
Culling unit, the frequency for determining that second participle occurs in first judging unit are greater than the first preset threshold In the case where, the second participle for determining that frequency is greater than the first preset threshold is that high frequency segments, and the high frequency is segmented from described It is rejected in second participle combination, the second participle combination after high frequency segments will be rejected and combined as third participle;
Training pattern construction unit, for using third participle group described in the skip-gram model construction in word2vec model The training pattern of conjunction;
First generation unit generates the term vector of each participle in the third participle combination for utilizing the training pattern.
7. device according to claim 5, which is characterized in that the division module includes:
Selection unit randomly selects three sentence vectors respectively as the center of three clusters, by institute for utilizing k-means algorithm The center for stating three clusters is denoted as C respectively1、C2And C3
First computing unit, for calculating separately the Euclidean distance between each sentence vector and the center of three clusters, The determining C nearest with the Euclidean distance of each sentence vectori, and the sentence vector is referred to CiCorresponding cluster, wherein i= 1,2,3;
Second computing unit forms the mean value for calculating the mean value of each dimension of all vectors in each cluster New center of the vector as the cluster;
Second judgment unit, whether the new center for judging the cluster is consistent with the center of the cluster randomly selected, if It is inconsistent, then the operation for executing first computing unit is returned to, until the new center of each cluster and preceding once calculating The heart is consistent, and using the new center of the cluster as target's center.
8. device according to claim 5, which is characterized in that the judgment module includes:
Second generation unit, for generating sentence vector corresponding with the new work order data;
First acquisition unit, for being inclined to disaggregated model for the corresponding sentence vector of the new work order data as described complain Second input vector obtains the second output vector corresponding with second input vector;
Second acquisition unit, for compared with first output vector, obtaining and described second second output vector Corresponding first output vector of output vector, by corresponding first output vector of second output vector make target export to Amount;
Determination unit, for determining corresponding with target output vector complaint tendency classification, and by the target export to Measure the corresponding complaint tendency classification for complaining tendency classification as the new work order data.
CN201811631912.3A 2018-12-29 2018-12-29 Complaint tendency analysis early warning method and device for work order data Active CN109710766B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811631912.3A CN109710766B (en) 2018-12-29 2018-12-29 Complaint tendency analysis early warning method and device for work order data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811631912.3A CN109710766B (en) 2018-12-29 2018-12-29 Complaint tendency analysis early warning method and device for work order data

Publications (2)

Publication Number Publication Date
CN109710766A true CN109710766A (en) 2019-05-03
CN109710766B CN109710766B (en) 2023-01-20

Family

ID=66258208

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811631912.3A Active CN109710766B (en) 2018-12-29 2018-12-29 Complaint tendency analysis early warning method and device for work order data

Country Status (1)

Country Link
CN (1) CN109710766B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110713088A (en) * 2019-10-25 2020-01-21 日立楼宇技术(广州)有限公司 Early warning method, device, equipment and medium for elevator complaints
CN110796554A (en) * 2019-09-06 2020-02-14 中国平安财产保险股份有限公司 User complaint early warning method and device, computer equipment and storage medium
CN111949795A (en) * 2020-08-14 2020-11-17 中国工商银行股份有限公司 Work order automatic classification method and device
CN112597752A (en) * 2020-12-18 2021-04-02 平安银行股份有限公司 Complaint text processing method and device, electronic equipment and storage medium
CN113343711A (en) * 2021-06-29 2021-09-03 南方电网数字电网研究院有限公司 Work order generation method, device, equipment and storage medium
CN113836307A (en) * 2021-10-15 2021-12-24 国网北京市电力公司 Power supply service work order hotspot discovery method, system and device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106530127A (en) * 2016-11-09 2017-03-22 国网江苏省电力公司南京供电公司 Complaint early warning and monitoring analysis system based on text mining
CN107861942A (en) * 2017-10-11 2018-03-30 国网浙江省电力公司电力科学研究院 A kind of electric power based on deep learning is doubtful to complain work order recognition methods
CN108108352A (en) * 2017-12-18 2018-06-01 广东广业开元科技有限公司 A kind of enterprise's complaint risk method for early warning based on machine learning Text Mining Technology

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106530127A (en) * 2016-11-09 2017-03-22 国网江苏省电力公司南京供电公司 Complaint early warning and monitoring analysis system based on text mining
CN107861942A (en) * 2017-10-11 2018-03-30 国网浙江省电力公司电力科学研究院 A kind of electric power based on deep learning is doubtful to complain work order recognition methods
CN108108352A (en) * 2017-12-18 2018-06-01 广东广业开元科技有限公司 A kind of enterprise's complaint risk method for early warning based on machine learning Text Mining Technology

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘兴平等: "电力企业投诉工单文本挖掘模型", 《电力需求侧管理》 *
吴刚勇等: "基于自然语言处理技术的电力客户投诉工单文本挖掘分析", 《电力大数据》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110796554A (en) * 2019-09-06 2020-02-14 中国平安财产保险股份有限公司 User complaint early warning method and device, computer equipment and storage medium
CN110796554B (en) * 2019-09-06 2024-05-24 中国平安财产保险股份有限公司 User complaint early warning method and device, computer equipment and storage medium
CN110713088A (en) * 2019-10-25 2020-01-21 日立楼宇技术(广州)有限公司 Early warning method, device, equipment and medium for elevator complaints
CN110713088B (en) * 2019-10-25 2021-06-01 日立楼宇技术(广州)有限公司 Early warning method, device, equipment and medium for elevator complaints
CN111949795A (en) * 2020-08-14 2020-11-17 中国工商银行股份有限公司 Work order automatic classification method and device
CN112597752A (en) * 2020-12-18 2021-04-02 平安银行股份有限公司 Complaint text processing method and device, electronic equipment and storage medium
CN112597752B (en) * 2020-12-18 2023-09-19 平安银行股份有限公司 Complaint text processing method and device, electronic equipment and storage medium
CN113343711A (en) * 2021-06-29 2021-09-03 南方电网数字电网研究院有限公司 Work order generation method, device, equipment and storage medium
CN113343711B (en) * 2021-06-29 2024-05-10 南方电网数字电网研究院有限公司 Work order generation method, device, equipment and storage medium
CN113836307A (en) * 2021-10-15 2021-12-24 国网北京市电力公司 Power supply service work order hotspot discovery method, system and device and storage medium
CN113836307B (en) * 2021-10-15 2024-02-20 国网北京市电力公司 Power supply service work order hot spot discovery method, system, device and storage medium

Also Published As

Publication number Publication date
CN109710766B (en) 2023-01-20

Similar Documents

Publication Publication Date Title
CN109710766A (en) A kind of the complaint trend analysis method for early warning and device of work order data
CN105184315B (en) A kind of quality inspection processing method and system
CN107766929B (en) Model analysis method and device
US8700551B2 (en) Systems and methods for identifying provider noncustomers as likely acquisition targets
CN108573031A (en) A kind of complaint sorting technique and system based on content
CN104321794A (en) A system and method using multi-dimensional rating to determine an entity's future commercial viability
CN110348516B (en) Data processing method, data processing device, storage medium and electronic equipment
CN106408325A (en) User consumption behavior prediction analysis method based on user payment information and system
CN110415103A (en) The method, apparatus and electronic equipment that tenant group mentions volume are carried out based on variable disturbance degree index
CN109840676B (en) Big data-based wind control method and device, computer equipment and storage medium
US11971873B2 (en) Real-time anomaly determination using integrated probabilistic system
CN110349007A (en) The method, apparatus and electronic equipment that tenant group mentions volume are carried out based on variable discrimination index
CN115396389A (en) Internet of things information technology customer service system based on emotion energy perception
CN115760332A (en) Risk prediction method, system, medium and device based on enterprise data analysis
CN113704389A (en) Data evaluation method and device, computer equipment and storage medium
CN117291707A (en) Loan application processing method, device, electronic equipment and storage medium
Wang Research on bank marketing behavior based on machine learning
CN115168603B (en) Automatic feedback response method, device and storage medium for color ring back tone service process
CN114511022B (en) Feature screening, behavior recognition model training and abnormal behavior recognition method and device
Nurhidayat et al. Analysis and Classification of Customer Churn Using Machine Learning Models
CN115293275A (en) Data identification method and device, electronic equipment and storage medium
CN114781937A (en) Method and device for pre-paid card enterprise risk early warning and storage medium
CN113888318A (en) Risk detection method and system
CN111191688A (en) User staging number management method and device and electronic equipment
CN109308565A (en) The recognition methods of crowd's performance ratings, device, storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant