CN109710766A

CN109710766A - A kind of the complaint trend analysis method for early warning and device of work order data

Info

Publication number: CN109710766A
Application number: CN201811631912.3A
Authority: CN
Inventors: 杨政; 刘柱揆; 尹春林; 潘侃; 朱华
Original assignee: Electric Power Research Institute of Yunnan Power System Ltd
Current assignee: Electric Power Research Institute of Yunnan Power System Ltd
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2019-05-03
Anticipated expiration: 2038-12-29
Also published as: CN109710766B

Abstract

The application provides the complaint trend analysis method for early warning and device of a kind of work order data, wherein the described method includes: being segmented and being deleted stop words to each work order data, obtains the combination of the second participle；Term vector is generated using word2vec model；Solve the corresponding sentence vector of the work order data；Each vector is divided into three clusters using k-means algorithm；It is generated using Softmax logistic regression and complains tendency disaggregated model；Utilize the complaint tendency classification for complaining tendency disaggregated model to judge new work order data；If the complaint tendency classification of the new work order data is that high-risk complaint is inclined to or has complaint to be inclined to, early warning is made.In method provided by the present application, it is generated using a large amount of work order data and complains tendency disaggregated model, on the basis of complaining tendency disaggregated model, new work order data complain with the prediction of tendency classification, according to prediction result, the purpose of timely active forewarning is realized, to solve the problems, such as existing manual analysis method inefficiency.

Description

A kind of the complaint trend analysis method for early warning and device of work order data

Technical field

This application involves big data applied technical fields, and in particular to a kind of pre- police of complaint trend analysis of work order data Method and device.

Background technique

With the fast development and commonly used, electric power enterprise unit at different levels, each business department of informatization of power industry construction Door realizes information-based all standing substantially.Wherein, 95598 customer service systems are the important windows of electric power enterprise and customer communication Mouthful, a large amount of non-structured work order data are had accumulated in the system, electric power enterprise understands client according to the content of work order data Intention and attitude, promoted service quality.

It is large number of due to work order data, and the urgency level of each work order data has differences, for urgency level Higher work order data, if electric power enterprise is not handled in time, it is likely that by customer complaint.In order to reduce by customer complaint Risk, electric power enterprise need to analyze work order data, work order data are carried out to complain tendency grade classification, and incline to complaint Early warning is made to higher ranked work order data, electric power enterprise, being capable of quick, proactive and specific aim according to early warning situation Ground takes measures.

The complaint trend analysis of existing work order data is still within the stage of manual analysis, and manual analysis can not be to work order Data are timely and effectively handled, and the problem of so as to cause existing analysis method inefficiency, therefore, need a kind of energy at present It is enough that timely and effectively work order data are analyzed, and the method for complaining tendency early warning is made in time.

Summary of the invention

The application provides the complaint trend analysis method for early warning and device of a kind of work order data, to solve existing manual analysis The problem of method inefficiency.

The application's in a first aspect, provide a kind of complaint trend analysis method for early warning of work order data, comprising:

Work order data are obtained, as unit of the identical work order data of work order odd numbers, are under the jurisdiction of the same unit to each Work order data are segmented, and first participle combination is obtained；

The stop words in the first participle combination is deleted, the second participle combination of the work order data is obtained；

The term vector of each participle in the second participle portmanteau word is generated using word2vec model；

The average vector for solving the corresponding term vector of the work order data, using the average vector as the work order data Sentence vector；

The sentence vector of each work order data is divided into three clusters using k-means algorithm, three clusters are corresponding Classification is inclined in three complaints of the work order data, wherein three complaint tendency classification be respectively as follows: high-risk complaint be inclined to, There is complaint to be inclined to and be inclined to without complaining；

Corresponding first output vector is respectively set for each work order data for complaining tendency classification, and by the work order number According to sentence vector as the first input vector, utilize Softmax logistic regression to generate and complain tendency disaggregated model；

Tendency disaggregated model is complained to judge that classification is inclined in the complaint of new work order data using described；

If the complaint tendency classification of the new work order data is that high-risk complaint is inclined to or has complaint to be inclined to, make pre- It is alert.

Optionally, it is described using word2vec model generate it is described second participle portmanteau word in it is each second participle word to Amount, comprising:

The frequency that second participle occurs is calculated according to the following formula, and the frequency for judging that second participle occurs is It is no to be greater than the first preset threshold:

Wherein, P (wi) is the frequency that the second participle occurs, and f (wi) is the frequency of occurrence of the second participle, and wi is second point Word, i=1,2,3...x, x are the quantity of the second participle, and n is the second preset threshold；

If the frequency that second participle occurs is greater than the first preset threshold, it is determined that frequency is greater than the first preset threshold Second participle is that high frequency segments, and high frequency participle is rejected from the second participle combination, after rejecting high frequency participle Second participle combination as third participle combination；

Combined training pattern is segmented using third described in the skip-gram model construction in word2vec model；

Using the training pattern, the term vector of each participle in the third participle combination is generated.

Optionally, described that the sentence vector of each work order data is divided into three clusters using k-means algorithm, it is described Three clusters correspond to three complaints tendency classification of the work order data, comprising:

Step 301, using k-means algorithm, three sentence vectors are randomly selected respectively as the center of three clusters, it will be described The center of three clusters is denoted as C1, C2 and C3 respectively；

Step 302, calculate separately the Euclidean distance between each sentence vector and the center of three clusters, determine with The nearest Ci of the Euclidean distance of each sentence vector, and the sentence vector is referred to the corresponding cluster of Ci, wherein i=1,2, 3；

Step 303, the mean value for calculating each dimension of all vectors in each cluster, the vector that the mean value is formed are made For the new center of the cluster；

Step 304, whether the new center for judging the cluster is consistent with the center of the cluster randomly selected, if different It causes, then returns to step 302 operation, until the new center of each cluster is consistent with the preceding center once calculated, and by institute The new center of cluster is stated as target's center.

Optionally, tendency disaggregated model is complained to judge that classification is inclined in the complaint of new work order data using described, comprising:

Generate sentence vector corresponding with the new work order data；

The second input vector of disaggregated model is inclined to using the corresponding sentence vector of the new work order data as the complaint, Obtain the second output vector corresponding with second input vector；

By second output vector compared with first output vector, obtain corresponding with second output vector Corresponding first output vector of second output vector is made target output vector by the first output vector；

Determine corresponding with target output vector complaint tendency classification, and by the corresponding throwing of the target output vector Tell that tendency classification is inclined to classification as the complaint of the new work order data.

The second aspect of the application provides a kind of complaint trend analysis prior-warning device of work order data, comprising:

Module is obtained as unit of the identical work order data of work order odd numbers, to be under the jurisdiction of to each for obtaining work order data The work order data of the same unit are segmented, and first participle combination is obtained；

Removing module obtains second point of the work order data for deleting the stop words in the first participle combination Word combination；

Term vector generation module, for generating each participle in the second participle portmanteau word using word2vec model Term vector；

Sentence vector generation module will be described average for solving the average vector of the corresponding term vector of the work order data Sentence vector of the vector as the work order data；

Division module, for the sentence vector of each work order data to be divided into three clusters, institute using k-means algorithm State three complaints tendency classification that three clusters correspond to the work order data, wherein three complaints tendency classification is respectively as follows: height Danger complains tendency, has complaint to be inclined to and be inclined to without complaining；

Disaggregated model generation module, it is defeated for being respectively set corresponding first for each work order data for complaining tendency classification Outgoing vector, and using the sentence vector of the work order data as the first input vector, it generates to complain using Softmax logistic regression and incline To disaggregated model；

Judgment module, for complaining tendency disaggregated model to judge that classification is inclined in the complaint of new work order data using described；

Warning module is high-risk throwing for determining that classification is inclined in the complaint of the new work order data in the judgment module In the case where telling tendency or thering is complaint to be inclined to, early warning is made.

Optionally, the term vector generation module includes:

First judging unit, the frequency occurred for calculating second participle according to the following formula, and judge described the Whether the frequency that two participles occur is greater than the first preset threshold:

Culling unit, it is default for determining that the frequency that second participle occurs is greater than first in first judging unit In the case where threshold value, determine frequency be greater than the first preset threshold second participle be high frequency segment, and by the high frequency participle from It is rejected in the second participle combination, the second participle combination after high frequency segments will be rejected and combined as third participle；

Training pattern construction unit, for using third described in the skip-gram model construction in word2vec model point The training pattern of word combination；

First generation unit generates the word of each participle in the third participle combination for utilizing the training pattern Vector.

Optionally, the division module includes:

Selection unit randomly selects three sentence vectors respectively as the center of three clusters for utilizing k-means algorithm, The center of three clusters is denoted as C1, C2 and C3 respectively；

First computing unit, for calculate separately between each sentence vector and the center of three clusters it is European away from From, the determining Ci nearest with the Euclidean distance of each sentence vector, and the sentence vector is referred to the corresponding cluster of Ci, In, i=1,2,3；

Second computing unit, for calculating the mean value of each dimension of all vectors in each cluster, by the mean value group At new center of the vector as the cluster；

Second judgment unit, for judge the cluster new center and the cluster randomly selected center whether one It causes, if inconsistent, returns to the operation for executing first computing unit, until the new center of each cluster and preceding primary calculating Center it is consistent, and using the new center of the cluster as target's center.

Optionally, the judgment module includes:

Second generation unit, for generating sentence vector corresponding with the new work order data；

First acquisition unit, for being inclined to classification mould for the corresponding sentence vector of the new work order data as the complaint Second input vector of type obtains the second output vector corresponding with second input vector；

Second acquisition unit, for by second output vector compared with first output vector, obtain with it is described It is defeated to be made target by corresponding first output vector of the second output vector for corresponding first output vector of second output vector Outgoing vector；

Determination unit, for determining that classification is inclined in complaint corresponding with the target output vector, and the target is defeated Outgoing vector is corresponding to complain tendency classification to be inclined to classification as the complaint of the new work order data.

From the above technical scheme, the application provides the complaint trend analysis method for early warning and dress of a kind of work order data It sets, wherein be under the jurisdiction of as unit of the identical work order data of work order odd numbers to each the described method includes: obtaining work order data The work order data of the same unit are segmented, and first participle combination is obtained；The stop words in the first participle combination is deleted, Obtain the second participle combination of the work order data；Each point is generated in the second participle portmanteau word using word2vec model The term vector of word；The average vector for solving the corresponding term vector of the work order data, obtain the corresponding sentence of the work order data to Amount；The sentence vector of each work order data is divided into three clusters using k-means algorithm；Classification is inclined to for each complain Work order data are respectively set corresponding first output vector, and using the sentence vector of the work order data as the first input vector, It is generated using Softmax logistic regression and complains tendency disaggregated model；New work order is judged using complaint tendency disaggregated model Classification is inclined in the complaint of data；If the complaint tendency classification of the new work order data is that high-risk complaint is inclined to or has complaint to incline To then making early warning.

It in method provided by the present application, is generated using a large amount of work order data and complains tendency disaggregated model, complaining tendency point On the basis of class model, new work order data complain with the prediction of tendency classification, according to prediction result, is realized in time actively The purpose of early warning, to solve the problems, such as existing manual analysis method inefficiency.

Detailed description of the invention

In order to illustrate more clearly of the technical solution of the application, letter will be made to attached drawing needed in the embodiment below Singly introduce, it should be apparent that, for those of ordinary skills, without creative efforts, also Other drawings may be obtained according to these drawings without any creative labor.

Fig. 1 is a kind of workflow of the complaint trend analysis method for early warning of work order data provided by the embodiments of the present application Figure；

Fig. 2 is in a kind of complaint trend analysis method for early warning of work order data provided by the embodiments of the present application, term vector The work flow diagram of generation method；

Fig. 3 is in a kind of complaint trend analysis method for early warning of work order data provided by the embodiments of the present application, by sentence vector It is divided into the work flow diagram of the method for three clusters；

Fig. 4 is to judge new in a kind of complaint trend analysis method for early warning of work order data provided by the embodiments of the present application The work flow diagram of the complaint tendency class method for distinguishing of work order data；

Fig. 5 is a kind of structural representation of the complaint trend analysis prior-warning device of work order data provided by the embodiments of the present application Figure.

Specific embodiment

To solve the problems, such as that existing manual analysis method inefficiency, the application provide a kind of complaint tendency of work order data Analysis and early warning method and device.

Work flow diagram shown in referring to Fig.1, the complaint trend analysis that the embodiment of the present application provides a kind of work order data are pre- Alarm method, comprising the following steps:

Step 101, obtain work order data, as unit of the identical work order data of work order odd numbers, to it is each be under the jurisdiction of it is same The work order data of a unit are segmented, and first participle combination is obtained.

In the step, real-time or offline mode work order text is obtained by 95598 customer service systems, by work order text As work order data, wherein the work order text includes following field: work order odd numbers, incoming call content, applicant, phone number, Handler, urgency level, station address, time started, creation work order time, work order end time, current link, work order class Type.Wherein, work order type may include 7 seed types, be respectively: praise, troublshooting, suggestion, report, complaint, business consultation And opinion.

In a kind of achievable mode, as unit of the identical work order data of work order odd numbers, using Python, Under the support of jieba kit, using accurate model, work order data are segmented.In the mistake segmented to work order data Cheng Zhong defines the relevant industry dictionary of power industry, for example, by the relevant industry dictionary of power industry is defined as: stop the financial moon Power transmission is spaced apart, stops sending the words such as a little.

Step 102, the stop words in the first participle combination is deleted, the second participle group of the work order data is obtained It closes.

In a kind of achievable mode, using Harbin Institute of Technology's stop words dictionary, first participle combination is handled, is deleted Stop words in first participle combination obtains the combination of the second participle.

Step 103, the term vector of each participle in the second participle portmanteau word is generated using word2vec model.

Word2vec model is the correlation model that a group is used to generate term vector, these models are the shallow and double-deck nerve net Network is used to training with the word text of construction linguistics again.Neural network need to guess adjacent position with word text presentation Input word, in the case where the bag of words in word2vec model are assumed, the sequence of word text is unessential.After training is completed, Word2vec model can be used to map each word text to a vector, which indicates word to the relationship between word, the vector For the hidden layer of neural network.In the embodiment of the present application, using each participle in the second participle portmanteau word as Word2vec mould The word text of type carries out the generation of term vector.

Term vector is the vector that the word or expression of vocabulary is mapped to real number.In concept, it is related to from each Mathematics insertion of the one-dimensional space of word to the vector row space with more low dimensional.The application asks in embodiment, and second point Word combination is vocabulary.

Step 104, the average vector for solving the corresponding term vector of the work order data, using the average vector as described in The sentence vector of work order data.

Sentence vector is the vector that a sentence or paragraph are mapped to real number.In a kind of achievable mode, according to every The corresponding term vector of work order data, sums to these term vectors, the term vector summed；Then to the word of summation to Amount is averaging, and generates the sentence vector of every work order data.

In the step, it is assumed that the corresponding second participle group of a work order data is combined into " client seeks advice from family raise-position problem ", If segmenting " client " corresponding vector is [1,0,0,0,0]；Segmenting " consulting " corresponding vector is [0,1,0,0,0]；Segment " family Number " corresponding vector is [0,0,1,0,0]；Segmenting " raise-position " corresponding vector is [0,0,0,1,0]；It is corresponding to segment " problem " Vector is [0,0,0,0,1], then the corresponding sentence vector of this work order data should be [0.2,0.2,0.2,0.2,0.2].

Step 105, the sentence vector of each work order data is divided into three clusters using k-means algorithm, described three A cluster corresponds to three complaints tendency classification of the work order data, wherein three complaints tendency classification is respectively as follows: high-risk throwing It tells tendency, there is complaint to be inclined to and be inclined to without complaining.

K-means algorithm is input cluster number k, and the database comprising n data object, output meet variance most A kind of algorithm of small standard k cluster.K-means algorithm receives input quantity k；Then n data object is divided into k cluster To meet cluster obtained: the object similarity in same cluster is higher；And the object similarity in different clusters It is smaller.In the embodiment of the present application, using the corresponding sentence vector of each work order data as data object, the value for clustering number k is 3。

Step 106, corresponding first output vector is respectively set for each work order data for complaining tendency classification, and by institute The sentence vector of work order data is stated as the first input vector, is generated using Softmax logistic regression and complains tendency disaggregated model.

Softmax Logic Regression Models are popularization of the logistic regression model in more classification problems, are asked in more classification In topic, class label y can take more than two values.It include a parameter set in Softmax logistic regression, the embodiment of the present application makes The parameter in the parameter set is optimized with gradient descent algorithm, when optimization, learning rate learning_rate is set as 0.1.

In a kind of achievable mode, the embodiment of the present application is provided a kind of generated using Softmax logistic regression and complained The method for being inclined to disaggregated model, comprising the following steps:

(1) Python is used, under the support of tensorflow kit, specifies and assumes that function is hypothesis =tf.nn.softmax (tf.matmul (X, W)+b), wherein X is sentence vector, and W is weight, and b is offset；

(2) Python is used, under the support of tensorflow kit, uses cross entropy as objective function, It is denoted as: cost=tf.reduce_mean (- tf.reduce_sum (Y*tf.log (hypothesis), axis=1))；

(3) Python is used, under the support of tensorflow kit, declines optimization weight using gradient and joins Number, optimizer=tf.train.GradientDescentOptimizer (learning_rate=0.1) .minimize (cost)；

(4) by above-mentioned (1), (2) and (3) step, using the work order data with the first output vector as training data, Generate the disaggregated model for complaining tendency.

Step 107, tendency disaggregated model is complained to judge that classification is inclined in the complaint of new work order data using described.

Step 108, if the complaint tendency classification of the new work order data is that high-risk complaint is inclined to or has complaint to be inclined to, Make early warning.

For example, the content of a work order data is as follows: " one, somewhere building electricity box due to catching fire in case before, burn by existing electricity box Burnt sign is serious, and inlet wire and outlet have and burn sign, and worry has security risk, it is desirable that handles as early as possible ", if disaggregated model The grade of prediction is high-risk complaint tendency, then means there are complaint risk, will early warning immediately, and notify at related service department Reason.

From the above technical scheme, it in method provided by the present application, is generated using a large amount of work order data and complains tendency point Class model complain to new work order data the prediction of tendency classification, according to pre- on the basis of complaining tendency disaggregated model It surveys as a result, the purpose of timely active forewarning is realized, to solve the problems, such as existing manual analysis method inefficiency.

It is described to be generated in the second participle portmanteau word using word2vec model referring to work flow diagram shown in Fig. 2 The term vector of each second participle, comprising the following steps:

Step 201, the frequency that second participle occurs is calculated according to the following formula, and judges that second participle occurs Frequency whether be greater than the first preset threshold:

Wherein, P (wi) is the frequency that the second participle occurs, and f (wi) is the frequency of occurrence of the second participle, and wi is second point Word, i=1,2,3...x, x are the quantity of the second participle, and n is the second preset threshold.

Step 202, if the frequency that second participle occurs is greater than the first preset threshold, it is determined that it is pre- that frequency is greater than first If the second participle of threshold value is that high frequency segments, and high frequency participle is rejected from the second participle combination, high by rejecting The second participle combination after frequency division word is as third participle combination.

In a kind of achievable mode, n takes 1e-5.First preset threshold takes 0.8, at this point, as the corresponding P of the second participle (wi) >=0.8 when, which is that high frequency segments, then second participle will be deleted.

Step 203, combined training mould is segmented using third described in the skip-gram model construction in word2vec model Type.

For each third participle in third participle combination, training data is constructed, the format of training data is (defeated Enter participle, output participle).Firstly, finding out the upper of each third participle from third participle combination using skip_window=2 Hereafter；Secondly, being based on context, training data is constructed.For example, it is assumed that the corresponding third participle combination of a work order data are as follows: " client seeks advice from family raise-position problem ", if " family number " is input participle, the training data generated includes: (family number, client)； (family number, consulting)；(family number, raise-position) and (family number, problem).

Training pattern is constructed using three layers of full Connection Neural Network, which includes: input layer, hidden layer and output Layer, the training pattern from training data carry out weight update using negative sample mode (negative sampling).Wherein, hidden It include 100 neurons containing layer.

Step 204, using the training pattern, the term vector of each participle in the third participle combination is generated.

Referring to work flow diagram shown in Fig. 3, it is described using k-means algorithm by the sentence vector of each work order data Three clusters are divided into, three clusters correspond to three complaints tendency classification of the work order data, comprising the following steps:

Step 301, using k-means algorithm, three sentence vectors are randomly selected respectively as the center of three clusters, it will be described The center of three clusters is denoted as C1, C2 and C3 respectively.

Step 302, calculate separately the Euclidean distance between each sentence vector and the center of three clusters, determine with The nearest Ci of the Euclidean distance of each sentence vector, and the sentence vector is referred to the corresponding cluster of Ci, wherein i=1,2, 3。

Step 303, the mean value for calculating each dimension of all vectors in each cluster, the vector that the mean value is formed are made For the new center of the cluster.

Step 304, whether the new center for judging the cluster is consistent with the center of the cluster randomly selected, if different It causes, then returns to step 302 operation, until the new center of each cluster is consistent with the preceding center once calculated.

Step 305, using the new center of the cluster as target's center.

Referring to work flow diagram shown in Fig. 4, the throwing for complaining tendency disaggregated model to judge new work order data is utilized Tell tendency classification, comprising the following steps:

Step 401, sentence vector corresponding with the new work order data is generated.

Step 402, the second of tendency disaggregated model is complained using the corresponding sentence vector of the new work order data as described Input vector obtains the second output vector corresponding with second input vector.

Step 403, by second output vector compared with first output vector, obtain with described second export to Corresponding first output vector is measured, corresponding first output vector of second output vector is made into target output vector.

Step 404, corresponding with target output vector complaint tendency classification is determined, and by the target output vector It is corresponding that tendency classification is complained to be inclined to classification as the complaint of the new work order data.

In the embodiment of the present application, if 3 the first output vectors be respectively as follows: [1,0,0,0,0], [0,0,1,0,0] and [0,0, 0,0,1], wherein [1,0,0,0,0] represents high-risk complaint tendency, and [0,0,1,0,0], which represents, has complaint to be inclined to, [0,0,0,0,1] It represents and is inclined to without complaint.

If corresponding second output vector of the sentence vector of new work order data be [1,0,0,0,0], then this second export to Measuring corresponding first output vector is [1,0,0,0,0], and the complaint of the corresponding new work order data of second output vector is inclined to Classification is high-risk complaint tendency.

Referring to structural schematic diagram shown in Fig. 2, the complaint trend analysis that the embodiment of the present application provides a kind of work order data is pre- Alarm device, comprising:

Module 100 is obtained, for obtaining work order data, as unit of the identical work order data of work order odd numbers, to each person in servitude The work order data for belonging to the same unit are segmented, and first participle combination is obtained；

Removing module 200 obtains the second of the work order data for deleting the stop words in the first participle combination Participle combination；

Term vector generation module 300, for generating in the second participle portmanteau word each point using word2vec model The term vector of word；

Sentence vector generation module 400 will be described flat for solving the average vector of the corresponding term vector of the work order data Equal sentence vector of the vector as the work order data；

Division module 500, for the sentence vector of each work order data to be divided into three using k-means algorithm Cluster, three clusters correspond to three complaints tendency classification of the work order data, wherein three complaints tendency classification difference Are as follows: high-risk complaint tendency has complaint to be inclined to and be inclined to without complaining；

Disaggregated model generation module 600, for being respectively set corresponding for each work order data for complaining tendency classification One output vector, and using the sentence vector of the work order data as the first input vector, it is generated and is thrown using Softmax logistic regression Tell tendency disaggregated model；

Judgment module 700, for complaining tendency disaggregated model to judge that class is inclined in the complaint of new work order data using described Not；

Warning module 800, for determining that the complaint of the new work order data is inclined to classification as height in the judgment module In the case that danger complains tendency or has complaint to be inclined to, early warning is made.

Optionally, the term vector generation module includes:

Optionally, the division module includes:

Optionally, the judgment module includes:

It is required that those skilled in the art can be understood that the technology in the embodiment of the present invention can add by software The mode of general hardware platform realize.Based on this understanding, the technical solution in the embodiment of the present invention substantially or Say that the part that contributes to existing technology can be embodied in the form of software products, which can deposit Storage is in storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions are used so that computer equipment (can be with It is personal computer, server or the network equipment etc.) execute certain part institutes of each embodiment of the present invention or embodiment The method stated.

Same and similar part may refer to each other between each embodiment in this specification.Implement especially for device For example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring in embodiment of the method Explanation.

Combine detailed description and exemplary example that the application is described in detail above, but these explanations are simultaneously It should not be understood as the limitation to the application.It will be appreciated by those skilled in the art that without departing from the application spirit and scope, A variety of equivalent substitution, modification or improvements can be carried out to technical scheme and embodiments thereof, these each fall within the application In the range of.The protection scope of the application is determined by the appended claims.

Claims

1. a kind of complaint trend analysis method for early warning of work order data characterized by comprising

Work order data are obtained, as unit of the identical work order data of work order odd numbers, to each work order for being under the jurisdiction of the same unit Data are segmented, and first participle combination is obtained；

The average vector for solving the corresponding term vector of the work order data, using the average vector as the sentence of the work order data Vector；

The sentence vector of each work order data is divided into three clusters using k-means algorithm, described in three clusters are corresponding Classification is inclined in three complaints of work order data, wherein three complaints tendency classification is respectively as follows: high-risk complaint tendency, has throwing It tells tendency and is inclined to without complaining；

Corresponding first output vector is respectively set for each work order data for complaining tendency classification, and by the work order data Sentence vector is generated using Softmax logistic regression as the first input vector and complains tendency disaggregated model；

If the complaint tendency classification of the new work order data is that high-risk complaint is inclined to or has complaint to be inclined to, early warning is made.

2. the method according to claim 1, wherein described generate second participle using word2vec model The term vector of each second participle in portmanteau word, comprising:

The frequency that second participle occurs is calculated according to the following formula, and whether the frequency for judging that second participle occurs is big In the first preset threshold:

Wherein, P (w_i) it is the frequency that the second participle occurs, f (w_i) it is the second frequency of occurrence segmented, w_iFor the second participle, i= 1,2,3...x, x are the quantity of the second participle, and n is the second preset threshold；

If the frequency that second participle occurs is greater than the first preset threshold, it is determined that frequency is greater than the second of the first preset threshold Participle is that high frequency segments, and high frequency participle is rejected from the second participle combination, will reject the after high frequency participle Two participle combinations are as third participle combination；

3. the method according to claim 1, wherein described utilize k-means algorithm by each work order number According to sentence vector be divided into three clusters, three clusters correspond to three complaints tendency classification of the work order data, comprising:

Step 301, using k-means algorithm, three sentence vectors are randomly selected respectively as the center of three clusters, by described three The center of cluster is denoted as C respectively₁、C₂And C₃；

Step 302, calculate separately the Euclidean distance between each sentence vector and the center of three clusters, it is determining with it is each The nearest C of the Euclidean distance of the sentence vector_i, and the sentence vector is referred to C_iCorresponding cluster, wherein i=1,2,3；

Step 303, the mean value of each dimension of all vectors in each cluster is calculated, the vector that the mean value is formed is as institute State the new center of cluster；

Step 304, whether the new center for judging the cluster is consistent with the center of the cluster randomly selected, if inconsistent, 302 operation is returned to step, until the new center of each cluster is consistent with the preceding center once calculated, and by the cluster New center is as target's center.

4. the method according to claim 1, wherein judging new work order using complaint tendency disaggregated model Classification is inclined in the complaint of data, comprising:

Generate sentence vector corresponding with the new work order data；

Using the corresponding sentence vector of the new work order data as second input vector for complaining tendency disaggregated model, obtain The second output vector corresponding with second input vector；

By second output vector compared with first output vector, obtain and second output vector corresponding first Corresponding first output vector of second output vector is made target output vector by output vector；

It determines that classification is inclined in complaint corresponding with the target output vector, and the corresponding complaint of the target output vector is inclined Classification is inclined in complaint to classification as the new work order data.

5. a kind of complaint trend analysis prior-warning device of work order data characterized by comprising

Obtain module, for obtaining work order data, as unit of the identical work order data of work order odd numbers, to it is each be under the jurisdiction of it is same The work order data of a unit are segmented, and first participle combination is obtained；

Removing module obtains the second participle group of the work order data for deleting the stop words in the first participle combination It closes；

Term vector generation module, for using word2vec model generate it is described second participle portmanteau word in each participle word to Amount；

Sentence vector generation module, for solving the average vector of the corresponding term vector of the work order data, by the average vector Sentence vector as the work order data；

Division module, for the sentence vector of each work order data to be divided into three clusters using k-means algorithm, described three A cluster corresponds to three complaints tendency classification of the work order data, wherein three complaints tendency classification is respectively as follows: high-risk throwing It tells tendency, there is complaint to be inclined to and be inclined to without complaining；

Disaggregated model generation module, for for it is each complain tendency classification work order data be respectively set corresponding first export to Amount, and using the sentence vector of the work order data as the first input vector, it is generated using Softmax logistic regression and complains tendency point Class model；

Warning module is that high-risk complaint is inclined for determining that classification is inclined in the complaint of the new work order data in the judgment module To or have complaint be inclined in the case where, make early warning.

6. device according to claim 5, which is characterized in that the term vector generation module includes:

First judging unit, the frequency occurred for calculating second participle according to the following formula, and judge described second point Whether the frequency that word occurs is greater than the first preset threshold:

Culling unit, the frequency for determining that second participle occurs in first judging unit are greater than the first preset threshold In the case where, the second participle for determining that frequency is greater than the first preset threshold is that high frequency segments, and the high frequency is segmented from described It is rejected in second participle combination, the second participle combination after high frequency segments will be rejected and combined as third participle；

Training pattern construction unit, for using third participle group described in the skip-gram model construction in word2vec model The training pattern of conjunction；

First generation unit generates the term vector of each participle in the third participle combination for utilizing the training pattern.

7. device according to claim 5, which is characterized in that the division module includes:

Selection unit randomly selects three sentence vectors respectively as the center of three clusters, by institute for utilizing k-means algorithm The center for stating three clusters is denoted as C respectively₁、C₂And C₃；

First computing unit, for calculating separately the Euclidean distance between each sentence vector and the center of three clusters, The determining C nearest with the Euclidean distance of each sentence vector_i, and the sentence vector is referred to C_iCorresponding cluster, wherein i= 1,2,3；

Second computing unit forms the mean value for calculating the mean value of each dimension of all vectors in each cluster New center of the vector as the cluster；

Second judgment unit, whether the new center for judging the cluster is consistent with the center of the cluster randomly selected, if It is inconsistent, then the operation for executing first computing unit is returned to, until the new center of each cluster and preceding once calculating The heart is consistent, and using the new center of the cluster as target's center.

8. device according to claim 5, which is characterized in that the judgment module includes:

First acquisition unit, for being inclined to disaggregated model for the corresponding sentence vector of the new work order data as described complain Second input vector obtains the second output vector corresponding with second input vector；

Second acquisition unit, for compared with first output vector, obtaining and described second second output vector Corresponding first output vector of output vector, by corresponding first output vector of second output vector make target export to Amount；

Determination unit, for determining corresponding with target output vector complaint tendency classification, and by the target export to Measure the corresponding complaint tendency classification for complaining tendency classification as the new work order data.