CN109710766B

CN109710766B - Complaint tendency analysis early warning method and device for work order data

Info

Publication number: CN109710766B
Application number: CN201811631912.3A
Authority: CN
Inventors: 杨政; 刘柱揆; 尹春林; 潘侃; 朱华
Original assignee: Electric Power Research Institute of Yunnan Power Grid Co Ltd
Current assignee: Electric Power Research Institute of Yunnan Power Grid Co Ltd
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2023-01-20
Anticipated expiration: 2038-12-29
Also published as: CN109710766A

Abstract

The application provides a complaint tendency analysis early warning method and device for work order data, wherein the method comprises the following steps: performing word segmentation and stop word deletion on each worksheet data to obtain a second word segmentation combination; generating a word vector by using a word2vec model; solving sentence vectors corresponding to the work order data; dividing each sentence vector into three clusters by using a k-means algorithm; generating a complaint tendency classification model by utilizing Softmax logistic regression; judging the complaint tendency category of the new work order data by using a complaint tendency classification model; and if the complaint tendency category of the new work order data is a high-risk complaint tendency or has a complaint tendency, giving an early warning. According to the method, a complaint tendency classification model is generated by using a large amount of work order data, the complaint tendency classification is predicted for the new work order data on the basis of the complaint tendency classification model, and the purpose of timely and active early warning is achieved according to the prediction result, so that the problem that the existing manual analysis method is low in efficiency is solved.

Description

Complaint tendency analysis early warning method and device for work order data

Technical Field

The application relates to the technical field of big data application, in particular to a complaint tendency analysis and early warning method and device for work order data.

Background

With the rapid development and the universal application of the informatization construction of the power industry, the informatization full coverage of all levels of units and all service departments of the power enterprise is basically realized. The 95598 customer service system is an important window for communication between the power enterprise and customers, a large amount of unstructured work order data are accumulated in the system, and the power enterprise learns the intention and attitude of the customers according to the content of the work order data and improves the service quality.

Because the quantity of work order data is numerous and the emergency degree of each work order data is different, for the work order data with higher emergency degree, if the electric power enterprise does not process the work order data in time, the work order data may be complained by customers. In order to reduce the risk of complaints of customers, the power enterprise needs to analyze the work order data, classify the complaint tendency of the work order data, and give an early warning to the work order data with higher complaint tendency grade, so that the power enterprise can take measures quickly, predictably and pertinently according to the early warning condition.

The complaint tendency analysis of the existing work order data is still in a stage of manual analysis, and the manual analysis cannot timely and effectively process the work order data, so that the problem of low efficiency of the existing analysis method is caused.

Disclosure of Invention

The application provides a complaint tendency analysis and early warning method and device for work order data, and aims to solve the problem that an existing manual analysis method is low in efficiency.

In a first aspect of the present application, a complaint tendency analysis and early warning method for work order data is provided, which includes:

acquiring work order data, and taking the work order data with the same work order number as a unit, and performing word segmentation on the work order data belonging to the same unit to obtain a first word segmentation combination;

deleting stop words in the first word segmentation combination to obtain a second word segmentation combination of the work order data;

generating a word vector of each participle in the second participle combination word by using a word2vec model;

solving an average vector of word vectors corresponding to the work order data, and taking the average vector as a sentence vector of the work order data;

dividing sentence vectors of each work order data into three clusters by using a k-means algorithm, wherein the three clusters correspond to three complaint tendency categories of the work order data, and the three complaint tendency categories are respectively as follows: high risk complaints, complaints and no complaints;

respectively setting corresponding first output vectors for the work order data of each complaint tendency category, taking the sentence vector of the work order data as a first input vector, and generating a complaint tendency classification model by utilizing Softmax logistic regression;

judging the complaint tendency category of the new work order data by using the complaint tendency classification model;

and if the complaint tendency category of the new work order data is a high-risk complaint tendency or has a complaint tendency, giving an early warning.

Optionally, the generating, by using the word2vec model, a word vector of each second participle in the second participle combination word includes:

calculating the frequency of the second participle according to the following formula, and judging whether the frequency of the second participle is greater than a first preset threshold value:

wherein P (wi) is the frequency of occurrence of the second participle, f (wi) is the frequency of occurrence of the second participle, wi is the second participle, i =1,2,3.. X, x is the number of the second participles, and n is a second preset threshold;

if the frequency of the second participle is greater than a first preset threshold value, determining the second participle with the frequency greater than the first preset threshold value as a high-frequency participle, removing the high-frequency participle from the second participle combination, and taking the second participle combination after the high-frequency participle is removed as a third participle combination;

adopting a skip-gram model in a word2vec model to construct a training model of the third participle combination;

and generating word vectors of all the participles in the third participle combination by utilizing the training model.

Optionally, the dividing, by using a k-means algorithm, the sentence vector of each work order data into three clusters, where the three clusters correspond to three complaint tendency categories of the work order data, and the method includes:

step 301, randomly selecting three sentence vectors as centers of three clusters respectively by using a k-means algorithm, and recording the centers of the three clusters as C1, C2 and C3 respectively;

step 302, respectively calculating Euclidean distances between each sentence vector and the centers of the three clusters, determining Ci closest to the Euclidean distance of each sentence vector, and classifying the sentence vectors into the clusters corresponding to Ci, wherein i =1,2,3;

step 303, calculating the mean value of each dimension of all sentence vectors in each cluster, and taking the vector formed by the mean values as a new center of the cluster;

and 304, judging whether the new center of the cluster is consistent with the center of the randomly selected cluster, if not, returning to execute the operation of 302 until the new center of each cluster is consistent with the previously calculated center, and taking the new center of the cluster as a target center.

Optionally, the method for determining a complaint tendency category of new work order data by using the complaint tendency classification model includes:

generating a sentence vector corresponding to the new work order data;

taking the sentence vector corresponding to the new work order data as a second input vector of the complaint tendency classification model, and acquiring a second output vector corresponding to the second input vector;

comparing the second output vector with the first output vector to obtain a first output vector corresponding to the second output vector, and making the first output vector corresponding to the second output vector into a target output vector;

and determining a complaint tendency category corresponding to the target output vector, and using the complaint tendency category corresponding to the target output vector as the complaint tendency category of the new work order data.

The second aspect of the application provides a complaint tendency analysis early warning device of work order data, includes:

the acquisition module is used for acquiring the work order data, and segmenting each work order data belonging to the same unit by taking the work order data with the same work order number as a unit to obtain a first segmented word combination;

the deleting module is used for deleting stop words in the first word segmentation combination to obtain a second word segmentation combination of the worksheet data;

a word vector generating module, configured to generate a word vector of each participle in the second participle combination word by using a word2vec model;

the sentence vector generation module is used for solving an average vector of word vectors corresponding to the work order data and taking the average vector as a sentence vector of the work order data;

a dividing module, configured to divide the sentence vector of each work order data into three clusters by using a k-means algorithm, where the three clusters correspond to three complaint tendency categories of the work order data, where the three complaint tendency categories are: high risk complaints, complaints and no complaints;

the classification model generation module is used for setting corresponding first output vectors for the work order data of each complaint tendency category, taking the sentence vector of the work order data as a first input vector, and generating a complaint tendency classification model by utilizing Softmax logistic regression;

the judging module is used for judging the complaint tendency category of the new work order data by utilizing the complaint tendency classification model;

and the early warning module is used for giving an early warning when the judging module determines that the complaint tendency category of the new work order data is a high-risk complaint tendency or has a complaint tendency.

Optionally, the word vector generating module includes:

a first judging unit, configured to calculate a frequency of occurrence of the second participle according to the following formula, and judge whether the frequency of occurrence of the second participle is greater than a first preset threshold:

p (wi) is the frequency of occurrence of the second participle, f (wi) is the frequency of occurrence of the second participle, wi is the second participle, i =1,2,3.. X, x is the number of the second participles, and n is a second preset threshold;

the removing unit is used for determining that the second participle with the frequency higher than a first preset threshold value is a high-frequency participle under the condition that the first judging unit determines that the frequency of the second participle is higher than the first preset threshold value, removing the high-frequency participle from the second participle combination, and taking the second participle combination after the high-frequency participle is removed as a third participle combination;

the training model building unit is used for building a training model of the third participle combination by adopting a skip-gram model in a word2vec model;

and the first generating unit is used for generating word vectors of all the participles in the third participle combination by utilizing the training model.

Optionally, the dividing module includes:

the selecting unit is used for randomly selecting three sentence vectors as the centers of the three clusters respectively by using a k-means algorithm, and recording the centers of the three clusters as C1, C2 and C3 respectively;

a first calculating unit, configured to calculate euclidean distances between the sentence vectors and centers of the three clusters, respectively, determine Ci closest to the euclidean distances of the sentence vectors, and classify the sentence vectors into clusters corresponding to Ci, where i =1,2,3;

the second calculation unit is used for calculating the mean value of all dimensions of all sentence vectors in each cluster, and taking the vector formed by the mean value as a new center of the cluster;

and the second judging unit is used for judging whether the new center of the cluster is consistent with the center of the randomly selected cluster, if not, the operation of the first calculating unit is returned to be executed until the new center of each cluster is consistent with the previously calculated center, and the new center of the cluster is used as a target center.

Optionally, the determining module includes:

a second generation unit, configured to generate a sentence vector corresponding to the new work order data;

a first obtaining unit, configured to obtain a second output vector corresponding to a second input vector by using a sentence vector corresponding to the new work order data as the second input vector of the complaint tendency classification model;

a second obtaining unit, configured to compare the second output vector with the first output vector, obtain a first output vector corresponding to the second output vector, and make the first output vector corresponding to the second output vector as a target output vector;

and a determining unit, configured to determine a complaint tendency category corresponding to the target output vector, and use the complaint tendency category corresponding to the target output vector as the complaint tendency category of the new work order data.

According to the technical scheme, the application provides a complaint tendency analysis and early warning method and device for work order data, wherein the method comprises the following steps: acquiring work order data, and taking the work order data with the same work order number as a unit, and performing word segmentation on each work order data belonging to the same unit to obtain a first word segmentation combination; deleting stop words in the first word segmentation combination to obtain a second word segmentation combination of the work order data; generating a word vector of each participle in the second participle combination word by using a word2vec model; solving an average vector of word vectors corresponding to the work order data to obtain a sentence vector corresponding to the work order data; dividing sentence vectors of each work order data into three clusters by using a k-means algorithm; respectively setting corresponding first output vectors for the work order data of each complaint tendency category, taking the sentence vector of the work order data as a first input vector, and generating a complaint tendency classification model by utilizing Softmax logistic regression; judging the complaint tendency category of the new work order data by using the complaint tendency classification model; and if the complaint tendency category of the new work order data is a high-risk complaint tendency or has a complaint tendency, giving an early warning.

According to the method, a complaint tendency classification model is generated by using a large amount of work order data, the complaint tendency classification of new work order data is predicted on the basis of the complaint tendency classification model, and the purpose of timely and active early warning is achieved according to the prediction result, so that the problem that the existing manual analysis method is low in efficiency is solved.

Drawings

In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments are briefly described below, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a flowchart illustrating a method for analyzing and warning complaint tendencies of work order data according to an embodiment of the present disclosure;

fig. 2 is a flowchart of a method for generating word vectors in a method for analyzing and warning complaint tendencies of work order data according to an embodiment of the present disclosure;

fig. 3 is a flowchart of a method for dividing a sentence vector into three clusters in the method for analyzing and warning complaint tendencies of work order data according to the embodiment of the present disclosure;

fig. 4 is a flowchart of a method for determining a complaint tendency category of new work order data in a complaint tendency analysis and early warning method for work order data according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a complaint tendency analysis and early warning device for work order data according to an embodiment of the present disclosure.

Detailed Description

In order to solve the problem that the existing manual analysis method is low in efficiency, the application provides a complaint tendency analysis and early warning method and device for work order data.

Referring to a work flow chart shown in fig. 1, an embodiment of the present application provides a method for analyzing and warning complaint tendencies of work order data, including the following steps:

step 101, acquiring work order data, and taking the work order data with the same work order number as a unit, performing word segmentation on each work order data belonging to the same unit to obtain a first word segmentation combination.

In the step, a work order text in a real-time or offline mode is acquired through a 95598 client service system, and the work order text is used as work order data, wherein the work order text comprises the following fields: the system comprises a work order number, incoming call content, an applicant, a mobile phone number, a receiver, emergency degree, a user address, start time, work order creation time, work order end time, a current link and a work order type. Wherein, the work order type can include 7 types, are respectively: raise, troubleshooting, advice, reporting, complaints, business consultation, and opinions.

In an implementation mode, the work order data with the same work order number is taken as a unit, python language is used, and the work order data is subjected to word segmentation by adopting an accurate mode under the support of a jieba tool kit. In the process of segmenting the work order data, an industry dictionary related to the power industry is defined, for example, the industry dictionary related to the power industry is defined as: the words of finance month, power cut and transmission, air start, power cut and transmission point, etc.

And 102, deleting stop words in the first word segmentation combination to obtain a second word segmentation combination of the work order data.

In an implementation mode, a Harmony large stop word library is used for processing the first word segmentation combination, and stop words in the first word segmentation combination are deleted to obtain a second word segmentation combination.

And 103, generating a word vector of each participle in the second participle combination word by using a word2vec model.

The Word2vec models are a group of related models used to generate Word vectors, which are shallow and double-layered neural networks used to train to reconstruct the linguistic Word text. The neural network is represented in word text and the input words in adjacent positions are guessed, and the order of the word text is unimportant under the assumption of a bag-of-words model in the word2vec model. After training is complete, the word2vec model may be used to map each word text to a vector representing word-to-word relationships, which is a hidden layer of the neural network. In the embodiment of the application, each participle in the second participle combination Word is used as a Word text of the Word2vec model to generate a Word vector.

A word vector is a vector that maps words or phrases of a vocabulary to real numbers. Conceptually, it involves mathematical embedding from a one-dimensional space of each word to a continuous vector space with lower dimensions. In the embodiment of the present application, the second word segmentation group is a vocabulary table.

And 104, solving an average vector of the word vectors corresponding to the work order data, and taking the average vector as a sentence vector of the work order data.

A sentence vector is a vector that maps a sentence or paragraph to real numbers. In an implementation mode, summing word vectors corresponding to each work order data according to the word vectors to obtain summed word vectors; the summed word vectors are then averaged to generate a sentence vector for each work order datum.

In the step, a second phrase corresponding to one piece of work order data is assumed to be combined as a 'customer consultant user number raising problem', if the phrase 'customer' corresponds to a vector of [1, 0]; the vector corresponding to the participle "consult" is [0,1, 0]; the vector corresponding to the participle "house number" is [0,1, 0]; the vector corresponding to the participle "ascending position" is [0,1, 0]; the vector corresponding to the participle "problem" is [0, 1], the sentence vector corresponding to the piece of work order data should be [0.2,0.2 ].

Step 105, dividing the sentence vector of each work order data into three clusters by using a k-means algorithm, wherein the three clusters correspond to three complaint tendency categories of the work order data, and the three complaint tendency categories are respectively as follows: tendency to complaint at high risk, tendency to complaint, and tendency to non-complaint.

The k-means algorithm is an algorithm that inputs the number k of clusters, and a database containing n data objects, and outputs k clusters satisfying a minimum variance criterion. The k-means algorithm accepts an input k; the n data objects are then divided into k clusters so that the obtained clusters satisfy: the similarity of objects in the same cluster is higher; while the object similarity in different clusters is smaller. In the embodiment of the application, the sentence vector corresponding to each work order data is used as a data object, and the value of the clustering number k is 3.

And 106, respectively setting corresponding first output vectors for the work order data of each complaint tendency category, taking the sentence vector of the work order data as a first input vector, and generating a complaint tendency classification model by utilizing Softmax logistic regression.

The softmax logistic regression model is a popularization of logistic regression model in multi-classification problems, in which a class label y can take more than two values. The Softmax logistic regression includes a parameter set, in the embodiment of the present application, a gradient descent algorithm is used to optimize parameters in the parameter set, and when the learning rate learning _ rate is set to 0.1.

In one implementation, the present application provides a method for generating a complaint tendency classification model by Softmax logistic regression, including the following steps:

(1) Using Python language, with the support of tensoflow toolkit, specifying a hypothesis function as hypothesis = tf.nn.softmax (tf.matmul (X, W) + b), where X is a sentence vector, W is a weight, and b is an offset;

(2) Using Python language, with the support of the tensoflow toolkit, cross entropy is used as the objective function, noted as: cost = tf.reduce _ mean (-tf.reduce _ sum (Y × tf.log, axis = 1));

(3) Using Python language, optimizing the weight parameters using gradient descent with the support of tensoflow toolkit, optimer = tf. Train. Gradientdescnti izer (learning _ rate = 0.1.) minimize (cost);

(4) And (3) generating a classification model of the complaint tendency by using the work order data with the first output vector as training data through the steps (1), (2) and (3).

And step 107, judging the complaint tendency type of the new work order data by using the complaint tendency classification model.

And step 108, if the complaint tendency category of the new work order data is a high-risk complaint tendency or has a complaint tendency, giving an early warning.

For example, the content of one piece of work order data is as follows: if the grade predicted by the classification model is a high-risk complaint tendency, the complaint risk is present, and early warning is immediately given, and related business departments are informed to deal with the complaint risk.

According to the technical scheme, the complaint tendency classification model is generated by using a large amount of work order data, the complaint tendency classification is predicted for the new work order data on the basis of the complaint tendency classification model, and the purpose of timely and active early warning is achieved according to the prediction result, so that the problem of low efficiency of the existing manual analysis method is solved.

Referring to a work flow chart shown in fig. 2, the generating a word vector of each second participle in the second participle combination word by using the word2vec model includes the following steps:

step 201, calculating the frequency of the second participle according to the following formula, and determining whether the frequency of the second participle is greater than a first preset threshold:

p (wi) is the frequency of occurrence of the second participle, f (wi) is the frequency of occurrence of the second participle, wi is the second participle, i =1,2,3.

Step 202, if the frequency of the second participle is greater than a first preset threshold, determining that the second participle with the frequency greater than the first preset threshold is a high-frequency participle, removing the high-frequency participle from the second participle combination, and taking the second participle combination after the high-frequency participle is removed as a third participle combination.

In one implementation, n is 1e-5. The first preset threshold is 0.8, and at this time, when the P (wi) corresponding to the second participle is more than or equal to 0.8, the second participle is a high-frequency participle, and the second participle is deleted.

And step 203, adopting a skip-gram model in the word2vec model to construct a training model of the third participle combination.

And constructing training data for each third participle in the third participle combination, wherein the format of the training data is (input participle, output participle). Firstly, finding out the context of each third participle from the third participle combination by using skip _ window = 2; second, based on the context, training data is constructed. For example, suppose that the third component corresponding to one piece of work order data is combined as: the client consultant account number upscaling problem generates training data including, if the account number is an input word segmentation: (house number, customer); (account number, consult); (house number, ascending) and (house number, question).

Constructing a training model by using a three-layer fully-connected neural network, wherein the neural network comprises: the method comprises the steps of an input layer, a hidden layer and an output layer, training a model from training data, and updating weights by adopting a negative sampling mode (negative sampling). Wherein the hidden layer contains 100 neurons.

And step 204, generating word vectors of all the participles in the third participle combination by using the training model.

Referring to a work flow chart shown in fig. 3, the dividing of the sentence vector of each work order data into three clusters by using a k-means algorithm, where the three clusters correspond to three complaint tendency categories of the work order data, includes the following steps:

step 301, randomly selecting three sentence vectors as centers of three clusters respectively by using a k-means algorithm, and marking the centers of the three clusters as C1, C2 and C3 respectively.

Step 302, respectively calculating Euclidean distances between each sentence vector and the centers of the three clusters, determining Ci closest to the Euclidean distance of each sentence vector, and classifying the sentence vectors into the clusters corresponding to Ci, wherein i =1,2,3.

Step 303, calculating the mean value of each dimension of all sentence vectors in each cluster, and taking the vector formed by the mean values as the new center of the cluster.

And 304, judging whether the new center of the cluster is consistent with the center of the cluster selected randomly, if not, returning to execute the operation of the step 302 until the new center of each cluster is consistent with the previously calculated center.

Step 305, the new center of the cluster is taken as the target center.

Referring to the workflow diagram shown in fig. 4, the method for determining a complaint tendency category of new work order data by using the complaint tendency classification model includes the following steps:

step 401, generating a sentence vector corresponding to the new work order data.

Step 402, taking the sentence vector corresponding to the new work order data as a second input vector of the complaint tendency classification model, and obtaining a second output vector corresponding to the second input vector.

Step 403, comparing the second output vector with the first output vector, obtaining a first output vector corresponding to the second output vector, and making the first output vector corresponding to the second output vector as a target output vector.

Step 404, determining a complaint tendency category corresponding to the target output vector, and using the complaint tendency category corresponding to the target output vector as the complaint tendency category of the new work order data.

In the embodiment of the present application, if the 3 first output vectors are: [1,0,0,0,0], [0,0,1,0,0] and [0,0,0,0,1], wherein [1,0,0,0,0] represents a high risk complaint tendency, [0,0,1,0,0] represents a complaint tendency, and [0,0,0,0,1] represents no complaint tendency.

If the sentence vector of the new work order data corresponds to the second output vector of [1, 0], the first output vector corresponding to the second output vector is 1,0, the complaint tendency category of the new work order data corresponding to the second output vector is a high-risk complaint tendency.

Referring to a schematic structural diagram shown in fig. 2, an embodiment of the present application provides a complaint tendency analysis and early warning device for work order data, including:

the obtaining module 100 is configured to obtain work order data, and perform word segmentation on each work order data belonging to the same unit by using work order data with the same work order number as a unit to obtain a first word segmentation combination;

a deleting module 200, configured to delete stop words in the first word segmentation group to obtain a second word segmentation group of the work order data;

a word vector generating module 300, configured to generate a word vector of each participle in the second participle combination word by using a word2vec model;

a sentence vector generating module 400, configured to solve an average vector of word vectors corresponding to the work order data, and use the average vector as a sentence vector of the work order data;

a dividing module 500, configured to divide the sentence vector of each work order data into three clusters by using a k-means algorithm, where the three clusters correspond to three complaint tendency categories of the work order data, where the three complaint tendency categories are: high risk complaint tendency, complaint tendency and no complaint tendency;

the classification model generation module 600 is configured to set corresponding first output vectors for the work order data of each complaint tendency category, use the sentence vector of the work order data as a first input vector, and generate a complaint tendency classification model by Softmax logistic regression;

a judging module 700, configured to judge a complaint tendency category of the new work order data by using the complaint tendency classification model;

and the early warning module 800 is configured to make an early warning when the judging module determines that the complaint tendency category of the new work order data is a high-risk complaint tendency or has a complaint tendency.

Optionally, the word vector generating module includes:

the first judging unit is used for calculating the frequency of the second participle according to the following formula and judging whether the frequency of the second participle is greater than a first preset threshold value or not:

the removing unit is used for determining that the second participle with the frequency greater than the first preset threshold is a high-frequency participle under the condition that the first judging unit determines that the frequency of the second participle is greater than the first preset threshold, removing the high-frequency participle from the second participle combination, and taking the second participle combination after the high-frequency participle is removed as a third participle combination;

and the first generation unit is used for generating word vectors of all participles in the third participle combination by using the training model.

Optionally, the dividing module includes:

the selecting unit is used for randomly selecting three sentence vectors as the centers of three clusters respectively by using a k-means algorithm, and marking the centers of the three clusters as C1, C2 and C3 respectively;

and the second judging unit is used for judging whether the new center of the cluster is consistent with the center of the cluster selected at random or not, if not, the operation of the first calculating unit is executed in a return mode until the new center of each cluster is consistent with the previously calculated center, and the new center of the cluster is taken as a target center.

Optionally, the determining module includes:

a second generation unit configured to generate a sentence vector corresponding to the new work order data;

Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented using software plus any required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

The same and similar parts in the various embodiments in this specification may be referred to each other. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference may be made to the description in the method embodiment for relevant points.

The present application has been described in detail with reference to particular embodiments and illustrative examples, but the description is not intended to be construed as limiting the application. Those skilled in the art will appreciate that various equivalent substitutions, modifications or improvements may be made to the presently disclosed embodiments and implementations thereof without departing from the spirit and scope of the present disclosure, and these fall within the scope of the present disclosure. The protection scope of this application is subject to the appended claims.

Claims

1. A complaint tendency analysis early warning method of work order data is characterized by comprising the following steps:

acquiring work order data, and taking the work order data with the same work order number as a unit, and performing word segmentation on each work order data belonging to the same unit to obtain a first word segmentation combination;

dividing sentence vectors of each work order data into three clusters by using a k-means algorithm, wherein the three clusters correspond to three complaint tendency categories of the work order data, and the three complaint tendency categories are respectively as follows: high risk complaint tendency, complaint tendency and no complaint tendency;

2. The method of claim 1, wherein generating a word vector for each second participle in the second participle combination word using a word2vec model comprises:

wherein, P (w) _i ) Frequency of occurrence of the second participle, f (w) _i ) Frequency of occurrence of the second participle, w _i Is a second participle, i =1,2,3.. X, x being the number of second participles, n being a second preset threshold;

if the frequency of the second participle is greater than a first preset threshold, determining the second participle with the frequency greater than the first preset threshold as a high-frequency participle, removing the high-frequency participle from the second participle combination, and taking the second participle combination after the high-frequency participle is removed as a third participle combination;

constructing a training model of the third participle combination by adopting a skip-gram model in a word2vec model;

3. The method of claim 1, wherein the dividing of the sentence vectors of each of the work order data into three clusters using a k-means algorithm, the three clusters corresponding to three complaint propensity categories for the work order data comprises:

step 301, randomly selecting three sentence vectors as centers of three clusters respectively by using a k-means algorithm, and recording the centers of the three clusters as C ₁ 、C ₂ And C ₃ ；

Step 302, respectively calculating Euclidean distances between each sentence vector and the centers of the three clusters, and determining C closest to the Euclidean distance of each sentence vector _i And classifying the sentence vector into C _i A corresponding cluster, wherein i =1,2,3;

and 304, judging whether the new center of the cluster is consistent with the center of the cluster selected randomly, if not, returning to execute the operation of the step 302 until the new center of each cluster is consistent with the previously calculated center, and taking the new center of the cluster as a target center.

4. The method of claim 1, wherein determining the complaint propensity category for new work order data using the complaint propensity classification model comprises:

generating a sentence vector corresponding to the new work order data;

5. The utility model provides a complaint trend analysis early warning device of work order data which characterized in that includes:

the acquisition module is used for acquiring the work order data, and segmenting each work order data belonging to the same unit by taking the work order data with the same work order number as a unit to obtain a first segmentation combination;

the deleting module is used for deleting stop words in the first word segmentation combination to obtain a second word segmentation combination of the work order data;

a dividing module, configured to divide the sentence vectors of each work order data into three clusters by using a k-means algorithm, where the three clusters correspond to three complaint tendency categories of the work order data, where the three complaint tendency categories are: high risk complaint tendency, complaint tendency and no complaint tendency;

the classification model generation module is used for respectively setting corresponding first output vectors for the work order data of each complaint tendency category, taking the sentence vector of the work order data as a first input vector and generating a complaint tendency classification model by utilizing Softmax logistic regression;

6. The apparatus of claim 5, wherein the word vector generation module comprises:

wherein, P (w) _i ) Frequency of occurrence of the second participle, f (w) _i ) Frequency of occurrence of the second sub-word, w _i Is a second participle, i =1,2,3.. X, x being the number of second participles, n being a second preset threshold;

7. The apparatus of claim 5, wherein the partitioning module comprises:

a selecting unit for randomly selecting three sentence vectors as the centers of the three clusters respectively by using a k-means algorithm, and recording the centers of the three clusters as C ₁ 、C ₂ And C ₃ ；

A first calculating unit, configured to calculate euclidean distances between the sentence vectors and the centers of the three clusters, and determine C closest to the euclidean distances of the sentence vectors _i And classifying the sentence vector into C _i A corresponding cluster, wherein i =1,2,3;

8. The apparatus of claim 5, wherein the determining module comprises:

a first obtaining unit, configured to take a sentence vector corresponding to the new work order data as a second input vector of the complaint tendency classification model, and obtain a second output vector corresponding to the second input vector;