CN108897754B

CN108897754B - Big data-based work order type identification method and system and computing device

Info

Publication number: CN108897754B
Application number: CN201810427330.7A
Authority: CN
Inventors: 李炯城; 吴佩娥; 李玥; 关晓明; 管学锋; 陈运动
Original assignee: Guangdong Communications Services Co Ltd; China Communications Services Corp Ltd; Guangdong Planning and Designing Institute of Telecommunications Co Ltd
Current assignee: Guangdong Communications Services Co Ltd; China Communications Services Corp Ltd; Guangdong Planning and Designing Institute of Telecommunications Co Ltd
Priority date: 2018-05-07
Filing date: 2018-05-07
Publication date: 2020-12-11
Anticipated expiration: 2038-05-07
Also published as: CN108897754A

Abstract

The invention relates to a big data-based work order type identification method, a big data-based work order type identification system and a big data-based work order type identification computing device, wherein the method comprises the following steps: acquiring a work order to be classified, and performing correlation analysis on a structured field in the work order to be classified to acquire a first correlation coefficient; if the first correlation coefficient exceeds a preset threshold value, removing corresponding structured fields in the work order to be classified, and acquiring a target work order to be classified; extracting a first feature word of a target work order to be classified, and respectively calculating corresponding probabilities when the work order type to be classified is different work order types under the condition that the first feature word appears by utilizing a pre-constructed naive Bayes model according to the first feature word; and judging the type of the work order to be classified according to the probability corresponding to different work order types. By adopting the method, the occupation of the storage space of the server and the reduction of the operation efficiency of the service system caused by the accumulation of a large number of work orders can be avoided.

Description

Big data-based work order type identification method and system and computing device

Technical Field

The invention relates to the technical field of data processing, in particular to a method, a system and a computing device for identifying a work order type based on big data.

Background

With the continuous improvement of the digitization degree of a telecommunication network, a telecommunication service forms a large-scale work order, the design types of the service work order are more and more, the identification difficulty of the work order types is more and more, the speed of manually identifying the work order types at the present stage cannot keep up with the speed of the rapid increase of the service, the accumulation of the work orders occupies a large amount of storage space of a server, and the operation efficiency of a service system is directly reduced.

Therefore, the traditional method for identifying the type of the work order manually cannot meet the increasing demand of telecommunication service in the new period, and a new means is required to be found to improve the work efficiency and the planning accuracy of the identification of the type of the work order.

Disclosure of Invention

Based on this, it is necessary to provide a method, a system and a computing device for identifying a work order type based on big data, aiming at the problem that accumulation of work orders occupies a large amount of storage space of a server, which directly results in reduction of operation efficiency of a business system.

A big data-based work order type identification method comprises the following steps:

acquiring a work order to be classified, and performing correlation analysis on a structured field in the work order to be classified to acquire a first correlation coefficient;

if the first correlation coefficient exceeds a preset threshold value, removing corresponding structured fields in the work order to be classified to obtain a target work order to be classified;

extracting a first feature word of the target work order to be classified, and respectively calculating corresponding probabilities when the work order type to be classified is different work order types under the condition that the first feature word appears by utilizing a pre-constructed naive Bayes model according to the first feature word;

and judging the type of the work order to be classified according to the probabilities.

In one embodiment, before the step of obtaining the work order to be classified, the method further includes the following steps:

acquiring a work order sample set, and randomly selecting a plurality of work order samples from the work order sample set to form a work order training set;

calculating corresponding prior probability when the types of the work order training samples in the work order training set are different work order types;

extracting a second characteristic word from the work order training samples in the work order training set, and constructing a training word matrix of a word bag model according to the second characteristic word;

acquiring the conditional probability of each second characteristic word under different work order type conditions according to the training word matrix;

and constructing a naive Bayes model according to the prior probability and the conditional probability.

In one embodiment, the step of obtaining a sample set of work orders comprises the steps of:

obtaining a plurality of work orders of different types, and respectively carrying out correlation analysis on the structured fields in each work order to obtain a second correlation coefficient;

and if the second correlation number exceeds a preset threshold value, removing the structured fields corresponding to the second correlation number in each work order, and obtaining a work order sample set.

In one embodiment, after the step of constructing the naive bayes model according to the prior probability and the conditional probability, the method further comprises the following steps:

randomly selecting a plurality of work order samples from the work order sample set to form a work order test set;

extracting a third feature word from the work order test sample in the work order test set;

respectively calculating corresponding probabilities of the work order test samples in different work order types under the condition that the third feature words appear according to the third feature words and by using the naive Bayes model, and acquiring identification results according to the corresponding probabilities of the different work order types;

obtaining the classification accuracy of the naive Bayes model according to the identification result and the type of the work order test sample;

and if the classification accuracy is lower than a preset threshold value, adjusting the naive Bayes model.

In one embodiment, the step of adjusting the naive bayes model comprises the steps of:

acquiring a work order test sample with a wrong identification result and extracting a fourth characteristic word;

adjusting the occurrence frequency of the corresponding feature words in the training word matrix according to the fourth feature words;

and adjusting the conditional probability in the naive Bayes according to the adjusted training word matrix.

In one embodiment, the step of extracting the first feature word of the target work order to be classified includes the following steps:

and performing word segmentation on the target work order to be classified by using a regular expression to obtain a first characteristic word.

A big data based work order type identification system, comprising:

the correlation analysis module is used for acquiring a work order to be classified, and performing correlation analysis on a structured field in the work order to be classified to acquire a first correlation coefficient;

the target work order obtaining module is used for removing the corresponding structured fields in the work orders to be classified to obtain the target work orders to be classified if the first correlation coefficient exceeds a preset threshold value;

the probability obtaining module is used for extracting a first feature word of the target work order to be classified, and respectively calculating corresponding probabilities when the work order types to be classified are different work order types under the condition that the first feature word appears by utilizing a pre-constructed naive Bayes model according to the first feature word;

and the work order type identification module is used for judging the type of the work order to be classified according to each probability.

In one embodiment, the system further comprises a naive Bayes model construction module, wherein the naive Bayes model construction module comprises a work order training set acquisition unit, a prior probability acquisition unit, a training matrix acquisition unit, a conditional probability acquisition unit and a model construction unit;

the work order training set acquisition unit is used for acquiring a work order sample set and randomly selecting a plurality of work order samples from the work order sample set to form a work order training set;

the prior probability obtaining unit is used for calculating the corresponding prior probability when the types of the work order training samples in the work order training set are different work order types;

the training matrix obtaining unit is used for extracting a second characteristic word from the work order training sample in the work order training set and constructing a training word matrix of the word bag model according to the second characteristic word;

the conditional probability acquiring unit is used for acquiring the conditional probability of each second feature word under different work order type conditions according to the training word matrix;

and the model construction unit is used for constructing a naive Bayes model according to the prior probability and the conditional probability.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

According to the work order type identification method, the work order type identification system, the computing equipment and the storage medium based on the big data, the correlation analysis is carried out on the structured fields of the work orders, the strong correlation fields are removed, the work orders can be quickly and accurately identified by using the pre-constructed naive Bayesian model, and occupation of the storage space of a server and reduction of the operation efficiency of a service system caused by a large amount of accumulated work orders are avoided.

Drawings

FIG. 1 is a flow chart of a big data based work order type identification method in one embodiment of the present invention;

FIG. 2 is a flow diagram of a naive Bayesian model construction in one embodiment of the invention;

FIG. 3 is a flow chart of testing a naive Bayes model in one embodiment of the invention;

FIG. 4 is a block diagram of a big data based work order type identification system in accordance with an embodiment of the present invention;

FIG. 5 is a schematic diagram of a big data based work order type identification system according to another embodiment of the present invention;

FIG. 6 is a block diagram of a naive Bayes model construction module in an embodiment of the invention;

FIG. 7 is a schematic structural diagram of a naive Bayesian model test module in an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the scope of the invention.

Referring to fig. 1, fig. 1 is a flowchart of a method for identifying a work order type based on big data in an embodiment of the present invention, where the method for identifying a work order type based on big data in the embodiment includes the following steps:

step S110: and acquiring a work order to be classified, and performing correlation analysis on the structured fields in the work order to be classified to acquire a first correlation coefficient.

In this step, the work order to be classified may be obtained directly from the service system or may be obtained by scanning a paper document, and usually the work order to be classified includes structured data and unstructured data, and the structured fields are designed to have certain redundancy, and the fields are numerous and have certain correlation.

After the work order to be classified is obtained, correlation analysis is carried out on a plurality of structured fields in the work order to be classified, and correlation coefficients among the structured fields are obtained.

Step S120: and if the first correlation coefficient exceeds a preset threshold value, removing the corresponding structured field in the work order to be classified, and acquiring the target work order to be classified.

In order to improve the information proportion contained in the structuralization of the work order to be classified and reduce the related calculation amount, the correlation coefficient between the structuralization fields is obtained through correlation analysis, and when the correlation coefficient between the structuralization fields exceeds the preset correlation coefficient threshold value, the structuralization field related to the correlation coefficient is judged to be a strong correlation field; and removing the strong correlation field from the work order to be classified to obtain the target work order to be classified.

Step S130: and extracting a first characteristic word of the target work order to be classified, and respectively calculating the corresponding probability when the work order type to be classified is different work order types under the condition that the first characteristic word appears by utilizing a pre-constructed naive Bayes model according to the first characteristic word.

Step S140: and judging the type of the work order to be classified according to the probabilities.

And for a pre-constructed naive Bayes model, words extracted from a target work order to be classified are input into the naive Bayes model as characteristic parameters, the pre-constructed naive Bayes model calculates and outputs corresponding probabilities when the work order types to be classified are different under the condition that the words appear, wherein the work order type corresponding to the maximum probability value is the work order type of the final work order to be classified.

According to the big data-based work order type identification method, the work order type of the work order to be classified is identified by using the naive Bayesian model, the work order identification is converted into a classification problem, the work order type can be identified quickly and accurately, and therefore an enterprise is helped to solve the batch work order identification problem, occupation of a server storage space and reduction of operation efficiency of a service system caused by a large amount of accumulated work orders are avoided, work efficiency is improved, and higher labor cost is not consumed.

In one embodiment, the step of extracting the first feature word of the target work order to be classified comprises the following steps:

In the embodiment, the character string in the target work order to be classified is subjected to character division through the regular expression, the word in the target work order to be classified is extracted and used as the characteristic parameter to be input into the naive Bayes model, and a basis is provided for subsequent work order type identification.

In one embodiment, the regular expression "\\ w \" is adopted to perform word division on the target work order to be classified, so that a first characteristic word is obtained, and the phenomenon that the character strings of the target work order to be classified are excessively subdivided to form meaningless words and influence the recognition result of the work order type is avoided. For example, the use of the regular representation "\ w _" divides "Dr. Li" into one word, rather than two words, "Dr" and "Li".

Referring to fig. 2, fig. 2 is a flow chart of constructing a naive bayes model in an embodiment of the invention. In this embodiment, before the step of obtaining the work order to be classified, the method further includes the following steps:

step S210: and acquiring a work order sample set, and randomly selecting a plurality of work order samples from the work order sample set to form a work order training set.

Step S220: and calculating the prior probability corresponding to different work order types of the work order training samples in the work order training set.

Step S230: and extracting second characteristic words from the work order training samples in the work order training set, and constructing a training word matrix of the word bag model according to the second characteristic words.

Step S240: and acquiring the conditional probability of each second characteristic word under different work order type conditions according to the training word matrix.

Step S250: and constructing a naive Bayes model according to the prior probability and the conditional probability.

Specifically, a work order sample set is stored in the system catalog, the work order sample set comprises a plurality of work order samples of each work order type, a plurality of work order samples are randomly selected from the work order sample set to form a work order training set, and corresponding prior probability P (C) when the types of the work order samples in the work order training set are different work order types is obtained₁)、P(C₂)、…、P(C_i) In which C is_iIs of the work order type. Extracting words w in all work order samples in work order training set_iAll the words are converted into lower case words, repetition is removed, a word list is obtained, and each word w in the word list is counted_iAnd generating a training word matrix of the bag-of-words model. For example, there exist a plurality of work order training sets, wherein the specific form after all words in two work order training samples are converted into lower case is as follows:

1, a work order: baby eat applet? eat!

A work order 2: say good bye, baby.

The training word matrix for the bag-of-words model generated from these two work orders is as follows:

	able	apple	Baby	bye	eat	good	say
								work order 1 vocabulary vector	0	1	1	0	2	0	0
Work order 2 vocabulary vector	0	0	1	1	0	1	1

After the training word matrix is obtained, the conditional probability P (w) of each characteristic word under different work order type conditions of each word is calculated according to the training word matrix_i|C_i) According to P (C)₁)、P(C₂)、…、P(C_i) And P (w)_i|C_i) And constructing a naive Bayes model.

The prior probability and the conditional probability parameters are obtained through the work order training set, a naive Bayes model is constructed according to the parameters, and the obtained recognition result is more accurate when the work order type recognition is carried out by using the naive Bayes model in the follow-up process.

obtaining a plurality of work orders of different types, and respectively carrying out correlation analysis on the structured fields in each work order to obtain a second correlation coefficient; and if the second correlation number exceeds a preset threshold value, removing the structural field corresponding to the second correlation number in each work order, and obtaining a work order sample set.

In this embodiment, a plurality of work orders that have been manually identified as work order types are collected, and since the machine learning algorithm processes unstructured data, the collected work orders include structured data, correlation analysis is performed on structured fields of the collected work orders, and fields with strong correlation are removed to obtain a work order sample set.

Referring to fig. 3, fig. 3 is a flow chart of testing a naive bayes model in an embodiment of the invention. In this embodiment, after the step of constructing the naive bayes model according to the prior probability and the conditional probability, the method further includes the following steps:

step S310: randomly selecting a plurality of work order samples from the work order sample set to form a work order test set;

step S320: extracting a third feature word from the work order test sample in the work order test set;

step S330: respectively calculating corresponding probabilities of different work order types of work order test samples under the condition that the third feature words appear according to the third feature words and by using a naive Bayes model, and acquiring recognition results according to the corresponding probabilities of the different work order types;

step S340: acquiring the recognition accuracy of the naive Bayes model according to the classification result and the type of the work order test sample;

step S350: and if the classification accuracy rate is lower than a preset threshold value, adjusting the naive Bayes model.

In the naive Bayes model test process, the corresponding probability P (C) when the work order test sample is different work order types is obtained by extracting the feature words in the work order test sample in the work order test set and calculating according to the feature words of the work order test sample by using the constructed naive Bayes model_iIf the classification result is inconsistent with the work order type of the work order test sample, the work order type is identified wrongly; and testing all work order test samples in the work order test set to obtain the identification accuracy of the naive Bayes model, and if the classification accuracy is lower than a preset threshold, adjusting the naive Bayes model to ensure the identification performance of the naive Bayes model on the work order type.

acquiring a work order test sample with a wrong identification result and extracting a fourth characteristic word; adjusting the occurrence frequency of the corresponding feature words in the training word matrix according to the fourth feature words; and adjusting the conditional probability in naive Bayes according to the adjusted training word matrix.

Obtaining a work order with a wrong recognition result, extracting characteristic words in the work order, adjusting the frequency of the words in a training word matrix of the bag-of-words model so as to adjust the weights of the words, and training according to the adjusted trainingThe word matrix retrieves the conditional probability P (C) of each feature word under different work order type conditions_iAnd w), so that the naive Bayes model is adjusted, and the identification performance of the naive Bayes model on the type of the work order is improved.

Furthermore, in the actual use process, when the work order type is identified and classified wrongly, the work order is returned in the next link, the naive Bayes model adjusts the word bag by utilizing the characteristic words in the work order with the wrong identification result, self-learning is realized, the self-learning capability and the self-adaptability of the naive Bayes model are improved, and the naive Bayes model has stronger practicability.

In order to make the technical solution of the present invention clearer, the following takes identification of the work order type of the telecommunications work order as an example, and further explains the identification method of the work order type based on big data in the embodiment of the present invention:

(1) obtaining a set of work order samples

The method comprises the steps of obtaining a paper-edition telecom work order or an electronic edition telecom work order on a service system, scanning the paper-edition telecom work order to obtain character information of the work order, or extracting characters of the telecom work order on the service system, and generating a telecom work order sample from the telecom work order extracted through the characters.

A plurality of telecom work order samples of low-value type and non-low-value type are respectively placed in two subdirectories of low value type and not low value type, wherein each subdirectory comprises 25 work orders which are named as 1.txt, 2.txt,3.txt, … … and 25. txt. Wherein the low value is a low value work order sample, and the not low value is a non-low value work order sample.

(2) Construction of naive Bayes model

Randomly selecting 40 work orders from the 50 work order samples as a work order training set, and extracting words w in all work order samples in the work order training set_iAll the words are converted into lower case words, repetition is removed, a word list is obtained, and each word w in the word list is counted_iGenerating training word matrix TrainMat of the bag-of-words model and corresponding work order training sample type TrainCat, calling a training function to train the training word matrix TrainMat and the corresponding work order training sample typeThe type TrainCat is processed to obtain the prior probability P (C) of the work order training sample with the work order type being the low-value work order in the work order training sample set₁) And the conditional probability P (w) of each feature word under different work order type conditions for each word_i|C_i) Is composed of p₀Vec and p₁Vec. Wherein the training function is as follows

Void Train (const TIntmat & TrainMat-training matrix;

const TIntVec & TrainCat-classification of the corresponding work order;

double&p_c1-returning the proportion of the work order in the matrix belonging to c 1;

TDblVec&p0Vec——P(w₀|C₀)、P(w₁|C₀) … array;

TDblVec&p1Vec——P(w₀|C₁)、P(w₁|C₁) … array

Wherein, the total work order number is the number of rows of TrainMat, and the vocabulary word number is the number of columns of TrainMat; let p be_c1The ratio of the work orders to the total number of the work orders when the work orders are low in value, namely the number of the work orders corresponding to the value of 1 in the TrainCat is divided by the total number of the work orders.

Calculating p₀Vec，p₁In the course of Vec, p₀New [ number of vocabulary of molecule ═ new [ ]]And p₁New [ number of vocabulary of molecule ═ new [ ]]Are all initialized to 1, p₀Denominator p₁The denominator is 2; the specific function is as follows:

for (i is 0; i < total number of work orders; i + +)

If(TrainCat[i]＝＝0){

Adding the ith row of TrainMat to p₀A molecule.

Adding the number of words appearing on line i of TrainMat to p₀Denominator

(analogously to p)₀Denominator + Sum (TrainMat [ i ]]))

}else{

Similar to above, but all change 0's to 1

p₀Vec＝ln(p₀Molecule/p₀Branch bus)

p₁Vec＝ln(p₁Molecule/p₁Branch bus)

}

(2) Testing naive Bayes model

And (4) utilizing the remaining 10 work orders to make a work order test doc, and judging whether the doc belongs to a low-value work order by using a Classify function.

for each doc in TestDoc{

Classify (vocabulary vector, p, corresponding to doc)_c1、p₀Vec and p₁Vec)

Recording error rate

}

(3) Identification of telecommunications work order types

The method comprises the following steps of utilizing a classification decision function to identify the type of a telecommunication work order to be classified, wherein the function is as follows:

bool classify(const TDblVec&w,const TDblVec&p₀Vec,const TDblVec&p₁Vec,double p_c1)

p0＝sum(w*p₀Vec)+ln(1-p_c1)

p1＝sum(w*p₁Vec)+ln(p_c1)

Return(p₁>p₀)

according to the method for identifying the type of the work order based on the big data, the invention also provides a system for identifying the type of the work order based on the big data, and the embodiment of the system for identifying the type of the work order based on the big data is explained in detail below.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a big data-based work order type identification system according to an embodiment of the present invention. In this embodiment, the identification system of the work order type based on the big data includes:

the correlation analysis module 410 is configured to obtain a work order to be classified, and perform correlation analysis on a structured field in the work order to be classified to obtain a first correlation coefficient;

the target work order obtaining module 420 is configured to, if the first correlation coefficient exceeds a preset threshold, remove a corresponding structured field in the work order to be classified, and obtain a target work order to be classified;

the probability obtaining module 430 is configured to extract a first feature word of the target work order to be classified, and calculate, according to the first feature word, a probability corresponding to the work order to be classified as a different work order type under the condition that the first feature word occurs by using a pre-established naive bayes model;

and the work order type identification module 440 is used for judging the type of the work order to be classified according to each probability.

According to the identification system of the work order type based on the big data, the correlation analysis is carried out on the structured field of the work order, the strong correlation field is removed, the work order can be quickly and accurately identified by using the pre-constructed naive Bayes model, and occupation of the storage space of the server and reduction of the operation efficiency of the service system caused by a large amount of accumulated work orders are avoided.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a big data-based work order type identification system according to another embodiment of the present invention; in this embodiment, the big data-based work order type recognition system further includes a naive bayes model construction module 450 and a naive bayes model testing module 460.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a naive bayes model building module in an embodiment of the invention; the naive bayes model construction module 450 includes a work order training set acquisition unit 451, a prior probability acquisition unit 452, a training matrix acquisition unit 453, a conditional probability acquisition unit 454, and a model construction unit 455.

The work order training set acquisition unit 451 is used for acquiring a work order sample set and randomly selecting a plurality of work order samples from the work order sample set to form a work order training set;

a prior probability obtaining unit 452, configured to calculate a corresponding prior probability when the types of the work order training samples in the work order training set are different work order types;

a training matrix obtaining unit 453, configured to extract a second feature word from the work order training sample in the work order training set, and construct a training word matrix of the bag-of-words model according to the second feature word;

a conditional probability obtaining unit 454, configured to obtain a conditional probability of each second feature word under different work order type conditions according to the training word matrix;

and a model constructing unit 455, configured to construct a naive bayes model according to the prior probability and the conditional probability.

In one embodiment, the work order training set obtaining unit 451 obtains a plurality of work orders of different types, and performs correlation analysis on the structured fields in each work order to obtain a second correlation coefficient; and if the second correlation number exceeds a preset threshold value, removing the structural field corresponding to the second correlation number in each work order, and obtaining a work order sample set.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a naive bayes model test module in an embodiment of the invention; in this embodiment, the naive bayes model testing module 460 includes a work order testing set obtaining unit 461, a feature extracting unit 462, an identification result obtaining unit 463, an accuracy obtaining unit 464, and a naive bayes model adjusting unit 465;

a work order test set obtaining unit 461, configured to randomly select a plurality of work order samples from the work order sample set to form a work order test set;

a feature extraction unit 462, configured to extract a third feature word from the work order test sample in the work order test set;

the recognition result obtaining unit 463, configured to respectively calculate, according to the third feature word and by using a naive bayes model, probabilities corresponding to different work order types of the work order test samples under the condition that the third feature word appears, and obtain recognition results according to the probabilities corresponding to the different work order types;

an accuracy obtaining unit 464, configured to obtain a classification accuracy of the naive bayes model according to the identification result and the type of the work order test sample;

and a naive bayes model adjusting unit 465, configured to adjust the naive bayes model if the classification accuracy is lower than a preset threshold.

In one embodiment, the naive bayes model adjusting unit 465 acquires a work order test sample with a wrong recognition result and extracts a fourth feature word; adjusting the occurrence frequency of the corresponding feature words in the training word matrix according to the fourth feature words; and adjusting the conditional probability in naive Bayes according to the adjusted training word matrix.

In one embodiment, the probability obtaining module 430 performs word segmentation on the target work order to be classified by using a regular expression to obtain a first feature word.

The identification system of the work order type based on the big data corresponds to the identification method of the work order type based on the big data one to one, and the technical characteristics and the beneficial effects thereof described in the embodiment of the identification method of the work order type based on the big data are all applicable to the embodiment of the identification system of the work order type based on the big data, so that the statement is made.

In one embodiment, there is also provided a computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of:

if the first correlation coefficient exceeds a preset threshold value, removing corresponding structured fields in the work order to be classified, and acquiring a target work order to be classified;

extracting a first feature word of a target work order to be classified, and respectively calculating corresponding probabilities when the work order type to be classified is different work order types under the condition that the first feature word appears by utilizing a pre-constructed naive Bayes model according to the first feature word;

In one embodiment, the processor executes the program to further implement the following steps:

acquiring a work order sample set, and randomly selecting a plurality of work order samples from the work order sample set to form a work order training set; calculating the prior probability corresponding to different work order types of the work order training samples in the work order training set; extracting a second characteristic word from the work order training samples in the work order training set, and constructing a training word matrix of the word bag model according to the second characteristic word; acquiring the conditional probability of each specific second token under different work order type conditions according to the training word matrix; and constructing a naive Bayes model according to the prior probability and the conditional probability.

randomly selecting a plurality of work order samples from the work order sample set to form a work order test set; extracting a third feature word from the work order test sample in the work order test set; respectively calculating corresponding probabilities of different work order types of work order test samples under the condition that the third feature words appear according to the third feature words and by using a naive Bayes model, and acquiring recognition results according to the corresponding probabilities of the different work order types; obtaining the classification accuracy of the naive Bayes model according to the identification result and the type of the work order test sample; and if the classification accuracy is lower than a preset threshold value, adjusting the naive Bayes model.

In one embodiment, the processor executes the program to further implement the following steps: acquiring a work order test sample with a wrong identification result and extracting a fourth characteristic word; adjusting the occurrence frequency of the corresponding feature words in the training word matrix according to the fourth feature words; and adjusting the conditional probability in naive Bayes according to the adjusted training word matrix.

When a processor of the computer equipment executes a program, the processor of the computer equipment can perform relevance analysis on a structured field of the work order by implementing any big data-based work order type identification method in the embodiments, and removes a strong relevance field, so that the work order can perform rapid and accurate type identification by using a pre-constructed naive Bayes model, and occupation of a server storage space and reduction of operation efficiency of a service system caused by a large amount of accumulated work orders are avoided.

In addition, it can be understood by those skilled in the art that all or part of the processes in the methods of the embodiments described above can be implemented by using a computer program to instruct related hardware, and the program can be stored in a non-volatile computer readable storage medium, and as in the embodiments of the present invention, the program can be stored in the storage medium of a computer system and executed by at least one processor in the computer system, so as to implement the processes of the embodiments including the above-described big data based work order type identification method.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

acquiring a work order sample set, and randomly selecting a plurality of work order samples from the work order sample set to form a work order training set; calculating the prior probability corresponding to different work order types of the work order training samples in the work order training set; extracting a second characteristic word from the work order training samples in the work order training set, and constructing a training word matrix of the word bag model according to the second characteristic word; acquiring the conditional probability of each second feature word under different work order type conditions according to the training word matrix; and constructing a naive Bayes model according to the prior probability and the conditional probability.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A big data-based work order type identification method is characterized by comprising the following steps:

judging the type of the work order to be classified according to the probabilities;

the step of extracting the first characteristic word of the target work order to be classified comprises the following steps: and performing word segmentation on the target work order to be classified by using a regular expression to obtain a first characteristic word.

2. The big-data-based work order type identification method according to claim 1, wherein the step of obtaining the work order to be classified further comprises the following steps:

3. The big data based work order type identification method as claimed in claim 2, wherein said step of obtaining a sample set of work orders comprises the steps of:

4. The big data-based work order type identification method according to claim 3, wherein the step of constructing a naive Bayes model according to the prior probabilities and the conditional probabilities further comprises the following steps:

5. The big data-based work order type identification method according to claim 4, wherein the step of adjusting the naive Bayes model comprises the steps of:

6. A big data based work order type identification system, comprising:

the work order type identification module is used for judging the type of the work order to be classified according to each probability;

the probability obtaining module is further configured to: and performing word segmentation on the target work order to be classified by using a regular expression to obtain a first characteristic word.

7. The big data-based work order type recognition system of claim 6, further comprising a naive Bayes model construction module, wherein the naive Bayes model construction module comprises a work order training set obtaining unit, a prior probability obtaining unit, a training matrix obtaining unit, a conditional probability obtaining unit and a model construction unit;

8. The big-data-based work order type identification system as claimed in claim 7, wherein said work order training set obtaining unit is further configured to obtain a plurality of work orders of different types, and perform correlation analysis on the structured fields in each of said work orders to obtain a second correlation coefficient; and if the second correlation number exceeds a preset threshold value, removing the structured fields corresponding to the second correlation number in each work order, and obtaining a work order sample set.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the big data based work order type identification method according to any one of claims 1 to 5 when executing the computer program.

10. A computer storage medium on which a computer program is stored, the program, when executed by a processor, implementing the method for big data based work order type identification according to any of claims 1 to 5.