CN108897754B - Big data-based work order type identification method and system and computing device - Google Patents

Big data-based work order type identification method and system and computing device Download PDF

Info

Publication number
CN108897754B
CN108897754B CN201810427330.7A CN201810427330A CN108897754B CN 108897754 B CN108897754 B CN 108897754B CN 201810427330 A CN201810427330 A CN 201810427330A CN 108897754 B CN108897754 B CN 108897754B
Authority
CN
China
Prior art keywords
work order
classified
word
training
naive bayes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810427330.7A
Other languages
Chinese (zh)
Other versions
CN108897754A (en
Inventor
李炯城
吴佩娥
李玥
关晓明
管学锋
陈运动
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Communications Services Co Ltd
China Communications Services Corp Ltd
Guangdong Planning and Designing Institute of Telecommunications Co Ltd
Original Assignee
Guangdong Communications Services Co Ltd
China Communications Services Corp Ltd
Guangdong Planning and Designing Institute of Telecommunications Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Communications Services Co Ltd, China Communications Services Corp Ltd, Guangdong Planning and Designing Institute of Telecommunications Co Ltd filed Critical Guangdong Communications Services Co Ltd
Priority to CN201810427330.7A priority Critical patent/CN108897754B/en
Publication of CN108897754A publication Critical patent/CN108897754A/en
Application granted granted Critical
Publication of CN108897754B publication Critical patent/CN108897754B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification

Abstract

The invention relates to a big data-based work order type identification method, a big data-based work order type identification system and a big data-based work order type identification computing device, wherein the method comprises the following steps: acquiring a work order to be classified, and performing correlation analysis on a structured field in the work order to be classified to acquire a first correlation coefficient; if the first correlation coefficient exceeds a preset threshold value, removing corresponding structured fields in the work order to be classified, and acquiring a target work order to be classified; extracting a first feature word of a target work order to be classified, and respectively calculating corresponding probabilities when the work order type to be classified is different work order types under the condition that the first feature word appears by utilizing a pre-constructed naive Bayes model according to the first feature word; and judging the type of the work order to be classified according to the probability corresponding to different work order types. By adopting the method, the occupation of the storage space of the server and the reduction of the operation efficiency of the service system caused by the accumulation of a large number of work orders can be avoided.

Description

Big data-based work order type identification method and system and computing device
Technical Field
The invention relates to the technical field of data processing, in particular to a method, a system and a computing device for identifying a work order type based on big data.
Background
With the continuous improvement of the digitization degree of a telecommunication network, a telecommunication service forms a large-scale work order, the design types of the service work order are more and more, the identification difficulty of the work order types is more and more, the speed of manually identifying the work order types at the present stage cannot keep up with the speed of the rapid increase of the service, the accumulation of the work orders occupies a large amount of storage space of a server, and the operation efficiency of a service system is directly reduced.
Therefore, the traditional method for identifying the type of the work order manually cannot meet the increasing demand of telecommunication service in the new period, and a new means is required to be found to improve the work efficiency and the planning accuracy of the identification of the type of the work order.
Disclosure of Invention
Based on this, it is necessary to provide a method, a system and a computing device for identifying a work order type based on big data, aiming at the problem that accumulation of work orders occupies a large amount of storage space of a server, which directly results in reduction of operation efficiency of a business system.
A big data-based work order type identification method comprises the following steps:
acquiring a work order to be classified, and performing correlation analysis on a structured field in the work order to be classified to acquire a first correlation coefficient;
if the first correlation coefficient exceeds a preset threshold value, removing corresponding structured fields in the work order to be classified to obtain a target work order to be classified;
extracting a first feature word of the target work order to be classified, and respectively calculating corresponding probabilities when the work order type to be classified is different work order types under the condition that the first feature word appears by utilizing a pre-constructed naive Bayes model according to the first feature word;
and judging the type of the work order to be classified according to the probabilities.
In one embodiment, before the step of obtaining the work order to be classified, the method further includes the following steps:
acquiring a work order sample set, and randomly selecting a plurality of work order samples from the work order sample set to form a work order training set;
calculating corresponding prior probability when the types of the work order training samples in the work order training set are different work order types;
extracting a second characteristic word from the work order training samples in the work order training set, and constructing a training word matrix of a word bag model according to the second characteristic word;
acquiring the conditional probability of each second characteristic word under different work order type conditions according to the training word matrix;
and constructing a naive Bayes model according to the prior probability and the conditional probability.
In one embodiment, the step of obtaining a sample set of work orders comprises the steps of:
obtaining a plurality of work orders of different types, and respectively carrying out correlation analysis on the structured fields in each work order to obtain a second correlation coefficient;
and if the second correlation number exceeds a preset threshold value, removing the structured fields corresponding to the second correlation number in each work order, and obtaining a work order sample set.
In one embodiment, after the step of constructing the naive bayes model according to the prior probability and the conditional probability, the method further comprises the following steps:
randomly selecting a plurality of work order samples from the work order sample set to form a work order test set;
extracting a third feature word from the work order test sample in the work order test set;
respectively calculating corresponding probabilities of the work order test samples in different work order types under the condition that the third feature words appear according to the third feature words and by using the naive Bayes model, and acquiring identification results according to the corresponding probabilities of the different work order types;
obtaining the classification accuracy of the naive Bayes model according to the identification result and the type of the work order test sample;
and if the classification accuracy is lower than a preset threshold value, adjusting the naive Bayes model.
In one embodiment, the step of adjusting the naive bayes model comprises the steps of:
acquiring a work order test sample with a wrong identification result and extracting a fourth characteristic word;
adjusting the occurrence frequency of the corresponding feature words in the training word matrix according to the fourth feature words;
and adjusting the conditional probability in the naive Bayes according to the adjusted training word matrix.
In one embodiment, the step of extracting the first feature word of the target work order to be classified includes the following steps:
and performing word segmentation on the target work order to be classified by using a regular expression to obtain a first characteristic word.
A big data based work order type identification system, comprising:
the correlation analysis module is used for acquiring a work order to be classified, and performing correlation analysis on a structured field in the work order to be classified to acquire a first correlation coefficient;
the target work order obtaining module is used for removing the corresponding structured fields in the work orders to be classified to obtain the target work orders to be classified if the first correlation coefficient exceeds a preset threshold value;
the probability obtaining module is used for extracting a first feature word of the target work order to be classified, and respectively calculating corresponding probabilities when the work order types to be classified are different work order types under the condition that the first feature word appears by utilizing a pre-constructed naive Bayes model according to the first feature word;
and the work order type identification module is used for judging the type of the work order to be classified according to each probability.
In one embodiment, the system further comprises a naive Bayes model construction module, wherein the naive Bayes model construction module comprises a work order training set acquisition unit, a prior probability acquisition unit, a training matrix acquisition unit, a conditional probability acquisition unit and a model construction unit;
the work order training set acquisition unit is used for acquiring a work order sample set and randomly selecting a plurality of work order samples from the work order sample set to form a work order training set;
the prior probability obtaining unit is used for calculating the corresponding prior probability when the types of the work order training samples in the work order training set are different work order types;
the training matrix obtaining unit is used for extracting a second characteristic word from the work order training sample in the work order training set and constructing a training word matrix of the word bag model according to the second characteristic word;
the conditional probability acquiring unit is used for acquiring the conditional probability of each second feature word under different work order type conditions according to the training word matrix;
and the model construction unit is used for constructing a naive Bayes model according to the prior probability and the conditional probability.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring a work order to be classified, and performing correlation analysis on a structured field in the work order to be classified to acquire a first correlation coefficient;
if the first correlation coefficient exceeds a preset threshold value, removing corresponding structured fields in the work order to be classified to obtain a target work order to be classified;
extracting a first feature word of the target work order to be classified, and respectively calculating corresponding probabilities when the work order type to be classified is different work order types under the condition that the first feature word appears by utilizing a pre-constructed naive Bayes model according to the first feature word;
and judging the type of the work order to be classified according to the probabilities.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring a work order to be classified, and performing correlation analysis on a structured field in the work order to be classified to acquire a first correlation coefficient;
if the first correlation coefficient exceeds a preset threshold value, removing corresponding structured fields in the work order to be classified to obtain a target work order to be classified;
extracting a first feature word of the target work order to be classified, and respectively calculating corresponding probabilities when the work order type to be classified is different work order types under the condition that the first feature word appears by utilizing a pre-constructed naive Bayes model according to the first feature word;
and judging the type of the work order to be classified according to the probabilities.
According to the work order type identification method, the work order type identification system, the computing equipment and the storage medium based on the big data, the correlation analysis is carried out on the structured fields of the work orders, the strong correlation fields are removed, the work orders can be quickly and accurately identified by using the pre-constructed naive Bayesian model, and occupation of the storage space of a server and reduction of the operation efficiency of a service system caused by a large amount of accumulated work orders are avoided.
Drawings
FIG. 1 is a flow chart of a big data based work order type identification method in one embodiment of the present invention;
FIG. 2 is a flow diagram of a naive Bayesian model construction in one embodiment of the invention;
FIG. 3 is a flow chart of testing a naive Bayes model in one embodiment of the invention;
FIG. 4 is a block diagram of a big data based work order type identification system in accordance with an embodiment of the present invention;
FIG. 5 is a schematic diagram of a big data based work order type identification system according to another embodiment of the present invention;
FIG. 6 is a block diagram of a naive Bayes model construction module in an embodiment of the invention;
FIG. 7 is a schematic structural diagram of a naive Bayesian model test module in an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the scope of the invention.
Referring to fig. 1, fig. 1 is a flowchart of a method for identifying a work order type based on big data in an embodiment of the present invention, where the method for identifying a work order type based on big data in the embodiment includes the following steps:
step S110: and acquiring a work order to be classified, and performing correlation analysis on the structured fields in the work order to be classified to acquire a first correlation coefficient.
In this step, the work order to be classified may be obtained directly from the service system or may be obtained by scanning a paper document, and usually the work order to be classified includes structured data and unstructured data, and the structured fields are designed to have certain redundancy, and the fields are numerous and have certain correlation.
After the work order to be classified is obtained, correlation analysis is carried out on a plurality of structured fields in the work order to be classified, and correlation coefficients among the structured fields are obtained.
Step S120: and if the first correlation coefficient exceeds a preset threshold value, removing the corresponding structured field in the work order to be classified, and acquiring the target work order to be classified.
In order to improve the information proportion contained in the structuralization of the work order to be classified and reduce the related calculation amount, the correlation coefficient between the structuralization fields is obtained through correlation analysis, and when the correlation coefficient between the structuralization fields exceeds the preset correlation coefficient threshold value, the structuralization field related to the correlation coefficient is judged to be a strong correlation field; and removing the strong correlation field from the work order to be classified to obtain the target work order to be classified.
Step S130: and extracting a first characteristic word of the target work order to be classified, and respectively calculating the corresponding probability when the work order type to be classified is different work order types under the condition that the first characteristic word appears by utilizing a pre-constructed naive Bayes model according to the first characteristic word.
Step S140: and judging the type of the work order to be classified according to the probabilities.
And for a pre-constructed naive Bayes model, words extracted from a target work order to be classified are input into the naive Bayes model as characteristic parameters, the pre-constructed naive Bayes model calculates and outputs corresponding probabilities when the work order types to be classified are different under the condition that the words appear, wherein the work order type corresponding to the maximum probability value is the work order type of the final work order to be classified.
According to the big data-based work order type identification method, the work order type of the work order to be classified is identified by using the naive Bayesian model, the work order identification is converted into a classification problem, the work order type can be identified quickly and accurately, and therefore an enterprise is helped to solve the batch work order identification problem, occupation of a server storage space and reduction of operation efficiency of a service system caused by a large amount of accumulated work orders are avoided, work efficiency is improved, and higher labor cost is not consumed.
In one embodiment, the step of extracting the first feature word of the target work order to be classified comprises the following steps:
and performing word segmentation on the target work order to be classified by using a regular expression to obtain a first characteristic word.
In the embodiment, the character string in the target work order to be classified is subjected to character division through the regular expression, the word in the target work order to be classified is extracted and used as the characteristic parameter to be input into the naive Bayes model, and a basis is provided for subsequent work order type identification.
In one embodiment, the regular expression "\\ w \" is adopted to perform word division on the target work order to be classified, so that a first characteristic word is obtained, and the phenomenon that the character strings of the target work order to be classified are excessively subdivided to form meaningless words and influence the recognition result of the work order type is avoided. For example, the use of the regular representation "\ w _" divides "Dr. Li" into one word, rather than two words, "Dr" and "Li".
Referring to fig. 2, fig. 2 is a flow chart of constructing a naive bayes model in an embodiment of the invention. In this embodiment, before the step of obtaining the work order to be classified, the method further includes the following steps:
step S210: and acquiring a work order sample set, and randomly selecting a plurality of work order samples from the work order sample set to form a work order training set.
Step S220: and calculating the prior probability corresponding to different work order types of the work order training samples in the work order training set.
Step S230: and extracting second characteristic words from the work order training samples in the work order training set, and constructing a training word matrix of the word bag model according to the second characteristic words.
Step S240: and acquiring the conditional probability of each second characteristic word under different work order type conditions according to the training word matrix.
Step S250: and constructing a naive Bayes model according to the prior probability and the conditional probability.
Specifically, a work order sample set is stored in the system catalog, the work order sample set comprises a plurality of work order samples of each work order type, a plurality of work order samples are randomly selected from the work order sample set to form a work order training set, and corresponding prior probability P (C) when the types of the work order samples in the work order training set are different work order types is obtained1)、P(C2)、…、P(Ci) In which C isiIs of the work order type. Extracting words w in all work order samples in work order training setiAll the words are converted into lower case words, repetition is removed, a word list is obtained, and each word w in the word list is countediAnd generating a training word matrix of the bag-of-words model. For example, there exist a plurality of work order training sets, wherein the specific form after all words in two work order training samples are converted into lower case is as follows:
1, a work order: baby eat applet? eat!
A work order 2: say good bye, baby.
The training word matrix for the bag-of-words model generated from these two work orders is as follows:
able apple Baby bye eat good say
work order 1 vocabulary vector 0 1 1 0 2 0 0
Work order 2 vocabulary vector 0 0 1 1 0 1 1
After the training word matrix is obtained, the conditional probability P (w) of each characteristic word under different work order type conditions of each word is calculated according to the training word matrixi|Ci) According to P (C)1)、P(C2)、…、P(Ci) And P (w)i|Ci) And constructing a naive Bayes model.
The prior probability and the conditional probability parameters are obtained through the work order training set, a naive Bayes model is constructed according to the parameters, and the obtained recognition result is more accurate when the work order type recognition is carried out by using the naive Bayes model in the follow-up process.
In one embodiment, the step of obtaining a sample set of work orders comprises the steps of:
obtaining a plurality of work orders of different types, and respectively carrying out correlation analysis on the structured fields in each work order to obtain a second correlation coefficient; and if the second correlation number exceeds a preset threshold value, removing the structural field corresponding to the second correlation number in each work order, and obtaining a work order sample set.
In this embodiment, a plurality of work orders that have been manually identified as work order types are collected, and since the machine learning algorithm processes unstructured data, the collected work orders include structured data, correlation analysis is performed on structured fields of the collected work orders, and fields with strong correlation are removed to obtain a work order sample set.
Referring to fig. 3, fig. 3 is a flow chart of testing a naive bayes model in an embodiment of the invention. In this embodiment, after the step of constructing the naive bayes model according to the prior probability and the conditional probability, the method further includes the following steps:
step S310: randomly selecting a plurality of work order samples from the work order sample set to form a work order test set;
step S320: extracting a third feature word from the work order test sample in the work order test set;
step S330: respectively calculating corresponding probabilities of different work order types of work order test samples under the condition that the third feature words appear according to the third feature words and by using a naive Bayes model, and acquiring recognition results according to the corresponding probabilities of the different work order types;
step S340: acquiring the recognition accuracy of the naive Bayes model according to the classification result and the type of the work order test sample;
step S350: and if the classification accuracy rate is lower than a preset threshold value, adjusting the naive Bayes model.
In the naive Bayes model test process, the corresponding probability P (C) when the work order test sample is different work order types is obtained by extracting the feature words in the work order test sample in the work order test set and calculating according to the feature words of the work order test sample by using the constructed naive Bayes modeliIf the classification result is inconsistent with the work order type of the work order test sample, the work order type is identified wrongly; and testing all work order test samples in the work order test set to obtain the identification accuracy of the naive Bayes model, and if the classification accuracy is lower than a preset threshold, adjusting the naive Bayes model to ensure the identification performance of the naive Bayes model on the work order type.
In one embodiment, the step of adjusting the naive bayes model comprises the steps of:
acquiring a work order test sample with a wrong identification result and extracting a fourth characteristic word; adjusting the occurrence frequency of the corresponding feature words in the training word matrix according to the fourth feature words; and adjusting the conditional probability in naive Bayes according to the adjusted training word matrix.
Obtaining a work order with a wrong recognition result, extracting characteristic words in the work order, adjusting the frequency of the words in a training word matrix of the bag-of-words model so as to adjust the weights of the words, and training according to the adjusted trainingThe word matrix retrieves the conditional probability P (C) of each feature word under different work order type conditionsiAnd w), so that the naive Bayes model is adjusted, and the identification performance of the naive Bayes model on the type of the work order is improved.
Furthermore, in the actual use process, when the work order type is identified and classified wrongly, the work order is returned in the next link, the naive Bayes model adjusts the word bag by utilizing the characteristic words in the work order with the wrong identification result, self-learning is realized, the self-learning capability and the self-adaptability of the naive Bayes model are improved, and the naive Bayes model has stronger practicability.
In order to make the technical solution of the present invention clearer, the following takes identification of the work order type of the telecommunications work order as an example, and further explains the identification method of the work order type based on big data in the embodiment of the present invention:
(1) obtaining a set of work order samples
The method comprises the steps of obtaining a paper-edition telecom work order or an electronic edition telecom work order on a service system, scanning the paper-edition telecom work order to obtain character information of the work order, or extracting characters of the telecom work order on the service system, and generating a telecom work order sample from the telecom work order extracted through the characters.
A plurality of telecom work order samples of low-value type and non-low-value type are respectively placed in two subdirectories of low value type and not low value type, wherein each subdirectory comprises 25 work orders which are named as 1.txt, 2.txt,3.txt, … … and 25. txt. Wherein the low value is a low value work order sample, and the not low value is a non-low value work order sample.
(2) Construction of naive Bayes model
Randomly selecting 40 work orders from the 50 work order samples as a work order training set, and extracting words w in all work order samples in the work order training setiAll the words are converted into lower case words, repetition is removed, a word list is obtained, and each word w in the word list is countediGenerating training word matrix TrainMat of the bag-of-words model and corresponding work order training sample type TrainCat, calling a training function to train the training word matrix TrainMat and the corresponding work order training sample typeThe type TrainCat is processed to obtain the prior probability P (C) of the work order training sample with the work order type being the low-value work order in the work order training sample set1) And the conditional probability P (w) of each feature word under different work order type conditions for each wordi|Ci) Is composed of p0Vec and p1Vec. Wherein the training function is as follows
Void Train (const TIntmat & TrainMat-training matrix;
const TIntVec & TrainCat-classification of the corresponding work order;
double&pc1-returning the proportion of the work order in the matrix belonging to c 1;
TDblVec&p0Vec——P(w0|C0)、P(w1|C0) … array;
TDblVec&p1Vec——P(w0|C1)、P(w1|C1) … array
Wherein, the total work order number is the number of rows of TrainMat, and the vocabulary word number is the number of columns of TrainMat; let p bec1The ratio of the work orders to the total number of the work orders when the work orders are low in value, namely the number of the work orders corresponding to the value of 1 in the TrainCat is divided by the total number of the work orders.
Calculating p0Vec,p1In the course of Vec, p0New [ number of vocabulary of molecule ═ new [ ]]And p1New [ number of vocabulary of molecule ═ new [ ]]Are all initialized to 1, p0Denominator p1The denominator is 2; the specific function is as follows:
for (i is 0; i < total number of work orders; i + +)
If(TrainCat[i]==0){
Adding the ith row of TrainMat to p0A molecule.
Adding the number of words appearing on line i of TrainMat to p0Denominator
(analogously to p)0Denominator + Sum (TrainMat [ i ]]))
}else{
Similar to above, but all change 0's to 1
p0Vec=ln(p0Molecule/p0Branch bus)
p1Vec=ln(p1Molecule/p1Branch bus)
}
(2) Testing naive Bayes model
And (4) utilizing the remaining 10 work orders to make a work order test doc, and judging whether the doc belongs to a low-value work order by using a Classify function.
for each doc in TestDoc{
Classify (vocabulary vector, p, corresponding to doc)c1、p0Vec and p1Vec)
Recording error rate
}
(3) Identification of telecommunications work order types
The method comprises the following steps of utilizing a classification decision function to identify the type of a telecommunication work order to be classified, wherein the function is as follows:
bool classify(const TDblVec&w,const TDblVec&p0Vec,const TDblVec&p1Vec,double pc1)
p0=sum(w*p0Vec)+ln(1-pc1)
p1=sum(w*p1Vec)+ln(pc1)
Return(p1>p0)
according to the method for identifying the type of the work order based on the big data, the invention also provides a system for identifying the type of the work order based on the big data, and the embodiment of the system for identifying the type of the work order based on the big data is explained in detail below.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a big data-based work order type identification system according to an embodiment of the present invention. In this embodiment, the identification system of the work order type based on the big data includes:
the correlation analysis module 410 is configured to obtain a work order to be classified, and perform correlation analysis on a structured field in the work order to be classified to obtain a first correlation coefficient;
the target work order obtaining module 420 is configured to, if the first correlation coefficient exceeds a preset threshold, remove a corresponding structured field in the work order to be classified, and obtain a target work order to be classified;
the probability obtaining module 430 is configured to extract a first feature word of the target work order to be classified, and calculate, according to the first feature word, a probability corresponding to the work order to be classified as a different work order type under the condition that the first feature word occurs by using a pre-established naive bayes model;
and the work order type identification module 440 is used for judging the type of the work order to be classified according to each probability.
According to the identification system of the work order type based on the big data, the correlation analysis is carried out on the structured field of the work order, the strong correlation field is removed, the work order can be quickly and accurately identified by using the pre-constructed naive Bayes model, and occupation of the storage space of the server and reduction of the operation efficiency of the service system caused by a large amount of accumulated work orders are avoided.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a big data-based work order type identification system according to another embodiment of the present invention; in this embodiment, the big data-based work order type recognition system further includes a naive bayes model construction module 450 and a naive bayes model testing module 460.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a naive bayes model building module in an embodiment of the invention; the naive bayes model construction module 450 includes a work order training set acquisition unit 451, a prior probability acquisition unit 452, a training matrix acquisition unit 453, a conditional probability acquisition unit 454, and a model construction unit 455.
The work order training set acquisition unit 451 is used for acquiring a work order sample set and randomly selecting a plurality of work order samples from the work order sample set to form a work order training set;
a prior probability obtaining unit 452, configured to calculate a corresponding prior probability when the types of the work order training samples in the work order training set are different work order types;
a training matrix obtaining unit 453, configured to extract a second feature word from the work order training sample in the work order training set, and construct a training word matrix of the bag-of-words model according to the second feature word;
a conditional probability obtaining unit 454, configured to obtain a conditional probability of each second feature word under different work order type conditions according to the training word matrix;
and a model constructing unit 455, configured to construct a naive bayes model according to the prior probability and the conditional probability.
In one embodiment, the work order training set obtaining unit 451 obtains a plurality of work orders of different types, and performs correlation analysis on the structured fields in each work order to obtain a second correlation coefficient; and if the second correlation number exceeds a preset threshold value, removing the structural field corresponding to the second correlation number in each work order, and obtaining a work order sample set.
Referring to fig. 7, fig. 7 is a schematic structural diagram of a naive bayes model test module in an embodiment of the invention; in this embodiment, the naive bayes model testing module 460 includes a work order testing set obtaining unit 461, a feature extracting unit 462, an identification result obtaining unit 463, an accuracy obtaining unit 464, and a naive bayes model adjusting unit 465;
a work order test set obtaining unit 461, configured to randomly select a plurality of work order samples from the work order sample set to form a work order test set;
a feature extraction unit 462, configured to extract a third feature word from the work order test sample in the work order test set;
the recognition result obtaining unit 463, configured to respectively calculate, according to the third feature word and by using a naive bayes model, probabilities corresponding to different work order types of the work order test samples under the condition that the third feature word appears, and obtain recognition results according to the probabilities corresponding to the different work order types;
an accuracy obtaining unit 464, configured to obtain a classification accuracy of the naive bayes model according to the identification result and the type of the work order test sample;
and a naive bayes model adjusting unit 465, configured to adjust the naive bayes model if the classification accuracy is lower than a preset threshold.
In one embodiment, the naive bayes model adjusting unit 465 acquires a work order test sample with a wrong recognition result and extracts a fourth feature word; adjusting the occurrence frequency of the corresponding feature words in the training word matrix according to the fourth feature words; and adjusting the conditional probability in naive Bayes according to the adjusted training word matrix.
In one embodiment, the probability obtaining module 430 performs word segmentation on the target work order to be classified by using a regular expression to obtain a first feature word.
The identification system of the work order type based on the big data corresponds to the identification method of the work order type based on the big data one to one, and the technical characteristics and the beneficial effects thereof described in the embodiment of the identification method of the work order type based on the big data are all applicable to the embodiment of the identification system of the work order type based on the big data, so that the statement is made.
In one embodiment, there is also provided a computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of:
acquiring a work order to be classified, and performing correlation analysis on a structured field in the work order to be classified to acquire a first correlation coefficient;
if the first correlation coefficient exceeds a preset threshold value, removing corresponding structured fields in the work order to be classified, and acquiring a target work order to be classified;
extracting a first feature word of a target work order to be classified, and respectively calculating corresponding probabilities when the work order type to be classified is different work order types under the condition that the first feature word appears by utilizing a pre-constructed naive Bayes model according to the first feature word;
and judging the type of the work order to be classified according to the probabilities.
In one embodiment, the processor executes the program to further implement the following steps:
acquiring a work order sample set, and randomly selecting a plurality of work order samples from the work order sample set to form a work order training set; calculating the prior probability corresponding to different work order types of the work order training samples in the work order training set; extracting a second characteristic word from the work order training samples in the work order training set, and constructing a training word matrix of the word bag model according to the second characteristic word; acquiring the conditional probability of each specific second token under different work order type conditions according to the training word matrix; and constructing a naive Bayes model according to the prior probability and the conditional probability.
In one embodiment, the processor executes the program to further implement the following steps:
obtaining a plurality of work orders of different types, and respectively carrying out correlation analysis on the structured fields in each work order to obtain a second correlation coefficient; and if the second correlation number exceeds a preset threshold value, removing the structural field corresponding to the second correlation number in each work order, and obtaining a work order sample set.
In one embodiment, the processor executes the program to further implement the following steps:
randomly selecting a plurality of work order samples from the work order sample set to form a work order test set; extracting a third feature word from the work order test sample in the work order test set; respectively calculating corresponding probabilities of different work order types of work order test samples under the condition that the third feature words appear according to the third feature words and by using a naive Bayes model, and acquiring recognition results according to the corresponding probabilities of the different work order types; obtaining the classification accuracy of the naive Bayes model according to the identification result and the type of the work order test sample; and if the classification accuracy is lower than a preset threshold value, adjusting the naive Bayes model.
In one embodiment, the processor executes the program to further implement the following steps: acquiring a work order test sample with a wrong identification result and extracting a fourth characteristic word; adjusting the occurrence frequency of the corresponding feature words in the training word matrix according to the fourth feature words; and adjusting the conditional probability in naive Bayes according to the adjusted training word matrix.
In one embodiment, the processor executes the program to further implement the following steps:
and performing word segmentation on the target work order to be classified by using a regular expression to obtain a first characteristic word.
When a processor of the computer equipment executes a program, the processor of the computer equipment can perform relevance analysis on a structured field of the work order by implementing any big data-based work order type identification method in the embodiments, and removes a strong relevance field, so that the work order can perform rapid and accurate type identification by using a pre-constructed naive Bayes model, and occupation of a server storage space and reduction of operation efficiency of a service system caused by a large amount of accumulated work orders are avoided.
In addition, it can be understood by those skilled in the art that all or part of the processes in the methods of the embodiments described above can be implemented by using a computer program to instruct related hardware, and the program can be stored in a non-volatile computer readable storage medium, and as in the embodiments of the present invention, the program can be stored in the storage medium of a computer system and executed by at least one processor in the computer system, so as to implement the processes of the embodiments including the above-described big data based work order type identification method.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
acquiring a work order to be classified, and performing correlation analysis on a structured field in the work order to be classified to acquire a first correlation coefficient;
if the first correlation coefficient exceeds a preset threshold value, removing corresponding structured fields in the work order to be classified, and acquiring a target work order to be classified;
extracting a first feature word of a target work order to be classified, and respectively calculating corresponding probabilities when the work order type to be classified is different work order types under the condition that the first feature word appears by utilizing a pre-constructed naive Bayes model according to the first feature word;
and judging the type of the work order to be classified according to the probabilities.
In one embodiment, the processor executes the program to further implement the following steps:
acquiring a work order sample set, and randomly selecting a plurality of work order samples from the work order sample set to form a work order training set; calculating the prior probability corresponding to different work order types of the work order training samples in the work order training set; extracting a second characteristic word from the work order training samples in the work order training set, and constructing a training word matrix of the word bag model according to the second characteristic word; acquiring the conditional probability of each second feature word under different work order type conditions according to the training word matrix; and constructing a naive Bayes model according to the prior probability and the conditional probability.
In one embodiment, the processor executes the program to further implement the following steps:
obtaining a plurality of work orders of different types, and respectively carrying out correlation analysis on the structured fields in each work order to obtain a second correlation coefficient; and if the second correlation number exceeds a preset threshold value, removing the structural field corresponding to the second correlation number in each work order, and obtaining a work order sample set.
In one embodiment, the processor executes the program to further implement the following steps:
randomly selecting a plurality of work order samples from the work order sample set to form a work order test set; extracting a third feature word from the work order test sample in the work order test set; respectively calculating corresponding probabilities of different work order types of work order test samples under the condition that the third feature words appear according to the third feature words and by using a naive Bayes model, and acquiring recognition results according to the corresponding probabilities of the different work order types; obtaining the classification accuracy of the naive Bayes model according to the identification result and the type of the work order test sample; and if the classification accuracy is lower than a preset threshold value, adjusting the naive Bayes model.
In one embodiment, the processor executes the program to further implement the following steps: acquiring a work order test sample with a wrong identification result and extracting a fourth characteristic word; adjusting the occurrence frequency of the corresponding feature words in the training word matrix according to the fourth feature words; and adjusting the conditional probability in naive Bayes according to the adjusted training word matrix.
In one embodiment, the processor executes the program to further implement the following steps:
and performing word segmentation on the target work order to be classified by using a regular expression to obtain a first characteristic word.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A big data-based work order type identification method is characterized by comprising the following steps:
acquiring a work order to be classified, and performing correlation analysis on a structured field in the work order to be classified to acquire a first correlation coefficient;
if the first correlation coefficient exceeds a preset threshold value, removing corresponding structured fields in the work order to be classified to obtain a target work order to be classified;
extracting a first feature word of the target work order to be classified, and respectively calculating corresponding probabilities when the work order type to be classified is different work order types under the condition that the first feature word appears by utilizing a pre-constructed naive Bayes model according to the first feature word;
judging the type of the work order to be classified according to the probabilities;
the step of extracting the first characteristic word of the target work order to be classified comprises the following steps: and performing word segmentation on the target work order to be classified by using a regular expression to obtain a first characteristic word.
2. The big-data-based work order type identification method according to claim 1, wherein the step of obtaining the work order to be classified further comprises the following steps:
acquiring a work order sample set, and randomly selecting a plurality of work order samples from the work order sample set to form a work order training set;
calculating corresponding prior probability when the types of the work order training samples in the work order training set are different work order types;
extracting a second characteristic word from the work order training samples in the work order training set, and constructing a training word matrix of a word bag model according to the second characteristic word;
acquiring the conditional probability of each second characteristic word under different work order type conditions according to the training word matrix;
and constructing a naive Bayes model according to the prior probability and the conditional probability.
3. The big data based work order type identification method as claimed in claim 2, wherein said step of obtaining a sample set of work orders comprises the steps of:
obtaining a plurality of work orders of different types, and respectively carrying out correlation analysis on the structured fields in each work order to obtain a second correlation coefficient;
and if the second correlation number exceeds a preset threshold value, removing the structured fields corresponding to the second correlation number in each work order, and obtaining a work order sample set.
4. The big data-based work order type identification method according to claim 3, wherein the step of constructing a naive Bayes model according to the prior probabilities and the conditional probabilities further comprises the following steps:
randomly selecting a plurality of work order samples from the work order sample set to form a work order test set;
extracting a third feature word from the work order test sample in the work order test set;
respectively calculating corresponding probabilities of the work order test samples in different work order types under the condition that the third feature words appear according to the third feature words and by using the naive Bayes model, and acquiring identification results according to the corresponding probabilities of the different work order types;
obtaining the classification accuracy of the naive Bayes model according to the identification result and the type of the work order test sample;
and if the classification accuracy is lower than a preset threshold value, adjusting the naive Bayes model.
5. The big data-based work order type identification method according to claim 4, wherein the step of adjusting the naive Bayes model comprises the steps of:
acquiring a work order test sample with a wrong identification result and extracting a fourth characteristic word;
adjusting the occurrence frequency of the corresponding feature words in the training word matrix according to the fourth feature words;
and adjusting the conditional probability in the naive Bayes according to the adjusted training word matrix.
6. A big data based work order type identification system, comprising:
the correlation analysis module is used for acquiring a work order to be classified, and performing correlation analysis on a structured field in the work order to be classified to acquire a first correlation coefficient;
the target work order obtaining module is used for removing the corresponding structured fields in the work orders to be classified to obtain the target work orders to be classified if the first correlation coefficient exceeds a preset threshold value;
the probability obtaining module is used for extracting a first feature word of the target work order to be classified, and respectively calculating corresponding probabilities when the work order types to be classified are different work order types under the condition that the first feature word appears by utilizing a pre-constructed naive Bayes model according to the first feature word;
the work order type identification module is used for judging the type of the work order to be classified according to each probability;
the probability obtaining module is further configured to: and performing word segmentation on the target work order to be classified by using a regular expression to obtain a first characteristic word.
7. The big data-based work order type recognition system of claim 6, further comprising a naive Bayes model construction module, wherein the naive Bayes model construction module comprises a work order training set obtaining unit, a prior probability obtaining unit, a training matrix obtaining unit, a conditional probability obtaining unit and a model construction unit;
the work order training set acquisition unit is used for acquiring a work order sample set and randomly selecting a plurality of work order samples from the work order sample set to form a work order training set;
the prior probability obtaining unit is used for calculating the corresponding prior probability when the types of the work order training samples in the work order training set are different work order types;
the training matrix obtaining unit is used for extracting a second characteristic word from the work order training sample in the work order training set and constructing a training word matrix of the word bag model according to the second characteristic word;
the conditional probability acquiring unit is used for acquiring the conditional probability of each second feature word under different work order type conditions according to the training word matrix;
and the model construction unit is used for constructing a naive Bayes model according to the prior probability and the conditional probability.
8. The big-data-based work order type identification system as claimed in claim 7, wherein said work order training set obtaining unit is further configured to obtain a plurality of work orders of different types, and perform correlation analysis on the structured fields in each of said work orders to obtain a second correlation coefficient; and if the second correlation number exceeds a preset threshold value, removing the structured fields corresponding to the second correlation number in each work order, and obtaining a work order sample set.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the big data based work order type identification method according to any one of claims 1 to 5 when executing the computer program.
10. A computer storage medium on which a computer program is stored, the program, when executed by a processor, implementing the method for big data based work order type identification according to any of claims 1 to 5.
CN201810427330.7A 2018-05-07 2018-05-07 Big data-based work order type identification method and system and computing device Active CN108897754B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810427330.7A CN108897754B (en) 2018-05-07 2018-05-07 Big data-based work order type identification method and system and computing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810427330.7A CN108897754B (en) 2018-05-07 2018-05-07 Big data-based work order type identification method and system and computing device

Publications (2)

Publication Number Publication Date
CN108897754A CN108897754A (en) 2018-11-27
CN108897754B true CN108897754B (en) 2020-12-11

Family

ID=64342619

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810427330.7A Active CN108897754B (en) 2018-05-07 2018-05-07 Big data-based work order type identification method and system and computing device

Country Status (1)

Country Link
CN (1) CN108897754B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325422B (en) * 2018-12-14 2023-10-27 中国移动通信集团河南有限公司 Work order dispatching method and system
CN110147980A (en) * 2019-04-03 2019-08-20 口碑(上海)信息技术有限公司 Worksheet method and device
CN110417748A (en) * 2019-07-08 2019-11-05 新华三信息安全技术有限公司 A kind of attack detection method and device
CN111382068B (en) * 2020-02-29 2024-04-09 中国平安人寿保险股份有限公司 Hierarchical testing method and device for large-batch data
CN111797942A (en) * 2020-07-23 2020-10-20 深圳壹账通智能科技有限公司 User information classification method and device, computer equipment and storage medium
CN113177151A (en) * 2021-05-28 2021-07-27 中山世达模型制造有限公司 Potential customer screening method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101814083A (en) * 2010-01-08 2010-08-25 上海复歌信息科技有限公司 Automatic webpage classification method and system
CN104021302A (en) * 2014-06-18 2014-09-03 北京邮电大学 Auxiliary registration method based on Bayes text classification model
CN106445994A (en) * 2016-07-13 2017-02-22 广州精点计算机科技有限公司 Mixed algorithm-based web page classification method and apparatus
CN106844632A (en) * 2017-01-20 2017-06-13 清华大学 Based on the product review sensibility classification method and device that improve SVMs

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101814083A (en) * 2010-01-08 2010-08-25 上海复歌信息科技有限公司 Automatic webpage classification method and system
CN104021302A (en) * 2014-06-18 2014-09-03 北京邮电大学 Auxiliary registration method based on Bayes text classification model
CN106445994A (en) * 2016-07-13 2017-02-22 广州精点计算机科技有限公司 Mixed algorithm-based web page classification method and apparatus
CN106844632A (en) * 2017-01-20 2017-06-13 清华大学 Based on the product review sensibility classification method and device that improve SVMs

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《文本分类关键技术及应用研究》;凤丽洲;《中国博士学位论文全文数据库 信息科技辑》;20160615;论文第2、4章 *

Also Published As

Publication number Publication date
CN108897754A (en) 2018-11-27

Similar Documents

Publication Publication Date Title
CN108897754B (en) Big data-based work order type identification method and system and computing device
CN109165840B (en) Risk prediction processing method, risk prediction processing device, computer equipment and medium
CN109840588B (en) Neural network model training method, device, computer equipment and storage medium
CN109598095B (en) Method and device for establishing scoring card model, computer equipment and storage medium
CN112380840B (en) Text error correction method, device, equipment and medium
CN110795919A (en) Method, device, equipment and medium for extracting table in PDF document
CN110991474A (en) Machine learning modeling platform
CN110674319A (en) Label determination method and device, computer equipment and storage medium
CN110503143B (en) Threshold selection method, device, storage medium and device based on intention recognition
CN108710907B (en) Handwritten data classification method, model training method, device, equipment and medium
CN113255370A (en) Industry type recommendation method, device, equipment and medium based on semantic similarity
WO2019223104A1 (en) Method and apparatus for determining event influencing factors, terminal device, and readable storage medium
CN112528022A (en) Method for extracting characteristic words corresponding to theme categories and identifying text theme categories
CN111583911B (en) Speech recognition method, device, terminal and medium based on label smoothing
CN109657710B (en) Data screening method and device, server and storage medium
CN109992778B (en) Resume document distinguishing method and device based on machine learning
CN112464660A (en) Text classification model construction method and text data processing method
CN111104422A (en) Training method, device, equipment and storage medium of data recommendation model
CN112560545A (en) Method and device for identifying form direction and electronic equipment
CN112364620B (en) Text similarity judging method and device and computer equipment
CN110659347B (en) Associated document determining method, device, computer equipment and storage medium
CN112883267A (en) Data heat degree statistical method and device based on deep learning
CN111460268B (en) Method and device for determining database query request and computer equipment
CN114169331A (en) Address resolution method, device, computer equipment and storage medium
CN103744830A (en) Semantic analysis based identification method of identity information in EXCEL document

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant