CN108897754A - Recognition methods, system and the calculating equipment of work order type based on big data - Google Patents

Recognition methods, system and the calculating equipment of work order type based on big data Download PDF

Info

Publication number
CN108897754A
CN108897754A CN201810427330.7A CN201810427330A CN108897754A CN 108897754 A CN108897754 A CN 108897754A CN 201810427330 A CN201810427330 A CN 201810427330A CN 108897754 A CN108897754 A CN 108897754A
Authority
CN
China
Prior art keywords
work order
sorted
type
feature word
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810427330.7A
Other languages
Chinese (zh)
Other versions
CN108897754B (en
Inventor
李炯城
吴佩娥
李玥
关晓明
管学锋
陈运动
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Communications Services Co Ltd
China Communications Services Corp Ltd
Guangdong Planning and Designing Institute of Telecommunications Co Ltd
Original Assignee
Guangdong Communications Services Co Ltd
China Communications Services Corp Ltd
Guangdong Planning and Designing Institute of Telecommunications Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Communications Services Co Ltd, China Communications Services Corp Ltd, Guangdong Planning and Designing Institute of Telecommunications Co Ltd filed Critical Guangdong Communications Services Co Ltd
Priority to CN201810427330.7A priority Critical patent/CN108897754B/en
Publication of CN108897754A publication Critical patent/CN108897754A/en
Application granted granted Critical
Publication of CN108897754B publication Critical patent/CN108897754B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The recognition methods of the work order type based on big data that the present invention relates to a kind of, system and equipment is calculated, this method includes:Work order to be sorted is obtained, the structured field in classification work order is treated and carries out correlation analysis the first related coefficient of acquisition;If the first related coefficient is more than preset threshold, corresponding structured field in work order to be sorted is removed, target work order to be sorted is obtained;The fisrt feature word for extracting target work order to be sorted is calculated separately under conditions of fisrt feature word occurs according to fisrt feature word using the model-naive Bayesian constructed in advance, work order type to be sorted corresponding probability when being different work order types;According to the type of the different corresponding probabilistic determination of work order type work orders to be sorted.The reduction of the occupancy and operation system operational efficiency of server storage is caused using the bulk deposition that this method can be avoided work order.

Description

Recognition methods, system and the calculating equipment of work order type based on big data
Technical field
The present invention relates to technical field of data processing, more particularly to a kind of identification side of work order type based on big data Method, system and calculating equipment.
Background technique
With the continuous improvement of telecommunication network digitized degree, telecommunication service forms the work order of scale, and business work order is set The type of meter is more and more, increasing to the identification difficulty of work order type, and the speed of manual identified work order type at this stage The speed of business rapid growth is not caught up with, a large amount of memory spaces for occupying server of the accumulation of work order directly result in operation system Operational efficiency reduce.
Therefore, the method for relying on traditional artificial identification work order type is not able to satisfy new period telecommunication service increased requirement, needs New means are found to improve the working efficiency of the identification of work order type and planning accuracy.
Summary of the invention
Based on this, it is necessary to for a large amount of memory spaces for occupying server of accumulation of work order, directly result in operation system Operational efficiency the problem of reducing, the recognition methods of work order type based on big data a kind of, system are provided and calculate equipment.
A kind of recognition methods of the work order type based on big data, includes the following steps:
Work order to be sorted is obtained, correlation analysis is carried out to the structured field in the work order to be sorted and obtains the first phase Relationship number;
If first related coefficient is more than preset threshold, corresponding structured field in the work order to be sorted is removed, Obtain target work order to be sorted;
The fisrt feature word for extracting the target work order to be sorted utilizes building in advance according to the fisrt feature word Model-naive Bayesian calculate separately the fisrt feature word occur under conditions of, work order type to be sorted be it is different Corresponding probability when work order type;
The type of the work order to be sorted according to each probabilistic determination.
It is further comprising the steps of in one of the embodiments, before the step of acquisition work order to be sorted:
Work order sample set is obtained, several work order samples are randomly selected from the work order sample set and form work order training Collection;
Corresponding priori is general when the type for calculating work order training sample in the work order training set is different work order type Rate;
Second feature word is extracted from the work order training sample in the work order training set, and according to the second feature The training word matrix of word building bag of words;
The item of each second feature word in different work order type conditions is obtained according to the trained word matrix Part probability;
Model-naive Bayesian is constructed according to the prior probability and the conditional probability.
The step of acquisition work order sample set in one of the embodiments, includes the following steps:
Different types of multiple work orders are obtained, correlation analysis is carried out respectively to the structured field in each work order, Obtain the second related coefficient;
If second related coefficient is more than preset threshold, it is corresponding to remove the second related coefficient described in each work order Structured field obtains work order sample set.
It is described in one of the embodiments, that naive Bayesian is constructed according to the prior probability and the conditional probability It is further comprising the steps of after the step of model:
Several work order samples are randomly selected from the work order sample set forms work order test set;
Third feature word is extracted from the work order test sample in the work order test set;
It calculates separately according to the third feature word and using the model-naive Bayesian in the third feature list Under conditions of word occurs, the work order test sample corresponding probability when being different work order types, and according to described different The corresponding probability of work order type obtains recognition result;
The classification of the model-naive Bayesian is obtained according to the type of the recognition result and the work order test sample Accuracy rate;
If the classification accuracy is lower than preset threshold, the model-naive Bayesian is adjusted.
The described the step of model-naive Bayesian is adjusted in one of the embodiments, including following step Suddenly:
It obtains the work order test sample of recognition result mistake and extracts fourth feature word;
The frequency of occurrence of corresponding feature word in the trained word matrix is adjusted according to the fourth feature word;
According to the conditional probability in naive Bayesian described in trained word adjustment of matrix adjusted.
The step of fisrt feature word for extracting the target work order to be sorted in one of the embodiments, packet Include following steps:
The segmentation of words is carried out to the target work order to be sorted using regular expression, obtains fisrt feature word.
A kind of identifying system of the work order type based on big data, including:
Correlating module carries out the structured field in the work order to be sorted for obtaining work order to be sorted Correlation analysis obtains the first related coefficient;
Target work order obtains module, if being more than preset threshold for first related coefficient, removes the work to be sorted Corresponding structured field in list obtains target work order to be sorted;
Probability obtains module, special according to described first for extracting the fisrt feature word of the target work order to be sorted Sign word is calculated separately using the model-naive Bayesian constructed in advance under conditions of the fisrt feature word occurs, wait divide Class work order type corresponding probability when being different work order types;
Work order type identification module, the type for the work order to be sorted according to each probabilistic determination.
It in one of the embodiments, further include model-naive Bayesian building module, the model-naive Bayesian structure Modeling block includes work order training set acquiring unit, prior probability acquiring unit, training matrix acquiring unit, conditional probability acquisition list Member and model construction unit;
The work order training set acquiring unit is randomly selected from the work order sample set for obtaining work order sample set Several work order samples form work order training set;
The prior probability acquiring unit, for calculating the type of work order training sample in the work order training set as difference Work order type when corresponding prior probability;
The training matrix acquiring unit, it is special for extracting second from the work order training sample in the work order training set Word is levied, and constructs the training word matrix of bag of words according to the second feature word;
The conditional probability acquiring unit, for being obtained according to the trained word matrix in different work order type conditions When each second feature word conditional probability;
The model construction unit, for constructing naive Bayesian mould according to the prior probability and the conditional probability Type.
A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing Device realizes following steps when executing the computer program:
Work order to be sorted is obtained, correlation analysis is carried out to the structured field in the work order to be sorted and obtains the first phase Relationship number;
If first related coefficient is more than preset threshold, corresponding structured field in the work order to be sorted is removed, Obtain target work order to be sorted;
The fisrt feature word for extracting the target work order to be sorted utilizes building in advance according to the fisrt feature word Model-naive Bayesian calculate separately the fisrt feature word occur under conditions of, work order type to be sorted be it is different Corresponding probability when work order type;
The type of the work order to be sorted according to each probabilistic determination.
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor Following steps are realized when row:
Work order to be sorted is obtained, correlation analysis is carried out to the structured field in the work order to be sorted and obtains the first phase Relationship number;
If first related coefficient is more than preset threshold, corresponding structured field in the work order to be sorted is removed, Obtain target work order to be sorted;
The fisrt feature word for extracting the target work order to be sorted utilizes building in advance according to the fisrt feature word Model-naive Bayesian calculate separately the fisrt feature word occur under conditions of, work order type to be sorted be it is different Corresponding probability when work order type;
The type of the work order to be sorted according to each probabilistic determination.
The recognition methods of the above-mentioned work order type based on big data, calculates equipment and storage medium at system, by work order Structured field carries out correlation analysis, strong correlation field is removed, so that work order is able to use the simple pattra leaves constructed in advance This model carries out fast and accurately type identification, avoid the bulk deposition of work order cause server storage occupancy and The reduction of operation system operational efficiency.
Detailed description of the invention
Fig. 1 is the flow chart of the recognition methods of the work order type based on big data in one embodiment of the invention;
Fig. 2 is the flow chart that model-naive Bayesian is constructed in one embodiment of the invention;
Fig. 3 is the flow chart that model-naive Bayesian is tested in one embodiment of the invention;
Fig. 4 is the structural schematic diagram of the identifying system of the work order type based on big data in one embodiment of the invention;
Fig. 5 is the structural schematic diagram of the identifying system of the work order type based on big data in another embodiment of the present invention;
Fig. 6 is the structural schematic diagram that model-naive Bayesian constructs module in one embodiment of the invention;
Fig. 7 is the structural schematic diagram of model-naive Bayesian test module in one embodiment of the invention.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not intended to limit the scope of the invention.
It is the process of the recognition methods of the work order type based on big data in one embodiment of the invention referring to Fig. 1, Fig. 1 Scheme, the recognition methods of the work order type based on big data, includes the following steps in the embodiment:
Step S110:Work order to be sorted is obtained, the structured field treated in classification work order carries out correlation analysis acquisition First related coefficient.
In this step, what work order to be sorted can be obtained directly from operation system, it can also be obtained by scanning paper document It takes, includes structural data and unstructured data in usual work order to be sorted, the design of structured field is with certain Redundancy, field is various and field between there are certain correlations.
After obtaining work order to be sorted, correlation analysis is carried out for structured fields multiple in work order to be sorted, is obtained Obtain the related coefficient between structured field.
Step S120:If the first related coefficient is more than preset threshold, corresponding structured field in work order to be sorted is removed, Obtain target work order to be sorted.
In order to improve information specific gravity contained by structuring in work order to be sorted and reduce relevant calculation amount, pass through correlation Property analysis obtain structured field between relative coefficient, when the related coefficient between structured field be more than preset correlation Coefficient threshold is then determined as strong correlation field with the associated structured field of related coefficient;By strong correlation field from wait divide It is removed in class work order, obtains target work order to be sorted.
Step S130:The fisrt feature word for extracting target work order to be sorted utilizes preparatory structure according to fisrt feature word The model-naive Bayesian built calculates separately under conditions of fisrt feature word occurs, and work order type to be sorted is different work Corresponding probability when single type.
Step S140:According to the type of each probabilistic determination work order to be sorted.
For the model-naive Bayesian constructed in advance, the word extracted from target work order to be sorted is defeated as characteristic parameter Entering into model-naive Bayesian, the model-naive Bayesian constructed in advance calculates output under conditions of these words occur, Corresponding probability when work order type to be sorted is different work order type, wherein work order type corresponding to probability value maximum is The work order type of final work order to be sorted.
The recognition methods of the above-mentioned work order type based on big data, by treating classification work order using model-naive Bayesian Work order type identified, by work order identification be converted into a classification problem, can fast and accurately identify work order type, To help enterprise solves the problems, such as batch work order identification, avoid the bulk deposition of work order cause the occupancy of server storage with And the reduction of operation system operational efficiency, it improves work efficiency, and higher human cost need not be expended.
The step of extracting the fisrt feature word of target work order to be sorted in one of the embodiments, including following step Suddenly:
Class object work order is treated using regular expression and carries out the segmentation of words, obtains fisrt feature word.
In the present embodiment, the character string in target work order to be sorted is carried out by character division by regular expression, is extracted Word in target work order to be sorted is input in model-naive Bayesian as characteristic parameter, is subsequent work order type identification Basis is provided.
Class object work order is treated using regular expression " w* " in one of the embodiments, and carries out word division, is obtained Fisrt feature word is taken, avoids the character string of target work order to be sorted from excessively being segmented, forms meaningless word, influence work order type Recognition result.For example, using regular representation " w* " " Dr.li " is divided into a word, without being divided into " Dr " and " Li " Two words.
Referring to fig. 2, Fig. 2 is the flow chart that model-naive Bayesian is constructed in one embodiment of the invention.In the present embodiment In, it is further comprising the steps of before the step of obtaining work order to be sorted:
Step S210:Work order sample set is obtained, several work order samples are randomly selected from work order sample set and form work order Training set.
Step S220:The type for calculating work order training sample in work order training set corresponding elder generation when being different work order types Test probability.
Step S230:Second feature word is extracted from the work order training sample in work order training set, and according to the second spy Levy the training word matrix of word building bag of words.
Step S240:The item of each second feature word in different work order type conditions is obtained according to training word matrix Part probability.
Step S250:Model-naive Bayesian is constructed according to prior probability and conditional probability.
It is more comprising each work order type in work order sample set specifically, the in store work order sample set in system directory A work order sample randomly selects multiple work order samples from work order sample set and forms work order training set, and obtains work order training Corresponding prior probability P (the C when type of work order sample being concentrated to be different work order types1)、P(C2)、…、P(Ci), wherein Ci It is work order type.Extract the word w in work order training set in all work order samplesi, all switch to small letter, removal repeats, and obtains Word list counts each word w in word listiThe number of middle appearance, and generate the training word matrix of bag of words.For example, Existing multiple work order training sets, the word in two of them work order training sample are all converted to the later concrete form of small letter such as Under:
Work order 1:baby eat apple?eat!
Work order 2,:Say good bye, baby.
Training word matrix according to the bag of words of the two work orders generation is as follows:
able apple Baby bye eat good say
1 vocabulary vector of work order 0 1 1 0 2 0 0
2 vocabulary vector of work order 0 0 1 1 0 1 1
After obtaining training word matrix, each word is calculated in different work order type conditions according to training word matrix Conditional probability P (the w of each feature wordi|Ci), according to P (C1)、P(C2)、…、P(Ci) and P (wi|Ci) building naive Bayesian Model.
Prior probability and conditional probability parameter are obtained by work order training set, naive Bayesian mould is constructed according to parameter Type, when later use model-naive Bayesian carries out work order type identification, the recognition result of acquisition is more accurate.
The step of obtaining work order sample set in one of the embodiments, includes the following steps:
Different types of multiple work orders are obtained, correlation analysis is carried out respectively to the structured field in each work order, are obtained Second related coefficient;If the second related coefficient is more than preset threshold, the corresponding structuring of the second related coefficient in each work order is removed Field obtains work order sample set.
In the present embodiment, acquisition has already passed through multiple work orders of manual identified work order type, at machine learning algorithm Reason is unstructured data, and the work order collected includes structural data, for the structure of the work order collected Change field and carry out correlation analysis, the field of strong correlation is removed, work order sample set is obtained.
It is the flow chart that model-naive Bayesian is tested in one embodiment of the invention referring to Fig. 3, Fig. 3.In the present embodiment, It is further comprising the steps of after the step of prior probability and conditional probability building model-naive Bayesian:
Step S310:Several work order samples are randomly selected from work order sample set forms work order test set;
Step S320:Third feature word is extracted from the work order test sample in work order test set;
Step S330:It calculates separately to go out in third feature word according to third feature word and using model-naive Bayesian Under conditions of existing, work order test sample corresponding probability when being different work order types, and it is corresponding according to different work order types Probability obtain recognition result;
Step S340:The identification for obtaining model-naive Bayesian according to the type of classification results and work order test sample is accurate Rate;
Step S350:If classification accuracy rate is lower than preset threshold, model-naive Bayesian is adjusted.
Above-mentioned model-naive Bayesian test process, by extracting the feature list in work order test set in work order test sample Word, calculating acquisition work order test sample according to the feature word of work order test sample using the model-naive Bayesian built is Corresponding probability P (C when different work order typei| w), wherein w is the term vector of work order test sample, and the maximum institute of probability value is right The work order type answered is final classification results, if the work order Type-Inconsistencies of classification results and work order test sample, work order Type identification mistake;Work order test samples all in work order test set are tested, the identification of model-naive Bayesian is obtained Accuracy rate, if classification accuracy is lower than preset threshold, model-naive Bayesian is adjusted, to guarantee model-naive Bayesian To the recognition performance of work order type.
The step of being adjusted in one of the embodiments, to model-naive Bayesian includes the following steps:
It obtains the work order test sample of recognition result mistake and extracts fourth feature word;It is adjusted according to fourth feature word The frequency of occurrence of corresponding feature word in training word matrix;According to trained word adjustment of matrix naive Bayesian adjusted In conditional probability.
The work order of recognition result mistake is obtained, and extracts the feature word in the work order, by adjusting these words in word The frequency in the training word matrix of bag model, so that their weight is mutually adjusted, according to trained word matrix weight adjusted The new conditional probability P (C for obtaining each feature word in different work order type conditionsi| w), to realize to naive Bayesian Model is adjusted, and improves model-naive Bayesian to the recognition performance of work order type.
Further, in actual use, it when work order type identification, classification error, is returned in next link, Piao Plain Bayesian model adjusts bag of words using the feature word in the work order of the recognition result mistake, realizes self-teaching, improves Piao The self-learning capability and adaptivity of plain Bayesian model, make it have stronger practicability.
In order to be more clear technical solution of the present invention, below by taking the identification of the work order type of telecommunications work order as an example, to this hair The recognition methods of the work order type based on big data of bright embodiment is to be further described:
(1) work order sample set is obtained
The electronic edition telecommunications work order in papery version telecommunications work order or operation system is obtained, papery version work order is scanned The text information of work order is obtained, or Word Input is carried out to the telecommunications work order in operation system, the electricity of Word Input will be passed through Believe that work order generates telecommunications work order sample.
Multiple telecommunications work order samples of low price Value Types and non-low price Value Types are respectively placed in low value, not In two subdirectories of low value, each subdirectory respectively has 25 work orders, and name is all 1.txt, 2.txt, 3.txt, ..., 25.txt.Wherein low value's is low value work order sample, and not low value is non-low value work order sample.
(2) model-naive Bayesian is constructed
From this 50 work order samples, 40 work orders are randomly selected as work order training set, extract institute in work order training set There is the word w in work order samplei, all switch to small letter, removal repeats, and obtains word list, counts each word w in word listiIn The number of appearance, and generate the training word matrix TrainMat and and corresponding work order training sample type of bag of words TrainCat calls training function to training word matrix TrainMat and and corresponding work order training sample type TrainCat is handled, and is concentrated to obtain the work order training sample that work order type is low value work order in work order training sample Prior probability P (C1) and each word each feature word in different work order type conditions conditional probability P (wi|Ci), Including p0Vec and p1Vec.Wherein, training function is as follows
Void Train (const TIntmat&TrainMat --- training matrix;
Const TIntVec&TrainCat --- the classification of corresponding work order;
double&pc1--- return to the ratio that work order in matrix belongs to c1;
TDblVec&p0Vec——P(w0|C0)、P(w1|C0) ... array;
TDblVec&p1Vec——P(w0|C1)、P(w1|C1) ... array)
Wherein, work order sum=TrainMat line number, vocabulary word number=TrainMat columns are enabled;Enable pc1=at a low price Work order accounts for the ratio of work order sum when value work order, i.e. value is work order quantity corresponding to 1 divided by work order sum in TrainCat.
Calculate p0Vec, p1During Vec, p0Molecule=new [vocabulary number] and p1Molecule=new [vocabulary Number] it is initialized to 1, p0Denominator=p1Denominator=2;Its specific function is as follows:
For (i=0;I < work order sum;i++){
If (TrainCat [i]==0)
The i-th row of TrainMat is added to p0Molecule.
The word word number occurred in the i-th row of TrainMat is added to p0Denominator
(similar p0Denominator +=Sum (TrainMat [i]))
}else{
It is like above, but 1. are all changed to 0 }
p0Vec=ln (p0Molecule/p0Denominator)
p1Vec=ln (p1Molecule/p1Denominator)
}
(2) model-naive Bayesian is tested
It is work order test set TestDoc using remaining 10 work orders, differentiates whether doc belongs to Classify function Low value work order.
for each doc in TestDoc{
Classify (the corresponding vocabulary vector of doc, pc1、p0Vec and p1Vec)
Record sentences error rate
}
(3) identification of telecommunications work order type
Type identification is carried out to telecommunications work order to be sorted using classification decision function, the function is specific as follows:
bool classify(const TDblVec&w,const TDblVec&p0Vec,const TDblVec&p1Vec, double pc1)
P0=sum (w*p0Vec)+ln(1-pc1)
P1=sum (w*p1Vec)+ln(pc1)
Return(p1>p0)
According to the recognition methods of the above-mentioned work order type based on big data, the present invention also provides a kind of works based on big data The identifying system of single type, the embodiment of the identifying system of the work order type with regard to of the invention based on big data carries out detailed below Explanation.
Referring to fig. 4, Fig. 4 is that the structure of the identifying system of the work order type based on big data in one embodiment of the invention is shown It is intended to.In the present embodiment, the identifying system of the work order type based on big data, including:
Correlating module 410, for obtaining work order to be sorted, the structured field treated in classification work order carries out phase The analysis of closing property obtains the first related coefficient;
Target work order obtains module 420, if being more than preset threshold for the first related coefficient, removes phase in work order to be sorted The structured field answered obtains target work order to be sorted;
Probability obtains module 430, for extracting the fisrt feature word of target work order to be sorted, according to fisrt feature word It is calculated separately under conditions of fisrt feature word occurs using the model-naive Bayesian constructed in advance, work order type to be sorted Corresponding probability when for different work order types;
Work order type identification module 440, the type for the work order to be sorted according to each probabilistic determination.
The identifying system of the above-mentioned work order type based on big data by work order structured field carry out correlation analysis, Strong correlation field is removed, so that work order is able to use the model-naive Bayesian constructed in advance and carries out fast and accurately type Identification, avoids the bulk deposition of work order from causing the reduction of the occupancy and operation system operational efficiency of server storage.
It is the structure of the identifying system of the work order type based on big data in another embodiment of the present invention referring to Fig. 5, Fig. 5 Schematic diagram;In the present embodiment, the identifying system of the work order type based on big data further includes model-naive Bayesian building mould Block 450 and model-naive Bayesian test module 460.
It is the structural schematic diagram that model-naive Bayesian constructs module in one embodiment of the invention referring to Fig. 6, Fig. 6;Piao Plain Bayesian model building module 450 includes work order training set acquiring unit 451, prior probability acquiring unit 452, training matrix Acquiring unit 453, conditional probability acquiring unit 454 and model construction unit 455.
Work order training set acquiring unit 451 randomly selects several from work order sample set for obtaining work order sample set Work order sample forms work order training set;
Prior probability acquiring unit 452, the type for calculating work order training sample in work order training set are different works Corresponding prior probability when single type;
Training matrix acquiring unit 453, for extracting second feature list from the work order training sample in work order training set Word, and according to the training word matrix of second feature word building bag of words;
Conditional probability acquiring unit 454, it is each in different work order type conditions for being obtained according to training word matrix The conditional probability of second feature word;
Model construction unit 455, for constructing model-naive Bayesian according to prior probability and conditional probability.
Work order training set acquiring unit 451 obtains different types of multiple work orders in one of the embodiments, to each work Structured field in list carries out correlation analysis respectively, obtains the second related coefficient;If the second related coefficient is more than default threshold Value removes the corresponding structured field of the second related coefficient in each work order, obtains work order sample set.
It is the structural schematic diagram of model-naive Bayesian test module in one embodiment of the invention referring to Fig. 7, Fig. 7;? In the present embodiment, model-naive Bayesian test module 460 includes work order test set acquiring unit 461, feature extraction unit 462, recognition result acquiring unit 463, accuracy rate acquiring unit 464 and model-naive Bayesian adjustment unit 465;
Work order test set acquiring unit 461 forms work for randomly selecting several work order samples from work order sample set Single test set;
Feature extraction unit 462, for extracting third feature word from the work order test sample in work order test set;
Recognition result acquiring unit 463, for being calculated separately according to third feature word and using model-naive Bayesian Under conditions of third feature word occurs, work order test sample corresponding probability when being different work order types, and according to not The corresponding probability of same work order type obtains recognition result;
Accuracy rate acquiring unit 464, for obtaining naive Bayesian according to the type of recognition result and work order test sample The classification accuracy of model;
Model-naive Bayesian adjustment unit 465, if being lower than preset threshold for classification accuracy, to naive Bayesian Model is adjusted.
The work order that model-naive Bayesian adjustment unit 465 obtains recognition result mistake in one of the embodiments, is surveyed Sample sheet simultaneously extracts fourth feature word;According in fourth feature word adjusting training word matrix corresponding feature word go out Occurrence number;According to the conditional probability in trained word adjustment of matrix naive Bayesian adjusted.
Probability obtains module 430 and treats the progress of class object work order using regular expression in one of the embodiments, The segmentation of words obtains fisrt feature word.
The identifying system of work order type based on big data of the invention and the work order type of the invention based on big data Recognition methods correspond, the above-mentioned work order type based on big data recognition methods embodiment illustrate technical characteristic And its advantages are suitable for the embodiment of the identifying system of the work order type based on big data, hereby give notice that.
In one embodiment, a kind of computer equipment is also provided, which includes memory, processor and deposit Store up the computer program that can be run on a memory and on a processor, wherein processor realizes following steps when executing program:
Work order to be sorted is obtained, the structured field in classification work order is treated and carries out correlation analysis the first phase relation of acquisition Number;
If the first related coefficient is more than preset threshold, corresponding structured field in work order to be sorted is removed, is obtained wait divide Class target work order;
The fisrt feature word for extracting target work order to be sorted utilizes the simple shellfish constructed in advance according to fisrt feature word This model of leaf calculate separately fisrt feature word occur under conditions of, work order type to be sorted be different work order types when pair The probability answered;
According to the type of each probabilistic determination work order to be sorted.
Following steps are also realized when processor executes program in one of the embodiments,:
Work order sample set is obtained, several work order samples are randomly selected from work order sample set and form work order training set;Meter The type for calculating work order training sample in work order training set corresponding prior probability when being different work order types;From work order training set In work order training sample in extract second feature word, and according to second feature word construct bag of words training word square Battle array;The conditional probability of each special second sign word in different work order type conditions is obtained according to training word matrix;According to elder generation Test probability and conditional probability building model-naive Bayesian.
Following steps are also realized when processor executes program in one of the embodiments,:
Different types of multiple work orders are obtained, correlation analysis is carried out respectively to the structured field in each work order, are obtained Second related coefficient;If the second related coefficient is more than preset threshold, the corresponding structuring of the second related coefficient in each work order is removed Field obtains work order sample set.
Following steps are also realized when processor executes program in one of the embodiments,:
Several work order samples are randomly selected from work order sample set forms work order test set;Work from work order test set Third feature word is extracted in single test sample;It calculates separately according to third feature word and using model-naive Bayesian Under conditions of three feature words occur, work order test sample corresponding probability when being different work order types, and according to different The corresponding probability of work order type obtains recognition result;Naive Bayesian is obtained according to the type of recognition result and work order test sample The classification accuracy of model;If classification accuracy is lower than preset threshold, model-naive Bayesian is adjusted.
Following steps are also realized when processor executes program in one of the embodiments,:Obtain recognition result mistake Work order test sample simultaneously extracts fourth feature word;According to corresponding feature list in fourth feature word adjusting training word matrix The frequency of occurrence of word;According to the conditional probability in trained word adjustment of matrix naive Bayesian adjusted.
Following steps are also realized when processor executes program in one of the embodiments,:
Class object work order is treated using regular expression and carries out the segmentation of words, obtains fisrt feature word.
The computer equipment, when processor executes program, by realizing such as any one base in the various embodiments described above Strong correlation is removed by carrying out correlation analysis to work order structured field in the recognition methods of the work order type of big data Field avoids work order so that work order is able to use the model-naive Bayesian constructed in advance and carries out fast and accurately type identification Bulk deposition cause server storage occupancy and operation system operational efficiency reduction.
In addition, those of ordinary skill in the art will appreciate that realize above-described embodiment method in all or part of the process, It is that relevant hardware can be instructed to complete by computer program, it is non-volatile computer-readable that program can be stored in one It takes in storage medium, in the embodiment of the present invention, which be can be stored in the storage medium of computer system, and by the calculating At least one processor in machine system executes, and includes the recognition methods such as the above-mentioned respectively work order type based on big data with realization Embodiment process.
In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program realizes following steps when being executed by processor:
Work order to be sorted is obtained, the structured field in classification work order is treated and carries out correlation analysis the first phase relation of acquisition Number;
If the first related coefficient is more than preset threshold, corresponding structured field in work order to be sorted is removed, is obtained wait divide Class target work order;
The fisrt feature word for extracting target work order to be sorted utilizes the simple shellfish constructed in advance according to fisrt feature word This model of leaf calculate separately fisrt feature word occur under conditions of, work order type to be sorted be different work order types when pair The probability answered;
According to the type of each probabilistic determination work order to be sorted.
Following steps are also realized when processor executes program in one of the embodiments,:
Work order sample set is obtained, several work order samples are randomly selected from work order sample set and form work order training set;Meter The type for calculating work order training sample in work order training set corresponding prior probability when being different work order types;From work order training set In work order training sample in extract second feature word, and according to second feature word construct bag of words training word square Battle array;The conditional probability of feature word is stated for each second in different work order type conditions according to training word matrix acquisition;Root Model-naive Bayesian is constructed according to prior probability and conditional probability.
Following steps are also realized when processor executes program in one of the embodiments,:
Different types of multiple work orders are obtained, correlation analysis is carried out respectively to the structured field in each work order, are obtained Second related coefficient;If the second related coefficient is more than preset threshold, the corresponding structuring of the second related coefficient in each work order is removed Field obtains work order sample set.
Following steps are also realized when processor executes program in one of the embodiments,:
Several work order samples are randomly selected from work order sample set forms work order test set;Work from work order test set Third feature word is extracted in single test sample;It calculates separately according to third feature word and using model-naive Bayesian Under conditions of three feature words occur, work order test sample corresponding probability when being different work order types, and according to different The corresponding probability of work order type obtains recognition result;Naive Bayesian is obtained according to the type of recognition result and work order test sample The classification accuracy of model;If classification accuracy is lower than preset threshold, model-naive Bayesian is adjusted.
Following steps are also realized when processor executes program in one of the embodiments,:Obtain recognition result mistake Work order test sample simultaneously extracts fourth feature word;According to corresponding feature list in fourth feature word adjusting training word matrix The frequency of occurrence of word;According to the conditional probability in trained word adjustment of matrix naive Bayesian adjusted.
Following steps are also realized when processor executes program in one of the embodiments,:
Class object work order is treated using regular expression and carries out the segmentation of words, obtains fisrt feature word.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, To any reference of memory, storage, database or other media used in each embodiment provided herein, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned reality It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, all should be considered as described in this specification.
The embodiments described above only express several embodiments of the present invention, and the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to protection of the invention Range.Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.

Claims (10)

1. a kind of recognition methods of the work order type based on big data, which is characterized in that include the following steps:
Work order to be sorted is obtained, correlation analysis is carried out to the structured field in the work order to be sorted and obtains the first phase relation Number;
If first related coefficient is more than preset threshold, corresponding structured field in the work order to be sorted is removed, is obtained Target work order to be sorted;
The fisrt feature word for extracting the target work order to be sorted utilizes the Piao constructed in advance according to the fisrt feature word Plain Bayesian model calculates separately under conditions of the fisrt feature word occurs, and work order type to be sorted is different work order Corresponding probability when type;
The type of the work order to be sorted according to each probabilistic determination.
2. the recognition methods of the work order type according to claim 1 based on big data, which is characterized in that it is described obtain to It is further comprising the steps of before the step of work order of classifying:
Work order sample set is obtained, several work order samples are randomly selected from the work order sample set and form work order training set;
The type for calculating work order training sample in the work order training set corresponding prior probability when being different work order types;
Second feature word is extracted from the work order training sample in the work order training set, and according to the second feature word Construct the training word matrix of bag of words;
It is general that the condition of each second feature word in different work order type conditions is obtained according to the trained word matrix Rate;
Model-naive Bayesian is constructed according to the prior probability and the conditional probability.
3. the recognition methods of the work order type according to claim 2 based on big data, which is characterized in that the acquisition work The step of single sample set, includes the following steps:
Different types of multiple work orders are obtained, correlation analysis is carried out respectively to the structured field in each work order, are obtained Second related coefficient;
If second related coefficient is more than preset threshold, the corresponding structure of the second related coefficient described in each work order is removed Change field, obtains work order sample set.
4. the recognition methods of the work order type according to claim 3 based on big data, which is characterized in that described according to institute It is further comprising the steps of after the step of stating prior probability and conditional probability building model-naive Bayesian:
Several work order samples are randomly selected from the work order sample set forms work order test set;
Third feature word is extracted from the work order test sample in the work order test set;
It calculates separately to go out in the third feature word according to the third feature word and using the model-naive Bayesian Under conditions of existing, the work order test sample corresponding probability when being different work order types, and according to the different work order The corresponding probability of type obtains recognition result;
The classification for obtaining the model-naive Bayesian according to the type of the recognition result and the work order test sample is accurate Rate;
If the classification accuracy is lower than preset threshold, the model-naive Bayesian is adjusted.
5. the recognition methods of the work order type according to claim 4 based on big data, which is characterized in that described to described The step of model-naive Bayesian is adjusted includes the following steps:
It obtains the work order test sample of recognition result mistake and extracts fourth feature word;
The frequency of occurrence of corresponding feature word in the trained word matrix is adjusted according to the fourth feature word;
According to the conditional probability in naive Bayesian described in trained word adjustment of matrix adjusted.
6. the recognition methods of the work order type according to claim 1 based on big data, which is characterized in that the extraction institute The step of stating the fisrt feature word of target work order to be sorted, includes the following steps:
The segmentation of words is carried out to the target work order to be sorted using regular expression, obtains fisrt feature word.
7. a kind of identifying system of the work order type based on big data, which is characterized in that including:
Correlating module carries out the structured field in the work order to be sorted related for obtaining work order to be sorted Property analysis obtain the first related coefficient;
Target work order obtains module, if being more than preset threshold for first related coefficient, removes in the work order to be sorted Corresponding structured field obtains target work order to be sorted;
Probability obtains module, for extracting the fisrt feature word of the target work order to be sorted, according to the fisrt feature list Word is calculated separately under conditions of the fisrt feature word occurs using the model-naive Bayesian constructed in advance, work to be sorted Single type corresponding probability when being different work order types;
Work order type identification module, the type for the work order to be sorted according to each probabilistic determination.
8. the identifying system of the work order type according to claim 7 based on big data, which is characterized in that further include simplicity Bayesian model constructs module, and the model-naive Bayesian building module includes work order training set acquiring unit, prior probability Acquiring unit, training matrix acquiring unit, conditional probability acquiring unit and model construction unit;
The work order training set acquiring unit randomly selects several for obtaining work order sample set from the work order sample set A work order sample forms work order training set;
The prior probability acquiring unit, the type for calculating work order training sample in the work order training set are different works Corresponding prior probability when single type;
The training matrix acquiring unit, for extracting second feature list from the work order training sample in the work order training set Word, and according to the training word matrix of second feature word building bag of words;
The conditional probability acquiring unit, it is each in different work order type conditions for being obtained according to the trained word matrix The conditional probability of the second feature word;
The model construction unit, for constructing model-naive Bayesian according to the prior probability and the conditional probability.
9. a kind of computer equipment, including memory, processor and it is stored on the memory and can be in the processor The computer program of upper operation, which is characterized in that the processor realized when executing the computer program as claim 1 to The recognition methods of work order type described in 6 any one based on big data.
10. a kind of computer storage medium, is stored thereon with computer program, which is characterized in that the program is executed by processor The recognition methods of work order type based on big data of the Shi Shixian as described in claim 1 to 6 any one.
CN201810427330.7A 2018-05-07 2018-05-07 Big data-based work order type identification method and system and computing device Active CN108897754B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810427330.7A CN108897754B (en) 2018-05-07 2018-05-07 Big data-based work order type identification method and system and computing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810427330.7A CN108897754B (en) 2018-05-07 2018-05-07 Big data-based work order type identification method and system and computing device

Publications (2)

Publication Number Publication Date
CN108897754A true CN108897754A (en) 2018-11-27
CN108897754B CN108897754B (en) 2020-12-11

Family

ID=64342619

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810427330.7A Active CN108897754B (en) 2018-05-07 2018-05-07 Big data-based work order type identification method and system and computing device

Country Status (1)

Country Link
CN (1) CN108897754B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147980A (en) * 2019-04-03 2019-08-20 口碑(上海)信息技术有限公司 Worksheet method and device
CN110417748A (en) * 2019-07-08 2019-11-05 新华三信息安全技术有限公司 A kind of attack detection method and device
CN111325422A (en) * 2018-12-14 2020-06-23 中国移动通信集团河南有限公司 Work order distribution method and system
CN111382068A (en) * 2020-02-29 2020-07-07 中国平安人寿保险股份有限公司 Hierarchical testing method and device for mass data
CN111797942A (en) * 2020-07-23 2020-10-20 深圳壹账通智能科技有限公司 User information classification method and device, computer equipment and storage medium
CN113177151A (en) * 2021-05-28 2021-07-27 中山世达模型制造有限公司 Potential customer screening method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101814083A (en) * 2010-01-08 2010-08-25 上海复歌信息科技有限公司 Automatic webpage classification method and system
CN104021302A (en) * 2014-06-18 2014-09-03 北京邮电大学 Auxiliary registration method based on Bayes text classification model
CN106445994A (en) * 2016-07-13 2017-02-22 广州精点计算机科技有限公司 Mixed algorithm-based web page classification method and apparatus
CN106844632A (en) * 2017-01-20 2017-06-13 清华大学 Based on the product review sensibility classification method and device that improve SVMs

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101814083A (en) * 2010-01-08 2010-08-25 上海复歌信息科技有限公司 Automatic webpage classification method and system
CN104021302A (en) * 2014-06-18 2014-09-03 北京邮电大学 Auxiliary registration method based on Bayes text classification model
CN106445994A (en) * 2016-07-13 2017-02-22 广州精点计算机科技有限公司 Mixed algorithm-based web page classification method and apparatus
CN106844632A (en) * 2017-01-20 2017-06-13 清华大学 Based on the product review sensibility classification method and device that improve SVMs

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
凤丽洲: "《文本分类关键技术及应用研究》", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325422A (en) * 2018-12-14 2020-06-23 中国移动通信集团河南有限公司 Work order distribution method and system
CN111325422B (en) * 2018-12-14 2023-10-27 中国移动通信集团河南有限公司 Work order dispatching method and system
CN110147980A (en) * 2019-04-03 2019-08-20 口碑(上海)信息技术有限公司 Worksheet method and device
CN110417748A (en) * 2019-07-08 2019-11-05 新华三信息安全技术有限公司 A kind of attack detection method and device
CN111382068A (en) * 2020-02-29 2020-07-07 中国平安人寿保险股份有限公司 Hierarchical testing method and device for mass data
CN111382068B (en) * 2020-02-29 2024-04-09 中国平安人寿保险股份有限公司 Hierarchical testing method and device for large-batch data
CN111797942A (en) * 2020-07-23 2020-10-20 深圳壹账通智能科技有限公司 User information classification method and device, computer equipment and storage medium
CN113177151A (en) * 2021-05-28 2021-07-27 中山世达模型制造有限公司 Potential customer screening method

Also Published As

Publication number Publication date
CN108897754B (en) 2020-12-11

Similar Documents

Publication Publication Date Title
CN108897754A (en) Recognition methods, system and the calculating equipment of work order type based on big data
CN108073568B (en) Keyword extraction method and device
CN104866558B (en) A kind of social networks account mapping model training method and mapping method and system
CN111914099B (en) Intelligent question-answering method, system, device and medium of traffic optimization strategy
CN108519971B (en) Cross-language news topic similarity comparison method based on parallel corpus
CN106339495A (en) Topic detection method and system based on hierarchical incremental clustering
CN102270212A (en) User interest feature extraction method based on hidden semi-Markov model
CN108171243B (en) Medical image information identification method and system based on deep neural network
CN110990529B (en) Industry detail dividing method and system for enterprises
JP2020512651A (en) Search method, device, and non-transitory computer-readable storage medium
CN109710725A (en) A kind of Chinese table column label restoration methods and system based on text classification
CN109255029A (en) A method of automatic Bug report distribution is enhanced using weighted optimization training set
CN106528527A (en) Identification method and identification system for out of vocabularies
CN114612251A (en) Risk assessment method, device, equipment and storage medium
CN116304035A (en) Multi-notice multi-crime name relation extraction method and device in complex case
CN111767390A (en) Skill word evaluation method and device, electronic equipment and computer readable medium
CN109582743B (en) Data mining system for terrorist attack event
CN111930944B (en) File label classification method and device
US20230206676A1 (en) Systems and Methods for Generating Document Numerical Representations
US20210117448A1 (en) Iterative sampling based dataset clustering
CN110941703A (en) Integrated resume information extraction method based on machine learning and fuzzy rules
CN112506930B (en) Data insight system based on machine learning technology
CN113221792B (en) Chapter detection model construction method, cataloguing method and related equipment
CN113343012B (en) News matching method, device, equipment and storage medium
CN109902129A (en) Insurance agent&#39;s classifying method and relevant device based on big data analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant