CN108897754A - Recognition methods, system and the calculating equipment of work order type based on big data - Google Patents
Recognition methods, system and the calculating equipment of work order type based on big data Download PDFInfo
- Publication number
- CN108897754A CN108897754A CN201810427330.7A CN201810427330A CN108897754A CN 108897754 A CN108897754 A CN 108897754A CN 201810427330 A CN201810427330 A CN 201810427330A CN 108897754 A CN108897754 A CN 108897754A
- Authority
- CN
- China
- Prior art keywords
- work order
- sorted
- type
- feature word
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
- G06F18/24155—Bayesian classification
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The recognition methods of the work order type based on big data that the present invention relates to a kind of, system and equipment is calculated, this method includes:Work order to be sorted is obtained, the structured field in classification work order is treated and carries out correlation analysis the first related coefficient of acquisition;If the first related coefficient is more than preset threshold, corresponding structured field in work order to be sorted is removed, target work order to be sorted is obtained;The fisrt feature word for extracting target work order to be sorted is calculated separately under conditions of fisrt feature word occurs according to fisrt feature word using the model-naive Bayesian constructed in advance, work order type to be sorted corresponding probability when being different work order types;According to the type of the different corresponding probabilistic determination of work order type work orders to be sorted.The reduction of the occupancy and operation system operational efficiency of server storage is caused using the bulk deposition that this method can be avoided work order.
Description
Technical field
The present invention relates to technical field of data processing, more particularly to a kind of identification side of work order type based on big data
Method, system and calculating equipment.
Background technique
With the continuous improvement of telecommunication network digitized degree, telecommunication service forms the work order of scale, and business work order is set
The type of meter is more and more, increasing to the identification difficulty of work order type, and the speed of manual identified work order type at this stage
The speed of business rapid growth is not caught up with, a large amount of memory spaces for occupying server of the accumulation of work order directly result in operation system
Operational efficiency reduce.
Therefore, the method for relying on traditional artificial identification work order type is not able to satisfy new period telecommunication service increased requirement, needs
New means are found to improve the working efficiency of the identification of work order type and planning accuracy.
Summary of the invention
Based on this, it is necessary to for a large amount of memory spaces for occupying server of accumulation of work order, directly result in operation system
Operational efficiency the problem of reducing, the recognition methods of work order type based on big data a kind of, system are provided and calculate equipment.
A kind of recognition methods of the work order type based on big data, includes the following steps:
Work order to be sorted is obtained, correlation analysis is carried out to the structured field in the work order to be sorted and obtains the first phase
Relationship number;
If first related coefficient is more than preset threshold, corresponding structured field in the work order to be sorted is removed,
Obtain target work order to be sorted;
The fisrt feature word for extracting the target work order to be sorted utilizes building in advance according to the fisrt feature word
Model-naive Bayesian calculate separately the fisrt feature word occur under conditions of, work order type to be sorted be it is different
Corresponding probability when work order type;
The type of the work order to be sorted according to each probabilistic determination.
It is further comprising the steps of in one of the embodiments, before the step of acquisition work order to be sorted:
Work order sample set is obtained, several work order samples are randomly selected from the work order sample set and form work order training
Collection;
Corresponding priori is general when the type for calculating work order training sample in the work order training set is different work order type
Rate;
Second feature word is extracted from the work order training sample in the work order training set, and according to the second feature
The training word matrix of word building bag of words;
The item of each second feature word in different work order type conditions is obtained according to the trained word matrix
Part probability;
Model-naive Bayesian is constructed according to the prior probability and the conditional probability.
The step of acquisition work order sample set in one of the embodiments, includes the following steps:
Different types of multiple work orders are obtained, correlation analysis is carried out respectively to the structured field in each work order,
Obtain the second related coefficient;
If second related coefficient is more than preset threshold, it is corresponding to remove the second related coefficient described in each work order
Structured field obtains work order sample set.
It is described in one of the embodiments, that naive Bayesian is constructed according to the prior probability and the conditional probability
It is further comprising the steps of after the step of model:
Several work order samples are randomly selected from the work order sample set forms work order test set;
Third feature word is extracted from the work order test sample in the work order test set;
It calculates separately according to the third feature word and using the model-naive Bayesian in the third feature list
Under conditions of word occurs, the work order test sample corresponding probability when being different work order types, and according to described different
The corresponding probability of work order type obtains recognition result;
The classification of the model-naive Bayesian is obtained according to the type of the recognition result and the work order test sample
Accuracy rate;
If the classification accuracy is lower than preset threshold, the model-naive Bayesian is adjusted.
The described the step of model-naive Bayesian is adjusted in one of the embodiments, including following step
Suddenly:
It obtains the work order test sample of recognition result mistake and extracts fourth feature word;
The frequency of occurrence of corresponding feature word in the trained word matrix is adjusted according to the fourth feature word;
According to the conditional probability in naive Bayesian described in trained word adjustment of matrix adjusted.
The step of fisrt feature word for extracting the target work order to be sorted in one of the embodiments, packet
Include following steps:
The segmentation of words is carried out to the target work order to be sorted using regular expression, obtains fisrt feature word.
A kind of identifying system of the work order type based on big data, including:
Correlating module carries out the structured field in the work order to be sorted for obtaining work order to be sorted
Correlation analysis obtains the first related coefficient;
Target work order obtains module, if being more than preset threshold for first related coefficient, removes the work to be sorted
Corresponding structured field in list obtains target work order to be sorted;
Probability obtains module, special according to described first for extracting the fisrt feature word of the target work order to be sorted
Sign word is calculated separately using the model-naive Bayesian constructed in advance under conditions of the fisrt feature word occurs, wait divide
Class work order type corresponding probability when being different work order types;
Work order type identification module, the type for the work order to be sorted according to each probabilistic determination.
It in one of the embodiments, further include model-naive Bayesian building module, the model-naive Bayesian structure
Modeling block includes work order training set acquiring unit, prior probability acquiring unit, training matrix acquiring unit, conditional probability acquisition list
Member and model construction unit;
The work order training set acquiring unit is randomly selected from the work order sample set for obtaining work order sample set
Several work order samples form work order training set;
The prior probability acquiring unit, for calculating the type of work order training sample in the work order training set as difference
Work order type when corresponding prior probability;
The training matrix acquiring unit, it is special for extracting second from the work order training sample in the work order training set
Word is levied, and constructs the training word matrix of bag of words according to the second feature word;
The conditional probability acquiring unit, for being obtained according to the trained word matrix in different work order type conditions
When each second feature word conditional probability;
The model construction unit, for constructing naive Bayesian mould according to the prior probability and the conditional probability
Type.
A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing
Device realizes following steps when executing the computer program:
Work order to be sorted is obtained, correlation analysis is carried out to the structured field in the work order to be sorted and obtains the first phase
Relationship number;
If first related coefficient is more than preset threshold, corresponding structured field in the work order to be sorted is removed,
Obtain target work order to be sorted;
The fisrt feature word for extracting the target work order to be sorted utilizes building in advance according to the fisrt feature word
Model-naive Bayesian calculate separately the fisrt feature word occur under conditions of, work order type to be sorted be it is different
Corresponding probability when work order type;
The type of the work order to be sorted according to each probabilistic determination.
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor
Following steps are realized when row:
Work order to be sorted is obtained, correlation analysis is carried out to the structured field in the work order to be sorted and obtains the first phase
Relationship number;
If first related coefficient is more than preset threshold, corresponding structured field in the work order to be sorted is removed,
Obtain target work order to be sorted;
The fisrt feature word for extracting the target work order to be sorted utilizes building in advance according to the fisrt feature word
Model-naive Bayesian calculate separately the fisrt feature word occur under conditions of, work order type to be sorted be it is different
Corresponding probability when work order type;
The type of the work order to be sorted according to each probabilistic determination.
The recognition methods of the above-mentioned work order type based on big data, calculates equipment and storage medium at system, by work order
Structured field carries out correlation analysis, strong correlation field is removed, so that work order is able to use the simple pattra leaves constructed in advance
This model carries out fast and accurately type identification, avoid the bulk deposition of work order cause server storage occupancy and
The reduction of operation system operational efficiency.
Detailed description of the invention
Fig. 1 is the flow chart of the recognition methods of the work order type based on big data in one embodiment of the invention;
Fig. 2 is the flow chart that model-naive Bayesian is constructed in one embodiment of the invention;
Fig. 3 is the flow chart that model-naive Bayesian is tested in one embodiment of the invention;
Fig. 4 is the structural schematic diagram of the identifying system of the work order type based on big data in one embodiment of the invention;
Fig. 5 is the structural schematic diagram of the identifying system of the work order type based on big data in another embodiment of the present invention;
Fig. 6 is the structural schematic diagram that model-naive Bayesian constructs module in one embodiment of the invention;
Fig. 7 is the structural schematic diagram of model-naive Bayesian test module in one embodiment of the invention.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not intended to limit the scope of the invention.
It is the process of the recognition methods of the work order type based on big data in one embodiment of the invention referring to Fig. 1, Fig. 1
Scheme, the recognition methods of the work order type based on big data, includes the following steps in the embodiment:
Step S110:Work order to be sorted is obtained, the structured field treated in classification work order carries out correlation analysis acquisition
First related coefficient.
In this step, what work order to be sorted can be obtained directly from operation system, it can also be obtained by scanning paper document
It takes, includes structural data and unstructured data in usual work order to be sorted, the design of structured field is with certain
Redundancy, field is various and field between there are certain correlations.
After obtaining work order to be sorted, correlation analysis is carried out for structured fields multiple in work order to be sorted, is obtained
Obtain the related coefficient between structured field.
Step S120:If the first related coefficient is more than preset threshold, corresponding structured field in work order to be sorted is removed,
Obtain target work order to be sorted.
In order to improve information specific gravity contained by structuring in work order to be sorted and reduce relevant calculation amount, pass through correlation
Property analysis obtain structured field between relative coefficient, when the related coefficient between structured field be more than preset correlation
Coefficient threshold is then determined as strong correlation field with the associated structured field of related coefficient;By strong correlation field from wait divide
It is removed in class work order, obtains target work order to be sorted.
Step S130:The fisrt feature word for extracting target work order to be sorted utilizes preparatory structure according to fisrt feature word
The model-naive Bayesian built calculates separately under conditions of fisrt feature word occurs, and work order type to be sorted is different work
Corresponding probability when single type.
Step S140:According to the type of each probabilistic determination work order to be sorted.
For the model-naive Bayesian constructed in advance, the word extracted from target work order to be sorted is defeated as characteristic parameter
Entering into model-naive Bayesian, the model-naive Bayesian constructed in advance calculates output under conditions of these words occur,
Corresponding probability when work order type to be sorted is different work order type, wherein work order type corresponding to probability value maximum is
The work order type of final work order to be sorted.
The recognition methods of the above-mentioned work order type based on big data, by treating classification work order using model-naive Bayesian
Work order type identified, by work order identification be converted into a classification problem, can fast and accurately identify work order type,
To help enterprise solves the problems, such as batch work order identification, avoid the bulk deposition of work order cause the occupancy of server storage with
And the reduction of operation system operational efficiency, it improves work efficiency, and higher human cost need not be expended.
The step of extracting the fisrt feature word of target work order to be sorted in one of the embodiments, including following step
Suddenly:
Class object work order is treated using regular expression and carries out the segmentation of words, obtains fisrt feature word.
In the present embodiment, the character string in target work order to be sorted is carried out by character division by regular expression, is extracted
Word in target work order to be sorted is input in model-naive Bayesian as characteristic parameter, is subsequent work order type identification
Basis is provided.
Class object work order is treated using regular expression " w* " in one of the embodiments, and carries out word division, is obtained
Fisrt feature word is taken, avoids the character string of target work order to be sorted from excessively being segmented, forms meaningless word, influence work order type
Recognition result.For example, using regular representation " w* " " Dr.li " is divided into a word, without being divided into " Dr " and " Li "
Two words.
Referring to fig. 2, Fig. 2 is the flow chart that model-naive Bayesian is constructed in one embodiment of the invention.In the present embodiment
In, it is further comprising the steps of before the step of obtaining work order to be sorted:
Step S210:Work order sample set is obtained, several work order samples are randomly selected from work order sample set and form work order
Training set.
Step S220:The type for calculating work order training sample in work order training set corresponding elder generation when being different work order types
Test probability.
Step S230:Second feature word is extracted from the work order training sample in work order training set, and according to the second spy
Levy the training word matrix of word building bag of words.
Step S240:The item of each second feature word in different work order type conditions is obtained according to training word matrix
Part probability.
Step S250:Model-naive Bayesian is constructed according to prior probability and conditional probability.
It is more comprising each work order type in work order sample set specifically, the in store work order sample set in system directory
A work order sample randomly selects multiple work order samples from work order sample set and forms work order training set, and obtains work order training
Corresponding prior probability P (the C when type of work order sample being concentrated to be different work order types1)、P(C2)、…、P(Ci), wherein Ci
It is work order type.Extract the word w in work order training set in all work order samplesi, all switch to small letter, removal repeats, and obtains
Word list counts each word w in word listiThe number of middle appearance, and generate the training word matrix of bag of words.For example,
Existing multiple work order training sets, the word in two of them work order training sample are all converted to the later concrete form of small letter such as
Under:
Work order 1:baby eat apple?eat!
Work order 2,:Say good bye, baby.
Training word matrix according to the bag of words of the two work orders generation is as follows:
able | apple | Baby | bye | eat | good | say | |
1 vocabulary vector of work order | 0 | 1 | 1 | 0 | 2 | 0 | 0 |
2 vocabulary vector of work order | 0 | 0 | 1 | 1 | 0 | 1 | 1 |
After obtaining training word matrix, each word is calculated in different work order type conditions according to training word matrix
Conditional probability P (the w of each feature wordi|Ci), according to P (C1)、P(C2)、…、P(Ci) and P (wi|Ci) building naive Bayesian
Model.
Prior probability and conditional probability parameter are obtained by work order training set, naive Bayesian mould is constructed according to parameter
Type, when later use model-naive Bayesian carries out work order type identification, the recognition result of acquisition is more accurate.
The step of obtaining work order sample set in one of the embodiments, includes the following steps:
Different types of multiple work orders are obtained, correlation analysis is carried out respectively to the structured field in each work order, are obtained
Second related coefficient;If the second related coefficient is more than preset threshold, the corresponding structuring of the second related coefficient in each work order is removed
Field obtains work order sample set.
In the present embodiment, acquisition has already passed through multiple work orders of manual identified work order type, at machine learning algorithm
Reason is unstructured data, and the work order collected includes structural data, for the structure of the work order collected
Change field and carry out correlation analysis, the field of strong correlation is removed, work order sample set is obtained.
It is the flow chart that model-naive Bayesian is tested in one embodiment of the invention referring to Fig. 3, Fig. 3.In the present embodiment,
It is further comprising the steps of after the step of prior probability and conditional probability building model-naive Bayesian:
Step S310:Several work order samples are randomly selected from work order sample set forms work order test set;
Step S320:Third feature word is extracted from the work order test sample in work order test set;
Step S330:It calculates separately to go out in third feature word according to third feature word and using model-naive Bayesian
Under conditions of existing, work order test sample corresponding probability when being different work order types, and it is corresponding according to different work order types
Probability obtain recognition result;
Step S340:The identification for obtaining model-naive Bayesian according to the type of classification results and work order test sample is accurate
Rate;
Step S350:If classification accuracy rate is lower than preset threshold, model-naive Bayesian is adjusted.
Above-mentioned model-naive Bayesian test process, by extracting the feature list in work order test set in work order test sample
Word, calculating acquisition work order test sample according to the feature word of work order test sample using the model-naive Bayesian built is
Corresponding probability P (C when different work order typei| w), wherein w is the term vector of work order test sample, and the maximum institute of probability value is right
The work order type answered is final classification results, if the work order Type-Inconsistencies of classification results and work order test sample, work order
Type identification mistake;Work order test samples all in work order test set are tested, the identification of model-naive Bayesian is obtained
Accuracy rate, if classification accuracy is lower than preset threshold, model-naive Bayesian is adjusted, to guarantee model-naive Bayesian
To the recognition performance of work order type.
The step of being adjusted in one of the embodiments, to model-naive Bayesian includes the following steps:
It obtains the work order test sample of recognition result mistake and extracts fourth feature word;It is adjusted according to fourth feature word
The frequency of occurrence of corresponding feature word in training word matrix;According to trained word adjustment of matrix naive Bayesian adjusted
In conditional probability.
The work order of recognition result mistake is obtained, and extracts the feature word in the work order, by adjusting these words in word
The frequency in the training word matrix of bag model, so that their weight is mutually adjusted, according to trained word matrix weight adjusted
The new conditional probability P (C for obtaining each feature word in different work order type conditionsi| w), to realize to naive Bayesian
Model is adjusted, and improves model-naive Bayesian to the recognition performance of work order type.
Further, in actual use, it when work order type identification, classification error, is returned in next link, Piao
Plain Bayesian model adjusts bag of words using the feature word in the work order of the recognition result mistake, realizes self-teaching, improves Piao
The self-learning capability and adaptivity of plain Bayesian model, make it have stronger practicability.
In order to be more clear technical solution of the present invention, below by taking the identification of the work order type of telecommunications work order as an example, to this hair
The recognition methods of the work order type based on big data of bright embodiment is to be further described:
(1) work order sample set is obtained
The electronic edition telecommunications work order in papery version telecommunications work order or operation system is obtained, papery version work order is scanned
The text information of work order is obtained, or Word Input is carried out to the telecommunications work order in operation system, the electricity of Word Input will be passed through
Believe that work order generates telecommunications work order sample.
Multiple telecommunications work order samples of low price Value Types and non-low price Value Types are respectively placed in low value, not
In two subdirectories of low value, each subdirectory respectively has 25 work orders, and name is all 1.txt, 2.txt, 3.txt, ...,
25.txt.Wherein low value's is low value work order sample, and not low value is non-low value work order sample.
(2) model-naive Bayesian is constructed
From this 50 work order samples, 40 work orders are randomly selected as work order training set, extract institute in work order training set
There is the word w in work order samplei, all switch to small letter, removal repeats, and obtains word list, counts each word w in word listiIn
The number of appearance, and generate the training word matrix TrainMat and and corresponding work order training sample type of bag of words
TrainCat calls training function to training word matrix TrainMat and and corresponding work order training sample type
TrainCat is handled, and is concentrated to obtain the work order training sample that work order type is low value work order in work order training sample
Prior probability P (C1) and each word each feature word in different work order type conditions conditional probability P (wi|Ci),
Including p0Vec and p1Vec.Wherein, training function is as follows
Void Train (const TIntmat&TrainMat --- training matrix;
Const TIntVec&TrainCat --- the classification of corresponding work order;
double&pc1--- return to the ratio that work order in matrix belongs to c1;
TDblVec&p0Vec——P(w0|C0)、P(w1|C0) ... array;
TDblVec&p1Vec——P(w0|C1)、P(w1|C1) ... array)
Wherein, work order sum=TrainMat line number, vocabulary word number=TrainMat columns are enabled;Enable pc1=at a low price
Work order accounts for the ratio of work order sum when value work order, i.e. value is work order quantity corresponding to 1 divided by work order sum in TrainCat.
Calculate p0Vec, p1During Vec, p0Molecule=new [vocabulary number] and p1Molecule=new [vocabulary
Number] it is initialized to 1, p0Denominator=p1Denominator=2;Its specific function is as follows:
For (i=0;I < work order sum;i++){
If (TrainCat [i]==0)
The i-th row of TrainMat is added to p0Molecule.
The word word number occurred in the i-th row of TrainMat is added to p0Denominator
(similar p0Denominator +=Sum (TrainMat [i]))
}else{
It is like above, but 1. are all changed to 0 }
p0Vec=ln (p0Molecule/p0Denominator)
p1Vec=ln (p1Molecule/p1Denominator)
}
(2) model-naive Bayesian is tested
It is work order test set TestDoc using remaining 10 work orders, differentiates whether doc belongs to Classify function
Low value work order.
for each doc in TestDoc{
Classify (the corresponding vocabulary vector of doc, pc1、p0Vec and p1Vec)
Record sentences error rate
}
(3) identification of telecommunications work order type
Type identification is carried out to telecommunications work order to be sorted using classification decision function, the function is specific as follows:
bool classify(const TDblVec&w,const TDblVec&p0Vec,const TDblVec&p1Vec,
double pc1)
P0=sum (w*p0Vec)+ln(1-pc1)
P1=sum (w*p1Vec)+ln(pc1)
Return(p1>p0)
According to the recognition methods of the above-mentioned work order type based on big data, the present invention also provides a kind of works based on big data
The identifying system of single type, the embodiment of the identifying system of the work order type with regard to of the invention based on big data carries out detailed below
Explanation.
Referring to fig. 4, Fig. 4 is that the structure of the identifying system of the work order type based on big data in one embodiment of the invention is shown
It is intended to.In the present embodiment, the identifying system of the work order type based on big data, including:
Correlating module 410, for obtaining work order to be sorted, the structured field treated in classification work order carries out phase
The analysis of closing property obtains the first related coefficient;
Target work order obtains module 420, if being more than preset threshold for the first related coefficient, removes phase in work order to be sorted
The structured field answered obtains target work order to be sorted;
Probability obtains module 430, for extracting the fisrt feature word of target work order to be sorted, according to fisrt feature word
It is calculated separately under conditions of fisrt feature word occurs using the model-naive Bayesian constructed in advance, work order type to be sorted
Corresponding probability when for different work order types;
Work order type identification module 440, the type for the work order to be sorted according to each probabilistic determination.
The identifying system of the above-mentioned work order type based on big data by work order structured field carry out correlation analysis,
Strong correlation field is removed, so that work order is able to use the model-naive Bayesian constructed in advance and carries out fast and accurately type
Identification, avoids the bulk deposition of work order from causing the reduction of the occupancy and operation system operational efficiency of server storage.
It is the structure of the identifying system of the work order type based on big data in another embodiment of the present invention referring to Fig. 5, Fig. 5
Schematic diagram;In the present embodiment, the identifying system of the work order type based on big data further includes model-naive Bayesian building mould
Block 450 and model-naive Bayesian test module 460.
It is the structural schematic diagram that model-naive Bayesian constructs module in one embodiment of the invention referring to Fig. 6, Fig. 6;Piao
Plain Bayesian model building module 450 includes work order training set acquiring unit 451, prior probability acquiring unit 452, training matrix
Acquiring unit 453, conditional probability acquiring unit 454 and model construction unit 455.
Work order training set acquiring unit 451 randomly selects several from work order sample set for obtaining work order sample set
Work order sample forms work order training set;
Prior probability acquiring unit 452, the type for calculating work order training sample in work order training set are different works
Corresponding prior probability when single type;
Training matrix acquiring unit 453, for extracting second feature list from the work order training sample in work order training set
Word, and according to the training word matrix of second feature word building bag of words;
Conditional probability acquiring unit 454, it is each in different work order type conditions for being obtained according to training word matrix
The conditional probability of second feature word;
Model construction unit 455, for constructing model-naive Bayesian according to prior probability and conditional probability.
Work order training set acquiring unit 451 obtains different types of multiple work orders in one of the embodiments, to each work
Structured field in list carries out correlation analysis respectively, obtains the second related coefficient;If the second related coefficient is more than default threshold
Value removes the corresponding structured field of the second related coefficient in each work order, obtains work order sample set.
It is the structural schematic diagram of model-naive Bayesian test module in one embodiment of the invention referring to Fig. 7, Fig. 7;?
In the present embodiment, model-naive Bayesian test module 460 includes work order test set acquiring unit 461, feature extraction unit
462, recognition result acquiring unit 463, accuracy rate acquiring unit 464 and model-naive Bayesian adjustment unit 465;
Work order test set acquiring unit 461 forms work for randomly selecting several work order samples from work order sample set
Single test set;
Feature extraction unit 462, for extracting third feature word from the work order test sample in work order test set;
Recognition result acquiring unit 463, for being calculated separately according to third feature word and using model-naive Bayesian
Under conditions of third feature word occurs, work order test sample corresponding probability when being different work order types, and according to not
The corresponding probability of same work order type obtains recognition result;
Accuracy rate acquiring unit 464, for obtaining naive Bayesian according to the type of recognition result and work order test sample
The classification accuracy of model;
Model-naive Bayesian adjustment unit 465, if being lower than preset threshold for classification accuracy, to naive Bayesian
Model is adjusted.
The work order that model-naive Bayesian adjustment unit 465 obtains recognition result mistake in one of the embodiments, is surveyed
Sample sheet simultaneously extracts fourth feature word;According in fourth feature word adjusting training word matrix corresponding feature word go out
Occurrence number;According to the conditional probability in trained word adjustment of matrix naive Bayesian adjusted.
Probability obtains module 430 and treats the progress of class object work order using regular expression in one of the embodiments,
The segmentation of words obtains fisrt feature word.
The identifying system of work order type based on big data of the invention and the work order type of the invention based on big data
Recognition methods correspond, the above-mentioned work order type based on big data recognition methods embodiment illustrate technical characteristic
And its advantages are suitable for the embodiment of the identifying system of the work order type based on big data, hereby give notice that.
In one embodiment, a kind of computer equipment is also provided, which includes memory, processor and deposit
Store up the computer program that can be run on a memory and on a processor, wherein processor realizes following steps when executing program:
Work order to be sorted is obtained, the structured field in classification work order is treated and carries out correlation analysis the first phase relation of acquisition
Number;
If the first related coefficient is more than preset threshold, corresponding structured field in work order to be sorted is removed, is obtained wait divide
Class target work order;
The fisrt feature word for extracting target work order to be sorted utilizes the simple shellfish constructed in advance according to fisrt feature word
This model of leaf calculate separately fisrt feature word occur under conditions of, work order type to be sorted be different work order types when pair
The probability answered;
According to the type of each probabilistic determination work order to be sorted.
Following steps are also realized when processor executes program in one of the embodiments,:
Work order sample set is obtained, several work order samples are randomly selected from work order sample set and form work order training set;Meter
The type for calculating work order training sample in work order training set corresponding prior probability when being different work order types;From work order training set
In work order training sample in extract second feature word, and according to second feature word construct bag of words training word square
Battle array;The conditional probability of each special second sign word in different work order type conditions is obtained according to training word matrix;According to elder generation
Test probability and conditional probability building model-naive Bayesian.
Following steps are also realized when processor executes program in one of the embodiments,:
Different types of multiple work orders are obtained, correlation analysis is carried out respectively to the structured field in each work order, are obtained
Second related coefficient;If the second related coefficient is more than preset threshold, the corresponding structuring of the second related coefficient in each work order is removed
Field obtains work order sample set.
Following steps are also realized when processor executes program in one of the embodiments,:
Several work order samples are randomly selected from work order sample set forms work order test set;Work from work order test set
Third feature word is extracted in single test sample;It calculates separately according to third feature word and using model-naive Bayesian
Under conditions of three feature words occur, work order test sample corresponding probability when being different work order types, and according to different
The corresponding probability of work order type obtains recognition result;Naive Bayesian is obtained according to the type of recognition result and work order test sample
The classification accuracy of model;If classification accuracy is lower than preset threshold, model-naive Bayesian is adjusted.
Following steps are also realized when processor executes program in one of the embodiments,:Obtain recognition result mistake
Work order test sample simultaneously extracts fourth feature word;According to corresponding feature list in fourth feature word adjusting training word matrix
The frequency of occurrence of word;According to the conditional probability in trained word adjustment of matrix naive Bayesian adjusted.
Following steps are also realized when processor executes program in one of the embodiments,:
Class object work order is treated using regular expression and carries out the segmentation of words, obtains fisrt feature word.
The computer equipment, when processor executes program, by realizing such as any one base in the various embodiments described above
Strong correlation is removed by carrying out correlation analysis to work order structured field in the recognition methods of the work order type of big data
Field avoids work order so that work order is able to use the model-naive Bayesian constructed in advance and carries out fast and accurately type identification
Bulk deposition cause server storage occupancy and operation system operational efficiency reduction.
In addition, those of ordinary skill in the art will appreciate that realize above-described embodiment method in all or part of the process,
It is that relevant hardware can be instructed to complete by computer program, it is non-volatile computer-readable that program can be stored in one
It takes in storage medium, in the embodiment of the present invention, which be can be stored in the storage medium of computer system, and by the calculating
At least one processor in machine system executes, and includes the recognition methods such as the above-mentioned respectively work order type based on big data with realization
Embodiment process.
In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated
Machine program realizes following steps when being executed by processor:
Work order to be sorted is obtained, the structured field in classification work order is treated and carries out correlation analysis the first phase relation of acquisition
Number;
If the first related coefficient is more than preset threshold, corresponding structured field in work order to be sorted is removed, is obtained wait divide
Class target work order;
The fisrt feature word for extracting target work order to be sorted utilizes the simple shellfish constructed in advance according to fisrt feature word
This model of leaf calculate separately fisrt feature word occur under conditions of, work order type to be sorted be different work order types when pair
The probability answered;
According to the type of each probabilistic determination work order to be sorted.
Following steps are also realized when processor executes program in one of the embodiments,:
Work order sample set is obtained, several work order samples are randomly selected from work order sample set and form work order training set;Meter
The type for calculating work order training sample in work order training set corresponding prior probability when being different work order types;From work order training set
In work order training sample in extract second feature word, and according to second feature word construct bag of words training word square
Battle array;The conditional probability of feature word is stated for each second in different work order type conditions according to training word matrix acquisition;Root
Model-naive Bayesian is constructed according to prior probability and conditional probability.
Following steps are also realized when processor executes program in one of the embodiments,:
Different types of multiple work orders are obtained, correlation analysis is carried out respectively to the structured field in each work order, are obtained
Second related coefficient;If the second related coefficient is more than preset threshold, the corresponding structuring of the second related coefficient in each work order is removed
Field obtains work order sample set.
Following steps are also realized when processor executes program in one of the embodiments,:
Several work order samples are randomly selected from work order sample set forms work order test set;Work from work order test set
Third feature word is extracted in single test sample;It calculates separately according to third feature word and using model-naive Bayesian
Under conditions of three feature words occur, work order test sample corresponding probability when being different work order types, and according to different
The corresponding probability of work order type obtains recognition result;Naive Bayesian is obtained according to the type of recognition result and work order test sample
The classification accuracy of model;If classification accuracy is lower than preset threshold, model-naive Bayesian is adjusted.
Following steps are also realized when processor executes program in one of the embodiments,:Obtain recognition result mistake
Work order test sample simultaneously extracts fourth feature word;According to corresponding feature list in fourth feature word adjusting training word matrix
The frequency of occurrence of word;According to the conditional probability in trained word adjustment of matrix naive Bayesian adjusted.
Following steps are also realized when processor executes program in one of the embodiments,:
Class object work order is treated using regular expression and carries out the segmentation of words, obtains fisrt feature word.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer
In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein,
To any reference of memory, storage, database or other media used in each embodiment provided herein,
Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM
(PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include
Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms,
Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing
Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM
(RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned reality
It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited
In contradiction, all should be considered as described in this specification.
The embodiments described above only express several embodiments of the present invention, and the description thereof is more specific and detailed, but simultaneously
It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art
It says, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to protection of the invention
Range.Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.
Claims (10)
1. a kind of recognition methods of the work order type based on big data, which is characterized in that include the following steps:
Work order to be sorted is obtained, correlation analysis is carried out to the structured field in the work order to be sorted and obtains the first phase relation
Number;
If first related coefficient is more than preset threshold, corresponding structured field in the work order to be sorted is removed, is obtained
Target work order to be sorted;
The fisrt feature word for extracting the target work order to be sorted utilizes the Piao constructed in advance according to the fisrt feature word
Plain Bayesian model calculates separately under conditions of the fisrt feature word occurs, and work order type to be sorted is different work order
Corresponding probability when type;
The type of the work order to be sorted according to each probabilistic determination.
2. the recognition methods of the work order type according to claim 1 based on big data, which is characterized in that it is described obtain to
It is further comprising the steps of before the step of work order of classifying:
Work order sample set is obtained, several work order samples are randomly selected from the work order sample set and form work order training set;
The type for calculating work order training sample in the work order training set corresponding prior probability when being different work order types;
Second feature word is extracted from the work order training sample in the work order training set, and according to the second feature word
Construct the training word matrix of bag of words;
It is general that the condition of each second feature word in different work order type conditions is obtained according to the trained word matrix
Rate;
Model-naive Bayesian is constructed according to the prior probability and the conditional probability.
3. the recognition methods of the work order type according to claim 2 based on big data, which is characterized in that the acquisition work
The step of single sample set, includes the following steps:
Different types of multiple work orders are obtained, correlation analysis is carried out respectively to the structured field in each work order, are obtained
Second related coefficient;
If second related coefficient is more than preset threshold, the corresponding structure of the second related coefficient described in each work order is removed
Change field, obtains work order sample set.
4. the recognition methods of the work order type according to claim 3 based on big data, which is characterized in that described according to institute
It is further comprising the steps of after the step of stating prior probability and conditional probability building model-naive Bayesian:
Several work order samples are randomly selected from the work order sample set forms work order test set;
Third feature word is extracted from the work order test sample in the work order test set;
It calculates separately to go out in the third feature word according to the third feature word and using the model-naive Bayesian
Under conditions of existing, the work order test sample corresponding probability when being different work order types, and according to the different work order
The corresponding probability of type obtains recognition result;
The classification for obtaining the model-naive Bayesian according to the type of the recognition result and the work order test sample is accurate
Rate;
If the classification accuracy is lower than preset threshold, the model-naive Bayesian is adjusted.
5. the recognition methods of the work order type according to claim 4 based on big data, which is characterized in that described to described
The step of model-naive Bayesian is adjusted includes the following steps:
It obtains the work order test sample of recognition result mistake and extracts fourth feature word;
The frequency of occurrence of corresponding feature word in the trained word matrix is adjusted according to the fourth feature word;
According to the conditional probability in naive Bayesian described in trained word adjustment of matrix adjusted.
6. the recognition methods of the work order type according to claim 1 based on big data, which is characterized in that the extraction institute
The step of stating the fisrt feature word of target work order to be sorted, includes the following steps:
The segmentation of words is carried out to the target work order to be sorted using regular expression, obtains fisrt feature word.
7. a kind of identifying system of the work order type based on big data, which is characterized in that including:
Correlating module carries out the structured field in the work order to be sorted related for obtaining work order to be sorted
Property analysis obtain the first related coefficient;
Target work order obtains module, if being more than preset threshold for first related coefficient, removes in the work order to be sorted
Corresponding structured field obtains target work order to be sorted;
Probability obtains module, for extracting the fisrt feature word of the target work order to be sorted, according to the fisrt feature list
Word is calculated separately under conditions of the fisrt feature word occurs using the model-naive Bayesian constructed in advance, work to be sorted
Single type corresponding probability when being different work order types;
Work order type identification module, the type for the work order to be sorted according to each probabilistic determination.
8. the identifying system of the work order type according to claim 7 based on big data, which is characterized in that further include simplicity
Bayesian model constructs module, and the model-naive Bayesian building module includes work order training set acquiring unit, prior probability
Acquiring unit, training matrix acquiring unit, conditional probability acquiring unit and model construction unit;
The work order training set acquiring unit randomly selects several for obtaining work order sample set from the work order sample set
A work order sample forms work order training set;
The prior probability acquiring unit, the type for calculating work order training sample in the work order training set are different works
Corresponding prior probability when single type;
The training matrix acquiring unit, for extracting second feature list from the work order training sample in the work order training set
Word, and according to the training word matrix of second feature word building bag of words;
The conditional probability acquiring unit, it is each in different work order type conditions for being obtained according to the trained word matrix
The conditional probability of the second feature word;
The model construction unit, for constructing model-naive Bayesian according to the prior probability and the conditional probability.
9. a kind of computer equipment, including memory, processor and it is stored on the memory and can be in the processor
The computer program of upper operation, which is characterized in that the processor realized when executing the computer program as claim 1 to
The recognition methods of work order type described in 6 any one based on big data.
10. a kind of computer storage medium, is stored thereon with computer program, which is characterized in that the program is executed by processor
The recognition methods of work order type based on big data of the Shi Shixian as described in claim 1 to 6 any one.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810427330.7A CN108897754B (en) | 2018-05-07 | 2018-05-07 | Big data-based work order type identification method and system and computing device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810427330.7A CN108897754B (en) | 2018-05-07 | 2018-05-07 | Big data-based work order type identification method and system and computing device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108897754A true CN108897754A (en) | 2018-11-27 |
CN108897754B CN108897754B (en) | 2020-12-11 |
Family
ID=64342619
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810427330.7A Active CN108897754B (en) | 2018-05-07 | 2018-05-07 | Big data-based work order type identification method and system and computing device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108897754B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110147980A (en) * | 2019-04-03 | 2019-08-20 | 口碑(上海)信息技术有限公司 | Worksheet method and device |
CN110417748A (en) * | 2019-07-08 | 2019-11-05 | 新华三信息安全技术有限公司 | A kind of attack detection method and device |
CN111325422A (en) * | 2018-12-14 | 2020-06-23 | 中国移动通信集团河南有限公司 | Work order distribution method and system |
CN111382068A (en) * | 2020-02-29 | 2020-07-07 | 中国平安人寿保险股份有限公司 | Hierarchical testing method and device for mass data |
CN111797942A (en) * | 2020-07-23 | 2020-10-20 | 深圳壹账通智能科技有限公司 | User information classification method and device, computer equipment and storage medium |
CN113177151A (en) * | 2021-05-28 | 2021-07-27 | 中山世达模型制造有限公司 | Potential customer screening method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101814083A (en) * | 2010-01-08 | 2010-08-25 | 上海复歌信息科技有限公司 | Automatic webpage classification method and system |
CN104021302A (en) * | 2014-06-18 | 2014-09-03 | 北京邮电大学 | Auxiliary registration method based on Bayes text classification model |
CN106445994A (en) * | 2016-07-13 | 2017-02-22 | 广州精点计算机科技有限公司 | Mixed algorithm-based web page classification method and apparatus |
CN106844632A (en) * | 2017-01-20 | 2017-06-13 | 清华大学 | Based on the product review sensibility classification method and device that improve SVMs |
-
2018
- 2018-05-07 CN CN201810427330.7A patent/CN108897754B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101814083A (en) * | 2010-01-08 | 2010-08-25 | 上海复歌信息科技有限公司 | Automatic webpage classification method and system |
CN104021302A (en) * | 2014-06-18 | 2014-09-03 | 北京邮电大学 | Auxiliary registration method based on Bayes text classification model |
CN106445994A (en) * | 2016-07-13 | 2017-02-22 | 广州精点计算机科技有限公司 | Mixed algorithm-based web page classification method and apparatus |
CN106844632A (en) * | 2017-01-20 | 2017-06-13 | 清华大学 | Based on the product review sensibility classification method and device that improve SVMs |
Non-Patent Citations (1)
Title |
---|
凤丽洲: "《文本分类关键技术及应用研究》", 《中国博士学位论文全文数据库 信息科技辑》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111325422A (en) * | 2018-12-14 | 2020-06-23 | 中国移动通信集团河南有限公司 | Work order distribution method and system |
CN111325422B (en) * | 2018-12-14 | 2023-10-27 | 中国移动通信集团河南有限公司 | Work order dispatching method and system |
CN110147980A (en) * | 2019-04-03 | 2019-08-20 | 口碑(上海)信息技术有限公司 | Worksheet method and device |
CN110417748A (en) * | 2019-07-08 | 2019-11-05 | 新华三信息安全技术有限公司 | A kind of attack detection method and device |
CN111382068A (en) * | 2020-02-29 | 2020-07-07 | 中国平安人寿保险股份有限公司 | Hierarchical testing method and device for mass data |
CN111382068B (en) * | 2020-02-29 | 2024-04-09 | 中国平安人寿保险股份有限公司 | Hierarchical testing method and device for large-batch data |
CN111797942A (en) * | 2020-07-23 | 2020-10-20 | 深圳壹账通智能科技有限公司 | User information classification method and device, computer equipment and storage medium |
CN113177151A (en) * | 2021-05-28 | 2021-07-27 | 中山世达模型制造有限公司 | Potential customer screening method |
Also Published As
Publication number | Publication date |
---|---|
CN108897754B (en) | 2020-12-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108897754A (en) | Recognition methods, system and the calculating equipment of work order type based on big data | |
CN108073568B (en) | Keyword extraction method and device | |
CN104866558B (en) | A kind of social networks account mapping model training method and mapping method and system | |
CN111914099B (en) | Intelligent question-answering method, system, device and medium of traffic optimization strategy | |
CN108519971B (en) | Cross-language news topic similarity comparison method based on parallel corpus | |
CN106339495A (en) | Topic detection method and system based on hierarchical incremental clustering | |
CN102270212A (en) | User interest feature extraction method based on hidden semi-Markov model | |
CN108171243B (en) | Medical image information identification method and system based on deep neural network | |
CN110990529B (en) | Industry detail dividing method and system for enterprises | |
JP2020512651A (en) | Search method, device, and non-transitory computer-readable storage medium | |
CN109710725A (en) | A kind of Chinese table column label restoration methods and system based on text classification | |
CN109255029A (en) | A method of automatic Bug report distribution is enhanced using weighted optimization training set | |
CN106528527A (en) | Identification method and identification system for out of vocabularies | |
CN114612251A (en) | Risk assessment method, device, equipment and storage medium | |
CN116304035A (en) | Multi-notice multi-crime name relation extraction method and device in complex case | |
CN111767390A (en) | Skill word evaluation method and device, electronic equipment and computer readable medium | |
CN109582743B (en) | Data mining system for terrorist attack event | |
CN111930944B (en) | File label classification method and device | |
US20230206676A1 (en) | Systems and Methods for Generating Document Numerical Representations | |
US20210117448A1 (en) | Iterative sampling based dataset clustering | |
CN110941703A (en) | Integrated resume information extraction method based on machine learning and fuzzy rules | |
CN112506930B (en) | Data insight system based on machine learning technology | |
CN113221792B (en) | Chapter detection model construction method, cataloguing method and related equipment | |
CN113343012B (en) | News matching method, device, equipment and storage medium | |
CN109902129A (en) | Insurance agent's classifying method and relevant device based on big data analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |