CN110363223A - Industrial flow data processing method, detection method, system, device and medium - Google Patents
Industrial flow data processing method, detection method, system, device and medium Download PDFInfo
- Publication number
- CN110363223A CN110363223A CN201910534886.0A CN201910534886A CN110363223A CN 110363223 A CN110363223 A CN 110363223A CN 201910534886 A CN201910534886 A CN 201910534886A CN 110363223 A CN110363223 A CN 110363223A
- Authority
- CN
- China
- Prior art keywords
- flow data
- industrial flow
- data
- classification results
- industrial
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/10—Pre-processing; Data cleansing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24147—Distances to closest patterns, e.g. nearest neighbour classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
- G06F18/24155—Bayesian classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/259—Fusion by voting
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of industrial flow data processing method, detection method, system, device and media.It the treating method comprises and classified using trained multiple mutually different sorting algorithms to industrial flow data, to obtain multiple classification results, it is voted using trained Voting Algorithm multiple classification results, to make each classification results respectively correspond corresponding poll, then output is corresponding with the classification results of highest poll, and according to the classification results exported, the judging result whether the industrial flow data are belonged to abnormal data exported.The present invention is used for the abnormality detection to industrial flow data, high accuracy, the classification of the flow detection of low rate of false alarm may be implemented, to prevent abnormal flow invasion, safeguard industries control system safety and privacy from providing safety guarantee.The present invention is widely used in field of computer technology.
Description
Technical field
The present invention relates to field of computer technology, especially a kind of industrial flow data processing method, system, device and Jie
Matter.
Background technique
Industrial control system is widely applied in the industrial production.The course of work of industrial control system can generate or pass through
The equipment such as sensor get industrial flow data, in order to ensure the safety and stability of production, need to industrial flow number
According to being detected, therefore, it is determined that industrial flow data are normal data or abnormal data.For detecting showing for abnormal flow data
There is technology to can be mainly divided into three classes, i.e., method based on port, the method based on traffic characteristic statistics and based on it is original effectively
The method of load.
The principle of method based on port is checked using the port numbers in the packet header of industrial flow data
The application program known.Method based on port is simple and easy, but makes since many application programs are even passed through using dynamic port
Oneself is hidden with the well-known port of other applications, the method based on port is caused to be difficult to industrial flow data
It is identified, therefore the method based on port can not provide reliable as a result, the prior art is less using based on port
Method.
Network flow is divided using having supervision and unsupervised machine learning algorithm based on the method for traffic characteristic statistics
Class is the predefined classification of known applications, but this method needs expert by a large amount of experience, unites to flow information
Meter, expends more manpower.
The principle of method based on original payload is to learn original industrial flow data using the methods of deep learning
Internal characteristics.But based on the method for deep learning in learning data internal characteristics, there are the loss of data information, feature
Extract incomplete disadvantage.
Summary of the invention
In order to solve the above-mentioned technical problem, the purpose of the present invention is to provide a kind of industrial flow data processing method, be
System, device and medium.
On the one hand, the embodiment of the present invention includes a kind of industrial flow data processing methods, comprising the following steps:
Classified using trained multiple mutually different sorting algorithms to industrial flow data, to obtain more
A classification results;
It is voted using trained Voting Algorithm multiple classification results, to make each classification results
Corresponding poll is respectively corresponded, then output is corresponding with the classification results of highest poll;
According to the classification results exported, whether output belongs to the judgement knot of abnormal data to the industrial flow data
Fruit.
Further, the industrial flow data processing method further includes training step, and the training step specifically includes:
Obtain data label corresponding with the industrial flow data;The data label is for indicating the corresponding work
The classification of industry data on flows;
Training set is established using the industrial flow data and the data label;Industrial flow number in the training set
According to the input data for being used as each sorting algorithm, the expectation that the data label in the training set is used as the Voting Algorithm is defeated
Out;
Each sorting algorithm and Voting Algorithm are trained using the training set.
Further, the industrial flow data processing method is further comprising the steps of:
The repeatedly training step is executed, and to the ballot weight of the Voting Algorithm after executing training step every time
It is adjusted;The ballot weight is used for so that when executing the Voting Algorithm, and the poll that each classification results obtain has
There is corresponding weight;
When the output of the Voting Algorithm is with the error minimum between corresponding data label, corresponding franchise is recorded
Weight;
The Voting Algorithm is set according to the ballot weight recorded.
Further, the multiple mutually different sorting algorithm is K nearest neighbor algorithm, NB Algorithm and decision
Tree algorithm.
Further, the industrial flow data processing method is further comprising the steps of:
Detect the length of the industrial flow data;
If the length of the industrial flow data is greater than preset length, the industrial flow data are truncated to described pre-
If length, if the length of the industrial flow data is less than preset length, data bit benefit is carried out to the industrial flow data
Foot, so that the length of the industrial flow data be made to increase to the preset length;
The industrial flow data by truncating or data bit is supplied are normalized.
Further, formula used in the normalized isIn formula, aiFor the spy of the industrial flow data
Value indicative, Rescaled (ai) it is to aiIt is being normalized as a result, AminFor the industrial flow data characteristic value most
Small value, AmaxFor the maximum value of the characteristic value of the industrial flow data, max and min are respectively default value.
On the other hand, the embodiment of the invention also includes a kind of industrial flow data detection methods, comprising the following steps:
Obtain industrial flow data;
As the method according to claim 1 to 6 is executed to handle the industrial flow data got;
If detecting, the industrial flow data belong to abnormal data, carry out alarm prompt.
On the other hand, the embodiment of the invention also includes a kind of industrial flow data processing systems, comprising:
Categorization module, for being divided using trained multiple mutually different sorting algorithms industrial flow data
Class, to obtain multiple classification results;
Vote module, for being voted using trained Voting Algorithm multiple classification results, to make
Each classification results respectively correspond corresponding poll, and export the classification results for being corresponding with highest poll;
Judgment module, for according to the classification results exported, whether output to belong to exception to the industrial flow data
The judging result of data.
On the other hand, the embodiment of the invention also includes a kind of industrial flow data processing equipments, including memory and processing
Device, the memory is for storing at least one program, and the processor is for loading at least one described program to execute sheet
Inventive method.
On the other hand, the embodiment of the invention also includes a kind of media, with store function, wherein being stored with processor can
The instruction of execution, the executable instruction of the processor are used to execute the method for the present invention when executed by the processor.
The beneficial effects of the present invention are: by the present invention in that with trained sorting algorithm and Voting Algorithm, Ke Yishi
The feature distribution of other industrial flow data also has detectability to the novel exception not occurred, avoids conventional statistics spy
Sign can not effectively detect abnormal disadvantage, and simplify expert and carry out manual analysis to traffic characteristic and extract the behaviour of feature
Make.The present invention is used for the abnormality detection to industrial flow data, high accuracy, the flow detection of low rate of false alarm point may be implemented
Class, to prevent abnormal flow invasion, safeguard industries control system safety and privacy from providing safety guarantee.
Detailed description of the invention
Fig. 1 is the flow chart of industrial flow data processing method in the embodiment of the present invention;
Fig. 2 is the schematic diagram of industrial flow data processing method in the embodiment of the present invention;
Fig. 3 is the schematic diagram of the training process in the embodiment of the present invention to sorting algorithm and Voting Algorithm;
Fig. 4 is the structural block diagram of industrial flow data processing system in the embodiment of the present invention.
Specific embodiment
The present embodiment includes a kind of industrial flow data processing method, referring to Fig.1, comprising the following steps:
S1. classified using trained multiple mutually different sorting algorithms to industrial flow data, thus
To multiple classification results;
S2. it is voted using trained Voting Algorithm multiple classification results, to make each classification
As a result corresponding poll is respectively corresponded, then output is corresponding with the classification results of highest poll;
S3. according to the classification results exported, whether output belongs to the judgement of abnormal data to the industrial flow data
As a result.
In step S1, multiple mutually different sorting algorithms are set, these sorting algorithms have passed through training in advance, to have
There is corresponding classification capacity.The industrial flow data that will acquire are separately input in these sorting algorithms, receive these classification
Multiple classification results of algorithm output.The classification results refer to the classification that industrial flow data are divided by sorting algorithm, no
Identical industrial flow data may be divided into identical classification or different classifications by same sorting algorithm.
In the present embodiment, the industrial flow data can refer in particular to an individual, i.e. a data packet or one piece of data,
It is also possible to a collective concept, i.e., the set of multiple data packets or multiple segment data composition.The stream of industry handled by step S1
Amount data can be a data packet.
In step S2, voted using trained Voting Algorithm multiple classification results.In poll closing
Afterwards, each classification results corresponding to industrial flow data can obtain corresponding poll respectively.Voting Algorithm described in the present embodiment
It is simple vote algorithm, that is, searches and be corresponding with the classification results of highest poll and output it.If there is multiple classification results institutes
The poll obtained is identical, can use and sort according to default sequence and select the classification results to rank the first, or pass through random side
Method is selected classification results and is exported.
Multiple sorting algorithms may be respectively by industrial flow data classification at multiple classifications in step S1.Pass through step S2's
The most classification results of corresponding poll can be determined as industrial flow data generic by ballot.
In step S3, according to industrial flow data generic determined by step S2, industrial flow number is further judged
According to abnormal data or normal data is belonged to, then judging result is exported, it is final whether just to determine industrial flow data
Often.The specific implementation procedure of step S1-S3 may include: classify in establishment step S1 resulting multiple classification results and " exception
The mapping relations of data " and " normal data " both classifications are divided belonging to industrial flow data by executing step S2 and determining
After class result, then by the mapping relations determine that industrial flow data belong to normal data or abnormal data, to realize
Detection to industrial flow data exception.Each sorting algorithm can also be configured, so that each sorting algorithm is directly by work
Industry data on flows is classified as " normal data " or " abnormal data ", is then voted using Voting Algorithm, and output is corresponding with most
The classification results of high poll, so that the determination of industrial flow data is directly belonged to " normal data " still " abnormal data ".
The principle of step S1-S3 is as shown in Figure 2.At application class algorithm and Voting Algorithm are to industrial flow data
When reason, processed industrial flow data are known as test set.
Step S1-S3 can identify industrial flow data by using trained sorting algorithm and Voting Algorithm
Feature distribution also has detectability to the novel exception not occurred, avoid conventional statistics feature can not effectively detect it is different
Normal disadvantage, and simplify expert and manual analysis is carried out to traffic characteristic and extracts the operation of feature.Step S1-S3 is used for
To the abnormality detection of industrial flow data, high accuracy, the classification of the flow detection of low rate of false alarm may be implemented, to prevent exception stream
Amount invasion, safeguard industries control system safety and privacy provide safety guarantee.
Preferably, K nearest neighbor algorithm (K-Nearest Neighbor, KNN), naive Bayesian can be used in step S1
Three kinds of sorting algorithms of algorithm (Naive Bayes, NB) and decision Tree algorithms (Decision Tree, DT).Executing step S1
When, these three sorting algorithms receive same industrial flow data respectively and export respective classification results.
The principle of K nearest neighbor algorithm is: the several samples for looking for sample undetermined nearest, according to the affiliated class of several samples, so
Classification of the maximum classification of accounting as sample undetermined is selected afterwards.K is exactly the sample size for needing to find in algorithm.In the present invention
In, we select 5 samples (i.e. K value be 5) nearest apart from sample undetermined, and the weight of 5 samples, ballot is set as
Specific gravity, distance exam select Euclidean distance, and calculation formula isTraining set is inputted
After KNN module, KNN module calculates 5 samples nearest with flow Euclidean distance, carries out kind judging, and and training set
Label is compared, and by the learning process, KNN module is made to grasp the feature distribution of the training set, so that the module is being tested
Stage being capable of accurate judgement unknown flow rate classification.
The principle of decision Tree algorithms is: DT algorithm is usually the optimal cut-off (i.e. feature) of recurrence selection, and according to
This feature is split training dataset, so that having a best assorting process, this process pair to each Sub Data Set
The division to feature space is answered, the building of decision tree is also corresponded to, continues the process for recycling this cutting on Sub Data Set,
Until all training data subsets are in the main true classification, or no suitable feature.Selecting optimal cut-off
When, we measure the impurity level or uncertainty of data by calculating comentropy, while determining that classification becomes with comentropy
Optimal two score value of amount obtains cutting problems, and calculation formula isWherein, D indicates training data
Collection, c indicate data category number, piIt indicates that classification i sample size accounts for the ratio of all samples, determines a certain feature as node
Afterwards, classify using the node as root to data, sorted data set information entropy can be than small before classification, and calculation formula isK indicates that sample D is divided into the part k in formula.Information gain, the i.e. difference of comentropy, weighing apparatus
Influence of some feature to classification results is measured, calculation formula is Gain (A)=Info (D)-InfoA(D).Training set is inputted into DT
After module, DT module calculates the comentropy of each characteristic value, by information entropy size, selects the first cut-off, then arranges
Except selected feature, recursive operation is carried out to remaining feature, until data can not divide again.
The principle of NB Algorithm is: Naive Bayes Classification Algorithm is based on Bayes' theorem and realizes target classification,
Its principle is the prior probability using event, calculates posterior probability by Bayesian formula, then judges posterior probability, is selected
Affiliated class of the maximum probability class as the event, Bayesian formula areWherein, event
BiFull probability be P (Bi), event Bi under occurrence condition the conditional probability of event A be P (A | Bi), thing under event A occurrence condition
The conditional probability of part Bi is P (Bi|A).After training set is inputted NB Algorithm, module is estimated by calculating maximum likelihood
Meter, and judge the data category in training set.
It is further used as preferred embodiment, the industrial flow data processing method further includes training step, described
Training step specifically includes:
SA1. data label corresponding with the industrial flow data is obtained;The data label is for indicating corresponding institute
State the classification of industrial flow data;
SA2. training set is established using the industrial flow data and the data label;Industry stream in the training set
The input data that data are used as each sorting algorithm is measured, the data label in the training set is used as the phase of the Voting Algorithm
Hope output;
SA3. each sorting algorithm and Voting Algorithm are trained using the training set.
In the present embodiment, step SA1-SA3 is for being trained sorting algorithm and Voting Algorithm, it is preferable that step
SA1-SA3 is executed before executing step S1-S3.Locating for industrial data flow used in step SA1 and step S1-S3
The industrial data flow of reason can come from same source, such as the industrial data flow that can be will acquire is by data packet number 4:
6 ratio is allocated, and the part that ratio is " 4 " is used in step SA1-SA3 instruct sorting algorithm and Voting Algorithm
Practice, the part that ratio is " 6 " is used to handle in step SA1-SA3 for sorting algorithm and Voting Algorithm.
The principle of step SA1-SA3 is as shown in Figure 3.
In step SA1, it can be classified by manual method to each industrial flow data, that is, by each industrial flow
Data are divided into corresponding classification, and the classification is indicated with corresponding data label.
In step SA2, training set is established using the resulting industrial flow data of step SA1 and data label.When executing step
Rapid SA3, industry stream when being trained using the resulting training set of step SA2 to sorting algorithm and Voting Algorithm, in training set
The input data that data are used as sorting algorithm is measured, after sorting algorithm output category result, the Voting Algorithm is to each classification results
It is voted and exports the classification results with highest poll, the expectation that the data label in training set is used as Voting Algorithm is defeated
Out, i.e., the parameter of sorting algorithm and Voting Algorithm is adjusted, so that point with highest poll that Voting Algorithm is exported
Class result converges to corresponding data label.In the training process to sorting algorithm and Voting Algorithm, it may be necessary to execute more
Training step is taken turns, that is, executes the combination being repeatedly made of step SA1-SA3.
It is further used as preferred embodiment, the industrial flow data processing method is further comprising the steps of:
SB1. the repeatedly training step, and the ballot after executing training step every time to the Voting Algorithm are executed
Weight is adjusted;The ballot weight is used for so that when executing the Voting Algorithm, the ticket of each classification results acquisition
Number has corresponding weight;
SB2. when the output of the Voting Algorithm is with the error minimum between corresponding data label, corresponding throw is recorded
Ticket weight;
SB3. the Voting Algorithm is set according to the ballot weight recorded.
In the present embodiment, used Voting Algorithm can be to the identical ballot weight of setting of all categories.The franchise
Refer to again when counting each classification results poll obtained, on the basis of original poll multiplied by coefficient.The present embodiment
In, used Voting Algorithm can also identical ballot weight, the ballot weight not can be two-by-two to setting of all categories
It is configured according to the property of used sorting algorithm, dynamic adjustment can also be carried out by executing step SB1-SB3.
The step SB1-SB3 can be executed before executing step S1-S3.More wheel training steps are executed in step SB1,
Execute combination composed by multiple step SA1-SA3.Each round training step, i.e., it is described when executing step SA1-SA3 every time
Different ballot weights is arranged in multiple classification results that Voting Algorithm all respectively generates sorting algorithm;Executing a step
After SA1-SA3, the ballot weight of the Voting Algorithm is adjusted, that is, to multiple classification results that sorting algorithm generates
Corresponding ballot weight is modified, and then executes step SA1-SA3 next time.After executing step SA1-SA3 every time, note
The error between the reality output and desired output of Voting Algorithm is recorded, to investigate the influence of ballot weight bring.
It in step SB2, analyzes the resulting error of step SA1-SA3 is performed a plurality of times, it is corresponding to find out minimal error
That secondary step SA1-SA3 implementation procedure, and by ballot weight used in that the secondary step SA1-SA3 implementation procedure found
The Voting Algorithm is set.Voting Algorithm after setting is executing step S1-S3 to generating in actual production
It when industrial flow data are handled, is then voted using set ballot weight, to obtain optimal effectiveness.
It is further used as preferred embodiment, the industrial flow data processing method is further comprising the steps of:
SC1. the length of the industrial flow data is detected;
If SC2. the length of the industrial flow data is greater than preset length, the industrial flow data are truncated to institute
Preset length is stated, if the length of the industrial flow data is less than preset length, data are carried out to the industrial flow data
Position is supplied, so that the length of the industrial flow data be made to increase to the preset length;
SC3. the industrial flow data by truncating or data bit is supplied are normalized.
The step SC1-SC3 is the preprocessing process to the industrial flow data, at this time the industrial flow data
It embodies in the form of data packet.Wherein, the step SC1 and SC2 is for carrying out at alignment the industrial flow data
Reason, that is, the length of industrial flow data is subjected to unification, so that industrial flow data are used to calculate sorting algorithm and ballot
When the training process of method, sorting algorithm and Voting Algorithm can be allowed preferably to learn to feature therein.Specifically: setting one
Preset length MIS, such as MIS=500 judge the length of industrial flow data and the relationship of MIS, if the length of industrial flow data
Degree is equal to MIS and does not operate then;If the length of industrial flow data is greater than MIS, last several of industrial flow data are cut
It goes the length of industrial flow data being truncated to MIS;If the length of industrial flow data is less than MIS, in industrial flow
0 is mended after last of data to which the length of industrial flow data is complemented to MIS.
In step SC3, industrial flow data are normalized by following formula:
In formula, aiFor the industrial flow
The characteristic value of data, the i.e. specific value of the characteristic value of industrial flow data;Rescaled(ai) it is to aiIt is normalized
Result;AminFor the minimum value of the characteristic value of the industrial flow data, i.e. the characteristic value of industrial flow data may obtain
Minimum value;AmaxFor the maximum value of the characteristic value of the industrial flow data, i.e. the characteristic value of industrial flow data may take
The maximum value obtained;Max and min is respectively default value, such as max=1 and min=0 can be set.
By the normalized, the characteristic value of industrial flow data can be zoomed into [0,1], it can be to avoid difference
The characteristic values of industrial flow data differ in the larger implementation procedure for leading to sorting algorithm and Voting Algorithm objective function and become
" flat ", therefore the required training time can be reduced.
The present embodiment further includes a kind of industrial flow data detection method, comprising the following steps:
S100. industrial flow data are obtained;
S101. S1-S3 is executed, to handle the industrial flow data got;
If S102. detecting, the industrial flow data belong to abnormal data, carry out alarm prompt.
The step S100-S102 is the data flow being applied to step S1-S3 to being formed by a large amount of industrial flow data
It is detected.In step S100, industrial flow data are obtained from data flow by modes such as crawl data packets, then execute step
Rapid S1-S3 handles single industrial flow data, judges that industrial flow data belong to normal data or abnormal data.Step
In rapid S102, if detecting that industrial flow data belong to abnormal data, passed with regard to carrying out alarm prompt or pause data
Defeated equal operation, to safeguard the safety and stabilization of production.
The present embodiment further includes a kind of industrial flow data processing system, referring to Fig. 4, comprising:
Categorization module, for being divided using trained multiple mutually different sorting algorithms industrial flow data
Class, to obtain multiple classification results;
Vote module, for being voted using trained Voting Algorithm multiple classification results, to make
Each classification results respectively correspond corresponding poll, and export the classification results for being corresponding with highest poll;
Judgment module, for according to the classification results exported, whether output to belong to exception to the industrial flow data
The judging result of data.
The categorization module, vote module and judgment module can be the hardware mould in computer system with corresponding function
Block or software module.
The present embodiment further includes a kind of industrial flow data processing equipment, including memory and processor, the memory
For storing at least one program, the processor is for loading at least one described program to execute industrial flow number of the present invention
According to processing method.
The present embodiment further includes a kind of medium, with store function, wherein being stored with the executable instruction of processor, institute
It states the executable instruction of processor and is used to execute industrial flow data processing method of the present invention when executed by the processor.
Industrial flow data processing system, device and medium in the present embodiment, can execute industrial flow of the invention
Data processing method, any combination implementation steps of executing method embodiment have the corresponding function of this method and beneficial to effect
Fruit.
It should be noted that unless otherwise specified, when a certain feature referred to as " fixation ", " connection " are in another feature,
It can directly fix, be connected to another feature, and can also fix, be connected to another feature indirectly.In addition, this
The descriptions such as the upper and lower, left and right used in open are only the mutual alignment pass relative to each component part of the disclosure in attached drawing
For system.The "an" of used singular, " described " and "the" are also intended to including most forms in the disclosure, are removed
Non- context clearly expresses other meaning.In addition, unless otherwise defined, all technology and science used in the present embodiment
Term is identical as the normally understood meaning of those skilled in the art.This example demonstrates that term used in book is
In order to describe specific embodiment, it is not intended to be limiting of the invention.Term "and/or" used in the present embodiment includes one
Or the arbitrary combination of multiple relevant listed items.
It will be appreciated that though various elements, but this may be described using term first, second, third, etc. in the disclosure
A little elements should not necessarily be limited by these terms.These terms are only used to for same type of element being distinguished from each other out.For example, not departing from
In the case where disclosure range, first element can also be referred to as second element, and similarly, second element can also be referred to as
One element.The use of any and all example or exemplary language provided by the present embodiment (" such as ", " such as ") is only anticipated
The embodiment of the present invention is better described in figure, and unless the context requires otherwise, otherwise will not apply to the scope of the present invention and limit.
It should be appreciated that the embodiment of the present invention can be by computer hardware, the combination of hardware and software or by depositing
The computer instruction in non-transitory computer-readable memory is stored up to be effected or carried out.Standard volume can be used in the method
Journey technology-includes that the non-transitory computer-readable medium configured with computer program is realized in computer program, wherein such as
The medium of this configuration operates computer in a manner of specific and is predefined --- according to the side described in a particular embodiment
Method and attached drawing.Each program can be realized with the programming language of level process or object-oriented to communicate with computer system.
However, if desired, the program can be realized with compilation or machine language.Under any circumstance, which can be compiling or solution
The language released.In addition, the program can be run on the specific integrated circuit of programming for this purpose.
In addition, the operation of the process of the present embodiment description can be performed in any suitable order, unless the present embodiment is another
It is outer instruction or otherwise significantly with contradicted by context.The process (or modification and/or combination thereof) of the present embodiment description can
Configured with executable instruction one or more computer systems control under execute, and can be used as jointly at one or
The code (for example, executable instruction, one or more computer program or one or more application) that is executed on multiple processors,
It is realized by hardware or combinations thereof.The computer program includes the multiple instruction that can be performed by one or more processors.
Further, the method can be realized in being operably coupled to suitable any kind of computing platform, wrap
Include but be not limited to PC, mini-computer, main frame, work station, network or distributed computing environment, individual or integrated
Computer platform or communicated with charged particle tool or other imaging devices etc..Each aspect of the present invention can be to deposit
The machine readable code in non-transitory medium or equipment is stored up to realize, it is flat no matter to be moveable or be integrated to calculating
Platform, such as hard disk, optically read and/or write-in medium, RAM, ROM so that it can be read by programmable calculator, when medium or
It can be used for configuration and operation computer when equipment is read by computer to execute process described herein.In addition, machine readable
Code, or part thereof can be transmitted by wired or wireless network.When such media include combining microprocessor or other data
When processor realizes the instruction or program of steps described above, invention described in the present embodiment includes these and other different types
Non-transitory computer-readable medium.When methods and techniques according to the present invention programming, the invention also includes calculating
Machine itself.
Computer program can be applied to input data to execute function described in the present embodiment, to convert input data
It is stored with generating to the output data of nonvolatile memory.Output information can also be applied to one or more output equipments such as
Display.In the preferred embodiment of the invention, the data of conversion indicate physics and tangible object, including generate on display
Physics and physical objects particular visual describe.
The above, only presently preferred embodiments of the present invention, the invention is not limited to above embodiment, as long as
It reaches technical effect of the invention with identical means, all within the spirits and principles of the present invention, any modification for being made,
Equivalent replacement, improvement etc., should be included within the scope of the present invention.Its technical solution within the scope of the present invention
And/or embodiment can have a variety of different modifications and variations.
Claims (10)
1. a kind of industrial flow data processing method, which comprises the following steps:
Classified using trained multiple mutually different sorting algorithms to industrial flow data, to obtain multiple points
Class result;
It is voted using trained Voting Algorithm multiple classification results, to make each classification results difference
Corresponding corresponding poll, then output is corresponding with the classification results of highest poll;
According to the classification results exported, whether output belongs to the judging result of abnormal data to the industrial flow data.
2. a kind of industrial flow data processing method according to claim 1, which is characterized in that it further include training step,
The training step specifically includes:
Obtain data label corresponding with the industrial flow data;The data label is for indicating the corresponding industry stream
Measure the classification of data;
Training set is established using the industrial flow data and the data label;Industrial flow data in the training set are used
Make the input data of each sorting algorithm, the data label in the training set is used as the desired output of the Voting Algorithm;
Each sorting algorithm and Voting Algorithm are trained using the training set.
3. a kind of industrial flow data processing method according to claim 2, which is characterized in that further comprising the steps of:
The repeatedly training step is executed, and the ballot weight of the Voting Algorithm is carried out after executing training step every time
Adjustment;The ballot weight is used for so that when executing the Voting Algorithm, and the poll that each classification results obtain has phase
The weight answered;
When the output of the Voting Algorithm is with the error minimum between corresponding data label, corresponding ballot weight is recorded;
The Voting Algorithm is set according to the ballot weight recorded.
4. a kind of industrial flow data processing method according to claim 1-3, which is characterized in that the multiple
Mutually different sorting algorithm is K nearest neighbor algorithm, NB Algorithm and decision Tree algorithms.
5. a kind of industrial flow data processing method according to claim 1-3, which is characterized in that further include with
Lower step:
Detect the length of the industrial flow data;
If the length of the industrial flow data is greater than preset length, the industrial flow data are truncated to the default length
Degree carries out data bit to the industrial flow data and supplies if the length of the industrial flow data is less than preset length, from
And the length of the industrial flow data is made to increase to the preset length;
The industrial flow data by truncating or data bit is supplied are normalized.
6. a kind of industrial flow data processing method according to claim 5, which is characterized in that the normalized institute
Formula isIn formula, aiFor the industry stream
Measure the characteristic value of data, Rescaled (ai) it is to aiIt is being normalized as a result, AminFor the industrial flow data
The minimum value of characteristic value, AmaxFor the maximum value of the characteristic value of the industrial flow data, max and min are respectively present count
Value.
7. a kind of industrial flow data detection method, which comprises the following steps:
Obtain industrial flow data;
As the method according to claim 1 to 6 is executed to handle the industrial flow data got;
If detecting, the industrial flow data belong to abnormal data, carry out alarm prompt.
8. a kind of industrial flow data processing system characterized by comprising
Categorization module, for being classified using trained multiple mutually different sorting algorithms to industrial flow data,
To obtain multiple classification results;
Vote module, for being voted using trained Voting Algorithm multiple classification results, to make each institute
It states classification results and respectively corresponds corresponding poll, and export the classification results for being corresponding with highest poll;
Judgment module, for according to the classification results exported, whether output to belong to abnormal data to the industrial flow data
Judging result.
9. a kind of industrial flow data processing equipment, which is characterized in that including memory and processor, the memory is for depositing
At least one program is stored up, the processor is required described in any one of 1-7 for loading at least one described program with perform claim
Method.
10. a kind of medium, with store function, wherein being stored with the executable instruction of processor, which is characterized in that the place
The executable instruction of reason device is used to execute such as any one of claim 1-7 the method when executed by the processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910534886.0A CN110363223A (en) | 2019-06-20 | 2019-06-20 | Industrial flow data processing method, detection method, system, device and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910534886.0A CN110363223A (en) | 2019-06-20 | 2019-06-20 | Industrial flow data processing method, detection method, system, device and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110363223A true CN110363223A (en) | 2019-10-22 |
Family
ID=68216411
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910534886.0A Pending CN110363223A (en) | 2019-06-20 | 2019-06-20 | Industrial flow data processing method, detection method, system, device and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110363223A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111400155A (en) * | 2020-03-13 | 2020-07-10 | 深圳前海微众银行股份有限公司 | Data detection method and device |
CN114615002A (en) * | 2020-12-03 | 2022-06-10 | 中国移动通信集团设计院有限公司 | Operator key infrastructure controlled identification method and system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104391860A (en) * | 2014-10-22 | 2015-03-04 | 安一恒通(北京)科技有限公司 | Content type detection method and device |
CN104468276A (en) * | 2014-12-18 | 2015-03-25 | 东南大学 | Network traffic identification method based on random sampling multiple classifiers |
CN104951809A (en) * | 2015-07-14 | 2015-09-30 | 西安电子科技大学 | Unbalanced data classification method based on unbalanced classification indexes and integrated learning |
CN107294993A (en) * | 2017-07-05 | 2017-10-24 | 重庆邮电大学 | A kind of WEB abnormal flow monitoring methods based on integrated study |
CN109325638A (en) * | 2018-11-09 | 2019-02-12 | 电子科技大学 | A kind of SDN method for predicting based on RBF neural |
-
2019
- 2019-06-20 CN CN201910534886.0A patent/CN110363223A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104391860A (en) * | 2014-10-22 | 2015-03-04 | 安一恒通(北京)科技有限公司 | Content type detection method and device |
CN104468276A (en) * | 2014-12-18 | 2015-03-25 | 东南大学 | Network traffic identification method based on random sampling multiple classifiers |
CN104951809A (en) * | 2015-07-14 | 2015-09-30 | 西安电子科技大学 | Unbalanced data classification method based on unbalanced classification indexes and integrated learning |
CN107294993A (en) * | 2017-07-05 | 2017-10-24 | 重庆邮电大学 | A kind of WEB abnormal flow monitoring methods based on integrated study |
CN109325638A (en) * | 2018-11-09 | 2019-02-12 | 电子科技大学 | A kind of SDN method for predicting based on RBF neural |
Non-Patent Citations (5)
Title |
---|
何跃 等: "基于情感知识和机器学习算法的组合微文情感倾向分类研究", 《情报杂志》 * |
吴嘉乐: "异质集成学习器在鸢尾花卉分类中的应用", 《中国设备工程》 * |
朱佳佳,陈佳: "基于熵和SVM多分类器的异常流量检测方法", 《计算机技术与发展》 * |
汪为汉: "IPv6网络流量分类识别技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
高嵩 等: "基于快速级联分类器的行人检测方法研究", 《计算机工程与科学》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111400155A (en) * | 2020-03-13 | 2020-07-10 | 深圳前海微众银行股份有限公司 | Data detection method and device |
CN111400155B (en) * | 2020-03-13 | 2021-08-31 | 深圳前海微众银行股份有限公司 | Data detection method and device |
CN114615002A (en) * | 2020-12-03 | 2022-06-10 | 中国移动通信集团设计院有限公司 | Operator key infrastructure controlled identification method and system |
CN114615002B (en) * | 2020-12-03 | 2024-02-27 | 中国移动通信集团设计院有限公司 | Controlled identification method and system for key infrastructure of operator |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9984334B2 (en) | Method for anomaly detection in time series data based on spectral partitioning | |
Wahono et al. | A comparison framework of classification models for software defect prediction | |
CN100489870C (en) | Method and multidimensional system for statistical process control | |
CN105354198B (en) | A kind of data processing method and device | |
CN102265227B (en) | Method and apparatus for creating state estimation models in machine condition monitoring | |
Wahono et al. | Neural network parameter optimization based on genetic algorithm for software defect prediction | |
CN111079283B (en) | Method for processing information saturation imbalance data | |
US11416717B2 (en) | Classification model building apparatus and classification model building method thereof | |
CN110363223A (en) | Industrial flow data processing method, detection method, system, device and medium | |
US20050144537A1 (en) | Method to use a receiver operator characteristics curve for model comparison in machine condition monitoring | |
CN104915679A (en) | Large-scale high-dimensional data classification method based on random forest weighted distance | |
CN116930042B (en) | Building waterproof material performance detection equipment and method | |
CN113988616A (en) | Enterprise risk assessment system and method based on industry data | |
Jantzen | Dynamical kinds and their discovery | |
CN115186776B (en) | Method, device and storage medium for classifying ruby producing areas | |
US20230156043A1 (en) | System and method of supporting decision-making for security management | |
US7672813B2 (en) | Mixed statistical and numerical model for sensor array detection and classification | |
Buschmann et al. | Data-driven decision support for process quality improvements | |
US6782376B2 (en) | Reasoning method based on similarity of cases | |
CN107067034B (en) | Method and system for rapidly identifying infrared spectrum data classification | |
Caspary et al. | Statistical quality control of geodata | |
Balega et al. | IoT Anomaly Detection Using a Multitude of Machine Learning Algorithms | |
CN107239256A (en) | The randomness detecting method of lottery industry random sequence based on overall merit | |
CN113283512A (en) | Data anomaly detection method, device, equipment and storage medium | |
EP3686812A1 (en) | System and method for context-based training of a machine learning model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191022 |