CN110363223A - Industrial flow data processing method, detection method, system, device and medium - Google Patents

Industrial flow data processing method, detection method, system, device and medium Download PDF

Info

Publication number
CN110363223A
CN110363223A CN201910534886.0A CN201910534886A CN110363223A CN 110363223 A CN110363223 A CN 110363223A CN 201910534886 A CN201910534886 A CN 201910534886A CN 110363223 A CN110363223 A CN 110363223A
Authority
CN
China
Prior art keywords
flow data
industrial flow
data
classification results
industrial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910534886.0A
Other languages
Chinese (zh)
Inventor
高英
宋彬杰
靳亚洽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201910534886.0A priority Critical patent/CN110363223A/en
Publication of CN110363223A publication Critical patent/CN110363223A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/259Fusion by voting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of industrial flow data processing method, detection method, system, device and media.It the treating method comprises and classified using trained multiple mutually different sorting algorithms to industrial flow data, to obtain multiple classification results, it is voted using trained Voting Algorithm multiple classification results, to make each classification results respectively correspond corresponding poll, then output is corresponding with the classification results of highest poll, and according to the classification results exported, the judging result whether the industrial flow data are belonged to abnormal data exported.The present invention is used for the abnormality detection to industrial flow data, high accuracy, the classification of the flow detection of low rate of false alarm may be implemented, to prevent abnormal flow invasion, safeguard industries control system safety and privacy from providing safety guarantee.The present invention is widely used in field of computer technology.

Description

Industrial flow data processing method, detection method, system, device and medium
Technical field
The present invention relates to field of computer technology, especially a kind of industrial flow data processing method, system, device and Jie Matter.
Background technique
Industrial control system is widely applied in the industrial production.The course of work of industrial control system can generate or pass through The equipment such as sensor get industrial flow data, in order to ensure the safety and stability of production, need to industrial flow number According to being detected, therefore, it is determined that industrial flow data are normal data or abnormal data.For detecting showing for abnormal flow data There is technology to can be mainly divided into three classes, i.e., method based on port, the method based on traffic characteristic statistics and based on it is original effectively The method of load.
The principle of method based on port is checked using the port numbers in the packet header of industrial flow data The application program known.Method based on port is simple and easy, but makes since many application programs are even passed through using dynamic port Oneself is hidden with the well-known port of other applications, the method based on port is caused to be difficult to industrial flow data It is identified, therefore the method based on port can not provide reliable as a result, the prior art is less using based on port Method.
Network flow is divided using having supervision and unsupervised machine learning algorithm based on the method for traffic characteristic statistics Class is the predefined classification of known applications, but this method needs expert by a large amount of experience, unites to flow information Meter, expends more manpower.
The principle of method based on original payload is to learn original industrial flow data using the methods of deep learning Internal characteristics.But based on the method for deep learning in learning data internal characteristics, there are the loss of data information, feature Extract incomplete disadvantage.
Summary of the invention
In order to solve the above-mentioned technical problem, the purpose of the present invention is to provide a kind of industrial flow data processing method, be System, device and medium.
On the one hand, the embodiment of the present invention includes a kind of industrial flow data processing methods, comprising the following steps:
Classified using trained multiple mutually different sorting algorithms to industrial flow data, to obtain more A classification results;
It is voted using trained Voting Algorithm multiple classification results, to make each classification results Corresponding poll is respectively corresponded, then output is corresponding with the classification results of highest poll;
According to the classification results exported, whether output belongs to the judgement knot of abnormal data to the industrial flow data Fruit.
Further, the industrial flow data processing method further includes training step, and the training step specifically includes:
Obtain data label corresponding with the industrial flow data;The data label is for indicating the corresponding work The classification of industry data on flows;
Training set is established using the industrial flow data and the data label;Industrial flow number in the training set According to the input data for being used as each sorting algorithm, the expectation that the data label in the training set is used as the Voting Algorithm is defeated Out;
Each sorting algorithm and Voting Algorithm are trained using the training set.
Further, the industrial flow data processing method is further comprising the steps of:
The repeatedly training step is executed, and to the ballot weight of the Voting Algorithm after executing training step every time It is adjusted;The ballot weight is used for so that when executing the Voting Algorithm, and the poll that each classification results obtain has There is corresponding weight;
When the output of the Voting Algorithm is with the error minimum between corresponding data label, corresponding franchise is recorded Weight;
The Voting Algorithm is set according to the ballot weight recorded.
Further, the multiple mutually different sorting algorithm is K nearest neighbor algorithm, NB Algorithm and decision Tree algorithm.
Further, the industrial flow data processing method is further comprising the steps of:
Detect the length of the industrial flow data;
If the length of the industrial flow data is greater than preset length, the industrial flow data are truncated to described pre- If length, if the length of the industrial flow data is less than preset length, data bit benefit is carried out to the industrial flow data Foot, so that the length of the industrial flow data be made to increase to the preset length;
The industrial flow data by truncating or data bit is supplied are normalized.
Further, formula used in the normalized isIn formula, aiFor the spy of the industrial flow data Value indicative, Rescaled (ai) it is to aiIt is being normalized as a result, AminFor the industrial flow data characteristic value most Small value, AmaxFor the maximum value of the characteristic value of the industrial flow data, max and min are respectively default value.
On the other hand, the embodiment of the invention also includes a kind of industrial flow data detection methods, comprising the following steps:
Obtain industrial flow data;
As the method according to claim 1 to 6 is executed to handle the industrial flow data got;
If detecting, the industrial flow data belong to abnormal data, carry out alarm prompt.
On the other hand, the embodiment of the invention also includes a kind of industrial flow data processing systems, comprising:
Categorization module, for being divided using trained multiple mutually different sorting algorithms industrial flow data Class, to obtain multiple classification results;
Vote module, for being voted using trained Voting Algorithm multiple classification results, to make Each classification results respectively correspond corresponding poll, and export the classification results for being corresponding with highest poll;
Judgment module, for according to the classification results exported, whether output to belong to exception to the industrial flow data The judging result of data.
On the other hand, the embodiment of the invention also includes a kind of industrial flow data processing equipments, including memory and processing Device, the memory is for storing at least one program, and the processor is for loading at least one described program to execute sheet Inventive method.
On the other hand, the embodiment of the invention also includes a kind of media, with store function, wherein being stored with processor can The instruction of execution, the executable instruction of the processor are used to execute the method for the present invention when executed by the processor.
The beneficial effects of the present invention are: by the present invention in that with trained sorting algorithm and Voting Algorithm, Ke Yishi The feature distribution of other industrial flow data also has detectability to the novel exception not occurred, avoids conventional statistics spy Sign can not effectively detect abnormal disadvantage, and simplify expert and carry out manual analysis to traffic characteristic and extract the behaviour of feature Make.The present invention is used for the abnormality detection to industrial flow data, high accuracy, the flow detection of low rate of false alarm point may be implemented Class, to prevent abnormal flow invasion, safeguard industries control system safety and privacy from providing safety guarantee.
Detailed description of the invention
Fig. 1 is the flow chart of industrial flow data processing method in the embodiment of the present invention;
Fig. 2 is the schematic diagram of industrial flow data processing method in the embodiment of the present invention;
Fig. 3 is the schematic diagram of the training process in the embodiment of the present invention to sorting algorithm and Voting Algorithm;
Fig. 4 is the structural block diagram of industrial flow data processing system in the embodiment of the present invention.
Specific embodiment
The present embodiment includes a kind of industrial flow data processing method, referring to Fig.1, comprising the following steps:
S1. classified using trained multiple mutually different sorting algorithms to industrial flow data, thus To multiple classification results;
S2. it is voted using trained Voting Algorithm multiple classification results, to make each classification As a result corresponding poll is respectively corresponded, then output is corresponding with the classification results of highest poll;
S3. according to the classification results exported, whether output belongs to the judgement of abnormal data to the industrial flow data As a result.
In step S1, multiple mutually different sorting algorithms are set, these sorting algorithms have passed through training in advance, to have There is corresponding classification capacity.The industrial flow data that will acquire are separately input in these sorting algorithms, receive these classification Multiple classification results of algorithm output.The classification results refer to the classification that industrial flow data are divided by sorting algorithm, no Identical industrial flow data may be divided into identical classification or different classifications by same sorting algorithm.
In the present embodiment, the industrial flow data can refer in particular to an individual, i.e. a data packet or one piece of data, It is also possible to a collective concept, i.e., the set of multiple data packets or multiple segment data composition.The stream of industry handled by step S1 Amount data can be a data packet.
In step S2, voted using trained Voting Algorithm multiple classification results.In poll closing Afterwards, each classification results corresponding to industrial flow data can obtain corresponding poll respectively.Voting Algorithm described in the present embodiment It is simple vote algorithm, that is, searches and be corresponding with the classification results of highest poll and output it.If there is multiple classification results institutes The poll obtained is identical, can use and sort according to default sequence and select the classification results to rank the first, or pass through random side Method is selected classification results and is exported.
Multiple sorting algorithms may be respectively by industrial flow data classification at multiple classifications in step S1.Pass through step S2's The most classification results of corresponding poll can be determined as industrial flow data generic by ballot.
In step S3, according to industrial flow data generic determined by step S2, industrial flow number is further judged According to abnormal data or normal data is belonged to, then judging result is exported, it is final whether just to determine industrial flow data Often.The specific implementation procedure of step S1-S3 may include: classify in establishment step S1 resulting multiple classification results and " exception The mapping relations of data " and " normal data " both classifications are divided belonging to industrial flow data by executing step S2 and determining After class result, then by the mapping relations determine that industrial flow data belong to normal data or abnormal data, to realize Detection to industrial flow data exception.Each sorting algorithm can also be configured, so that each sorting algorithm is directly by work Industry data on flows is classified as " normal data " or " abnormal data ", is then voted using Voting Algorithm, and output is corresponding with most The classification results of high poll, so that the determination of industrial flow data is directly belonged to " normal data " still " abnormal data ".
The principle of step S1-S3 is as shown in Figure 2.At application class algorithm and Voting Algorithm are to industrial flow data When reason, processed industrial flow data are known as test set.
Step S1-S3 can identify industrial flow data by using trained sorting algorithm and Voting Algorithm Feature distribution also has detectability to the novel exception not occurred, avoid conventional statistics feature can not effectively detect it is different Normal disadvantage, and simplify expert and manual analysis is carried out to traffic characteristic and extracts the operation of feature.Step S1-S3 is used for To the abnormality detection of industrial flow data, high accuracy, the classification of the flow detection of low rate of false alarm may be implemented, to prevent exception stream Amount invasion, safeguard industries control system safety and privacy provide safety guarantee.
Preferably, K nearest neighbor algorithm (K-Nearest Neighbor, KNN), naive Bayesian can be used in step S1 Three kinds of sorting algorithms of algorithm (Naive Bayes, NB) and decision Tree algorithms (Decision Tree, DT).Executing step S1 When, these three sorting algorithms receive same industrial flow data respectively and export respective classification results.
The principle of K nearest neighbor algorithm is: the several samples for looking for sample undetermined nearest, according to the affiliated class of several samples, so Classification of the maximum classification of accounting as sample undetermined is selected afterwards.K is exactly the sample size for needing to find in algorithm.In the present invention In, we select 5 samples (i.e. K value be 5) nearest apart from sample undetermined, and the weight of 5 samples, ballot is set as Specific gravity, distance exam select Euclidean distance, and calculation formula isTraining set is inputted After KNN module, KNN module calculates 5 samples nearest with flow Euclidean distance, carries out kind judging, and and training set Label is compared, and by the learning process, KNN module is made to grasp the feature distribution of the training set, so that the module is being tested Stage being capable of accurate judgement unknown flow rate classification.
The principle of decision Tree algorithms is: DT algorithm is usually the optimal cut-off (i.e. feature) of recurrence selection, and according to This feature is split training dataset, so that having a best assorting process, this process pair to each Sub Data Set The division to feature space is answered, the building of decision tree is also corresponded to, continues the process for recycling this cutting on Sub Data Set, Until all training data subsets are in the main true classification, or no suitable feature.Selecting optimal cut-off When, we measure the impurity level or uncertainty of data by calculating comentropy, while determining that classification becomes with comentropy Optimal two score value of amount obtains cutting problems, and calculation formula isWherein, D indicates training data Collection, c indicate data category number, piIt indicates that classification i sample size accounts for the ratio of all samples, determines a certain feature as node Afterwards, classify using the node as root to data, sorted data set information entropy can be than small before classification, and calculation formula isK indicates that sample D is divided into the part k in formula.Information gain, the i.e. difference of comentropy, weighing apparatus Influence of some feature to classification results is measured, calculation formula is Gain (A)=Info (D)-InfoA(D).Training set is inputted into DT After module, DT module calculates the comentropy of each characteristic value, by information entropy size, selects the first cut-off, then arranges Except selected feature, recursive operation is carried out to remaining feature, until data can not divide again.
The principle of NB Algorithm is: Naive Bayes Classification Algorithm is based on Bayes' theorem and realizes target classification, Its principle is the prior probability using event, calculates posterior probability by Bayesian formula, then judges posterior probability, is selected Affiliated class of the maximum probability class as the event, Bayesian formula areWherein, event BiFull probability be P (Bi), event Bi under occurrence condition the conditional probability of event A be P (A | Bi), thing under event A occurrence condition The conditional probability of part Bi is P (Bi|A).After training set is inputted NB Algorithm, module is estimated by calculating maximum likelihood Meter, and judge the data category in training set.
It is further used as preferred embodiment, the industrial flow data processing method further includes training step, described Training step specifically includes:
SA1. data label corresponding with the industrial flow data is obtained;The data label is for indicating corresponding institute State the classification of industrial flow data;
SA2. training set is established using the industrial flow data and the data label;Industry stream in the training set The input data that data are used as each sorting algorithm is measured, the data label in the training set is used as the phase of the Voting Algorithm Hope output;
SA3. each sorting algorithm and Voting Algorithm are trained using the training set.
In the present embodiment, step SA1-SA3 is for being trained sorting algorithm and Voting Algorithm, it is preferable that step SA1-SA3 is executed before executing step S1-S3.Locating for industrial data flow used in step SA1 and step S1-S3 The industrial data flow of reason can come from same source, such as the industrial data flow that can be will acquire is by data packet number 4: 6 ratio is allocated, and the part that ratio is " 4 " is used in step SA1-SA3 instruct sorting algorithm and Voting Algorithm Practice, the part that ratio is " 6 " is used to handle in step SA1-SA3 for sorting algorithm and Voting Algorithm.
The principle of step SA1-SA3 is as shown in Figure 3.
In step SA1, it can be classified by manual method to each industrial flow data, that is, by each industrial flow Data are divided into corresponding classification, and the classification is indicated with corresponding data label.
In step SA2, training set is established using the resulting industrial flow data of step SA1 and data label.When executing step Rapid SA3, industry stream when being trained using the resulting training set of step SA2 to sorting algorithm and Voting Algorithm, in training set The input data that data are used as sorting algorithm is measured, after sorting algorithm output category result, the Voting Algorithm is to each classification results It is voted and exports the classification results with highest poll, the expectation that the data label in training set is used as Voting Algorithm is defeated Out, i.e., the parameter of sorting algorithm and Voting Algorithm is adjusted, so that point with highest poll that Voting Algorithm is exported Class result converges to corresponding data label.In the training process to sorting algorithm and Voting Algorithm, it may be necessary to execute more Training step is taken turns, that is, executes the combination being repeatedly made of step SA1-SA3.
It is further used as preferred embodiment, the industrial flow data processing method is further comprising the steps of:
SB1. the repeatedly training step, and the ballot after executing training step every time to the Voting Algorithm are executed Weight is adjusted;The ballot weight is used for so that when executing the Voting Algorithm, the ticket of each classification results acquisition Number has corresponding weight;
SB2. when the output of the Voting Algorithm is with the error minimum between corresponding data label, corresponding throw is recorded Ticket weight;
SB3. the Voting Algorithm is set according to the ballot weight recorded.
In the present embodiment, used Voting Algorithm can be to the identical ballot weight of setting of all categories.The franchise Refer to again when counting each classification results poll obtained, on the basis of original poll multiplied by coefficient.The present embodiment In, used Voting Algorithm can also identical ballot weight, the ballot weight not can be two-by-two to setting of all categories It is configured according to the property of used sorting algorithm, dynamic adjustment can also be carried out by executing step SB1-SB3.
The step SB1-SB3 can be executed before executing step S1-S3.More wheel training steps are executed in step SB1, Execute combination composed by multiple step SA1-SA3.Each round training step, i.e., it is described when executing step SA1-SA3 every time Different ballot weights is arranged in multiple classification results that Voting Algorithm all respectively generates sorting algorithm;Executing a step After SA1-SA3, the ballot weight of the Voting Algorithm is adjusted, that is, to multiple classification results that sorting algorithm generates Corresponding ballot weight is modified, and then executes step SA1-SA3 next time.After executing step SA1-SA3 every time, note The error between the reality output and desired output of Voting Algorithm is recorded, to investigate the influence of ballot weight bring.
It in step SB2, analyzes the resulting error of step SA1-SA3 is performed a plurality of times, it is corresponding to find out minimal error That secondary step SA1-SA3 implementation procedure, and by ballot weight used in that the secondary step SA1-SA3 implementation procedure found The Voting Algorithm is set.Voting Algorithm after setting is executing step S1-S3 to generating in actual production It when industrial flow data are handled, is then voted using set ballot weight, to obtain optimal effectiveness.
It is further used as preferred embodiment, the industrial flow data processing method is further comprising the steps of:
SC1. the length of the industrial flow data is detected;
If SC2. the length of the industrial flow data is greater than preset length, the industrial flow data are truncated to institute Preset length is stated, if the length of the industrial flow data is less than preset length, data are carried out to the industrial flow data Position is supplied, so that the length of the industrial flow data be made to increase to the preset length;
SC3. the industrial flow data by truncating or data bit is supplied are normalized.
The step SC1-SC3 is the preprocessing process to the industrial flow data, at this time the industrial flow data It embodies in the form of data packet.Wherein, the step SC1 and SC2 is for carrying out at alignment the industrial flow data Reason, that is, the length of industrial flow data is subjected to unification, so that industrial flow data are used to calculate sorting algorithm and ballot When the training process of method, sorting algorithm and Voting Algorithm can be allowed preferably to learn to feature therein.Specifically: setting one Preset length MIS, such as MIS=500 judge the length of industrial flow data and the relationship of MIS, if the length of industrial flow data Degree is equal to MIS and does not operate then;If the length of industrial flow data is greater than MIS, last several of industrial flow data are cut It goes the length of industrial flow data being truncated to MIS;If the length of industrial flow data is less than MIS, in industrial flow 0 is mended after last of data to which the length of industrial flow data is complemented to MIS.
In step SC3, industrial flow data are normalized by following formula:
In formula, aiFor the industrial flow The characteristic value of data, the i.e. specific value of the characteristic value of industrial flow data;Rescaled(ai) it is to aiIt is normalized Result;AminFor the minimum value of the characteristic value of the industrial flow data, i.e. the characteristic value of industrial flow data may obtain Minimum value;AmaxFor the maximum value of the characteristic value of the industrial flow data, i.e. the characteristic value of industrial flow data may take The maximum value obtained;Max and min is respectively default value, such as max=1 and min=0 can be set.
By the normalized, the characteristic value of industrial flow data can be zoomed into [0,1], it can be to avoid difference The characteristic values of industrial flow data differ in the larger implementation procedure for leading to sorting algorithm and Voting Algorithm objective function and become " flat ", therefore the required training time can be reduced.
The present embodiment further includes a kind of industrial flow data detection method, comprising the following steps:
S100. industrial flow data are obtained;
S101. S1-S3 is executed, to handle the industrial flow data got;
If S102. detecting, the industrial flow data belong to abnormal data, carry out alarm prompt.
The step S100-S102 is the data flow being applied to step S1-S3 to being formed by a large amount of industrial flow data It is detected.In step S100, industrial flow data are obtained from data flow by modes such as crawl data packets, then execute step Rapid S1-S3 handles single industrial flow data, judges that industrial flow data belong to normal data or abnormal data.Step In rapid S102, if detecting that industrial flow data belong to abnormal data, passed with regard to carrying out alarm prompt or pause data Defeated equal operation, to safeguard the safety and stabilization of production.
The present embodiment further includes a kind of industrial flow data processing system, referring to Fig. 4, comprising:
Categorization module, for being divided using trained multiple mutually different sorting algorithms industrial flow data Class, to obtain multiple classification results;
Vote module, for being voted using trained Voting Algorithm multiple classification results, to make Each classification results respectively correspond corresponding poll, and export the classification results for being corresponding with highest poll;
Judgment module, for according to the classification results exported, whether output to belong to exception to the industrial flow data The judging result of data.
The categorization module, vote module and judgment module can be the hardware mould in computer system with corresponding function Block or software module.
The present embodiment further includes a kind of industrial flow data processing equipment, including memory and processor, the memory For storing at least one program, the processor is for loading at least one described program to execute industrial flow number of the present invention According to processing method.
The present embodiment further includes a kind of medium, with store function, wherein being stored with the executable instruction of processor, institute It states the executable instruction of processor and is used to execute industrial flow data processing method of the present invention when executed by the processor.
Industrial flow data processing system, device and medium in the present embodiment, can execute industrial flow of the invention Data processing method, any combination implementation steps of executing method embodiment have the corresponding function of this method and beneficial to effect Fruit.
It should be noted that unless otherwise specified, when a certain feature referred to as " fixation ", " connection " are in another feature, It can directly fix, be connected to another feature, and can also fix, be connected to another feature indirectly.In addition, this The descriptions such as the upper and lower, left and right used in open are only the mutual alignment pass relative to each component part of the disclosure in attached drawing For system.The "an" of used singular, " described " and "the" are also intended to including most forms in the disclosure, are removed Non- context clearly expresses other meaning.In addition, unless otherwise defined, all technology and science used in the present embodiment Term is identical as the normally understood meaning of those skilled in the art.This example demonstrates that term used in book is In order to describe specific embodiment, it is not intended to be limiting of the invention.Term "and/or" used in the present embodiment includes one Or the arbitrary combination of multiple relevant listed items.
It will be appreciated that though various elements, but this may be described using term first, second, third, etc. in the disclosure A little elements should not necessarily be limited by these terms.These terms are only used to for same type of element being distinguished from each other out.For example, not departing from In the case where disclosure range, first element can also be referred to as second element, and similarly, second element can also be referred to as One element.The use of any and all example or exemplary language provided by the present embodiment (" such as ", " such as ") is only anticipated The embodiment of the present invention is better described in figure, and unless the context requires otherwise, otherwise will not apply to the scope of the present invention and limit.
It should be appreciated that the embodiment of the present invention can be by computer hardware, the combination of hardware and software or by depositing The computer instruction in non-transitory computer-readable memory is stored up to be effected or carried out.Standard volume can be used in the method Journey technology-includes that the non-transitory computer-readable medium configured with computer program is realized in computer program, wherein such as The medium of this configuration operates computer in a manner of specific and is predefined --- according to the side described in a particular embodiment Method and attached drawing.Each program can be realized with the programming language of level process or object-oriented to communicate with computer system. However, if desired, the program can be realized with compilation or machine language.Under any circumstance, which can be compiling or solution The language released.In addition, the program can be run on the specific integrated circuit of programming for this purpose.
In addition, the operation of the process of the present embodiment description can be performed in any suitable order, unless the present embodiment is another It is outer instruction or otherwise significantly with contradicted by context.The process (or modification and/or combination thereof) of the present embodiment description can Configured with executable instruction one or more computer systems control under execute, and can be used as jointly at one or The code (for example, executable instruction, one or more computer program or one or more application) that is executed on multiple processors, It is realized by hardware or combinations thereof.The computer program includes the multiple instruction that can be performed by one or more processors.
Further, the method can be realized in being operably coupled to suitable any kind of computing platform, wrap Include but be not limited to PC, mini-computer, main frame, work station, network or distributed computing environment, individual or integrated Computer platform or communicated with charged particle tool or other imaging devices etc..Each aspect of the present invention can be to deposit The machine readable code in non-transitory medium or equipment is stored up to realize, it is flat no matter to be moveable or be integrated to calculating Platform, such as hard disk, optically read and/or write-in medium, RAM, ROM so that it can be read by programmable calculator, when medium or It can be used for configuration and operation computer when equipment is read by computer to execute process described herein.In addition, machine readable Code, or part thereof can be transmitted by wired or wireless network.When such media include combining microprocessor or other data When processor realizes the instruction or program of steps described above, invention described in the present embodiment includes these and other different types Non-transitory computer-readable medium.When methods and techniques according to the present invention programming, the invention also includes calculating Machine itself.
Computer program can be applied to input data to execute function described in the present embodiment, to convert input data It is stored with generating to the output data of nonvolatile memory.Output information can also be applied to one or more output equipments such as Display.In the preferred embodiment of the invention, the data of conversion indicate physics and tangible object, including generate on display Physics and physical objects particular visual describe.
The above, only presently preferred embodiments of the present invention, the invention is not limited to above embodiment, as long as It reaches technical effect of the invention with identical means, all within the spirits and principles of the present invention, any modification for being made, Equivalent replacement, improvement etc., should be included within the scope of the present invention.Its technical solution within the scope of the present invention And/or embodiment can have a variety of different modifications and variations.

Claims (10)

1. a kind of industrial flow data processing method, which comprises the following steps:
Classified using trained multiple mutually different sorting algorithms to industrial flow data, to obtain multiple points Class result;
It is voted using trained Voting Algorithm multiple classification results, to make each classification results difference Corresponding corresponding poll, then output is corresponding with the classification results of highest poll;
According to the classification results exported, whether output belongs to the judging result of abnormal data to the industrial flow data.
2. a kind of industrial flow data processing method according to claim 1, which is characterized in that it further include training step, The training step specifically includes:
Obtain data label corresponding with the industrial flow data;The data label is for indicating the corresponding industry stream Measure the classification of data;
Training set is established using the industrial flow data and the data label;Industrial flow data in the training set are used Make the input data of each sorting algorithm, the data label in the training set is used as the desired output of the Voting Algorithm;
Each sorting algorithm and Voting Algorithm are trained using the training set.
3. a kind of industrial flow data processing method according to claim 2, which is characterized in that further comprising the steps of:
The repeatedly training step is executed, and the ballot weight of the Voting Algorithm is carried out after executing training step every time Adjustment;The ballot weight is used for so that when executing the Voting Algorithm, and the poll that each classification results obtain has phase The weight answered;
When the output of the Voting Algorithm is with the error minimum between corresponding data label, corresponding ballot weight is recorded;
The Voting Algorithm is set according to the ballot weight recorded.
4. a kind of industrial flow data processing method according to claim 1-3, which is characterized in that the multiple Mutually different sorting algorithm is K nearest neighbor algorithm, NB Algorithm and decision Tree algorithms.
5. a kind of industrial flow data processing method according to claim 1-3, which is characterized in that further include with Lower step:
Detect the length of the industrial flow data;
If the length of the industrial flow data is greater than preset length, the industrial flow data are truncated to the default length Degree carries out data bit to the industrial flow data and supplies if the length of the industrial flow data is less than preset length, from And the length of the industrial flow data is made to increase to the preset length;
The industrial flow data by truncating or data bit is supplied are normalized.
6. a kind of industrial flow data processing method according to claim 5, which is characterized in that the normalized institute Formula isIn formula, aiFor the industry stream Measure the characteristic value of data, Rescaled (ai) it is to aiIt is being normalized as a result, AminFor the industrial flow data The minimum value of characteristic value, AmaxFor the maximum value of the characteristic value of the industrial flow data, max and min are respectively present count Value.
7. a kind of industrial flow data detection method, which comprises the following steps:
Obtain industrial flow data;
As the method according to claim 1 to 6 is executed to handle the industrial flow data got;
If detecting, the industrial flow data belong to abnormal data, carry out alarm prompt.
8. a kind of industrial flow data processing system characterized by comprising
Categorization module, for being classified using trained multiple mutually different sorting algorithms to industrial flow data, To obtain multiple classification results;
Vote module, for being voted using trained Voting Algorithm multiple classification results, to make each institute It states classification results and respectively corresponds corresponding poll, and export the classification results for being corresponding with highest poll;
Judgment module, for according to the classification results exported, whether output to belong to abnormal data to the industrial flow data Judging result.
9. a kind of industrial flow data processing equipment, which is characterized in that including memory and processor, the memory is for depositing At least one program is stored up, the processor is required described in any one of 1-7 for loading at least one described program with perform claim Method.
10. a kind of medium, with store function, wherein being stored with the executable instruction of processor, which is characterized in that the place The executable instruction of reason device is used to execute such as any one of claim 1-7 the method when executed by the processor.
CN201910534886.0A 2019-06-20 2019-06-20 Industrial flow data processing method, detection method, system, device and medium Pending CN110363223A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910534886.0A CN110363223A (en) 2019-06-20 2019-06-20 Industrial flow data processing method, detection method, system, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910534886.0A CN110363223A (en) 2019-06-20 2019-06-20 Industrial flow data processing method, detection method, system, device and medium

Publications (1)

Publication Number Publication Date
CN110363223A true CN110363223A (en) 2019-10-22

Family

ID=68216411

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910534886.0A Pending CN110363223A (en) 2019-06-20 2019-06-20 Industrial flow data processing method, detection method, system, device and medium

Country Status (1)

Country Link
CN (1) CN110363223A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111400155A (en) * 2020-03-13 2020-07-10 深圳前海微众银行股份有限公司 Data detection method and device
CN114615002A (en) * 2020-12-03 2022-06-10 中国移动通信集团设计院有限公司 Operator key infrastructure controlled identification method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104391860A (en) * 2014-10-22 2015-03-04 安一恒通(北京)科技有限公司 Content type detection method and device
CN104468276A (en) * 2014-12-18 2015-03-25 东南大学 Network traffic identification method based on random sampling multiple classifiers
CN104951809A (en) * 2015-07-14 2015-09-30 西安电子科技大学 Unbalanced data classification method based on unbalanced classification indexes and integrated learning
CN107294993A (en) * 2017-07-05 2017-10-24 重庆邮电大学 A kind of WEB abnormal flow monitoring methods based on integrated study
CN109325638A (en) * 2018-11-09 2019-02-12 电子科技大学 A kind of SDN method for predicting based on RBF neural

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104391860A (en) * 2014-10-22 2015-03-04 安一恒通(北京)科技有限公司 Content type detection method and device
CN104468276A (en) * 2014-12-18 2015-03-25 东南大学 Network traffic identification method based on random sampling multiple classifiers
CN104951809A (en) * 2015-07-14 2015-09-30 西安电子科技大学 Unbalanced data classification method based on unbalanced classification indexes and integrated learning
CN107294993A (en) * 2017-07-05 2017-10-24 重庆邮电大学 A kind of WEB abnormal flow monitoring methods based on integrated study
CN109325638A (en) * 2018-11-09 2019-02-12 电子科技大学 A kind of SDN method for predicting based on RBF neural

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
何跃 等: "基于情感知识和机器学习算法的组合微文情感倾向分类研究", 《情报杂志》 *
吴嘉乐: "异质集成学习器在鸢尾花卉分类中的应用", 《中国设备工程》 *
朱佳佳,陈佳: "基于熵和SVM多分类器的异常流量检测方法", 《计算机技术与发展》 *
汪为汉: "IPv6网络流量分类识别技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
高嵩 等: "基于快速级联分类器的行人检测方法研究", 《计算机工程与科学》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111400155A (en) * 2020-03-13 2020-07-10 深圳前海微众银行股份有限公司 Data detection method and device
CN111400155B (en) * 2020-03-13 2021-08-31 深圳前海微众银行股份有限公司 Data detection method and device
CN114615002A (en) * 2020-12-03 2022-06-10 中国移动通信集团设计院有限公司 Operator key infrastructure controlled identification method and system
CN114615002B (en) * 2020-12-03 2024-02-27 中国移动通信集团设计院有限公司 Controlled identification method and system for key infrastructure of operator

Similar Documents

Publication Publication Date Title
US9984334B2 (en) Method for anomaly detection in time series data based on spectral partitioning
Wahono et al. A comparison framework of classification models for software defect prediction
CN100489870C (en) Method and multidimensional system for statistical process control
CN105354198B (en) A kind of data processing method and device
CN102265227B (en) Method and apparatus for creating state estimation models in machine condition monitoring
Wahono et al. Neural network parameter optimization based on genetic algorithm for software defect prediction
CN111079283B (en) Method for processing information saturation imbalance data
US11416717B2 (en) Classification model building apparatus and classification model building method thereof
CN110363223A (en) Industrial flow data processing method, detection method, system, device and medium
US20050144537A1 (en) Method to use a receiver operator characteristics curve for model comparison in machine condition monitoring
CN104915679A (en) Large-scale high-dimensional data classification method based on random forest weighted distance
CN116930042B (en) Building waterproof material performance detection equipment and method
CN113988616A (en) Enterprise risk assessment system and method based on industry data
Jantzen Dynamical kinds and their discovery
CN115186776B (en) Method, device and storage medium for classifying ruby producing areas
US20230156043A1 (en) System and method of supporting decision-making for security management
US7672813B2 (en) Mixed statistical and numerical model for sensor array detection and classification
Buschmann et al. Data-driven decision support for process quality improvements
US6782376B2 (en) Reasoning method based on similarity of cases
CN107067034B (en) Method and system for rapidly identifying infrared spectrum data classification
Caspary et al. Statistical quality control of geodata
Balega et al. IoT Anomaly Detection Using a Multitude of Machine Learning Algorithms
CN107239256A (en) The randomness detecting method of lottery industry random sequence based on overall merit
CN113283512A (en) Data anomaly detection method, device, equipment and storage medium
EP3686812A1 (en) System and method for context-based training of a machine learning model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191022