CN110287231A - Abnormal method and detection device based on water environment sensor network monitoring big data - Google Patents

Abnormal method and detection device based on water environment sensor network monitoring big data Download PDF

Info

Publication number
CN110287231A
CN110287231A CN201910516179.9A CN201910516179A CN110287231A CN 110287231 A CN110287231 A CN 110287231A CN 201910516179 A CN201910516179 A CN 201910516179A CN 110287231 A CN110287231 A CN 110287231A
Authority
CN
China
Prior art keywords
subspace
template
data
network data
belonging
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910516179.9A
Other languages
Chinese (zh)
Inventor
李宏周
莫德清
任恩伯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN201910516179.9A priority Critical patent/CN110287231A/en
Publication of CN110287231A publication Critical patent/CN110287231A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3247Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials involving digital signatures

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Security & Cryptography (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Physiology (AREA)
  • Molecular Biology (AREA)
  • Fuzzy Systems (AREA)
  • Medical Informatics (AREA)
  • Testing Or Calibration Of Command Recording Devices (AREA)

Abstract

The embodiment of the invention discloses a kind of abnormal method and detection device based on water environment sensor network monitoring big data, for handling the high dimensional data problem in sensor network monitoring water environment, and the effective search subspace of energy is to detect the exception of subspace;The dynamic characteristics that can be adapted to data using dynamic subspace collection again, is accelerated the process of detection, reduces the error rate of testing result.The method comprise the steps that building subspace template;Obtain the network data of water environment sensor;The corresponding current planning unit lattice summary value of the subspace template is updated according to the network data;When the current planning unit lattice summary value, which belongs to the preset reference, plans cell summary value, determine subspace belonging to the network data for abnormal subspace;The exceptional value of subspace belonging to the network data is sent to client, the exceptional value of subspace belonging to the network data carries out data processing for the client.

Description

Abnormal method and detection device based on water environment sensor network monitoring big data
Technical field
The present invention relates to environmental monitoring technology fields, more particularly to one kind to be based on the big number of water environment sensor network monitoring According to abnormal method and detection device.
Background technique
Sensor network development in recent years is maked rapid progress, the various sensor network data scales applied to environmental monitoring Also increasing.With the application of machine learning and artificial intelligence technology in environmental monitoring, the early warning of environmental abnormality situation, It gradually predicts to change to the abnormality detection of high amount of traffic via simple threshold value early warning.
Such as the abnormal method about water environment sensor network monitoring big data are as follows: conventional exceptional value/singular point inspection Survey method is merely able to detect the exception in opposite low-dimensional and static data collection (without the data of frequency domain variation).In recent years, with Dampening environmental sensor node is continuously increased, for handling the abnormality detection of higher-dimension water environment data and water environment data flow It becomes more and more important.In higher dimensional space utilize subspace rejecting outliers method, estimation exceptional value be in real time more New, while needing to be related to multidimensional data, cause it that can not handle quick data flow.In addition, detecting in a stream different Normal technology depends on complete data space, while these technologies cannot find the exception of subspace.In recent years about benefit With the research of graphical method and visualization method detection abnormal point, the case where can not equally handling High Dimensional Data Streams.
Summary of the invention
The embodiment of the invention provides a kind of abnormal method and detection based on water environment sensor network monitoring big data Device, the high dimensional data problem for being capable of handling in sensor network monitoring water environment, and can effective search subspace from And detect the exception of subspace;The dynamic characteristics that can adapt to data using dynamic subspace collection again, accelerates the mistake of detection Journey reduces the error rate of testing result.
In view of this, first aspect present invention provide it is a kind of based on the different of water environment sensor network monitoring big data Chang Fangfa may include:
Subspace template is constructed, the subspace template includes fixed subspace template, unsupervised subspace sample Plate, the subspace template for having supervision;
Obtain the network data of water environment sensor;
The corresponding current planning unit lattice summary value of the subspace template is updated according to the network data;
When the current planning unit lattice summary value, which belongs to the preset reference, plans cell summary value, determine Subspace belonging to the network data is abnormal subspace;
The exceptional value of subspace belonging to the network data is sent to client, son belonging to the network data The exceptional value in space carries out data processing for the client.
Optionally, in some embodiments of the invention, building subspace template, comprising:
Subspace template is constructed using the unlabelled training data and priori knowledge obtained in advance;
Wherein, the subspace template of the fixation includes the subspace for preassigning restriction on the parameters;
The unsupervised subspace template is to use multi-objective genetic algorithm meter by the unlabelled training data It obtains;
The subspace template for having supervision is to be obtained by preparatory detection method.
Optionally, in some embodiments of the invention, the preset corresponding son of reference planning cell summary value Space is abnormal point subspace, the method also includes:
By multi-objective genetic algorithm search, there are other subspaces of the abnormal point.
Optionally, in some embodiments of the invention, the first generation subspace of the subspace template is random produces Raw, the N-Generation subspace of the subspace template is produced in the subspace in N-1 generation using intersection and mutation search operator Raw, the number of every generation subspace is located in purpose-function space difference tradeoff surface, and N is the integer greater than 1.
Optionally, in some embodiments of the invention, the method also includes:
By merging the correlation subspace new to the client feedback in unsupervised subspace template.
Optionally, in some embodiments of the invention, the merging correlation are as follows: will be in the subspace template It is 1 that weight, which is arranged, in subspace;Weight increases when the first subspace in the template of subspace is correctly to peel off subspace, When the subspace in the template of subspace be error message peel off subspace when weight will reduce.
Second aspect of the present invention provides a kind of detection device, comprising:
Module is constructed, for constructing subspace template, the subspace template includes fixed subspace template, without prison The subspace template superintended and directed, the subspace template for having supervision;
Transceiver module, for obtaining the network data of water environment sensor;By subspace belonging to the network data Exceptional value is sent to client, and the exceptional value of subspace belonging to the network data carries out at data for the client Reason;
Processing module, for updating the corresponding current planning unit lattice of the subspace template according to the network data Summary value;When the current planning unit lattice summary value, which belongs to the preset reference, plans cell summary value, institute is determined Subspace belonging to network data is stated as abnormal subspace.
Optionally, in some embodiments of the invention,
The building module, specifically for using the unlabelled training data obtained in advance and priori knowledge to construct son Space template;
Wherein, the subspace template of the fixation includes the subspace for preassigning restriction on the parameters;
The unsupervised subspace template is to use multi-objective genetic algorithm meter by the unlabelled training data It obtains;
The subspace template for having supervision is to be obtained by preparatory detection method.
Optionally, in some embodiments of the invention, the preset corresponding son of reference planning cell summary value Space is abnormal point subspace;
The processing module is also used to other sons by multi-objective genetic algorithm search there are the abnormal point Space.
Optionally, in some embodiments of the invention, the first generation subspace of the subspace template is random produces Raw, the N-Generation subspace of the subspace template is produced in the subspace in N-1 generation using intersection and mutation search operator Raw, the number of every generation subspace is located in purpose-function space difference tradeoff surface, and N is the integer greater than 1.
Optionally, in some embodiments of the invention,
The processing module is also used to by merging correlation in unsupervised subspace template to the client Feed back new subspace.
Optionally, in some embodiments of the invention, the merging correlation are as follows: will be in the subspace template It is 1 that weight, which is arranged, in subspace;Weight increases when the first subspace in the template of subspace is correctly to peel off subspace, When the subspace in the template of subspace be error message peel off subspace when weight will reduce.
Third aspect present invention provides a kind of detection device, may include:
Transceiver, processor, memory, wherein the transceiver, the processor and the memory pass through bus Connection;
The memory, for storing operational order;
The transceiver, for obtaining the network data of water environment sensor;By subspace belonging to the network data Exceptional value sent to client, the exceptional value of subspace belonging to the network data carries out data for the client Processing;
The processor is executed for calling the operational order as any in first aspect present invention and first aspect The step of method for detecting abnormality of big data described in optional implementation.
Fourth aspect present invention provides a kind of readable storage medium storing program for executing, is stored thereon with computer program, which is characterized in that It is realized when the computer program is executed by processor in first aspect and first aspect described in any optional implementation The step of method for detecting abnormality of big data.
As can be seen from the above technical solutions, the embodiment of the present invention has the advantage that
In embodiments of the present invention, construct subspace template, the subspace template include fixed subspace template, Unsupervised subspace template, the subspace template for having supervision;Obtain the network data of water environment sensor;According to the net Network data update the corresponding current planning unit lattice summary value of the subspace template;When the current planning unit lattice summary When value belongs to the preset reference planning cell summary value, determine that subspace belonging to the network data is abnormal sub Space;The exceptional value of subspace belonging to the network data is sent to client, subspace belonging to the network data Exceptional value for the client carry out data processing.The high dimensional data being capable of handling in sensor network monitoring water environment Problem, and the effective search subspace of energy is to detect the exception of subspace;It can be adapted to again using dynamic subspace collection The dynamic characteristics of data accelerates the process of detection, reduces the error rate of testing result.
Detailed description of the invention
Technical solution in order to illustrate the embodiments of the present invention more clearly, below will be in embodiment and description of the prior art Required attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is only some realities of the invention Example is applied, can also be obtained according to these attached drawings other attached drawings.
Fig. 1 is the principle frame of the abnormal method based on water environment sensor network monitoring big data in the embodiment of the present invention Figure;
Fig. 2 is a reality of the abnormal method based on water environment sensor network monitoring big data in the embodiment of the present invention Illustration is applied to be intended to;
Fig. 3 A is the schematic diagram for generating individualized training data set in the embodiment of the present invention and obtaining SS;
Fig. 3 B is the schematic diagram for generating multiple training datasets in the embodiment of the present invention and obtaining SS;
Fig. 3 C is the schematic diagram of line template before China and foreign countries of embodiment of the present invention defensive wall space and peripheral subspace;
Fig. 3 D is the schematic diagram that peripheral subspace front algorithm is found in the embodiment of the present invention;
Fig. 4 is one embodiment schematic diagram of detection device in the embodiment of the present invention;
Fig. 5 is another embodiment schematic diagram of detection device in the embodiment of the present invention.
Specific embodiment
The embodiment of the invention provides a kind of abnormal method and detection based on water environment sensor network monitoring big data Device, the high dimensional data problem for being capable of handling in sensor network monitoring water environment, and can effective search subspace from And detect the exception of subspace;The dynamic characteristics that can adapt to data using dynamic subspace collection again, accelerates the mistake of detection Journey reduces the error rate of testing result.
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical solution in the embodiment of the present invention is described, it is clear that described embodiment is only the present invention one Partial embodiment, instead of all the embodiments.Based on the embodiments of the present invention, it all should belong to what the present invention protected Range.
In embodiments of the present invention, as shown in Figure 1, to be based on water environment sensor network monitoring in the embodiment of the present invention The functional block diagram of the abnormal method of big data.The method for detecting abnormality based on water environment sensor network data is provided, Using multiple criteria come the abnormal point of detection data, and there are the subspaces of exceptional value using multi-objective genetic algorithm search.Inspection Survey process is divided into two stages: study stage and detection-phase.Wherein the study stage can be further divided into two kinds of Habit mode: i.e. off-line learning and on-line study.In off-line learning, constructed using unlabelled training data and priori knowledge Sparse subspace template (Sparse Subspace Template, SST), wherein subspace template is one group than other Subspace has the subspace of higher overall data sparsity.Subspace template primarily rests under unsupervised mode, because This unlabelled data is necessary.However, markd data, which can be used, in it is further improved subspace template.In son After space template construction, the exception from the subspace for the data for starting constantly to reach can be found in detection-phase.Incoming number According to will be first for updating each of outline data subspace template.If the threshold values of outline data is preparatory lower than before Specified value, then the data will be marked as exception, and the exceptional value filing that will test is that so-called exceptional value stores Library.Finally, all or specified exceptional value is fed back to user after detection.
Compared with prior art, self adaptation stream predicted anomaly detection method proposed by the invention, can either handle sensing High dimensional data problem in device network monitoring water environment, and the effective search subspace of energy is to detect the exception of subspace; The dynamic characteristics that can be adapted to data using dynamic subspace collection again, is accelerated the process of detection, reduces the mistake of testing result Rate.
Below by way of examples, technical solution of the present invention is further described, as shown in Fig. 2, for the present invention One embodiment schematic diagram of abnormal method based on water environment sensor network monitoring big data in embodiment may include: Including study stage and detection-phase.It specifically comprises the following steps:
201, subspace template is constructed.
In embodiments of the present invention, subspace template is calculated first, it is the first generation.Wherein, in off-line learning, make Sparse subspace template is constructed with unmarked training data and priori knowledge.It is many labeled different in the application Often sample (being referred to as abnormal data) is known, and subspace template includes SS (the subspace template for having supervision) and FS (fixed subspace template) and US (unsupervised subspace template).Then it is generated in the template of subspace by supervised learning SS.Since abnormal sample is marked as different classes, detection device can be using multi-objective genetic algorithm to belonging to same class Abnormal sample calculate certain kinds SS value.It finally may include the SS value of four classifications in the template of subspace.Namely:
SST=FS ∪ US ∪ SS (OD)
In above formula, ∪ represents union, and OD is the abnormal sample being labeled in data set, ODiTo belong to i-th of attack The abnormal sample of class.
Building subspace template, comprising: constructed using the unlabelled training data and priori knowledge obtained in advance Subspace template;Wherein, the subspace template of the fixation includes the subspace for preassigning restriction on the parameters;It is described unsupervised Subspace template be calculated by the unlabelled training data using multi-objective genetic algorithm;It is described to have prison The subspace template superintended and directed is to be obtained by preparatory detection method.
Illustratively, as shown in Figure 3A, Fig. 3 A is to generate individualized training data set in the embodiment of the present invention to obtain showing for SS It is intended to;As shown in Figure 3B, Fig. 3 B is the schematic diagram for generating multiple training datasets in the embodiment of the present invention and obtaining SS.
Further, detection device calculates the corresponding planning unit lattice summary value of each subset of subspace template.Examine Device is surveyed after obtaining subspace template, needs to calculate its planning unit lattice summary (Projected for each subset Cell Summary, PCS) value, to be used to the exception of detection data collection.It should be noted that because being normal in training set Sample, it is possible to which just with normal sample rather than all training data constructs PCS.It ensures that in this way PCS value, so as to the normal behaviour of better response data.
In the present invention, because the quantity of abnormal sample is far longer than the quantity of normal specimens, estimate when to exceptional value When, detection device is only retrieved, and there is no the ranges that the PCS value of updating unit is fallen into.More using abnormal sample The exceptional value made is deviated by new PCS value, and assessment algorithm is accurately identified to abnormal ability later.When discovery is all The peripheral subspace of abnormal sample, will establish proper subspace look-up table, and proper subspace look-up table has recorded abnormal data Peripheral subspace will be used for anomaly classification.Illustratively, as shown in Figure 3 C, Fig. 3 C is that China and foreign countries of embodiment of the present invention defensive wall is empty Between and peripheral subspace before line template schematic diagram.
When available training data is in the training of algorithm and improper, need to produce on based on original learning method Raw new training data.Since the quantity by the obtained single abnormal peripheral subspace of algorithm may be very big, so And the subspace of many redundancies can be used and delete from result, detection device needs to utilize periphery subspace front (Outlying Subspace Front, OSF) algorithm remove redundancy, guarantee that acquired results are more succinct and multi information.Further, since Algorithm can identify potential exception, but be a lack of the correctness classified to it, and the function of classification is needed to be added to algorithm In.Illustratively, as shown in Figure 3D, Fig. 3 D is the schematic diagram that peripheral subspace front algorithm is found in the embodiment of the present invention.
In order to realize anomaly classification, detection device can generate a signature subspace for each exception class.These Subspace can be used to identify the exception of a certain kinds.Signature subspace is generated for a specific class, detection device is collected Class and the signature subspace as this class belonging to the subspace of these abnormal OSF.As can be by class c Signature subspace is defined as:
In each class, the correctness that different signature subspaces has different weights to will affect classification, and will be similar Property measurement is applied in classification processing.
202, the network data of water environment sensor is obtained.
The sensor network data can be client transmission, is also possible to other equipment transmission, does not limit specifically It is fixed.
203, the corresponding current planning unit lattice summary value of the subspace template is updated according to the network data.
When each data arrives, the summary PCS for every sub-spaces template subspace that data belong to first will be by more Newly, with the information in order to capture new arrival data.
204, when the current planning unit lattice summary value, which belongs to the preset reference, plans cell summary value, Determine subspace belonging to the network data for abnormal subspace.
The preset corresponding subspace of reference planning cell summary value is abnormal point subspace;The method is also It include: that there are other subspaces of the abnormal point by multi-objective genetic algorithm search.
It should be noted that the first generation subspace of the subspace template is to be randomly generated, the subspace template N-Generation subspace is generated in the subspace in N-1 generation using intersection and mutation search operator, every generation subspace Number is located in purpose-function space difference tradeoff surface, and N is the integer greater than 1.
Used here as hash function quickly to map the data into affiliated subspace.Then, if cell PCS belongs to the one or more subspace template subspaces for belonging to predefined thresholds.These subspaces are abnormal peripheries Subspace.All exceptions include that their peripheral subspace and the PCS value of cell belong to peripheral subspace.
205, when the current planning unit lattice summary value, which belongs to the preset reference, plans cell summary value, Determine subspace belonging to the network data for abnormal subspace.
The method also includes: by merging correlation in unsupervised subspace template to the client feedback New subspace.Wherein, the merging correlation are as follows: it is 1 that weight, which is arranged, in the subspace in the subspace template;Group The first subspace in the template of space is that weight increases when correctly peeling off subspace, the subspace in the template of subspace Be error message peel off subspace when weight will reduce.
In embodiments of the present invention, construct subspace template, the subspace template include fixed subspace template, Unsupervised subspace template, the subspace template for having supervision;Obtain the network data of water environment sensor;According to the net Network data update the corresponding current planning unit lattice summary value of the subspace template;When the current planning unit lattice summary When value belongs to the preset reference planning cell summary value, determine that subspace belonging to the network data is abnormal sub Space;The exceptional value of subspace belonging to the network data is sent to client, subspace belonging to the network data Exceptional value for the client carry out data processing.The high dimensional data being capable of handling in sensor network monitoring water environment Problem, and the effective search subspace of energy is to detect the exception of subspace;It can be adapted to again using dynamic subspace collection The dynamic characteristics of data accelerates the process of detection, reduces the error rate of testing result.
Two stages of algorithm are described in further detail below:
(1) learn the stage
Because data dimension related with subspace quantity is exponentially increased, assess every in each sub-spaces One data point becomes excessively complicated.Therefore, in order to enable subspace abnormality detection problem is easily handled, the present invention has selection Several sub-spaces of inspection in data point, only use each data point in the subspace from subspace template.
Subspace template is made of several groups of subspaces, the principle fundamental difference that these subspaces generate.Different sons is empty Between group be it is complementary, this make capture subspace exception be hidden.Specifically subspace template includes following three Sub-spaces group:, there is supervision in fixed subspace template subspace (FS) at unsupervised subspace template subspace (US) Subspace template subspace (SS).Any learning process is not needed since FS construction, the main of off-line learning stage is appointed Business is to generate US or SS.The process of subspace template being obtained by using a collection of training data off-line learning, mainly sets It is calculated as using the unsupervised method for detecting abnormality of FS and US, it can also be by SS to label exception paradigm learning.Wherein, main The abnormality detection for wanting part is to pass through FS.Wherein, it should be noted that all spaces of US or SS covering will not be abnormal. Detection device can by the limited example biased of the training data or abnormal data that exceed for training, US and SS only by It can detecte abnormal probability for supplementing FS and increasing.
(1) fixed subspace template (FS) includes all subspaces that full-size constrains in full lattice, here Full-size is the parameter specified by user.In other words, FS includes the subspace of 1,2,3... all dimensions.FS meets:
(2) wherein, unsupervised subspace template (US) is constructed by a unsupervised off-line learning process. In this offline learning process, using unlabelled training data as input, it can automatically detect in the set of subspace The exception of higher number subspace.When assuming that one group of historical data, can be used for the beginning of unsupervised learning.Training data is logical It is often dimensionally more much smaller than original data flow, thus should fully mated main memory make I/O expense minimum.More mesh Mark genetic algorithm is used to search for space lattice and constructs US to find the unrelated subspace of entire training data.In order to more Adapt to the calculating of multi-objective genetic algorithm, all training datas are scanned and specify one (only one in hypercube It is a) in cell.In data allocation process, its data will be protected each cell for having occupied statistical information in hypercube It holds.When all training datas are mapped in corresponding cell, multi-objective genetic algorithm can be concentrated from training data and find son The exception of higher number in space, these subspaces will be added in US.
Once the present invention obtains initial US, the present invention can further obtain more useful subspaces, and in training Most unrelated subspace is found in data.The training data for being selected out is likely to be considered as abnormal, can be with this in data The more similar exceptions of centralized detecting.The unrelated degree of the entirety of training data uses clustering method under unsupervised mode.
In clustering method a crucial problem be how accurately to measure in trained space two points away from It can preferably reflect the unrelated similitude of overall data from, it is contemplated that the distance metric, especially those are possible to be detected Exception.In order to realize the target of aggregation training data, present invention employs a kind of new distance metric, this method is known as nothing The distance (OD) of pass.It is defined as two in the most rare subspace for obtaining whole training data by multi-objective genetic algorithm A point p1,p2Distance.That is:
It is the number for the most rare subspace that multi-objective genetic algorithm returns, S in above formulaiIt is the son for returning to the set Space.
The present invention is using advanced method of birdsing of the same feather flock together, the also referred to as method for congregating of fixed width, to assemble entire training data It is allowed to as cluster.Advanced method of birdsing of the same feather flock together is very effective, it using increment mode data cohesion.In data set Each of aggregated point p will be assigned in c ' cluster, such OD (p, c ') < dc,OD(p,c′)≤OD(p, ci), c0Mass center with existing m point will update in the distribution of the cluster of p.IfHave OD(p,ci)≥dc, the new cluster of right the latter is formed, and p becomes the new mass center of this cluster.These steps are so repeated, until Data in all data sets are assembled.
The characteristics of due to its asymptote, so that clustering method is characterized in the number about example and trains sample Dimension has linear contractibility.However, its result be to the aggregation order of training data it is sensitive, in order to solve this problem, The sensitivity that the present invention executes different data commands under most important aggregation multithreading to reduce it to data sequence.It The principle of behind can be assigned in different clusters even if an exception, the difference is that it is assigned to a small cluster Chance be relatively high.The average-size of the point of cluster is the useful indicators of its a unrelated degree, rather than the data Order.The unrelated degree of training data is known as irrelevant factor (OF), is defined as: Wherein cluster_sizei(p) size of i-th of operation cluster p in cluster is indicated, n indicates the quantity of cluster operation.By more The most sparse training data subspace that Multi-Objective Genetic Algorithm obtains is also added into the US in the template of subspace.Need into The some tests of row are to obtain dcData, in the most important d that birdss of the same feather flock togethercIn optional parameter, the number of cluster obtained is denoted as K.K Obviously compare dcMore easily specify.Some different, reasonable data can be used, under the data birdsed of the same feather flock together and each unrelated Training data can be quantized.
(3) the subspace template (SS) of the supervision described in refers in some applications, and the exceptions of some smallest numbers can be by It is obtained in the detection method of the domain expert or early stage.The knowledge that these abnormal examples can be considered as the field can have The improvement subspace template of effect can be detected preferably.Multi-objective genetic algorithm is applied to each these abnormal example, Most sparse subspace is found, these subspaces are referred to as the subspace template (SS) supervised.
The tissue of subspace is constituted based on above three, the entire training dataset DT of subspace template can be indicated such as Under:
SST(DT)=FS ∪ US ∪ SS
Wherein each group for constituting subspace can indicate are as follows:
FS=∪isi, | si|≤MaxDimension;
US=TSS (Dt)∪jTSS(pj);
SS=∪tTSS(ot);
Here siIndicate i-th of Max Dimension-dimensiona perfect lattice subspace, TSS (Dt) indicate entire Training data DtMost sparse subspace.TSS(pj) indicate j-th of top training data most evacuated space have it is most outlying The factor.TSS(ot) indicate t-th of available exceptional sample most sparse subspace.
In the off-line learning stage, the present invention uses multi-objective genetic algorithm (Multi-Objective Genetic Algorithm, MOGA) RD, IRSD, IKRD in search subspace, target is to minimize the structure of subspace template. I.e. multi-objective genetic algorithm search exceptional value is will to use the relative density in multi-objective genetic algorithm search subspace (Relative Density, RD), inverse relative standard (Inverse Relative Standard, IRSD), anti-k- relative distance (Inverse k-Relative Distance, IkRD), target are to make subspace template (Sparse Subspace Template, SST) structure minimize.Multi-objective genetic algorithm generates one comprising several individuals by several generations and every generation A population (i.e. subspace) carries out the search of subspace.In subspace the subspace of the first generation be usually be randomly generated, and with Several generations subspace afterwards is generated in these subspaces of their previous generation using intersection and mutation search operator.More In the problem of minimized target, the number of every generation subspace can be positioned at the different tradeoff surfaces of purpose-function space.Position In upper surface and close to the preferably separate origin in the subspace of origin;Advantage (disadvantage) in same surface subspace does not have area Not.It is called Pareto Front positioned at the surface of best subspace;The target of multi-objective genetic algorithm is gradually to generate increasingly More is located at Pareto Front, the optimal subspace changed from non-optimal subspace.Optimal subspace is found, is more Multi-objective optimization question is decomposed into each optimization problem of single goal by Multi-Objective Genetic Algorithm.This point can be in entire scheme It occupies an leading position.Multi-objective genetic algorithm provides the general frame for handling multiple target search problem well.Simultaneously Elitism is added in multi-objective genetic algorithm, by coping with the certain number of solution always, directly from a generation to next In generation, improves the convergence of multi-objective genetic algorithm.
(2) detection-phase
Detection-phase shows as the lasting detection to the new network data come is continueed to.When the arrival of each data, number It will be updated first according to the summary PCS of the every sub-spaces template subspace belonged to, newly to reach data to capture Information.Used here as hash function quickly to map the data into affiliated subspace.Then, if the PCS of cell belongs to In the one or more subspace template subspaces for belonging to predefined thresholds.These subspaces are that abnormal outer defensive wall is empty Between.All exceptions include that their peripheral subspace and the PCS value of cell belong to peripheral subspace.Finally, whole or The certain amount of abnormal exceptional value of person all returns to client.
Due to a large amount of network data flow and the time it is critical form detection process, above-mentioned process can quickly be held Row.The update that an expected feature of PCS can be incremented by, and his calculating express delivery is quickly.Therefore, each data Abnormal estimation is also highly effective.It only includes the abnormality detecting process for being mapped to suitable a cell and PCS of data point.
Algorithm one is significantly characterized in it using subspace is high dimensional data to detect exception.By merging correlation Feedback, subspace template is no longer the static data collection of subspace.On the contrary, it can dynamically update subspace template, in order to Meeting the concept drift of network data, (concept drift is an important feature of big data, it more likely includes the prominent of data Become feature).This method is initially to improve self feed back to improve detection accuracy.Help is reduced subspace by this method simultaneously The size of template can allow it to obtain higher efficiency.This method is suitable for the scene of manual intervention and feedback.It is practicing In, self feed back being capable of three subsets that are online or offline and influencing subspace template.
The basic thought of this relevance feedback is that the present invention is that each power is arranged in the subspace in the template of subspace Weight is 1.The weight of subspace in the template of subspace weight when they are correct peripheral subspaces increases, however Weight will be reduced when they are the peripheral subspaces of error message.Pass through certain amount of network data processing every time, The weight of subspace in the template of subspace will be less than the adjustment weight threshold obtained from subspace template.From subspace template The weight of the new subspace generated will be above threshold value.In this way, the present invention finally obtains dynamic subspace template and exists It is more important when accurate detection accuracy and error rate.The present invention needs:
(1) weight of subspace is normalized is to carry out standardization processing to weight threshold.
(2) weights resetting of all subspaces in the template of subspace is 1, is to keep the real-time of current data Property.
(3) size of data volume determines that the frequency that subspace template updates can be greater than 30, in order to make concept drift more With statistical significance.
The present invention is pointed out that concept drift is not the key for causing subspace template to change.Because although Seldom influence will be caused in the global feature sudden change of data intensive data, but to abnormality detection.Come in this sense It says, method of the invention is advantageous when handling the abnormality detection of concept drift, because subspace template only exists Update when abnormality detection leads to concept drift.This is that method of the invention is general with density calculating based on measuring with other Read the significant difference of drift detection method.
The present invention is finally using theoretical and proof analysis come the computation complexity of assessment algorithm simultaneously.Specifically, complicated Property analysis be training and detection-phase carry out.For simplicity, from its intrinsic dynamic natural quality, the present invention only divides Analyse its static state.
The present invention defines some symbols first to facilitate complexity analyzing.N, Nt, No respectively represent reality in a stream Number of cases, the instance number in training (history) data set peel off the instance number in example in the group.φ is dimension, and k is to be used for Calculate the relative distance of inverse K.Gn and Gs respectively represents the number of per generation MOGA assessment in the number and subspace of generation.
Complexity analyzing above shows: (1) complexity of detection-phase is linearly related with the size of data flow, This will more effectively handle high amount of traffic;(2) complexity of training stage is more important than detection detection, however, training rank Section be it is offline, the efficiency of entire method cannot be influenced.
As shown in figure 4, may include: for one embodiment schematic diagram of detection device in the embodiment of the present invention
Module 401 is constructed, for constructing subspace template, the subspace template includes fixed subspace template, nothing The subspace template of supervision, the subspace template for having supervision;
Transceiver module 402, for obtaining the network data of water environment sensor;By the sky of son belonging to the network data Between exceptional value sent to client, the exceptional value of subspace belonging to the network data is counted for the client According to processing;
Processing module 403, for being updated according to the sensor network data, the subspace template is corresponding to work as front lay Draw cell summary value;Cell summary value is planned when the current planning unit lattice summary value belongs to the preset reference When, determine subspace belonging to the sensor network data for abnormal subspace.
Optionally, in some embodiments of the invention,
Module 401 is constructed, it is empty specifically for using the unlabelled training data obtained in advance and priori knowledge to construct son Between template;
Wherein, the subspace template of the fixation includes the subspace for preassigning restriction on the parameters;
The unsupervised subspace template is to use multi-objective genetic algorithm meter by the unlabelled training data It obtains;
The subspace template for having supervision is to be obtained by preparatory detection method.
Optionally, in some embodiments of the invention, the preset corresponding son of reference planning cell summary value Space is abnormal point subspace;
Processing module 403, be also used to by multi-objective genetic algorithm search for there are other subspaces of the abnormal point.
Optionally, in some embodiments of the invention, the first generation subspace of the subspace template is random produces Raw, the N-Generation subspace of the subspace template is produced in the subspace in N-1 generation using intersection and mutation search operator Raw, the number of every generation subspace is located in purpose-function space difference tradeoff surface, and N is the integer greater than 1.
Optionally, in some embodiments of the invention,
Processing module 403 is also used to anti-to the client by merging correlation in unsupervised subspace template Present new subspace.
Optionally, in some embodiments of the invention, the merging correlation are as follows: will be in the subspace template It is 1 that weight, which is arranged, in subspace;Weight increases when the first subspace in the template of subspace is correctly to peel off subspace, When the subspace in the template of subspace be error message peel off subspace when weight will reduce.
As shown in figure 5, may include: for another embodiment schematic diagram of detection device in the embodiment of the present invention
Transceiver 501, processor 502, memory 503, wherein transceiver 501, processor 502 and memory 503 are logical Cross bus connection;
Memory 503, for storing operational order;
Transceiver 501, for obtaining the network data of water environment sensor;By subspace belonging to the network data Exceptional value sent to client, the exceptional value of subspace belonging to the network data carries out data for the client Processing;
Processor 502 executes building subspace template, the subspace template includes for calling the operational order Fixed subspace template, unsupervised subspace template, the subspace template for having supervision;It is updated according to the network data The corresponding current planning unit lattice summary value of the subspace template;Described in belonging to when the current planning unit lattice summary value When cell summary value is planned in preset reference, determine subspace belonging to the network data for abnormal subspace.
Optionally, in some embodiments of the invention,
Processor 502, specifically for using the unlabelled training data obtained in advance and priori knowledge to construct subspace Template;
Wherein, the subspace template of the fixation includes the subspace for preassigning restriction on the parameters;
The unsupervised subspace template is to use multi-objective genetic algorithm meter by the unlabelled training data It obtains;
The subspace template for having supervision is to be obtained by preparatory detection method.
Optionally, in some embodiments of the invention, the preset corresponding son of reference planning cell summary value Space is abnormal point subspace;
Processor 502, be also used to by multi-objective genetic algorithm search for there are other subspaces of the abnormal point.
Optionally, in some embodiments of the invention, the first generation subspace of the subspace template is random produces Raw, the N-Generation subspace of the subspace template is produced in the subspace in N-1 generation using intersection and mutation search operator Raw, the number of every generation subspace is located in purpose-function space difference tradeoff surface, and N is the integer greater than 1.
Optionally, in some embodiments of the invention,
Processor 502 is also used to by merging correlation in unsupervised subspace template to the client feedback New subspace.
Optionally, in some embodiments of the invention, the merging correlation are as follows: will be in the subspace template It is 1 that weight, which is arranged, in subspace;Weight increases when the first subspace in the template of subspace is correctly to peel off subspace, When the subspace in the template of subspace be error message peel off subspace when weight will reduce.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.
The computer program product includes one or more computer instructions.Described in loading and execute on computers When computer program instructions, entirely or partly generate according to process or function described in the embodiment of the present invention.The computer It can be general purpose computer, special purpose computer, computer network or other programmable devices.The computer instruction can be with Storage in a computer-readable storage medium, or from a computer readable storage medium to another computer-readable storage Medium transmission, for example, the computer instruction can be from a web-site, computer, server or data center by having Line (such as coaxial cable, optical fiber, Digital Subscriber Line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode are to another A web-site, computer, server or data center are transmitted.The computer readable storage medium can be calculating Any usable medium that machine can store either includes integrated server, the data center etc. of one or more usable mediums Data storage device.The usable medium can be magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, ) or semiconductor medium (such as solid state hard disk Solid State Disk (SSD)) etc. DVD.
It is apparent to those skilled in the art that for convenience and simplicity of description, foregoing description is System, the specific work process of device and unit can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided by the present invention, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or group Part can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown Or the mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, between device or unit Coupling or communication connection are connect, can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, as unit The component of display may or may not be physical unit, it can and it is in one place, or may be distributed over more In a network unit.Some or all of unit therein can be selected to realize this embodiment scheme according to the actual needs Purpose.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention essence On all or part of the part that contributes to existing technology or the technical solution can be with the shape of software product in other words Formula embodies, which is stored in a storage medium, including some instructions are used so that a calculating Machine equipment (can be personal computer, server or the network equipment etc.) executes each embodiment the method for the present invention All or part of the steps.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. are various can store The medium of program code.
The above, the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although reference Invention is explained in detail for previous embodiment, those skilled in the art should understand that: it still can be right Technical solution documented by foregoing embodiments is modified or equivalent replacement of some of the technical features;And this It modifies or replaces, the spirit and model of technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution It encloses.

Claims (10)

1. a kind of abnormal method based on water environment sensor network monitoring big data characterized by comprising
Subspace template is constructed, the subspace template includes fixed subspace template, unsupervised subspace template, has prison The subspace template superintended and directed;
Obtain the network data of water environment sensor;
The corresponding current planning unit lattice summary value of the subspace template is updated according to the network data;
When the current planning unit lattice summary value, which belongs to the preset reference, plans cell summary value, the net is determined Subspace belonging to network data is abnormal subspace;
The exceptional value of subspace belonging to the network data is sent to client, subspace belonging to the network data Exceptional value carries out data processing for the client.
2. the method according to claim 1, wherein building subspace template, comprising:
Subspace template is constructed using the unlabelled training data and priori knowledge obtained in advance;
Wherein, the subspace template of the fixation includes the subspace for preassigning restriction on the parameters;
The unsupervised subspace template is to be calculated by the unlabelled training data using multi-objective genetic algorithm It arrives;
The subspace template for having supervision is to be obtained by preparatory detection method.
3. method according to claim 1 or 2, which is characterized in that cell summary value pair is planned in the preset reference The subspace answered is abnormal point subspace;The method also includes:
By multi-objective genetic algorithm search, there are other subspaces of the abnormal point.
4. according to the method described in claim 3, it is characterized in that, the first generation subspace of the subspace template is random produces Raw, the N-Generation subspace of the subspace template is generated in the subspace in N-1 generation using intersection and mutation search operator , the number of every generation subspace is located in purpose-function space difference tradeoff surface, and N is the integer greater than 1.
5. according to the method described in claim 2, it is characterized in that, the method also includes:
By merging the correlation subspace new to the client feedback in unsupervised subspace template.
6. according to the method described in claim 5, it is characterized in that, the merging correlation are as follows: will be in the subspace template Subspace setting weight be 1;Weight increases when the first subspace in the template of subspace is correctly to peel off subspace, When the subspace in the template of subspace be error message peel off subspace when weight will reduce.
7. a kind of detection device characterized by comprising
Module is constructed, for constructing subspace template, the subspace template includes fixed subspace template, unsupervised son Space template, the subspace template for having supervision;
Transceiver module, for obtaining the network data of water environment sensor;By the exception of subspace belonging to the network data It is worth to client and sends, the exceptional value of subspace belonging to the network data carries out data processing for the client;
Processing module, for updating the corresponding current planning unit lattice summary of the subspace template according to the network data Value;When the current planning unit lattice summary value, which belongs to the preset reference, plans cell summary value, the net is determined Subspace belonging to network data is abnormal subspace.
8. detection device according to claim 7, which is characterized in that
The building module, specifically for using the unlabelled training data obtained in advance and priori knowledge to construct subspace sample Plate;
Wherein, the subspace template of the fixation includes the subspace for preassigning restriction on the parameters;
The unsupervised subspace template is to be calculated by the unlabelled training data using multi-objective genetic algorithm It arrives;
The subspace template for having supervision is to be obtained by preparatory detection method.
9. a kind of detection device characterized by comprising
Transceiver, processor, memory, wherein the transceiver, the processor and the memory are connected by bus;
The memory, for storing operational order;
The transceiver, for obtaining the network data of water environment sensor;By the different of subspace belonging to the network data Constant value is sent to client, and the exceptional value of subspace belonging to the network data carries out data processing for the client;
The processor executes for calling the operational order and is based on water environment as of any of claims 1-6 The step of abnormal method of sensor network monitoring big data.
10. a kind of readable storage medium storing program for executing, is stored thereon with computer program, which is characterized in that the computer program is processed Such as the exception side of any of claims 1-6 based on water environment sensor network monitoring big data is realized when device executes The step of method.
CN201910516179.9A 2019-06-14 2019-06-14 Abnormal method and detection device based on water environment sensor network monitoring big data Pending CN110287231A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910516179.9A CN110287231A (en) 2019-06-14 2019-06-14 Abnormal method and detection device based on water environment sensor network monitoring big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910516179.9A CN110287231A (en) 2019-06-14 2019-06-14 Abnormal method and detection device based on water environment sensor network monitoring big data

Publications (1)

Publication Number Publication Date
CN110287231A true CN110287231A (en) 2019-09-27

Family

ID=68004441

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910516179.9A Pending CN110287231A (en) 2019-06-14 2019-06-14 Abnormal method and detection device based on water environment sensor network monitoring big data

Country Status (1)

Country Link
CN (1) CN110287231A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899507A (en) * 2015-06-08 2015-09-09 桂林电子科技大学 Detecting method for abnormal intrusion of large high-dimensional data of network
WO2018111116A2 (en) * 2016-12-13 2018-06-21 Idletechs As Method for handling multidimensional data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899507A (en) * 2015-06-08 2015-09-09 桂林电子科技大学 Detecting method for abnormal intrusion of large high-dimensional data of network
WO2018111116A2 (en) * 2016-12-13 2018-06-21 Idletechs As Method for handling multidimensional data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JI ZHANG,HONGZHOU LI, QIGANG GAO , HAI WANG, YONGLONG: "Detecting anomalies from big network traffic data using", 《INFORMATION SCIENCES》 *

Similar Documents

Publication Publication Date Title
CN108023876B (en) Intrusion detection method and intrusion detection system based on sustainability ensemble learning
CA3088899C (en) Systems and methods for preparing data for use by machine learning algorithms
CN111694879B (en) Multielement time sequence abnormal mode prediction method and data acquisition monitoring device
Wang et al. Efficient learning by directed acyclic graph for resource constrained prediction
CN112084237A (en) Power system abnormity prediction method based on machine learning and big data analysis
Jin et al. Modeling with node degree preservation can accurately find communities
CN111126820B (en) Method and system for preventing electricity stealing
Wen et al. Comparision of four machine learning techniques for the prediction of prostate cancer survivability
CN102339347A (en) A method for computer-assisted analyzing of a technical system
CN114861788A (en) Load abnormity detection method and system based on DBSCAN clustering
Müller et al. Generalized stability approach for regularized graphical models
CN115545103A (en) Abnormal data identification method, label identification method and abnormal data identification device
CN104899507A (en) Detecting method for abnormal intrusion of large high-dimensional data of network
CN108830407B (en) Sensor distribution optimization method in structure health monitoring under multi-working condition
CN114169460A (en) Sample screening method, sample screening device, computer equipment and storage medium
Liu et al. Soil water content forecasting by ANN and SVM hybrid architecture
Wang et al. An artificial immune and incremental learning inspired novel framework for performance pattern identification of complex electromechanical systems
Lo Predicting software reliability with support vector machines
CN117079017A (en) Credible small sample image identification and classification method
Zhang et al. Detecting projected outliers in high-dimensional data streams
CN110287231A (en) Abnormal method and detection device based on water environment sensor network monitoring big data
Koh et al. Evaluating Deep Learning Uncertainty Quantification Methods for Neutrino Physics Applications
Langbridge et al. Causal temporal graph convolutional neural networks (ctgcn)
Viktoriia et al. Machine learning methods in medicine diagnostics problem
Sangeetha et al. Crime Rate Prediction and Prevention: Unleashing the Power of Deep Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190927

RJ01 Rejection of invention patent application after publication