CN103580960B

CN103580960B - Online pipe network anomaly detection system based on machine learning

Info

Publication number: CN103580960B
Application number: CN201310581956.0A
Authority: CN
Inventors: 陈尊裕; 张得志; 李丹; 胡斯洋; 龙圣; 郑思明; 吴珏其; 周振邦; 李维海; 王红旗
Original assignee: Foshan Luosixun Environmental Protection Technology Co ltd
Current assignee: Foshan Science And Technology Co Ltd
Priority date: 2013-11-19
Filing date: 2013-11-19
Publication date: 2017-01-11
Anticipated expiration: 2033-11-19
Also published as: CN103580960A

Abstract

The invention discloses an online pipe network anomaly detection system based on machine learning. The online pipe network anomaly detection system comprises a data collection unit, a data distribution unit and a plurality of anomaly detection units. The data collection unit is used for collecting real-time data of an online pipe network, merging the real-time data according to position areas and grouping the real-time data into different data packages. The data distribution unit is used for receiving the data packages, extracting data elements from the data packages and dividing the data packages into a plurality of data subsets after formatting the data packages. The anomaly detection units are used for receiving the data subsets in a one-to-one correspondence mode and predicating anomalism of the data subsets based on a semi-supervised machine learning framework. The anomaly detection units can be used for carrying out parallel data processing, and data transmission can be carried out among the anomaly detection units through an MPI. The online pipe network anomaly detection system can meet the requirements of the online anomaly detection units based on machine learning for usability of a server, and can prevent extra hardware on standby in an idle state from being introduced in.

Description

A kind of online pipe network abnormity detecting system based on machine learning

Technical field

The present invention relates to a kind of facility pipe network monitoring technology, be specifically related to a kind of based on machine learning at spool Net abnormity detecting system.

Background technology

The development of sensor technology makes sensor can realize high space-time accuracy parameters measuring at environmental area.Pass The time series data that sensor is collected constantly inputs in bin, forms data stream.With waterworks operation As a example by, sensing data can include each hydraulic parameters and water quality index.These data can be used for abnormal shape Condition detections etc., it differentiates data exception by historical pattern or model prediction.Unusual condition can be pipeline Reveal or contamination accident.The geographical scale of pipeline is big, by Changes in weather, seasonal variations, festivals or holidays and society It is high by water feature complexity that district's population structure change is caused so that manual method this work not competent. Therefore machine learning techniques based on historical data is the unique feasible scheme of online abnormality detection.Machine learning Technology can be roughly divided into three classes: (a) clear data analysis classes；(b) rule-based class；C () is based on physics mould Type class, classification foundation be dependent on which kind of parameter follow the tracks of, predict current and future sensing data tendency and Association between each group data.First, abnormality detection system is based on normal system or sensor-based system historical data Set a benchmark.Hereafter, any activity deviating from this benchmark will be considered exception.

Additionally, due to need to distinguish real abnormal data and non-abnormal data (false alarm), we still need to Want a set of calculating system based on replicanism Yu redundancy strategy, support continuous on-line data acquisition and perform number According to parser.

Summary of the invention

For above not enough, it is an object of the invention to provide online pipe network abnormity detecting based on machine learning System, meets based on machine based on the virtualization of multiserver host hardware and Publish-subscribe data distribution strategy The availability requirement to server of the online abnormity detecting unit of device study, avoids introducing what the free time awaited orders simultaneously Unnecessary hardware.

For realizing object above, the technical scheme that this invention takes is:

A kind of online pipe network abnormity detecting system based on machine learning, comprising:

Data acquisition unit, for gathering the real time data of described online pipe network, and depends on described real time data Merge according to the band of position and be grouped into different packets；

File distributing unit, is used for receiving described packet, and extracts data element from packet, then Multiple data subset it is divided into after packet is formatted process；

Multiple abnormity detecting unit, receive corresponding data subset for one_to_one corresponding, and to described data Collection carries out abnormity prediction, the plurality of abnormity detecting unit panel data based on semi-supervised learning framework Process and carry out data transmission by MPI each other.

Described abnormity detecting unit is installed on virtual machine, the corresponding abnormity detecting unit of each virtual machine.

Described online pipe network abnormity detecting system based on machine learning farther includes multiple server host, Server host is connected with each other by fully connected topology in LAN, and each server host is equipped with more than one Core processor, multiple virtual machines that described polycaryon processor is divided on same server host according to thread, Wherein, first thread is designated as virtual machine dom0, and other thread is divided into virtual machine dom U, described void Plan machine dom0 is for accessing the hardware of server host and interacting with virtual machine dom U, described Virtual machine dom U is used for installing abnormity detecting unit, the virtual machine dom U of the server host of each operation Other server hosts run are provided with corresponding backup.

Described abnormity detecting unit includes:

Prediction module, for setting up forecast model according to multiple regression equation, is as good as reason to provide in hypothesis The actuarial prediction data of data subset expecting varialbe state under condition, described prediction module and with other abnormity detecting Unit carries out the exchange of actuarial prediction data；

Analyze module, be used for receiving described actuarial prediction data, according to described actuarial prediction data estimation from number The regression parameter of data subset next time obtained according to Dispatching Unit, with data subset next time described in calculating Predictive value, described data subset next time has identical time step and consistent pipe network with historical data Background；

Judge module, according to predictive value and the actual value of data subset next time, to described data next time The abnormity of collection judges；

Decision-making module, for receiving the abnormity judged result that judge module is made, and according to described abnormity Described forecast model is made renewal by judged result.

Described prediction module is set up the method for forecast model and is comprised the following steps:

Step 11, carry out the simulation of data model according to certain some parameter situation over time in online pipe network:

X_i(t+1)=F_i(X (t), X (t-1), X (t-2) ... X (t-n)) (1)

Wherein: F_iBeing the forecast model of i-th abnormity detecting unit, i is positive integer, and is not more than abnormal detecing Survey the sum of unit, X_iIt is the input data of i-th abnormity detecting unit, wherein, X (t), X (t-1), X (t-2) ... X (t-n) is historical data, X_i(t+1) it is data subset next time；

Step 12, based on multiple regression equation build forecast model:

X_i(t+1)=A_i0*X_i(t)+A_i1*X_i(t-1)+... A_in*X_i(t-n)+C_i(2)

Wherein: A_i0、A_i1、...A_inFor forecast model F_iRegression parameter, C_iFor i-th abnormity detecting unit Random error parameter, described prediction module by MPI by C_iCarry out the exchange of actuarial prediction data；

Step 13, solve random error parameter C_i

C_{i} = Σ_{j &NotEqual; i}^{n} A_{ij 0} * X_{j} (t) + Σ_{j &NotEqual; i}^{n} A_{ij 1} * X_{j} (t - 1) + . . . + Σ_{j &NotEqual; i}^{n} A_{ijn} * X_{j} (t - n) - - - (3)

In formula (3), A_ij0、A_ij1、...A_ijnFrom normal data bag, Auto-matching obtains.

The method that the abnormity of described data subset next time is judged by described judge module comprises the following steps:

Step 31, compare X_i(t+1) predictive value and the difference of measured value；

Step 32, collection historical data set up data base X_i(t), X_i(t-1) ... X_i(t-P), wherein, P For the empirical parameter of the time relationship of i-th abnormity detecting unit, P is positive integer；

Step 33, structure historical data base sample { X_i(t), X_i(t-1) ... X_i(t-P) this sample }, is calculated This standard deviation scope；

Step 34, relatively described difference and standard deviation scope:

If difference is less than standard deviation scope, it is judged that module then returns a negative acknowledge character (NAK) to decision-making module, if What the judge module being provided with abnormity detecting unit fed back to decision-making module is all negative acknowledge character (NAK), and decision-making module then will All of X_i(t+1) it is stored in data base, and indicates the up-to-date sample of corresponding judge module and database synchronization Data and regression parameter, to be ready for use on prediction X_i(t+2)；

Prediction X_i(t+2) time, if difference is more than the sample { X updated_i(t+1) X_i(t), X_i(t-1) ... X_i (t-P+1) standard deviation scope }, corresponding judge module then returns a signal certainly to decision-making module, certainly Plan module in data base by X_i(t+2) be labeled as anomalous event, decision-making module by this judge module of instruction according to Database update regression parameter, but use old sample { X_i(t+1) X_i(t), X_i(t-1) ... X_i(t-P+1) } Definition standard deviation, for X_i(t+3) exception judges.

Described online pipe network abnormity detecting system based on machine learning farther includes Network Attached Storage list Unit, for storing the mirror image copies of the historical data of all virtual machines and online pipe network, each abnormity detecting Unit all may have access to the data in this Network Attached Storage unit, and virtual machine dom0 connects the virtual of its correspondence Machine dom U and the communication of Network Attached Storage unit.

The method of described backup is: believed by the virtual machine dom U test point on the server host of each operation Cease and be distributed backup according to the loading condition of other server hosts run, to realize optimal balance fortune Line mode, automatically generates an inquiry table after backup, described inquiry table is used for defining primary fault virtual machine dom U The migration node of backup, in order to perform dynamic migration when virtual machine dom U or server host break down.

Each virtual machine dom0 arranges backup manager, for by void corresponding for this virtual machine dom0 The health status of plan machine dom U arranges into inventory, and virtual machine dom0 passes through backup manager according to inquiry table quilt Body plan goes to process the backup of fault virtual machine.

The method of described file distributing unit distribution data subset comprises the following steps:

Described file distributing unit receive packet and in packet owing to measuring, the reason that sends or collect And the interference information even error message produced filters；

Extract the data element in packet, packet is converted to consolidation form；

Packet is divided into corresponding number and ensures equilibrium according to subset, the data in data subset,

Data subset is encrypted, and by publish-subscribe architecture, data subset is distributed to abnormity detecting list Unit.

The present invention compared with prior art, has the advantage that

1, by machine learning online pipe network carried out abnormity detecting unit, thus provide and be as good as reason assuming Under condition, the statistical distribution prediction of distributed network expecting varialbe state, improves the anomalous identification rate of online pipe network, Save substantial amounts of manpower simultaneously.

2, abnormity detecting unit parallel time processes, and reduces cpu resource competition, meets server Availability requirement, avoids introducing the idle unnecessary hardware awaited orders simultaneously.

3, need not rebuild data transfer application interface and deacclimatize data transmission inside and outside different server main frame Agreement with communication control.Sensing number it is far smaller than in the network of rivers by the computing relay between dom0 and dom U According to transformation period.

4, each server host has been not required to single disk, and virtual machine epigraph is stored in NAS On, it can be accessed by any physical machine.In this case, any virtual machine can be in any physical machine Run without again and on local disk, carry out backup.

5, virtual machine acquisition testing point is copied on another server host to complete dynamic migration.If One or more data processing module faults, each malfunctioning module will be by by multiserver main frame virtual platform The copy come into force replaces.

6, in failover, even if not having fault to have moved the virtual machine on different server main frame To resume operation from up-to-date test point.All operation times of operating system include that the TCP of activity connects Can preserve.The process being currently running will be carried out as usual, and all files, network state and disk all will keep Whole property.

Accompanying drawing explanation

Fig. 1 is the network architecture of high availability facility pipe network abnormity detecting of the present invention；

Fig. 2 is parallel type online abnormity detecting framework；

Fig. 3 is that multiserver mainframe virtualization envisions framework；

Fig. 4 is the framework that multinuclear process thread is divided into the different virtual machine on same server main frame；

Fig. 5 is to manage the method for the high availability server of executed in parallel online abnormity detecting algorithm to retouch State.

Detailed description of the invention

With detailed description of the invention, present disclosure is described in further details below in conjunction with the accompanying drawings.

Embodiment

The present embodiment as a example by the abnormity detecting of water supply network, other online pipe network such as electric power, telecommunications, network, Communication, heating power, combustion gas etc. are similar with its method, repeat no more here.

Fig. 1 is the network architecture of high availability facility pipe network abnormity detecting.In each group, sensor can be Hydraulic data or water quality data sensor, the sensing data in immediate geographic location passes through data acquisition list Unit is grouped together as packet and sends.File distributing unit receives the measurement data of sensor, by number According to be converted into meet subscriber's later stage process require form and issue.The server host at manipulation center is in office Territory net (LAN) is connected with each other by fully connected topology.Virtual machine (vm) migration on different server main frame First-selected mesh topology framework.Network Attached Storage unit is connected to all physical server hosts by LAN On.

Various sensors and instrument in water supply network monitoring system constantly gather data.Data can comprise water Force data (such as flow velocity, flow, hydraulic pressure, water level etc.) and water quality data (include free chlorine, turbidity, pH, Electrical conductivity, oxidation-reduction potential and total organic nitrogen etc.).The public network of rivers can be detected by analyzing these indexs In pipe leakage and contamination accident.Due to for the facility regular jobs such as water tank, pump, gate, water source And the seasonal variations of closing water, or water requirements fluctuation etc., in above-mentioned water distribution system, index changes the most greatly. It would therefore be desirable to incident detection system distinguishes conventional change and the unusual condition of sensing data.

Data acquisition unit includes SCADA system (supervisory control and data acqui sition system) and RTUs(Remote Terminal Units, remote control terminal), SCADA system is to collect real-time transport net sensing data A kind of canonical system.In SCADA system of the present invention (Fig. 1), we pass through region RTUs This locality sensing data is merged packet.RTUs function is by Data Digital, according to categories of sensors and collection Time adds time tag etc..Digitized sensor data are then sent to data collection server, and this process can Realized by closed circuit industrial network, such as Modbus, Lonworks, or BACnet.

File distributing unit is based on publish-subscribe architecture.File distributing unit as file distributing unit from conveying Network sensing data extracts data element, and converts them to consolidation form.Due to measure, send or The reason collected and the interference information even mistake that produces will be filtered in advance.File distributing unit tissue lattice Formulaization receives data in case processing further.After encryption, data are by open TCP/IP Ethernet transmission The terminal receiving different-format water quality data to each, in the present invention, sends the data after converted form In the abnormity detecting unit of operation center.Publish-subscribe host-host protocol includes data set X= (X₁,X₂... X_m) decomposition rule, such as X₁Send to virtual machine #1, X₂Send to virtual machine #2 ... X_mSend to virtual machine #m.

Exactly, all transport net sensing datas are based on the band of position and are divided into different packets. Sensing data in each packet can be waterpower or water quality data.File distributing unit is with abnormity detecting list The IP address of unit is packet name.As in figure 2 it is shown, be parallel type online abnormity detecting framework.All different Often detecting unit will run in multiserver host virtual machine system, and duplication is made mistakes different by virtual machine monitor Often detecting unit, therefore, abnormity detecting unit can recover from single virtual machine fault.

The abnormity detecting unit of operation center uses panel data tupe.System in the present invention is same Shi Zhihang multiple abnormity detecting algorithm, each algorithm processes certain subset in sensing data bag respectively, these Subset enters operation center with the form of independent packet.Need to pass through between each abnormity detecting unit Message Passing Interface(MPI) mutually transmit data.The abnormity detecting program of abnormity detecting unit C language or Fortran can be used to write, can run on linux system.When abnormity detecting program is with C When language is write, MPI is one group of function in C language.When writing with Fortran language, MPI and use In the subprogram (Fortran language compilation) exchanging data in different processes.

The detailed description of an algorithm be presented herein below:

The data being input in abnormity detecting unit cover the running status of whole pipe network.These data be by Sensor measurement and obtain.Data base will the last state of real-time update pipe network.

In pipe network, certain some parameter situation over time is simulated by data model, as follows

X (t+1)=F (X (t), X (t-1), X (t-2) ... X (t-n)),

Wherein X (t) is the parameters measured by each sensor.F is forecast model, reads from data base Historical data X (t), X (t-1), X (t-2) ..., speculate next time point t+1's according to the observed result of history X value.Under normal circumstances, based on multiple regression equation

X(t+1)=A₀*X(t)+A₁*X(t-1)+...A_n*X(t-n)

Just be enough to build forecast model F, determine the meansigma methods of X (t+1), wherein, A₀To A_nIt it is coefficient matrix.

Owing to X is a vector the hugest, up to a hundred parameters of a large-scale network will be contained.For letter Changing computing, it is different that the calculating process of F can be divided into several by multiple programming technology in MPI framework Subprocess.

I.e.

X=(X₁, X₂... X_i..., X_m)

The length of each subvector is

And

X_i(t+1)=F_i(X (t), X (t-1), X (t-2) ... X (t-n)),

X_i(t+1)=A_i0*X_i(t)+A_i1*X_i(t-1)+... A_in*X_i(t-n)+C_i,

I=1 ..., m

C_{i} = Σ_{j &NotEqual; i}^{n} A_{ij 0} * X_{j} (t) + Σ_{j &NotEqual; i}^{n} A_{ij 1} * X_{j} (t - 1) + . . . + Σ_{j &NotEqual; i}^{n} A_{ijn} * X_{j} (t - n)

The most each F_iComputing can independently execute on a virtual machine.At " Publish-subscribe " (Publish-Subscribe) under data distribution strategy framework, X_iIt is F_iThe input data of module, module it Between by message passing interface (MPI) utilize by random error parameter C_iCarry out data exchange.

Parameter in regression equation can from standard figures bag (such as CRAN-R statistical computation bag) automatically Coupling obtains.

Prediction module is collected historical data and is set up data base X(t), X(t-1) ... X(t-P), wherein p is The empirical parameter of the time relationship of definition X.

Estimate the regression parameter of each module according to historical data, be used for calculating F_i.Forecast model is used Regression parameter estimation X_i(t+1) meansigma methods.

Subsequently determine whether that module will calculate X_i(t+1) the difference between predictive value and measured value.

If difference is less than sample { X_i(t), X_i(t-1) ... X_i(t-P) standard deviation scope }, it was predicted that mould Type F_iThen return a negative acknowledge character (NAK) to decision package.If all module feedback to decision package the most whether Determining signal, decision package then allows X(t+1) it is stored in data base, and indicate each forecast model and data base Synchronize up-to-date sample data and regression parameter, to be ready for use on prediction X(t+2).

If difference is more than sample { X_i(t+1) X_i(t), X_i(t-1) ... X_i(t-P+1) standard deviation model } Enclose, it was predicted that model F_iThen return a signal certainly to decision package.Decision package in data base by X(t) It is labeled as anomalous event.Decision package is by instruction this forecast model F_iRegression parameter is updated according to parameter database, But use old sample values definition standard deviation, be used for judging exception.

System high-available in this patent passes through data parallel processing (parallel type) model realization.If one Individual or multiple data processing module faults, each malfunctioning module will be by raw by multiserver main frame virtual platform The copy of effect replaces.

The matrix that each data processing module belonging to above-mentioned subset produces, by further standardization, delivers to main determining Plan unit, is used for differentiating event detection outcome.

Fig. 3 is that multiserver mainframe virtualization envisions framework.(SuSE) Linux OS is arranged on dom U. Each dom U installs an abnormity detecting unit, and by Message Passing Interface (MPI) With the module communication on another dom U.At hardware view, communications protocol takes ICP/IP protocol for taking Business device main frame in and server host between communication.The mirror image of the historical data of each virtual machine and facility pipe network Copy will be stored in Network Attached Storage unit.Network Attached Storage unit i.e. Network storage technology In (Network Storage Technologies), its data above can be visited by each accident detection module Ask, process for data, it is also possible to when a certain virtual machine or server host fault for Virtual Machine Manager By on above virtual machine (vm) migration to existing service device main frame.

Multinuclear is processed the different virtual machine that thread is divided on same server main frame by Fig. 4.Multinuclear processes First thread of device is designated as dom0, its connecting virtual machine and the communication of Network Attached Storage unit, and It is responsible for creating and elimination virtual machine.Remaining calculates resource and supplies the virtual machine of operation exception detecting unit to make With.

In conjunction with Fig. 3 and Fig. 4, in the present invention, based on the high-performance abnormality detection service that multiserver main frame is virtual System architecture can be divided into three major parts.

[1] physical machine virtualization:

In this construction, physical machine be only install virtual machine server host, it by execution to conveying The parallel type abnormality detection of network detection data.Management program or virtual machine manager, such as IBMz/VM, VMware ESX, with XenSource or Novell Xen, will be installed on all virtual machines.Management journey Sequence can directly be run on hardware, without specific operating system, and can transport on the hardware The multiple virtual machines of row, as shown in Figure 3.

The present invention uses the Xen CPU of acquiescence to distribute policy, and in this case, virtual machine dom0 is designated First thread for each server host (such as Fig. 4) that may be installed on polycaryon processor.Dom0 is First virtual machine guided by Xen, it has some privileges, as can be directly accessed hardware, can have both The I/O function of all access systems, and (create with other virtual machine interaction being expressed as dom U With management) etc..

It is virtual that the dom0 that each server host is runs the detection that Heartbeat(sets up on Xen The messaging system that whether good machine running status is), it performs intelligence to all dom U on server host Energy fault detect, and process similar with on other server hosts exchange information.Due to Servers-all master Machine is all connected with mesh network, and the backup manager on each dom0 can being good for interior for group all virtual machines Health status Bar becomes inventory.Which backup is migrated node by definition primary fault virtual machine by one inquiry table at, And each backup manager can access this inquiry table.It is distributed more due to the virtual machine in group every time Change and after performing backup process, this table all can be updated, or system manager the most simply will be hard Part and virtual machine configuration recovery value original state, and keep inquiry table not change.

Virtual machine is virtualized environment, and each virtual machine performs themselves operating system and application journey Sequence.In the present invention, Linux is designated as the operation sequence of all virtual machines and physical machine.Each virtual One abnormity detecting unit is all installed, the example processed as MPI on machine.

Virtual network interface is assigned to each virtual machine.Each interface has single MAC Address and IP Address.

The present invention only uses TCP/IP communication interface (to have the physical services of certain quantity virtual machine as physical machine Device main frame) in local data exchange and the interface of inter-node communication.Virtual machine guest dom U and virtual network Driving Direct Communication, virtual network drives and drives function identical with Ethernet card.It is translated as hardware with by instruction Unlike signal, this driving will interact with dom0 so that connects with the respective rear ends in driving field Mouth communication.This makes virtual machine on all-network go out as the individual services device main frame having different MAC Address Existing.Although ICP/IP protocol is not enough to support the data transmission on same server main frame between virtual machine, but phase Than for the shared drive data transfer protocols that Xensocket Yu Xway provides, by dom0 and domU Between the computing relay of tcp/ip layer be far smaller than the transformation period of sensing data in the network of rivers.Additionally, due to this Ground Xen management program and the use of MPI code, system stability improves.

It is a trend favourable that polycaryon processor starts to commonly use.The system of the present invention can utilize this trend, Make dom U from dom0 in same server main frame run on different threads, thus can allow them Different IPs performs.CPU separates by body plan Xen management program realization.It makes the void in dom0 The MPI that plan I/O control protocol and dom U process is carried out parallel, makes reduction cpu resource compete.This can delay With the delay issue that above-mentioned I/O concentrates MPI process.On all virtual machines all will run based on IP(Internet Protocol, Internet protocol) service, have a following functions:

The file distributing unit that [a] mates from IP subscribes to packet.

Transmission network sensing data subset is inputted abnormity detecting unit by [b], and it is a to NAS to make a copy for.

The prediction module of [c] abnormity detecting unit subscribes to packet from other from the file distributing unit of different IP Virtual machine exchange process data.Prediction module is to go statistically to analyze based on semi-supervised learning framework Data, thus the statistical distribution prediction of distributed network expecting varialbe state in the case of hypothesis is without exception is provided. In system in the present invention, abnormity detecting unit is compiled into MPI program in identical or different physical machine Run on virtual machine.In the present invention, we use the acquiescence ICP/IP protocol on Xen in different MPI journeys Sequence transmits data.Therefore, we need not rebuild data transfer application interface and deacclimatize different service Inside and outside device main frame, data transmit the agreement with communication control.By the tcp/ip layer between dom0 and dom U Computing relay is far smaller than the transformation period of sensing data in the network of rivers.Additionally, due to local Xen manages program With the use of MPI code, system stability improves.

[d] combines shown in Fig. 2, and the analysis module of abnormity detecting unit receives actuarial prediction data from prediction module, It potentially includes the distribution of possible range numerical value, variance, and some other statistical indicator.Each time step Residual error in length must be classified as or outlier consistent with background water quality value.Analyze module according to described The regression parameter of the data subset next time that actuarial prediction data estimation obtains from file distributing unit, to calculate The predictive value of described data subset next time.Data subset and historical data have identical time step next time Long lower and consistent background water quality value.

[e] combines shown in Fig. 2, and the judge module of abnormity detecting unit is inclined to predictive value and online sensing data Difference degree judges.Although the absolute value at first unit lower threshold value can change along with water quality index, phase Acceptable prediction distribution formula network state deviation is fixed to specific standard deviation.Subsequently, one based on The abnormal accident differentiation of machine learning is used as decision tool with sort module.This judge module can be from visit Ask the historical data being stored in Network Attached Storage unit in data base.

[f] combines shown in Fig. 2, and result is imported on different virtual machine on the main abnormity detecting unit run. This main abnormity detecting unit will analyze the result of all previous concurrent abnormity detecting processes, and determines anomalous event Classification with in facility pipe network occur position.

[2] use of Network Attached Storage

Each server host has been not required to single disk, and virtual machine epigraph is stored on NAS, It can be accessed by any physical machine.In this case, any virtual machine can run in any physical machine And without carrying out backup on local disk again.

[3] monitoring and the control of high availability

REMUS software kit in Xen framework is responsible in Xen management program the General Virtual Machine run Offer high-performance ensures.In a system of the invention, when physical machine or simply certain specific virtual machine generation During mistake (whatsoever reason, hardware or software faults), REMUS will be with altofrequency (20-40 inspection Measuring point/second) to virtual machine acquisition testing point (checkpoints), and it is copied into another server host On to complete dynamic migration.In failover, even if not having fault to have moved different server master Virtual machine on machine will resume operation from up-to-date test point (checkpoints).All operations of operating system Time includes that the TCP of activity connects and all can preserve.The process being currently running will be carried out as usual, all files, Network state and disk all will keep integrity, at most TCP storehouse to there will be packet loss, but package also will It is possibly retransmission.

REMUS is used to be possible to prevent virtual machine that collapse fault occurs.This characteristic contributes to carrying out abnormal inspection The parallel computing of the MPI surveyed, because maintenance is all synchronized by all of calculation procedure.

Server exists in pairs with operation/standby both of which under REMUS drives, the service of operational mode Device will send test point information back-up to standby mode server based on Heartbeat signal in good time.

In the present invention, each server host is simultaneously in operation, two patterns of backup, will be by design one Individual inquiry table, is distributed to virtual machine test point (checkpoint) information in certain particular server main frame Another is on the server host of " backup " pattern, in order to can perform dynamic migration when fault occurs.

Owing to solid state hard disc still possesses high availability in acceptable price, the server master of backup virtual machine Machine can use solid state hard disc to be that test point (checkpoint) provides the most local archive, it is achieved virtual machine pole Speed (sub-second) is restarted.

Fig. 5 management describes for the method for the high availability server of executed in parallel online abnormity detecting algorithm. This figure illustrates and replicates fault virtual machine or server in the case of 4 server hosts and 16 virtual machines The situation of main frame.Each server host runs 4 virtual machines.Dom0 at each existing service device main frame The backup manager of upper operation will be gone the backup of process fault virtual machine according to inquiry table by body plan.Such as, clothes Business device main frame #1, in addition to responsible operation A, B, C, D virtual machine, is also responsible for the test point of E, I, M, F Information back-up, when certain virtual machine (assuming E) in E, I, M, F breaks down, server host Backup manager in #1 will enable the backup of corresponding test point, make the virtual machine (E) broken down in service Resume operation in device main frame #1 (now on server host #1 A, B, C, D, E all in operational mode). After automated back-up process completes, showing if required up optimizer system, system manager can get involved management Device manages each server host operating duty, and online by virtual machine (vm) migration to difference physical servers On main frame.Afterwards, system manager will need in existing service device main frame under the virtual machine of redistribution, (assuming that server host #1 collapses, new inquiry table will is that the inquiry table of renewal control virtual machine backup process Server host #2 will run A, E, F, G, H, back up B, I, N, O, P；Server host #3 B, I, J, K, L will be run, back up A, C, D, E, M；Server host #4 will run C, D, M, N, O, P, back up I, J, K, M, N), or simply simply hardware and virtual machine are configured back Complex value original state, and keep inquiry table not change.The design principle of inquiry table is, when any one services After the collapse of device main frame, all virtual machines that this server host runs will be divided equally to other normal server master Continue to run with on machine.

Fig. 5 have employed as an embodiment and only comprise 4 server hosts and be separately operable 4 virtual machines Situation.Should state, it illustrates just for possible embodiments of the present invention, this embodiment And it being not used to limit the scope of the claims of the present invention, the present invention should include but not limited to above-mentioned detailed description and tool Style.The present invention should include all adjustment in the range of core content and amendment, all without departing from institute of the present invention For equivalence implement or change, be intended to be limited solely by the scope of the claims of this case.

Claims

1. an online pipe network abnormity detecting system based on machine learning, it is characterised in that comprising:

Multiple abnormity detecting unit, receive corresponding data subset for one_to_one corresponding, and to described data Collection carries out abnormity prediction, the plurality of abnormity detecting unit panel data based on semi-supervised learning framework Process and carry out data transmission by MPI each other；

Described abnormity detecting unit is installed on virtual machine, the corresponding abnormity detecting unit of each virtual machine；

Described online pipe network abnormity detecting system based on machine learning farther includes multiple server host, Server host is connected with each other by fully connected topology in LAN, and each server host is equipped with more than one Core processor, multiple virtual machines that described polycaryon processor is divided on same server host according to thread, Wherein, first thread is designated as virtual machine dom 0, and other thread is divided into virtual machine dom U, described void Plan machine dom 0 is for accessing the hardware of server host and interacting with virtual machine dom U, described Virtual machine dom U is used for installing abnormity detecting unit, the virtual machine dom U of the server host of each operation Other server hosts run are provided with corresponding backup.

Online pipe network abnormity detecting system based on machine learning the most according to claim 1, its feature Being, described abnormity detecting unit includes:

Online pipe network abnormity detecting system based on machine learning the most according to claim 2, its feature Being, described prediction module is set up the method for forecast model and is comprised the following steps:

X_i(t+1)=F_i(X (t), X (t-1), X (t-2) ... X (t-n)) (1)

Step 12, based on multiple regression equation build forecast model:

X_i(t+1)=A_i0*X_i(t)+A_i1*X_i(t-1)+...A_in*X_i(t-n)+C_i (2)

Step 13, solve random error parameter C_i

C_{i} = Σ_{j &NotEqual; i}^{n} A_{i j 0} * X_{j} (t) + Σ_{j &NotEqual; i}^{n} A_{i j 1} * X_{j} (t - 1) + .... + Σ_{j &NotEqual; i}^{n} A_{i j n} * X_{j} (t - n) - - - (3)

Online pipe network abnormity detecting system based on machine learning the most according to claim 3, its feature Being, the method that the abnormity of described data subset next time is judged by described judge module includes following Step:

Step 34, relatively described difference and standard deviation scope:

Prediction X_i(t+2) time, if difference is more than the sample { X updated_i(t+1)X_i(t), X_i(t-1) ... X_i (t-P+1) standard deviation scope }, corresponding judge module then returns a signal certainly to decision-making module, certainly Plan module in data base by X_i(t+2) be labeled as anomalous event, decision-making module by this judge module of instruction according to Database update regression parameter, but use old sample { X_i(t+1)X_i(t), X_i(t-1) ... X_i(t-P+1)} Definition standard deviation, for X_i(t+3) exception judges.

Online pipe network abnormity detecting system based on machine learning the most according to claim 4, its feature Being, described online pipe network abnormity detecting system based on machine learning farther includes Network Attached Storage list Unit, for storing the mirror image copies of the historical data of all virtual machines and online pipe network, each abnormity detecting Unit all may have access to the data in this Network Attached Storage unit, and virtual machine dom 0 connects the virtual of its correspondence Machine dom U and the communication of Network Attached Storage unit.

Online pipe network abnormity detecting system based on machine learning the most according to claim 5, its feature Being, the method for described backup is: by the virtual machine dom U test point on the server host of each operation Information is distributed backup according to the loading condition of other server hosts run, to realize optimal balance The method of operation, automatically generates an inquiry table after backup, described inquiry table is used for defining primary fault virtual machine dom The migration node of U backup, in order to perform when virtual machine dom U or server host break down dynamically to move Move.

Online pipe network abnormity detecting system based on machine learning the most according to claim 6, its feature It is, each virtual machine dom 0 arranges backup manager, for by this virtual machine dom 0 correspondence The health status of virtual machine dom U arranges into inventory, and virtual machine dom 0 passes through backup manager according to inquiry table Gone to process the backup of fault virtual machine by body plan.

Online pipe network abnormity detecting system based on machine learning the most according to claim 1, its feature Being, the method for described file distributing unit distribution data subset comprises the following steps: