Summary of the invention
For produced problem in background technology, the invention provides a kind of thermal power generation unit remote diagnosis method based on Distributed Calculation, described method is expanded based on Hadoop, self-defining data is served and is developed operation platform, described method directly imports existing platform data by interface routine, and feed back Distributed Calculation result, described self-defining data service comprises the following steps:
The data syn-chronization of PI database imports in Hadoop cluster by ETL module;
Standard read/write module provides query interface for upper layer application;
Data Synthesis module provides professional function;
Data order module performs remote diagnosis tasks;
Described operation platform is used for providing graphically managing and system adjustment and optimization to whole Hadoop system.
Preferably, described ETL module is further used for described importing data to be stored to HBase.
In above-mentioned either a program preferably, the data in described ETL module decimates different pieces of information source are saved in distributed file system.
In above-mentioned either a program preferably, described ETL module automatically synchronizing data numerical value and attribute.
In above-mentioned either a program preferably, described standard read/write comprises the basic read-write of individual data and/or batch data.
In above-mentioned either a program preferably, described professional function comprises at least one in database manipulation class function, common mathematical function, professional function, mathematical statistics function, device characteristics function, remote diagnosis function, application interface function.
In above-mentioned either a program preferably, described data order module is further used for data mining task and mathematical statistics task.
In above-mentioned either a program preferably, described data order module directly can call the primary api function of Hadoop.
In above-mentioned either a program preferably, described operation platform comprises deployment, operation monitoring, the configuration schedules of cluster and arranges at least one in authority.
In above-mentioned either a program preferably, described system can provide information system application.
In above-mentioned either a program preferably, described information system application comprises at least one in alarm mechanism, remote diagnosis, workflow management, data monitoring and form displaying.
In above-mentioned either a program preferably, in described operation platform, keeper can increase, configure, the arbitrary node removed in distributed type assemblies.
In above-mentioned either a program preferably, described keeper can see that the information of the state of whole cluster and/or individual node, health degree and state of executing the task gathers middle at least one.
In above-mentioned either a program preferably, described database manipulation comprises increasing, deletes, changes, looks into middle at least one.
In above-mentioned either a program preferably, described database manipulation class function comprises the read-write operation function of logarithm value and/or the handling function to data attribute.
In above-mentioned either a program preferably, described common mathematical function comprises at least one in computing function, index calculate function, iunction for curve.
In above-mentioned either a program preferably, described professional function comprises steam thermodynamic property calculation function.
In above-mentioned either a program preferably, described steam thermodynamic property calculation function comprises and asks at least one in entropy, enthalpy, saturation temperature.
In above-mentioned either a program preferably, described mathematical statistics function comprises at least one in weighting, dispersion, variance.
In above-mentioned either a program preferably, described device characteristics comprise boiler part and steam turbine part.
In above-mentioned either a program preferably, described boiler packet portion draws together at least one in steam-water separator, air preheater, boiler fan, combustion system.
In above-mentioned either a program preferably, described steam turbine part comprises at least one in turbine heat rate rate, steam turbine body, cold end system, well heater heat regenerative system.
In above-mentioned either a program preferably, described application interface function is mutual for what complete with other platform.
Present invention also offers a kind of thermal power generation unit remote diagnosis system based on Distributed Calculation, described system is expanded based on Hadoop, self-defining data is served and is developed operation platform, described system directly imports existing platform data by interface routine, and feed back Distributed Calculation result, described self-defining data service comprises:
ETL module, for importing in Hadoop cluster by the data syn-chronization of PI database;
Standard read/write module, for providing query interface for upper layer application;
Data Synthesis module, for providing professional function;
Data order module, for performing remote diagnosis tasks;
Described operation platform is used for providing graphically managing and system adjustment and optimization to whole Hadoop system.
Preferably, described ETL module is further used for described importing data to be stored to HBase.
In above-mentioned either a program preferably, the data in described ETL module decimates different pieces of information source are saved in distributed file system.
In above-mentioned either a program preferably, described ETL module automatically synchronizing data numerical value and attribute.
In above-mentioned either a program preferably, described standard read/write comprises the basic read-write of individual data and/or batch data.
In above-mentioned either a program preferably, described professional function comprises at least one in database manipulation class function, common mathematical function, professional function, mathematical statistics function, device characteristics function, remote diagnosis function, application interface function.
In above-mentioned either a program preferably, described data order module is further used for data mining task and mathematical statistics task.
In above-mentioned either a program preferably, described data order module directly can call the primary api function of Hadoop.
In above-mentioned either a program preferably, described operation platform comprises deployment, operation monitoring, the configuration schedules of cluster and arranges at least one in authority.
In above-mentioned either a program preferably, described system can provide information system application.
In above-mentioned either a program preferably, described information system application comprises at least one in alarm mechanism, remote diagnosis, workflow management, data monitoring and form displaying.
In above-mentioned either a program preferably, in described operation platform, keeper can increase, configure, the arbitrary node removed in distributed type assemblies.
In above-mentioned either a program preferably, described keeper can see that the information of the state of whole cluster and/or individual node, health degree and state of executing the task gathers middle at least one.
In above-mentioned either a program preferably, described database manipulation comprises increasing, deletes, changes, looks into middle at least one.
In above-mentioned either a program preferably, described database manipulation class function comprises the read-write operation function of logarithm value and/or the handling function to data attribute.
In above-mentioned either a program preferably, described common mathematical function comprises at least one in computing function, index calculate function, iunction for curve.
In above-mentioned either a program preferably, described professional function comprises steam thermodynamic property calculation function.
In above-mentioned either a program preferably, described steam thermodynamic property calculation function comprises and asks at least one in entropy, enthalpy, saturation temperature.
In above-mentioned either a program preferably, described mathematical statistics function comprises at least one in weighting, dispersion, variance.
In above-mentioned either a program preferably, described device characteristics comprise boiler part and steam turbine part.
In above-mentioned either a program preferably, described boiler packet portion draws together at least one in steam-water separator, air preheater, boiler fan, combustion system.
In the present invention, make full use of cluster-based storage resource, Internet resources and the computational resource of being united by Hadoop, realize the high-speed computation based on mass data and storage.Based on Hadoop, computing function, steam character function, mathematical statistics probability function etc. that encapsulation thermal power generation unit is conventional, build the Distributed Computing Platform being applicable to thermal power generation unit calculation of thermodynamics, make full use of the historical data of magnanimity, carry out distributed thermodynamic computing, carry out thermal power generation unit remote diagnosis, improve the operational efficiency of remote diagnosis platform.
By the extendability adopting distributed structure/architecture to improve system, use existing resource, organize huger computer network, larger cluster to meet demand corresponding to company, platform is made to have stronger computing power, widely Data distribution8, lower coupling, stronger stability and extensibility more flexibly, make remote diagnosis efficient quick more, the content that can diagnose is more comprehensive simultaneously.
Embodiment
Describe the present invention in conjunction with exemplary embodiment with reference to the accompanying drawings.
Embodiment 1:
Be illustrated in figure 1 thermal power generation unit remote diagnosis system distributed structure/architecture figure.Wherein, expand based on Hadoop, self-defining data service also develops operation platform.The task of remote diagnosis is called self-defining data service and the primary API of Hadoop exactly and is obtained data and to go forward side by side row operation, operation platform is responsible for disposing, monitoring, scheduling, diagnostic task execution.The bottom of whole Technical Architecture is made up of Hadoop cluster, builds Hbase database on cluster, and is operated database by existing Hbase API.
Hadoop: one distributed system architecture, is developed by Apache fund club.
User can when not understanding distributed low-level details, exploitation distributed program.The power making full use of cluster carries out high-speed computation and storage.Hadoop achieves a distributed file system (Hadoop Distributed File System), is called for short HDFS.HDFS has the feature of high fault tolerance, and design is used for being deployed on cheap (low-cost) hardware; And it provides high-throughput (high throughput) to visit the data of application program, be applicable to the application program that those have super large data set (large data set).The requirement of HDFS relaxes (relax) POSIX, can access the data in (streaming access) file system in the form of streaming.It mainly contains following advantage: high reliability, and the ability value of Hadoop step-by-step Storage and Processing data obtains people and trusts; High scalability, Hadoop distributes data and completes calculation task between available computing machine collection bunch, and these collection bunch can expand in thousands of nodes easily; High efficiency, Hadoop can dynamically Mobile data among the nodes, and ensures the mobile equilibrium of each node, and therefore processing speed is quickly; High fault tolerance, Hadoop can preserve multiple copies of data automatically, and can automatically the task of failure be redistributed; Low cost, compared with the Data Marts such as all-in-one, commercial data warehouse and QlikView, Yonghong Z-Suite, hadoop increases income, and therefore the software cost of project can reduce greatly.Hadoop is able to widespread use in large data handling utility and has benefited from himself extracting, be out of shape and load the inherent advantage in (ETL) in data.The distributed structure/architecture of Hadoop, by large data processing engine as much as possible near storing, relatively suitable to the batch operation such as ETL, because the batch processing result of of this sort operation can directly be moved towards to store.Individual task is smashed by the MapReduce functional realiey of Hadoop, and is sent on multiple node by fragment task (Map), loads (Reduce) in data warehouse more afterwards with the form of individual data collection.
Hadoop is made up of many elements, its bottommost Hadoop Distributed File System(HDFS), it stores the file in Hadoop cluster on all memory nodes.HDFS(is for herein) last layer be MapReduce engine, this engine is made up of JobTrackers and TaskTrackers.By distributed file system HDFS, the MapReduce processing procedure most crucial to Hadoop Distributed Computing Platform, and the introduction of Tool for Data Warehouse Hive and distributed data base Hbase, substantially cover all technological cores of Hadoop distributed platform.
HDFS: for external client, HDFS are just as a traditional hierarchical file system.Can create, delete, move or Rename file, etc.But the framework of HDFS builds based on one group of specific node, this is determined by it self feature.These nodes comprise NameNode(only), it provides Metadata Service in HDFS inside; DataNode, it provides storage block for HDFS.Owing to only there is a NameNode, therefore this is a shortcoming (single point failure) of HDFS.The file be stored in HDFS is divided into block, then these blocks is copied to (DataNode) in multiple computing machine.This and traditional RAID framework differ widely.The size (being generally 64MB) of block and the number of blocks copied are determined by client computer when creating file.NameNode can control All Files operation.The all measured TCP/IP agreement of all communications of HDFS inside.
NameNode:NameNode is the software that an independent machine usually in HDFS example runs.It is in charge of file system name space and controls the access of external client.NameNode determine whether by File Mapping in the copy block on DataNode.For modal 3 copy block, first copy block is stored on the different nodes of same frame, and last copy block is stored on certain node of different frame.Note, need you to understand aggregated structure here.Actual I/O affairs, not through NameNode, only have the metadata of the File Mapping representing DataNode and block through NameNode.When external client send request require to create file time, NameNode can with the DataNode IP address of first copy of block identification and this block responsively.This NameNode also can notify that other will receive the DataNode of the copy of this block.NameNode is called at one the information storing all about file system name space in the file of FsImage.This file and a log file (being EditLog here) comprising all affairs will be stored on the local file system of NameNode.FsImage and EditLog file also needs reproduction replica, in case file corruption or NameNode system loss.NameNode itself inevitably has SPOF(Single Point Of Failure) risk of single point failure, active-standby mode can not address this problem, and could realize 100% uptime pot life by Hadoop Non-stop namenode.
DataNode:DataNode is also the software that an independent machine usually in HDFS example runs.Hadoop cluster comprises a NameNode and a large amount of DataNode.DataNode is usually with the form tissue of frame, and all systems are coupled together by a switch by frame.A hypothesis of Hadoop is: the transmission speed between machine frame inside node is faster than the transmission speed of frame intermediate node.DataNode response is from the read-write requests of HDFS client computer.They also respond the order from the establishment of NameNode, deletion and copy block.NameNode relies on regular heartbeat (heartbeat) message from each DataNode.Every bar message all comprises a block report, and NameNode can map and alternative document system metadata according to this reporting authentication block.If DataNode can not send heartbeat message, NameNode will take reclamation activities, again copy the block lost on this node.
File operation: visible, HDFS is not an omnipotent file system.Its fundamental purpose is the mass file of supporting to access in the form of streaming write.If client computer wants file to write on HDFS, first need this file to be cached to local interim storage.If the data of buffer memory are greater than required HDFS block size, the request creating file will send to NameNode.NameNode will with DataNode mark and object block customer in response machine.Also notice will preserve the DataNode of blocks of files copy simultaneously.When client computer starts temporary file to be sent to first DataNode, by pipe method, block content will be transmitted to copy DataNode immediately.Client computer is also responsible for creating School Affairs (checksum) file be kept in identical HDFS name space.After last blocks of files sends, document creation is submitted to its persistence metadata store (at EditLog and FsImage file) by NameNode.
Linux cluster: Hadoop framework can use on single Linux platform (when exploitation and debugging), and official provides MiniCluster to use as unit testing, but uses its strength of commerce server competence exertion left in frame.These frames form a Hadoop cluster.It determines how in whole cluster, to distribute operation and file by cluster topology knowledge.Hadoop supposes that node may failure, therefore employing native method process single the computing machine even failure of institute's organic frame.
In the present invention, the design of whole system comprises five modules, is operation platform design, the design of ETL Interface design, standard read/write, Data Synthesis and the design of data order respectively.Wherein, ETL design achieves and is imported in Hadoop cluster by the data syn-chronization in PI database, and is stored in Hbase, and this part relates to the design of reading to data in PI database and write Hbase; Standard read/write modular design refers to for upper layer application provides simple query window; Data Synthesis module provides the basic conventional professional function such as data statistics class function, device characteristics function; Data order module is responsible for performing remote diagnosis tasks; Operation platform module refers to provides graphically managing and system adjustment and optimization to whole Hadoop system, mainly comprises the deployment of cluster, operation monitoring, configuration schedules and arrange authority.In the present invention, for reducing the complicacy of platform O&M, the running status of the simple and effective management of exploitation operation platform, supervisory system.Keeper can increase easily, configure, the arbitrary node removed in distributed type assemblies; Also can see state and the health degree of whole cluster or individual node, and the information of execution status of task gathers, thus greatly simplify and reduce flow process and the cost of the daily O&M of cluster.
Embodiment 2:
The applied research of Distributed Calculation in air preheater diagnosis:
The air preheater of the same type to 300MW unit is diagnosed, show all air preheaters of the same type, list calculates and shows all parameters relevant to air preheater and performance index situation: cigarette temperature drop, wind-warm syndrome liter, X ratio, air leak rate of air curtain, the sulfur of coal, the efficiency that the side pressure of air preheater wind is poor, the side pressure of air preheater cigarette is poor, steam air heater exports cigarette temperature, air preheater efficiency, induced draft fan efficiency, ammonia.By analyzing historical data, particularly the data of a period of time before and after air preheater maintenance are analyzed, under identical operating mode, (load, heat supply, environment temperature) searches out the reason causing air preheater performance change, to display on displaying interface simultaneously fast, and can select at any time to switch any time period.
Embodiment 3:
The applied research that Distributed Calculation is diagnosed in cold end system:
With 300MW unit for research object, calculate and show condenser practical vacuum, delivery temperature inverse vacuum, theory should reach vacuum, vacuum tightness, condenser terminal difference, with same type units vacuum under operating mode and index of correlation and parameter.By historical data analysis (contrast with historical data, contrast with parameter index as the occasion requires, contrast with same type units), judge whether cold junction produces exception, list abnormal index parameter, being got it right by logical inverse is affected vacuum than finding out and holds poor main cause.
Embodiment 4:
The research that Distributed Calculation is diagnosed in high pressure cylinder efficiency:
Be vented water-cooled 300MW steam turbine high-pressure cylinder efficiency to three cylinders four to analyze and research, to calculate and vision-control stage efficiency, governing stage one pumping efficiency, governing stage one are extracted into high row's efficiency, high pressure cylinder efficiency, one is taken out temperature, two and taken out temperature, monitors section pressure, first stage pressure, all indexs relevant to high pressure cylinder efficiency and calculating all are listed, by with steam turbine of the same type, the steam turbine particularly produced with manufacturing plant compares, and comparative analysis calculates the reason drawing high pressure cylinder exception repeatedly.
In order to understand the present invention better, in conjunction with specific embodiments the present invention to be explained in detail above.But, obviously can carry out different modification and remodeling to the present invention and not exceed the wider spirit and scope of the present invention that claim limits.Therefore, above embodiment has exemplary and hard-core implication.