The content of the invention
For produced problem in background technology, the present invention provides a kind of thermal power generation unit based on Distributed Calculation
Remote diagnosis method, the method are extended based on Hadoop, and self-defining data service simultaneously develops operation platform, described
Method can be introduced directly into existing platform data by interface routine, and feed back Distributed Calculation as a result, the self-defining data takes
Business comprises the following steps:
The data of PI databases are synchronously imported into Hadoop clusters by ETL module;
Standard read/write module provides query interface for upper layer application;
Data Synthesis module provides professional function;
Data order module performs remote diagnosis tasks;
The operation platform is used to provide graphically managing and system adjustment and optimization to whole Hadoop system.
Preferably, the ETL module is further used for storing the importing data to HBase.
In any of the above-described scheme preferably, the data of the ETL module extraction different data sources are saved in distribution
File system.
In any of the above-described scheme preferably, the ETL module automatically synchronizing data numerical value and attribute.
In any of the above-described scheme preferably, the standard read/write includes the basic of individual data and/or batch data
Read-write.
In any of the above-described scheme preferably, the specialty function includes database manipulation class function, mathematical statistics letter
At least one of number, device characteristics function, remote diagnosis function, application interface function.
In any of the above-described scheme preferably, the data order module is further used for data mining task and mathematics
Statistics task.
In any of the above-described scheme preferably, the data order module can directly invoke the primary api functions of Hadoop.
In any of the above-described scheme preferably, the operation platform includes the deployment of cluster, operation monitoring, configuration schedules
And at least one of authority is set.
In any of the above-described scheme preferably, the system can provide information system application.
In any of the above-described scheme preferably, described information system, which is applied, includes alarm mechanism, remote diagnosis, flow pipe
At least one of reason, data monitoring and report displaying.
In any of the above-described scheme preferably, in the operation platform, administrator can increase, configure, removing distribution
Arbitrary node in cluster.
In any of the above-described scheme preferably, the administrator can be seen whole cluster and/or individual node state,
Health degree and perform the information of task status and at least one of collect.
In any of the above-described scheme preferably, the database manipulation includes at least one of increasing, deleting, changing, looking into.
In any of the above-described scheme preferably, the database manipulation class function includes the read-write operation function to numerical value
And/or the handling function to data attribute.
In any of the above-described scheme preferably, the common mathematical function includes calculating function, index calculating function, curve plan
Close at least one of function.
In any of the above-described scheme preferably, the specialty function includes steam thermodynamic property calculation function.
In any of the above-described scheme preferably, the steam thermodynamic property calculation function includes asking entropy, enthalpy, saturation
At least one of temperature.
In any of the above-described scheme preferably, the mathematical statistics function is included at least one in weighting, dispersion, variance
Kind.
In any of the above-described scheme preferably, the device characteristics include boiler part and steam turbine part.
In any of the above-described scheme preferably, the boiler part include steam-water separator, air preheater, boiler fan,
At least one of combustion system.
In any of the above-described scheme preferably, the steam turbine part includes turbine heat rate rate, steam turbine body, cold end system
At least one of system, heater heat regenerative system.
In any of the above-described scheme preferably, the application interface function is used to complete to interact with other platforms.
Present invention also offers a kind of thermal power generation unit remote diagnosis system based on Distributed Calculation, the system with
It is extended based on Hadoop, self-defining data service simultaneously develops operation platform, and the system can be direct by interface routine
Existing platform data is imported, and feeds back Distributed Calculation as a result, the self-defining data service includes:
ETL module, for the data of PI databases synchronously to be imported into Hadoop clusters;
Standard read/write module, for providing query interface for upper layer application;
Data Synthesis module, for providing professional function;
Data order module, for performing remote diagnosis tasks;
The operation platform is used to provide graphically managing and system adjustment and optimization to whole Hadoop system.
Preferably, the ETL module is further used for storing the importing data to HBase.
In any of the above-described scheme preferably, the data of the ETL module extraction different data sources are saved in distribution
File system.
In any of the above-described scheme preferably, the ETL module automatically synchronizing data numerical value and attribute.
In any of the above-described scheme preferably, the standard read/write includes the basic of individual data and/or batch data
Read-write.
In any of the above-described scheme preferably, the specialty function includes database manipulation class function, mathematical statistics letter
At least one of number, device characteristics function, remote diagnosis function, application interface function.
In any of the above-described scheme preferably, the data order module is further used for data mining task and mathematics
Statistics task.
In any of the above-described scheme preferably, the data order module can directly invoke the primary api functions of Hadoop.
In any of the above-described scheme preferably, the operation platform includes the deployment of cluster, operation monitoring, configuration schedules
And at least one of authority is set.
In any of the above-described scheme preferably, the system can provide information system application.
In any of the above-described scheme preferably, described information system, which is applied, includes alarm mechanism, remote diagnosis, flow pipe
At least one of reason, data monitoring and report displaying.
In any of the above-described scheme preferably, in the operation platform, administrator can increase, configure, removing distribution
Arbitrary node in cluster.
In any of the above-described scheme preferably, the administrator can be seen whole cluster and/or individual node state,
Health degree and perform the information of task status and at least one of collect.
In any of the above-described scheme preferably, the database manipulation includes at least one of increasing, deleting, changing, looking into.
In any of the above-described scheme preferably, the database manipulation class function includes the read-write operation function to numerical value
And/or the handling function to data attribute.
In any of the above-described scheme preferably, the common mathematical function includes calculating function, index calculating function, curve plan
Close at least one of function.
In any of the above-described scheme preferably, the specialty function includes steam thermodynamic property calculation function.
In any of the above-described scheme preferably, the steam thermodynamic property calculation function includes asking entropy, enthalpy, saturation
At least one of temperature.
In any of the above-described scheme preferably, the mathematical statistics function is included at least one in weighting, dispersion, variance
Kind.
In any of the above-described scheme preferably, the device characteristics include boiler part and steam turbine part.
In any of the above-described scheme preferably, the boiler part include steam-water separator, air preheater, boiler fan,
At least one of combustion system.
In the present invention, cluster-based storage resource, Internet resources and the computing resource united by Hadoop are made full use of, it is real
Now high-speed computation and storage based on mass data.Based on Hadoop, the common calculating function of encapsulation thermal power generation unit,
Steam property function, mathematical statistics probability function etc., Distributed Calculation of the structure suitable for thermal power generation unit calculation of thermodynamics
Platform, makes full use of the historical data of magnanimity, carries out distributed thermodynamic computing, carries out thermal power generation unit remote diagnosis, improves
The operational efficiency of remote diagnosis platform.
The autgmentability of system is improved by using distributed structure/architecture, with existing resource, the huger meter of tissue
Calculation machine network, the cluster of bigger meet the corresponding demand of company, platform is had stronger computing power, wider data point
Cloth, lower coupling, stronger stability and more flexible scalability, while make remote diagnosis more efficient quick,
The content that can be diagnosed is more comprehensive.
Embodiment 1:
It is as shown in Figure 1 thermal power generation unit remote diagnosis system distributed structure/architecture figure.Wherein, based on Hadoop into
Row extension, self-defining data service simultaneously develop operation platform.The task of remote diagnosis be exactly call self-defining data service and
The primary API of Hadoop obtain data and carry out computing, and operation platform is responsible for disposing, monitors, dispatches, the execution of diagnostic task.It is whole
The bottom of a Technical Architecture is made of Hadoop clusters, and Hbase databases are built on cluster, and by existing
Hbase API operate database.
Hadoop:One distributed system architecture, is developed by Apache funds club.
User can develop distributed program in the case where not knowing about distributed low-level details.Make full use of cluster
Power carries out high-speed computation and storage.Hadoop realizes a distributed file system (HadoopDistributed File
System), abbreviation HDFS.HDFS has the characteristics of high fault tolerance, and is designed to be deployed in cheap (low-cost) hardware
On;And it provides the data that high-throughput (high throughput) carrys out access application, it is adapted to those to have super large number
According to the application program of collection (large data set).HDFS relaxes the requirement of (relax) POSIX, can visit in the form of streaming
Ask the data in (streaming access) file system.It mainly has following advantage:High reliability, Hadoop step-by-steps
The ability value of storage and processing data obtains people's trust;High scalability, Hadoop are to distribute number between available computer cluster
According to and complete calculating task, these clusters can be easily extended in thousands of nodes;High efficiency, Hadoop can
Dynamically mobile data among the nodes, and ensure the dynamic equilibrium of each node, therefore processing speed is very fast;It is high fault-tolerant
Property, Hadoop can automatically save multiple copies of data, and task that can be automatically by failure is redistributed;Low cost,
Compared with the Data Mart such as all-in-one machine, commercial data warehouse and QlikView, Yonghong Z-Suite, hadoop is to increase income
, therefore the software cost of project can substantially reduce.Hadoop is able to the extensive use in big data processing application and has benefited from it
From the inherent advantage extracted, deform and loaded in data in (ETL) aspect.The distributed structure/architecture of Hadoop, at big data
Engine is managed as far as possible close to storage, it is relatively suitable to batch operation as such as ETL, because of this sort operation
Batch processing result can directly be moved towards to store.The MapReduce functions of Hadoop, which realize, smashes individual task, and by fragment
Task (Map) is sent on multiple nodes, loads (Reduce) in the form of individual data collection again afterwards in data warehouse.
Hadoop is made of many elements, its bottommost Hadoop Distributed File System (HDFS), it
Store the file on all memory nodes in Hadoop clusters.The last layer of HDFS (for herein) is MapReduce engines, should
Engine is made of JobTrackers and TaskTrackers.Pass through the distribution most crucial to Hadoop Distributed Computing Platforms
File system HDFS, MapReduce processing procedure, and the introduction of Tool for Data Warehouse Hive and distributed data base Hbase,
Substantially all technological cores of Hadoop distributed platforms are covered.
HDFS:For external client, HDFS is just as a traditional hierarchical file system.It can create, delete, move
Dynamic or Rename file, etc..But the framework of HDFS is built based on one group of specific node, this is the spy by its own
What point determined.These nodes include NameNode (only one), it provides Metadata Service inside HDFS;DataNode, it
Memory block is provided for HDFS.Due to only existing a NameNode, this is a shortcoming (single point failure) of HDFS.Storage
File in HDFS is divided into block, then copies to these blocks in multiple computers (DataNode).This with it is traditional
RAID frameworks differ widely.The size (being usually 64MB) of block and the number of blocks replicated are determined in establishment file by client computer.
NameNode can control All Files to operate.All communications inside HDFS are all based on the ICP/IP protocol of standard.
NameNode:NameNode is the software run on an independent machine usually in HDFS examples.It is responsible for
Manage file system name space and control the access of external client.NameNode decides whether to arrive File Mapping
In copy block on DataNode.For most common 3 copy blocks, first copy block is stored in the different sections of same rack
On point, last copy block is stored on some node of different racks.Note that needing exist for you understands aggregated structure.It is real
The I/O affairs on border do not pass through NameNode, only represent that the metadata of the File Mapping of DataNode and block is passed through
NameNode.When external client, which sends request, requires establishment file, NameNode can be with first of block identification and the block
The DataNode IP address of copy is in response.This NameNode is also notified that other will receive the copy of the block
DataNode.NameNode stores the information of all about file system name space in a file for being known as FsImage.
This file and a record file (being EditLog here) comprising all affairs will be stored in the local file of NameNode
In system.FsImage and EditLog files are also required to reproduction replica, to prevent file corruption or NameNode system loss.
NameNode inevitably has the risk of SPOF (Single Point Of Failure) single point failure, active and standby mould in itself
Formula can not solve the problems, such as this, and 100%uptime pot lifes could be realized by Hadoop Non-stop namenode.
DataNode:DataNode is also the software run on an independent machine usually in HDFS examples.
Hadoop clusters include a NameNode and a large amount of DataNode.DataNode tissue, rack usually in the form of rack are led to
An interchanger is crossed to connect all systems.A hypothesis of Hadoop is:Transmission speed between machine frame inside node
It is faster than the transmission speed of rack intermediate node.DataNode responds the read-write requests from HDFS client computer.They are also responsive to come from
Establishment, deletion and the order of copy block of NameNode.NameNode relies on the regular heartbeat from each DataNode
(heartbeat) message.Every message all reports comprising block, NameNode can be mapped according to this reporting authentication block and
Alternative document system metadata.If DataNode cannot send heartbeat message, NameNode will take reclamation activities, again multiple
Make the block lost on this node.
File operation:As it can be seen that HDFS is not an omnipotent file system.Its main purpose is the shape supported to flow
Formula accesses the mass file of write-in.If client computer is want to write file on HDFS, it is necessary first to which this document is cached to local
Interim storage.If the data of caching are more than required HDFS block sizes, the request of establishment file will be sent to NameNode.
NameNode will be identified with DataNode and object block customer in response machine.Also notice will save file block copy at the same time
DataNode., will be immediately by pipe method by block when client computer starts temporary file being sent to first DataNode
Content is transmitted to copy DataNode.Client computer also be responsible for creating be stored in verification in identical HDFS name spaces and
(checksum) file.After last blocks of files is sent, document creation is submitted to its persistence member number by NameNode
According to storage (in EditLog and FsImage files).
Linux clusters:Hadoop frames can use (when exploitation and debugging) in single Linux platform, official provides
MiniCluster is used as unit testing, but uses commerce server competence exertion its strength being stored in rack.
These racks form a Hadoop cluster.It determines how operation and text are distributed in whole cluster by cluster topology knowledge
Part.Hadoop assumes that node may fail, therefore handles single the computer even failure of institute's organic frame using native method.
The design of whole system includes five modules in the present invention, is operation platform design, ETL Interface designs, mark respectively
Quasi- Design of Read-Write, Data Synthesis and the design of data order.Wherein, ETL designs, which realize, synchronously leads the data in PI databases
Enter into Hadoop clusters, and be stored in Hbase, this part is related to reading and write-in to data in PI databases
The design of Hbase;The design of standard read/write module refers to provides simple query window for upper layer application;Data Synthesis module provides base
The common professional function such as data statistics class function originally, device characteristics function;Data order module is responsible for performing remote diagnosis times
Business;Operation platform module refers to, to whole Hadoop system offer graphically managing and system adjustment and optimization, mainly including cluster
Deployment, operation monitoring, configuration schedules and setting authority.In the present invention, to reduce the complexity of platform O&M, operation platform is developed
Simple and effective management, the operating status of monitoring system.Administrator can easily increase, configures, remove in distributed type assemblies
Arbitrary node;It can also be seen that the state and health degree of whole cluster or individual node, and the information of execution status of task
Collect, so as to greatly simplify and reduce the flow and cost of the daily O&M of cluster.