CN105608758B - A kind of big data analysis platform device and method calculated based on algorithm configuration and distributed stream - Google Patents

A kind of big data analysis platform device and method calculated based on algorithm configuration and distributed stream Download PDF

Info

Publication number
CN105608758B
CN105608758B CN201510962436.3A CN201510962436A CN105608758B CN 105608758 B CN105608758 B CN 105608758B CN 201510962436 A CN201510962436 A CN 201510962436A CN 105608758 B CN105608758 B CN 105608758B
Authority
CN
China
Prior art keywords
data
monitoring
services
result
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510962436.3A
Other languages
Chinese (zh)
Other versions
CN105608758A (en
Inventor
丁书耕
张建辉
孙燕
王震
丛兴滋
杨立涛
刘涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Luneng Software Technology Co Ltd
Original Assignee
Shandong Luneng Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Luneng Software Technology Co Ltd filed Critical Shandong Luneng Software Technology Co Ltd
Priority to CN201510962436.3A priority Critical patent/CN105608758B/en
Publication of CN105608758A publication Critical patent/CN105608758A/en
Application granted granted Critical
Publication of CN105608758B publication Critical patent/CN105608758B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07CTIME OR ATTENDANCE REGISTERS; REGISTERING OR INDICATING THE WORKING OF MACHINES; GENERATING RANDOM NUMBERS; VOTING OR LOTTERY APPARATUS; ARRANGEMENTS, SYSTEMS OR APPARATUS FOR CHECKING NOT PROVIDED FOR ELSEWHERE
    • G07C3/00Registering or indicating the condition or the working of machines or other apparatus, other than vehicles
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06QDATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply

Abstract

A kind of big data analysis platform device and method calculated based on algorithm configuration and distributed stream, including data source acquisition device, Data Integration unit, time series data memory, computing unit, wireless terminal and the data service terminal being sequentially connected, can quickly, efficiently, in time handle magnanimity real time data, ensure that computational efficiency is efficient, it is flexibly and expansible strong, while ensure equipment safety, stably, efficiently run.

Description

A kind of big data analysis platform device calculated based on algorithm configuration and distributed stream and Method
Technical field
The present invention relates to monitoring of equipment to analyze application field, and in particular to is based on algorithm configuration and distributed flowmeter to one kind The big data analysis platform device and method of calculation.
Background technology
With the rapid development of computer technology, the data of every profession and trade rapidly increase, increasing, the type of data quantitative change Also more and more, data structure also tends to complicate, and traditional database not only independently place by each equipment, and needs larger Space for its deployment, the shortcomings of being not easy to dispose, cost is higher be present, it is impossible to meet the general requirement of user.
Time series data is the time series data with time tag, and its typical feature is that generation frequency is fast, depended critically upon Acquisition time, measuring point multiple data quantity are big.In power industry, in order to ensure equipment safety, stably, efficiently run, it will usually The running status of the various kinds of equipment such as generating, power transformation is monitored in real time, collection, which obtains substantial amounts of time series data, can be used as equipment The basis of the advanced applications such as running status assessment, equipment operation failure early warning, equipment dependability analysis, thus, how quickly, high Effect, magnanimity real time data is handled in time, be always one that the weight assets industry such as electric power, chemical industry, oil, steel faces great Problem.
History service data collection and analysis, real-time or near-realtime data instant analysis are power industries in power industry Important content during middle informatization, it need complete set, stably, agree with the big data of practical business scene The solution of analytical equipment, to equipment fault early-warning etc., analysis classes business scenario provides reliable and stable bottom data branch in real time Support.
In recent years, as the IT technology fast developments such as cloud computing, big data, machine learning, data mining, distribution are deposited Storage, high-performance calculation obtain key breakthrough in theoretical research and engineering practice aspect, industry emerged it is a collection of with Hadoop is big data processing and the application solution of representative.
Hadoop is a distributed system architecture, including distributed file system HDFS (Hadoop Distributed FileSystem), distributed memory system HBase, several cores such as parallel computation programming model MapReduce Center portion point, it can greatly simplify the processing procedure of large-scale data, but it is deposited in terms of functional completeness, operation stability In certain limitation, and based on some commercial big data platforms derived from Hadoop with the actual demand of power business scene Deviation be present, thus, depth analysis studies the business demand of power industry, and structure is a kind of based on distributed time series data service Big data analytical equipment, has far-reaching significance and stronger value.
The diversity of business model and complexity are to compare in power industry during informatization in power industry The content of core, algorithm model is solved by series of computation unit according to practical business demand dynamic layout, so as to form industry The process of business model, computing unit, which independently develops, can preferably improve the accuracy of computing unit;Streaming computing engine can Ensure the efficient of computational efficiency, it is flexibly and expansible etc..
However, do not have specifically for big data analysis platform device design combine algorithm configuration and distribution at present The device of formula stream calculation mode.
The content of the invention
Calculated it is an object of the invention to overcome the deficiencies of the prior art and provide one kind based on algorithm configuration and distributed stream Big data analysis platform device and method, can quickly, efficiently, in time handle magnanimity real time data, ensure computational efficiency Efficiently, flexibly it is and expansible strong, while ensure equipment safety, stably, efficiently run.
The invention provides a kind of big data analysis platform device calculated based on algorithm configuration and distributed stream, including according to Data source acquisition device, Data Integration unit, time series data memory, computing unit, wireless terminal and the data clothes of secondary connection Business terminal, wherein data, services terminal are also connected with time series data memory, data source acquisition device and wireless terminal respectively,
Data source acquisition device, for obtaining monitoring of equipment data and conventional device data, and send to Data Integration Unit;
Data Integration unit, for being set with receiving from the monitoring of equipment data that data source acquisition device is sent and conventional Standby data, send result to time series data memory after pretreatment;
Time series data memory, for storing pretreated data and configuration data, and by visiting frequency is high, performance It is required that the internal memory in time series data memory is cached in high data set;
Computing unit, the data of time series data memory storage are called and receive for driving scheduling engine, and according to The data called and received are handled according to programmed processing logic in advance, training forms data mining model, and it is fallen into a trap Calculating unit includes more sub- computing units, and more sub- computing units calculate single according to practical business demand dynamic configuration per height Member is individually present, and can independently be developed according to industry specialists experience, using distributed streaming computing engine to calling and receiving Data calculated after export in real time, pass back to time series data memory and/or data, services end;
Data, services end, including data, services end processor, interface unit and display device, wherein data, services end are handled Device is used to directly read data from time series data memory and/or receives the data after computing unit processing, and carries out Analyzing and processing, the result after processing is shown by display device, while is sent out the result after processing by interface unit Give wireless terminal;
Wireless terminal, the result after the processing sent for reception from data, services end, and control can be transmitted wirelessly To data, services end, data, services termination receives control data source acquisition device after control command for system order, and adjustment data source obtains Take the data acquiring frequency of device.
Further, data source acquisition device includes monitoring sensor, Monitoring Data memory and device data memory, Wherein Monitoring Data memory is connected with monitoring sensor and data integral unit respectively, and Data Integration unit is also connected with number of devices According to memory:
Sensor is monitored, for obtaining monitoring of equipment data, and the monitoring of equipment data that will be collected in real time or quasi real time It is transferred to Monitoring Data memory;
Monitoring Data memory, for by monitoring of equipment data storage, and in a manner of streaming exports, by monitoring of equipment Data output is to Data Integration unit;
Device data memory, for storing conventional device data, and in a manner of batch signatures, by setting for routine For data output to Data Integration unit;
Further, the communications protocol of distributed streaming computing engine all uses unified standard agreement.
Further, the standard agreement is message transmission protocol MQTT.
Further, in addition to Data Integration unit the manual input device being connected, for implementing because of safety requirements Quarantine measures or Input Monitor Connector device data in the case of do not support data access.
Further, configuration data for description monitoring of equipment data and/or routine device data business implication data, Storage organization data and/or processing logical data.
Further, the data that the visiting frequency is high, performance requirement is high refer to recent Monitoring Data, conventional equipment Data, and history achievement data, model metadata and the preprocessing rule data that concern rate is higher.
Further, the result after the processing of data, services end is fault pre-alarming result and/or load prediction results.
Further, the wireless terminal is notebook computer, tablet personal computer and/or mobile phone.
Present invention also offers a kind of the big of big data analysis platform device calculated based on algorithm configuration and distributed stream Data analysing method, in turn include the following steps:
(1) initialize, the initial parameter at data, services end is set, according to the initial parameter control monitoring sensor set Sampling period be 6 times per second, the sampling time is 1 minute, and the data sampled in 1 minute are averaged A;
(2) under the conditions of same initial parameter, repeat step (1) 3 time, 3 times average value B, C, D is tried to achieve respectively;
(3) average P again after average value A, B, C, D summing:
A. ifIt is stable then to monitor sensor performance, into step (4);
B. ifIt is unstable then to monitor sensor performance, then into step (1);
(4) monitoring of equipment data are obtained in real time or quasi real time, and the monitoring of equipment data collected are transferred to monitoring number According to being stored after memory, in a manner of streaming exports, by monitoring of equipment data output to Data Integration unit;
(5) in a manner of accessing in batches, obtained automatically by predefined operation plan normal in device data memory The device data of rule, monitoring of equipment data and conventional device data are carried out with preprocessing rule the cleanings of data, filtering, turn The pretreatment changed, and pretreated data output to time series data memory is stored;
(6) by the higher history achievement data of recent Monitoring Data, conventional device data, and concern rate, model The internal memory being cached in metadata and preprocessing rule data set in time series data memory;
(7) scheduling engine is driven to call and receive the data of time series data memory storage by computing unit, and The data called and received are handled according to programmed processing logic in advance, training forms data mining model, wherein Computing unit includes more sub- computing units, and more sub- computing units calculate according to practical business demand dynamic configuration per height Unit is individually present, and can independently be developed according to industry specialists experience, using distributed streaming computing engine to calling and connecing The data of receipts export in real time after being calculated, and pass back to time series data memory and/or data, services end;
(8) data are directly read from time series data memory and/or receives the data after computing unit processing, and Analyzed and processed, the result after processing is shown by display device, while by interface unit by the knot after processing Fruit is sent to wireless terminal;
(9) result after the processing sent from data, services end by wireless terminal reception, according to the result after processing Decide whether to send control command to data, services end, data, services termination receives control monitoring sensor after control command, adjusts The data acquiring frequency of whole monitoring sensor, wherein the result after processing is fault pre-alarming result and/or load prediction results, root Decide whether that sending control command to data, services end meets according to the result after processing:
A. when fault pre-alarming result and/or load prediction results are normal, the data acquisition frequency of monitoring sensor is reduced Rate;
B. when fault pre-alarming result and/or load prediction results are abnormal, the data acquisition of monitoring sensor is improved Frequency, and repeat step (1)-(9), while alarm is sent at data, services end, it is real in the display device at data, services end When show fault pre-alarming result and/or load prediction results, and notify maintenance personal.
The big data analytical equipment and method of the present invention, it is possible to achieve:
1) using stable, reliable, efficient increase income distributed memory system and parallel computation service as core, for weight assets Industry time series data stores and requirements for access orientation encapsulation, and for equipment fault early-warning etc., analysis classes business scenario provides in real time Reliable and stable bottom data support;
2) in real time and punctual gathered data, ageing height, and optimization design data acquiring frequency, collecting efficiency Height, efficiency is low but efficiency is high, and apparatus function is powerful, and monitoring and maintenance personal can be caused to be set in remote control and monitoring Standby state, processing immediately, processing is ageing higher, and because shortening processing time so that reducing equipment loss, saves Cost;
3) reliability of system data is directed to, average data is devised and confirms scheme so that Monitoring Data is more stable Reliably, and by adjusting monitoring frequency according to the real-time status of equipment, the live load of device is alleviated, service life is more Long, performance is more stable;
4) with distributed streaming computing engine implementation to the real time propelling movement based on mass data calculating task, meter in real time Calculate and export.
Brief description of the drawings
Fig. 1 big data analysis platform apparatus structure schematic diagrams
Fig. 2 algorithm configurations and distributed stream computational methods flow chart
Embodiment
The following detailed description of the specific implementation of the present invention, it is necessary to it is pointed out here that, implement to be only intended to this hair below Bright further explanation, it is impossible to be interpreted as limiting the scope of the invention, art skilled person is according to above-mentioned Some nonessential modifications and adaptations that the content of the invention is made to the present invention, still fall within protection scope of the present invention.
The invention provides a kind of big data analytical equipment based on distributed time series data service, as shown in Figure 1, bag Include the data source acquisition device that monitoring sensor 1, Monitoring Data memory 2, device data memory 3 form, in addition to data Integral unit 4, time series data memory 5, computing unit 6, wireless terminal 8 and data service terminal 7, wherein Monitoring Data store Device 2 is also respectively connected with the He of device data memory 3 with monitoring sensor 1 and data integral unit 4, Data Integration unit 4 respectively Time series data memory 5, Data Integration unit 4, time series data memory 5, computing unit 6 and data service terminal 7 connect successively Connect, data, services terminal 7 is also connected with time series data memory 5, monitoring sensor 1 and wireless terminal 8 respectively;
Sensor is monitored, for obtaining monitoring of equipment data, and the monitoring of equipment data that will be collected in real time or quasi real time Monitoring Data memory is transferred to, monitoring sensor is the information acquisition sensor being installed in monitoring device, can also be prison Sensor, the Monitoring Data memories such as the camera of measurement equipment installation region, temperature detector can be in real time by monitoring of equipments Data storage, and in a manner of streaming exports, by monitoring of equipment data output to Data Integration unit.
Device data memory is used to store conventional device data, and in a manner of batch signatures, by setting for routine Standby data output, wherein conventional device data comes from system configuration management, is mainly used in description business to Data Integration list Business implication, storage organization and the processing logic of data, are typically produced in the system configuration stage.
Data Integration unit, for being accessed by streaming in a manner of receive from Monitoring Data memory send monitoring of equipment Data and in a manner of accessing in batches, conventional set by what predefined operation plan obtained in device data memory automatically Standby data, the diversified forms such as Data Integration unit accesses the data collected by batch, streaming accesses, manually imports obtain Take, collection point can also be directly connected to and obtain Monitoring Data.The data of access before storing, can carry out necessary pre- place Reason, the operation such as cleaned, filtered, change using pre-configured preprocessing rule, data are by Data Integration or directly Storage is connect into time series data memory, the high data of some access frequencys is waited for some rules, is typically stored at caching number In, for some history service data, the little data of access frequency, after Data Integration, business number is typically stored at In;The data such as some data predictions rule defined for system, computation rule, model data, are typically stored at configuration In data;Data access service directly reads data by data access interface.Either business datum or configuration data, its There is larger difference in visiting frequency, performance requirement, in specific business scenario for the number that visiting frequency is high, performance requirement is high According to system, which is concentrated, to be cached in Installed System Memory, and these buffered business datums and configuration data are referred to as caching number According to.In general, the number such as the higher history index of recent business datum, concern rate, model metadata, data prediction rule It is higher according to visiting frequency, it may be regarded as data cached.Data storage can provide basic guarantee for data query service, can be Line computation service and off-line analysis service provide input, while also support the write-back of corresponding result of calculation.Time series data memory 5 In the database that is related to mainly be distributed formula file system HDFS (Hadoop Distributed File System), column number According to storehouse HBase (HadoopDatabase), memory database Redis, relational database Oracle etc..Oracle database is main For storage configuration data and partial service data, distributed file system units of the HDFS as big data platform bottom, Support is provided for the HBASE on upper strata, non-sequential part that can also be directly in storage service data, HBASE is one highly reliable Property, high-performance, towards row, telescopic distributed memory system, the when preamble section being mainly used in storage service data, Redis is a key-value storage system based on internal memory, is mainly used in depositing herein data cached.
Computing unit can utilize the managerial experiences combination Principle of Statistics of industry specialists to research and analyse mining algorithm, with electricity The related historical data of power equipment operation is input, and training forms data mining model, and different sample datas can form difference Data mining model (example:Distinguished by season), it can repeat training for time series data new caused by equipment operation Process, carry out the sustained improvement of data mining model;Create the data mining model completed to may participate in line computation, using real-time Or mode quasi real time analyzes every evaluation index of power equipment.The training process of mining model has been related to batch and calculated, Operation is calculated by batch to realize;The application process of mining model has been related to streaming computing, is realized by streaming computing operation; In addition, two kinds of computation schemas can be also used for realizing appraisal of equipment index, voice semantics recognition, text semantic analysis etc. and digging Dig the unrelated calculating task of model.
Computing unit calls and received the data of time series data memory storage, and foundation by driving scheduling engine Programmed processing logic is handled the data called and received in advance, and training forms data mining model, wherein calculating Unit includes more sub- computing units, and more sub- computing units are according to practical business demand dynamic configuration, every sub- computing unit It is individually present, can be independently developed according to industry specialists experience, using distributed streaming computing engine to calling and receiving Data export in real time after being calculated, and pass back to time series data memory and/or data, services end.Computing unit is algorithm model Part, computing unit by computing unit designer flexible design configure, each computing unit can be according to industry specialists Experience independently develops, and computing unit is self-existent, and some computing unit can carry out edition upgrading, constantly be drilled Change, improve the calculating accuracy of computing unit.Big data platform need to support thousands of individual computation models it is extensive in real time, Quasi real time streaming computing, stream engine must efficiently, flexible, scalability, million ranks can be supported even by being easily accessed future The calculating of ten million rank measuring point, so using Spark as stream calculation engine, the communications protocol of all incoming stream computing engines Unified standard is all used, is unified for MQTT (message transmission protocol) agreement at present.Spark provides a stack solution, A variety of mixing such as Batch, Streaming, Graph, Sql are supported to calculate.Make for the real-time result of calculation based on messenger service With kafka technologies, the purpose using kafka is come on unified line and at offline message by Hadoop loaded in parallel mechanism Reason, also for providing real-time consumption by cluster machine;
It is different by encapsulating PI, EDNA, Inova, shield in heptan etc. for time series data and the column distributed storage of result of calculation The real-time data base of structure type uses data connector, using big towards the distributed storage of row family in Hadoop platform Data HBASE;
For the cluster cache of results of intermediate calculations, Redis is used;The mass data provided in big data platform all needs To be rapidly completed in the short time, it is very high for the throughput demands of system, by distributed memory database mechanism to access Data, which carry out caching, can greatly improve efficiency.
Redis is a key-value storage system, similar with Memcached, and it supports the value type phases of storage To more, including string (character string), list (chained list), set (set), zset (sorted set-- ordered sets) and Hash (Hash type).These data types all support push/pop, add/remove and take common factor union and difference set and Geng Feng Rich operation, and these operations are all atomicities.On this basis, redis supports the sequence of various different modes.With Memcached is the same, and for guaranteed efficiency, data are buffered in internal memory.In addition, Redis supports cluster mode, can be right Data on clustered node carry out burst, mirror image, greatly improve the reliability and scalability of distributed memory.
Different external data sources, corresponding different Integration Mode:Sqoop supports the data in relation storehouse to imported into big number According in platform;Spark Streaming support stream data to import;Common Spark Job can provide the data solution of batch Analysis and processing.ESB ESB provides the integrated and management of service.
Batch calculates operation and driven by scheduling engine, business historical data is read in from data storage areas, according in advance Programmed processing logic is calculated, and result of calculation can be written back to data storage area, can also pass through off-line analysis service Directly externally provide;Streaming computing operation is also to be driven by scheduling engine, and data access in a streaming manner from data storage Come in, calculated according to programmed processing logic in advance, result of calculation can be written back to data storage area, can also pass through Directly externally provided in line computation service.
(also referred to as jobs node) topological structure and execution logic that operation is used to define calculating task are calculated, similar to work Flow (Workflow), its definition procedure can be completed in the job design device that system provides, in terms of the visual angle of computing engines, each Jobs node corresponds to a computing unit (Compute Unit), and programmed logic corresponding to computing unit is referred to as operator (Transformation).System provides visual modeling tool, preset abundant data processing and data display operator, simultaneously Open operator development specifications, supports the secondary development of practical business scene.
Data, services end can realize all kinds of work(of big data analysis platform device to algorithm configuration and distributed stream calculating Can encapsulation, it is possible to achieve data access service, in line computation service and off-line analysis service.Data access service is directly from number According to memory block read data, its interaction be not related to data calculating, can be further subdivided into configuration information access service, Interactive inquiry service, typical case scene are mainly comprehensive inquiry, visual presentation etc..The common time is serviced in line computation Span between the several seconds, high concurrent and needs quick response analysis result in hundreds of milliseconds, and typical case scene includes failure Early warning, load prediction etc..The time span of off-line analysis service, between a few hours, is mainly used in multidimensional statistics in dozens of minutes Prediction, quasi real time analyze and cluster, data mining application, the typical case scene such as classification include Fault Pattern Recognition, stably Performance analysis etc..The interactive mode of data, services services generally use synchronous mode including synchronous, asynchronous two kinds, in line computation, Off-line analysis services generally use asynchronous mode, and asynchronous mode can introduce messenger service middleware adapter calculating state and calculate and tie The transmission of fruit information.
Wireless terminal can be notebook computer, tablet personal computer and/or mobile phone, and monitoring personnel or maintenance work personnel can With by wireless terminal it is long-range and data service end realize interaction, can active real-time query or passive receive data clothes The monitoring result of business end push, and the long-range manipulation data, services end of wireless terminal and other portions of device can also be passed through Point, long-range manipulation in real time, monitoring are realized, further for there is abnormal situation, can also timely be handled.
The present invention also provides a kind of big number of the big data analysis platform device calculated based on algorithm configuration and distributed stream According to analysis method, in turn include the following steps:
(1) initialize, the initial parameter at data, services end is set, according to the initial parameter control monitoring sensor set Sampling period be 6 times per second, the sampling time is 1 minute, and the data sampled in 1 minute are averaged A;
(2) under the conditions of same initial parameter, repeat step (1) 3 time, 3 times average value B, C, D is tried to achieve respectively;
(3) average P again after average value A, B, C, D summing:
A. ifIt is stable then to monitor sensor performance, into step (4);
B. ifIt is unstable then to monitor sensor performance, then into step (1);
(4) monitoring of equipment data are obtained in real time or quasi real time, and the monitoring of equipment data collected are transferred to monitoring number According to being stored after memory, in a manner of streaming exports, by monitoring of equipment data output to Data Integration unit;
(5) in a manner of accessing in batches, obtained automatically by predefined operation plan normal in device data memory The device data of rule, monitoring of equipment data and conventional device data are carried out with preprocessing rule the cleanings of data, filtering, turn The pretreatment changed, and pretreated data output to time series data memory is stored;
(6) by the higher history achievement data of recent Monitoring Data, conventional device data, and concern rate, model The internal memory being cached in metadata and preprocessing rule data set in time series data memory;
(7) scheduling engine is driven to call and receive the data of time series data memory storage by computing unit, and The data called and received are handled according to programmed processing logic in advance, training forms data mining model, wherein Computing unit includes more sub- computing units, and more sub- computing units calculate according to practical business demand dynamic configuration per height Unit is individually present, and can independently be developed according to industry specialists experience, using distributed streaming computing engine to calling and connecing The data of receipts export in real time after being calculated, and pass back to time series data memory and/or data, services end;
(8) data are directly read from time series data memory and/or receives the data after computing unit processing, and Analyzed and processed, the result after processing is shown by display device, while by interface unit by the knot after processing Fruit is sent to wireless terminal;
(9) result after the processing sent from data, services end by wireless terminal reception, according to the result after processing Decide whether to send control command to data, services end, data, services termination receives control monitoring sensor after control command, adjusts The data acquiring frequency of whole monitoring sensor, wherein the result after processing is fault pre-alarming result and/or load prediction results, root Decide whether that sending control command to data, services end meets according to the result after processing:
A. when fault pre-alarming result and/or load prediction results are normal, the data acquisition frequency of monitoring sensor is reduced Rate;
B. when fault pre-alarming result and/or load prediction results are abnormal, the data acquisition of monitoring sensor is improved Frequency, and repeat step (1)-(9), while alarm is sent at data, services end, it is real in the display device at data, services end When show fault pre-alarming result and/or load prediction results, and notify maintenance personal.
The big data analysis platform device and method that the algorithm configuration and distributed stream of the present invention calculates be by software and The cooperation of hardware unit is completed, but be not limited to that this, under certain condition, can also be real completely by way of hardware It is existing.
Although for illustrative purposes, it has been described that illustrative embodiments of the invention, those skilled in the art Member it will be understood that, can be in form and details in the case of the scope and spirit for not departing from invention disclosed in appended claims The upper change for carrying out various modifications, addition and replacement etc., and all these changes should all belong to appended claims of the present invention Protection domain, and each step in each department of claimed product and method, can be in any combination Form is combined.Therefore, to disclosed in this invention embodiment description be not intended to limit the scope of the present invention, But for describing the present invention.Correspondingly, the scope of the present invention is not limited by embodiment of above, but by claim or Its equivalent is defined.

Claims (1)

1. a kind of big data analysis method using the big data analysis platform device calculated based on algorithm configuration and distributed stream, Characterized in that, in turn include the following steps:
(1) initialize, the initial parameter at data, services end is set, according to adopting for the initial parameter control monitoring sensor set The sample cycle is 6 times per second, and the sampling time is 1 minute, is averaged A with the data sampled in 1 minute;
(2) under the conditions of same initial parameter, repeat step (1) 3 time, 3 times average value B, C, D is tried to achieve respectively;
(3) average P again after average value A, B, C, D summing:
A. ifIt is stable then to monitor sensor performance, into step (4);
B. ifIt is unstable then to monitor sensor performance, then into step (1);
(4) monitoring of equipment data are obtained in real time or quasi real time, and the monitoring of equipment data collected are transferred to Monitoring Data and deposited Stored after reservoir, in a manner of streaming exports, by monitoring of equipment data output to Data Integration unit;
(5) in a manner of accessing in batches, obtained automatically by predefined operation plan conventional in device data memory Device data, monitoring of equipment data and conventional device data are carried out to the cleanings of data, filtering, conversion with preprocessing rule Pretreatment, and pretreated data output to time series data memory is stored;
(6) by the higher history achievement data of recent monitoring of equipment data, conventional device data, and concern rate, model The internal memory being cached in metadata and preprocessing rule data set in time series data memory;
(7) scheduling engine is driven to call and receive the data of time series data memory storage, and foundation by computing unit Programmed processing logic is handled the data called and received in advance, and training forms data mining model, wherein calculating Unit includes more sub- computing units, and more sub- computing units are according to practical business demand dynamic configuration, every sub- computing unit It is individually present, can be independently developed according to industry specialists experience, using distributed streaming computing engine to calling and receiving Data export in real time after being calculated, and pass back to time series data memory and/or data, services end;
(8) data are directly read from time series data memory and/or receives the data after computing unit processing, and carried out Analyzing and processing, the result after processing is shown by display device, while is sent out the result after processing by interface unit Give wireless terminal;
(9) result after the processing sent from data, services end by wireless terminal reception, determined according to the result after processing Whether control command is sent to data, services end, and data, services termination receives control monitoring sensor after control command, adjustment prison The data acquiring frequency of sensor is surveyed, wherein the result after processing is fault pre-alarming result and/or load prediction results, according to place Result after reason decides whether that sending control command to data, services end meets:
A. when fault pre-alarming result and/or load prediction results are normal, the data acquiring frequency of monitoring sensor is reduced;
B. when fault pre-alarming result and/or load prediction results are abnormal, the data acquiring frequency of monitoring sensor is improved, And repeat step (1)-(9), while alarm is sent at data, services end, the real-time display in the display device at data, services end Fault pre-alarming result and/or load prediction results, and notify maintenance personal.
CN201510962436.3A 2015-12-17 2015-12-17 A kind of big data analysis platform device and method calculated based on algorithm configuration and distributed stream Active CN105608758B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510962436.3A CN105608758B (en) 2015-12-17 2015-12-17 A kind of big data analysis platform device and method calculated based on algorithm configuration and distributed stream

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510962436.3A CN105608758B (en) 2015-12-17 2015-12-17 A kind of big data analysis platform device and method calculated based on algorithm configuration and distributed stream

Publications (2)

Publication Number Publication Date
CN105608758A CN105608758A (en) 2016-05-25
CN105608758B true CN105608758B (en) 2018-03-27

Family

ID=55988668

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510962436.3A Active CN105608758B (en) 2015-12-17 2015-12-17 A kind of big data analysis platform device and method calculated based on algorithm configuration and distributed stream

Country Status (1)

Country Link
CN (1) CN105608758B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9037698B1 (en) * 2006-03-14 2015-05-19 Amazon Technologies, Inc. Method and system for collecting and analyzing time-series data
CN106371366A (en) * 2016-09-22 2017-02-01 南京中新赛克科技有限责任公司 ARM architecture-based big data acquisition and analysis platform
CN106527384B (en) * 2016-12-19 2019-03-05 华南理工大学 A kind of production regulation method based on cloud platform complementary handover strategies
CN106777243A (en) * 2016-12-27 2017-05-31 浪潮软件集团有限公司 A kind of dynamic modeling of stream data analysis
CN107016231A (en) * 2017-02-21 2017-08-04 广州七乐康药业连锁有限公司 It is a kind of that the method and system that medical data is calculated are realized based on cloud platform
CN107145467A (en) * 2017-05-13 2017-09-08 贾宏博 A kind of distributed computing hardware system in real time
CN107451663B (en) * 2017-07-06 2021-04-20 创新先进技术有限公司 Algorithm componentization, modeling method and device based on algorithm components and electronic equipment
CN111079942A (en) * 2017-08-30 2020-04-28 第四范式(北京)技术有限公司 Distributed system for performing machine learning and method thereof
CN108984279A (en) * 2018-07-02 2018-12-11 山东汇贸电子口岸有限公司 A kind of streaming computing method of internet of things oriented tradition SQL developer
CN109003459B (en) * 2018-07-17 2020-08-11 泉州装备制造研究所 Regional traffic signal control method and system based on hierarchical flow calculation
CN109862094A (en) * 2019-01-31 2019-06-07 福建智恒软件科技有限公司 A kind of water utilities device data sharing method and device based on stream calculation
CN110162515A (en) * 2019-04-30 2019-08-23 中国科学院深圳先进技术研究院 A kind of uncoupled elastic data warehouse schema
CN110377653B (en) * 2019-07-15 2021-05-07 武汉中地数码科技有限公司 Real-time big data calculation and storage method and system
CN111352872A (en) * 2020-02-20 2020-06-30 北京字节跳动网络技术有限公司 Execution engine, data processing method, apparatus, electronic device, and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102685221A (en) * 2012-04-29 2012-09-19 华北电力大学(保定) Distributed storage and parallel mining method for state monitoring data
CN103761309A (en) * 2014-01-23 2014-04-30 中国移动(深圳)有限公司 Operation data processing method and system
CN105069703A (en) * 2015-08-10 2015-11-18 国家电网公司 Mass data management method of power grid

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8352495B2 (en) * 2009-12-15 2013-01-08 Chalklabs, Llc Distributed platform for network analysis

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102685221A (en) * 2012-04-29 2012-09-19 华北电力大学(保定) Distributed storage and parallel mining method for state monitoring data
CN103761309A (en) * 2014-01-23 2014-04-30 中国移动(深圳)有限公司 Operation data processing method and system
CN105069703A (en) * 2015-08-10 2015-11-18 国家电网公司 Mass data management method of power grid

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
智能配用电大数据需求分析与应用研究;王继业等;《中国电机工程学报》;20150420;第1829-1836页 *
电力用户侧大数据分析与并行负荷预测;王德文等;《中国电机工程学报》;20150205;第527-537页 *

Also Published As

Publication number Publication date
CN105608758A (en) 2016-05-25

Similar Documents

Publication Publication Date Title
CN105608758B (en) A kind of big data analysis platform device and method calculated based on algorithm configuration and distributed stream
CN105427193B (en) A kind of big data analysis device and method based on distributed time series data service
CN105608144B (en) A kind of big data analysis stage apparatus and method based on multilayered model iteration
CN104769582B (en) For the real time data releasing of power grid
CN107943668B (en) Computer server cluster log monitoring method and monitor supervision platform
CN106294644B (en) A kind of magnanimity time series data collection and treatment device and method based on big data technology
CN104903894A (en) System and method for distributed database query engines
EP2932406B1 (en) System and method for storage, querying, and analysis service for time series data
CN103761309A (en) Operation data processing method and system
CN106202566A (en) A kind of magnanimity electricity consumption data mixing based on big data storage system and method
CN104881352A (en) System resource monitoring device based on mobile terminal
KR20150112357A (en) Sensor data processing system and method thereof
CN104376365A (en) Method for constructing information system running rule libraries on basis of association rule mining
CN106599197B (en) Data acquisition exchange engine
CN110719210B (en) Industrial equipment predictive maintenance method based on cloud edge cooperation
CN106874482A (en) A kind of device and method of the patterned data prediction based on big data technology
US20150213035A1 (en) Search Engine System and Method for a Utility Interface Platform
CN110460656A (en) A kind of industry environmental protection Internet of Things remotely monitors cloud platform
US20200089182A1 (en) Distributed embedded data and knowledge management system integrated with plc historian
CN110430260A (en) A kind of robot cloud platform and working method based on big data cloud computing support
CN103678522A (en) Method for acquiring and converting data of metering system of intelligent transformer substation
CN111405032A (en) General cloud platform of industry thing networking
CN109670199A (en) A kind of efficient power network topology analysis method and device
CN109507924B (en) Remote monitoring system for oil field operation equipment
CN105306279A (en) Data performance analysis platform and method of enterprise-level application system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant