CN106874482A - A kind of device and method of the patterned data prediction based on big data technology - Google Patents
A kind of device and method of the patterned data prediction based on big data technology Download PDFInfo
- Publication number
- CN106874482A CN106874482A CN201710090025.9A CN201710090025A CN106874482A CN 106874482 A CN106874482 A CN 106874482A CN 201710090025 A CN201710090025 A CN 201710090025A CN 106874482 A CN106874482 A CN 106874482A
- Authority
- CN
- China
- Prior art keywords
- data
- unit
- monitoring
- distributed memory
- computing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
Abstract
A kind of device and method of the patterned data prediction based on big data technology, including data acquisition device, monitoring of equipment device, distributed memory, spark internal memory computing engines, computing unit, ETL processing units, data pre-processing unit, wherein data acquisition device is connected with monitoring of equipment device, monitoring of equipment device connects distributed memory, distributed memory connects data pre-processing unit, data pre-processing unit includes spark internal memory computing engines, computing unit, ETL processing units, can be quick, efficiently, magnanimity isomeric data is processed in time, ensure equipment safety simultaneously, stabilization, efficiently run.
Description
Technical field
The present invention relates to monitoring of equipment analysis application field, and in particular to a kind of based on the patterned of big data technology
The device and method of data prediction.
Background technology
With the fast development of intelligent grid, power system has begun to march toward energy internet and " big data " epoch,
The a large amount of service datas of power industry are increasingly presented that the scale of construction is big, type is more, are worth high feature, Data Analysis Services ability fall behind with
Contradiction between data rapid growth will be protruded more;As data volume, data type are on the increase, also there is data analysis
Performance bottleneck, lack data analysis excavation sophisticated method, unstructured data still lack effectively utilize the problems such as, this restrict
Electric power trade information is from digitlization to intelligentized development.The big data key technology of energy Internet era is adopted including data
The many aspects such as collection, transmission, storage, quality management, fusion be shared and depth is excavated.
History service data collection and analysis, the instant analysis of real-time or near-realtime data are power industries in power industry
Important content during middle informatization, the big data that it needs complete set, stablizes, agrees with practical business scene
The solution of analytical equipment, to equipment fault early-warning etc., analysis classes business scenario provides reliable and stable bottom data branch in real time
Support.
In recent years, with the IT technology fast developments such as cloud computing, big data, machine learning, data mining, distribution is deposited
Storage, high-performance calculation obtain key breakthrough in theoretical research and engineering practice aspect, industry emerged it is a collection of with
Hadoop is big data treatment and the application solution of representative.
Hadoop is an extensible framework, and reliable distributed treatment, the framework of Hadoop can be carried out to big data
Most crucial design includes HDFS and MapReduce.HDFS provides storage for the data of magnanimity, then MapReduce is magnanimity
Data provide calculate.HDFS is a distributed file system, with low cost, high reliability, high-throughput spy
Point.MapReduce is one and becomes model and software frame, and it can greatly simplify the processing procedure of large-scale data.
Spark is a kind of distributed big data handling implement, and itself does not provide data storage function, and it may operate in Hadoop's
On HDFS or other distributed file systems, the design original intention of Spark is exactly anti-in order to solve Hadoop MapReduce
So as to the problem of inefficiency, it supports data to re-reading written document system by building elasticity distribution formula data set (RDD) structure
Memory resident, realizes In-memory MapReduce frameworks, and the deficiency of MapReduce is made up under application-specific scene.
Hadoop, Spark etc. general open source technology component has some limitations in terms of functional completeness, operation stability,
And be based on some commercial big data platforms derived from Hadoop and there is deviation with the actual demand of power business scene, thus,
Depth analysis study power industry business demand, the integration of heterogeneous data source, it is integrated be that IT application in enterprise process is frequent
The realistic problem for running into, with the increase of sharply increasing for data volume, particularly unstructured data, traditional data bins
The performance of storehouse technology and data extraction tool in terms of data pre-processing unit have too many difficulties to cope with, it is impossible to meet magnanimity isomeric data and
The data processing performance requirement of mixed and disorderly low quality data, builds a kind of patterned data prediction list based on big data technology
Element apparatus and method, have far-reaching significance and stronger value.
Intelligent grid big data complex structure, species are various, in addition to traditional structural data, also comprising substantial amounts of half hitch
Structure, unstructured data, such as speech data of the system of Customer Service Center 95598, the video in equipment on-line monitoring system
Data and view data etc..The sample frequency of these data is also variant with life cycle, from Microsecond grade, minute level, hour level,
Until annual level.Current grid company magnanimity, various data resource provide good condition for the profound analysis of data, such as
The performance of what lifting data processing, abundant mining data are worth and realize that data assets management makes data be provided as enterprise key
Produce, as current problem to be solved.
In consideration of it, needing a kind of unified presentation that can realize massive multi-source electric power big data, flexibly collection, concentration badly
Storage, effectively assessment, quick treatment and the shared solution of safety, multi-source heterogeneous big data of the research based on metadata are managed
System is extremely urgent.
The distributed computation ability reply isomeric data integration problem of big data, is built based on Spark internal memories computing engines
ETL processing units, data pick-up, data conversion, data load logic are converted into the meter for supporting parameter configuration and dynamic combined
Unit is calculated, coordinates patterned flow configuration instrument to realize the flexible customization of data pre-processing unit process, can not only solved
The performance issue of isomeric data pretreatment unit, can also effectively improve data pre-processing unit program reusing degree and flexibly
Degree.
But a kind of device and method of the patterned data pre-processing unit based on big data technology is built, solve to pass
The performance issue of the magnanimity isomeric data integration that system ETL instruments cannot be dealt carefully with, improves answering for data pre-processing unit program
Expenditure, flexibility ratio and execution efficiency.
The content of the invention
It is an object of the invention to overcome the deficiencies in the prior art, there is provided a kind of patterned number based on big data technology
Data preprocess device and method, can quickly, efficiently, magnanimity isomeric data is processed in time, while ensureing equipment safety, steady
Determine, efficiently run.
The invention provides a kind of device of the patterned data pre-processing unit based on big data technology, including data
Harvester, monitoring of equipment device, distributed memory, spark internal memories computing engines, computing unit, ETL processing units, number
Data preprocess unit, wherein data acquisition device are connected with monitoring of equipment device, monitoring of equipment device connection distributed memory,
Distributed memory connect data pre-processing unit, data pre-processing unit include spark internal memories computing engines, computing unit,
ETL processing units;
Data acquisition device, for obtaining facility information isomeric data, and the equipment letter that will be collected in real time or quasi real time
Breath isomeric data is transferred to monitoring of equipment device;
Monitoring of equipment device, is stored for being collected facility information isomeric data, and being pushed to distributed memory,
And in the way of data are flowed into, by monitoring of equipment device data output to data pre-processing unit process;
Distributed memory, also known as time series data memory, for pre- by equipment real time mass isomeric data and data
The storage of the device data after processing unit.
Spark internal memory computing engines, for being calculated data by calling computing unit logic rules, and will calculate
Data output afterwards is to distributed memory;
Computing unit, calls and receives the data of distributed memory storage, foundation for driving scheduling rule engine
Programmed treatment logic is processed the data called and receive in advance, and training forms data mining model;
Computing unit includes many sub- computing units, and many sub- computing units are graphically dynamically matched somebody with somebody according to practical business demand
Put, dynamic layout forms operation;Every sub- computing unit is individually present, and can independently extend evolution according to industry specialists experience,
Exported in real time after being calculated the data called and receive using distributed streaming computing engines, and output data to point
Cloth data storage;
ETL processing units, for forming operation based on computing unit dynamic layout, based on Spark internal memory computing engines structures
Build, data pick-up, data conversion, data load logic are converted into the graphical parameter configuration of support and dynamic combined;
Data pre-processing unit, for by facility information isomeric data, according to ETL processing units carry out data extraction,
Conversion, the pretreatment of loading, while standard data format can be carried out, abnormal data is removed, error correcting, repeated data
Remove;And the data in multiple data sources are combined into unified storage;By smoothing aggregation, Data generalization and/or standardization
Mode converts the data into the process of the data mode suitable for data mining.
Preferably, data acquisition device is the data sampling sensor being installed in monitoring device;
Preferably, the data acquisition device is the infrared detector or temperature detector of monitoring device installation region
Preferably, also including the manual input device being connected with monitoring of equipment device, for being implemented because of safety requirements
Quarantine measures or Input Monitor Connector device data in the case of do not support data access.
Preferably, monitoring of equipment device is pushed away during the data pre-processing unit is additionally operable to call and receive distributed memory
The new time series data of generation is sent, and training process is repeated to new time series data, data mining model is updated.
Preferably, the manual input device is notebook computer, panel computer or mobile phone.
Preferably, the related computing unit of data pre-processing unit includes but invalid value filter element, missing values supplement are single
One or many in unit, the additional unit of data column selection unit, Data Row Transformation unit, data row and data acquisition system and unit
It is individual, it is mutually combined according to specific business, and extension is supported, specifically:
Invalid value filter element:Freely configuring for combination condition judgment rule is realized using regulation engine, by invalid note
Record is removed, and is retained satisfactory data and is entered next processing links;
Missing values supplementary units:Freely configuring for missing values calculating logic is realized using function is calculated, is specifically being calculated
Logic can be calculated with self-defined missing value complement in operation, the data for completing to mend calculation operation enter next processing links;
Data column selection unit:Legacy data collection includes n field, and m field of unrestricted choice enters next treatment ring
Section, wherein (m<=n);
Data Row Transformation unit:Change the title or data form of some row of legacy data collection, the data for completing conversion are entered
Enter next processing links;
The additional unit of data row:Legacy data collection includes n field, freely adds m field, the name of new field,
Data type, data value can be self-defined, complete the additional data of row and enter next processing links;
Data set combining unit:The aggregation node of many data sets, supports SQL statement inquiry, and result data collection enters next
Individual processing links.
Present invention also offers a kind of processing method of the device of the patterned data prediction based on big data technology,
Comprise the following steps:
(1) initialize, the initial parameter of data acquisition device is set, according to the initial parameter control data collection for setting
The sampling period of device is per hour for 10 times, and the sampling time is 7 days, and the data sampled in 7 days are averaged A;
(2) under the conditions of same initial parameter, real-time data collection, using 4 data of every continuous acquisition as a group
[B C D E], 4 data are designated as B, C, D, E respectively, calculate difference scores M respectively using formula, wherein:
A ' is a value in B, C, D, E in formula;
(3) if difference scores are in threshold range, then it is assumed that secondary gathered data group is effective, by B, C, D, E averaging
M, makes the real-time measured value that P' is data acquisition device, then:
If A.Then data acquisition device stable performance, into step (4);
If B.Then data acquisition device unstable properties, then into step (1);
(4) facility information Monitoring Data, and the facility information Monitoring Data transmission that will be collected are obtained in real time or quasi real time
In to monitoring of equipment device, in the way of data-pushing, (account data and history are mainly included in being pushed to distributed memory
Data, magnanimity isomeric data), or in the way of streaming is exported, by monitoring of equipment data output to data pre-processing unit mistake
Journey;
(5) in the way of batch is accessed, the routine in distributed memory is obtained by predefined operation plan automatically
Account data and historical data, equipment magnanimity isomeric data is carried out the number of the extraction of data, conversion, loading with preprocessing rule
Data preprocess unit, and pretreated data output to distributed memory is stored;
(6) in the way of streaming is accessed, the equipment magnanimity in distributed memory is obtained by predefined system drive
Isomeric data, carries out extraction, conversion, the data pre-processing unit of loading of data under preprocessing rule, and by after pretreatment
Data output to distributed memory stored;
(7) scheduling rule engine is driven to call and receive distribution by computing engines during data pre-processing unit
The data of memory storage, and the data called and receive are processed according to programmed treatment logic in advance, train
Data mining model is formed, by by the data back after ETL processing unit processes to distributed memory.
The device and method of the patterned data prediction based on big data technology of the invention, it is possible to achieve:
1) with stabilization, it is reliable, efficient increase income distributed memory system and parallel computation service is core, by the pre- of data
Processing procedure, transfer to Distributed Calculation unit perform, can not only reduce data processing complexity, improve time series data access gulp down
The amount of telling;
2) ETL processing units are built based on Spark internal memories computing engines, data pick-up, data conversion, data loading is patrolled
The computing unit for being converted into and supporting parameter configuration and dynamic combined is collected, coordinates patterned flow configuration instrument to realize that data are located in advance
The flexible customization of unit process is managed, the performance issue of isomeric data pretreatment unit can be not only solved, can also be effectively improved
The reusing degree and flexibility ratio of data pre-processing unit program;
3) for the reliability of system data, devise average data and confirm scheme so that monitoring of equipment data are more
It is reliable and stable, the live load of device is alleviated, service life is longer, and performance is more stablized.
4) the data acquisition device performances evaluation mode of optimization so that data are more reliable.
Brief description of the drawings
Fig. 1 is based on the apparatus structure schematic diagram that the graphics data of big data is pre-processed
Specific embodiment
The following detailed description of specific implementation of the invention, it is necessary to it is pointed out here that, below implement to be only intended to this hair
Bright further illustrates, it is impossible to be interpreted as limiting the scope of the invention, and art skilled person is according to above-mentioned
Some nonessential modifications and adaptations that the content of the invention is made to the present invention, still fall within protection scope of the present invention.
The invention provides a kind of device and method of the patterned data pre-processing unit based on big data technology, such as
Shown in accompanying drawing 1, including data acquisition device 1, monitoring of equipment device 2, distributed memory 3, Spark internal memories computing engines 4, meter
Calculate unit 5, ETL processing units 6, data pre-processing unit 7, wherein monitoring of equipment device 2 respectively with data acquisition device 1 and divide
Cloth memory 3 is connected, and distributed storage device 3 is connected with data pre-processing unit 7, and data pre-processing unit 7 includes ETL treatment
Unit 6, ETL processing units 6 include spark internal memories computing engines 4 and computing unit 5;
Data acquisition device, for obtaining monitoring of equipment data, and the monitoring of equipment number that will be collected in real time or quasi real time
According to monitoring of equipment device is transferred to, data acquisition device is the information acquisition sensor being installed in monitoring device, can also be
The sensors such as camera, the temperature detector of monitoring device installation region, monitoring of equipment device can in real time by facility information
Monitoring Data store, and by push in the way of or by streaming export in the way of, by facility information Monitoring Data export to point
Cloth memory.
Monitoring of equipment device, for by facility information data acquisition, and in the way of pushing or streaming output side
Formula, by monitoring of equipment device data-pushing to distributed memory.
Distributed memory, also known as time series data memory, pushes or by data prediction for monitoring of equipment device
Account data after unit, historical data, achievement data, the storage of magnanimity isomeric data.
Spark internal memory computing engines, are drivers that data are calculated, by calling computing unit logic rules logarithm
According to being calculated, and by the data output after calculating to distributed memory.
Computing unit, also known as operator, calls and receives distributed memory storage for driving scheduling rule engine
The data called and receive can be processed by data according to programmed treatment logic in advance, and training forms data mining
Model.Data pick-up, data conversion, data load logic are converted into the computing unit for supporting parameter configuration and dynamic combined,
In patterned flow configuration instrument, according to practical business demand, marshal data pretreatment unit process, by pulling operator
Mode flexible configuration is carried out to data prediction unit process.Wherein computing unit includes many sub- computing units, many height
Computing unit forms operation according to the graphical dynamic configuration of practical business demand, dynamic layout.Every sub- computing unit is independently deposited
Evolution can independently extended according to industry specialists experience, using distributed streaming computing engines to the number that calls and receive
According to being exported in real time after being calculated, and output data to distributed data storage;
Computing unit is the part for calculating operation simultaneously, and calculating operation is used to define (the also referred to as operation of calculating task
Node) topological structure and execution logic, similar to workflow (Workflow), it is graphical that its definition procedure can be provided in system
Flow configuration instrument in complete, by way of pulling jobs node, by jobs node independent assortment and configuration, form one
Job task.From in terms of the visual angle of computing engines, each jobs node corresponds to a computing unit (Compute Unit), calculates
The corresponding programmed logic of unit is referred to as operator (Transformation).System provides visual modeling tool, preset abundant
Data processing and data display operator, while open operator development specifications, supports the secondary development of practical business scene.
ETL is responsible for being drawn into the data in scattered, heterogeneous data source such as relation data, unstructured data file etc.
Behind interim intermediate layer, cleaned, changed, it is integrated, be finally loaded into data warehouse or Data Mart, carried as data mining
For the data of decision support.The integrated ETL instruments major part function of ETL processing units, mainly computing unit dynamic layout is formed
Operation, ETL visualization processing frameworks are built based on Spark internal memories computing engines, by data pick-up, data conversion, data loading
Logic is converted into the computing unit for supporting graphical parameter configuration and dynamic combined, more intuitively shows ETL data processings
Journey.
Data pre-processing unit, for by facility information isomeric data, according to ETL processing units carry out data extraction,
Conversion, the pretreatment of loading, while standard data format can be carried out, abnormal data is removed, error correcting, repeated data
Remove;And the data in multiple data sources are combined into unified storage;Assembled by smooth, Data generalization, the side such as standardization
Formula converts the data into the process of the data mode suitable for data mining.The magnanimity isomeric data of access before storing, can
To carry out necessary pretreatment, data pick-up, data conversion, data loading etc. are carried out using pre-configured preprocessing rule
Operation.Device data (or other data) enters data pre-processing unit journey in forms such as data flow, timer-triggered scheduler, manual importings
Sequence, result is by according to the configuration output of specific treatment operation to specified location.Pre-process logic realization configuration, configuration
Change, visualize, each configurable logic unit is referred to as operator, according to actual business demand, in graphical tools, drag
Data pre-processing unit operator is dragged, dynamic layout forms preprocessing process, and configures operator relevant parameter.Whole pretreatment logic
Referred to as operation, the parallelization of operation is realized by the parallelization of operator.
The related operator of data pre-processing unit include but is not limited to invalid value filtering, missing values supplement, data column selection,
The units such as Data Row Transformation, data row are additional, data set merging, can be mutually combined, and support extension according to specific business.
Invalid value is filtered:Freely configuring for combination condition judgment rule is realized using regulation engine, can be by invalid note
Record is removed, and is retained satisfactory data and is entered next processing links.
Missing values are supplemented:Freely configuring for missing values calculating logic is realized using function is calculated, operation is calculated specific
In can with it is self-defined missing value complement calculate logic, complete mend calculate operation data enter next processing links.
Data column selection:Legacy data collection includes n field, can be with m field (m of unrestricted choice<=n) enter next
Processing links.
Data Row Transformation:Change the title or data form (example of some row of legacy data collection:Numeric type is converted to character
String type), the data for completing conversion enter next processing links.
Data row are additional:Legacy data collection includes n field, can freely add m field, the name of new field,
Data type, data value can customize (example:It is additional " creating the time " field of the data set comprising " creation time " field), it is complete
Additional data enter next processing links in column.
Data set merges:The aggregation node of many data sets, supports SQL statement inquiry, and result data collection enters next place
Reason link.
The present invention also provides a kind of device and method of the patterned data prediction based on big data technology, wraps successively
Include following steps:
(1) initialize, the initial parameter of data acquisition device is set, according to the initial parameter control data collection for setting
The sampling period of device is per hour for 10 times, and the sampling time is 7 days, and the data sampled in 7 days are averaged A;
(2) under the conditions of same initial parameter, real-time data collection, using 4 data of every continuous acquisition as a group
[B C D E], 4 data are designated as B, C, D, E respectively, calculate difference scores M respectively using formula, wherein:
A ' is a value in B, C, D, E in formula;
(3) if difference scores are in threshold range, then it is assumed that secondary gathered data group is effective, by B, C, D, E averaging
M, makes the real-time measured value that P' is data acquisition device, then:
If A.Then data acquisition device stable performance, into step (4);
If B.Then data acquisition device unstable properties, then into step (1);
(4) facility information Monitoring Data, and the facility information Monitoring Data transmission that will be collected are obtained in real time or quasi real time
In to monitoring of equipment device, in the way of data-pushing, (account data and history are mainly included in being pushed to distributed memory
Data, magnanimity isomeric data), or in the way of streaming is exported, by monitoring of equipment data output to data pre-processing unit mistake
Journey;
(5) in the way of batch is accessed, the routine in distributed memory is obtained by predefined operation plan automatically
Account data and historical data, equipment magnanimity isomeric data is carried out the number of the extraction of data, conversion, loading with preprocessing rule
Data preprocess unit, and pretreated data output to distributed memory is stored;
(6) in the way of streaming is accessed, the equipment magnanimity in distributed memory is obtained by predefined system drive
Isomeric data, carries out extraction, conversion, the data pre-processing unit of loading of data under preprocessing rule, and by after pretreatment
Data output to distributed memory stored;
(7) scheduling rule engine is driven to call and receive distribution by computing engines during data pre-processing unit
The data of memory storage, and the data called and receive are processed according to programmed treatment logic in advance, train
Data mining model is formed, by by the data back after ETL processing unit processes to distributed memory.
The device and method of the patterned data pre-processing unit based on big data technology of the invention is by software
Cooperation with hardware unit is completed, but be not limited to that this, under certain condition, it is also possible to the reality completely by way of software
It is existing.
Although for illustrative purposes, it has been described that illustrative embodiments of the invention, those skilled in the art
Member it will be understood that, in the case of not departing from the scope and spirit of invention disclosed in appended claims, can be in form and details
On carry out various modifications, addition and replace etc. change, and it is all these change should all belong to appended claims of the present invention
Each step in protection domain, and claimed each department of product and method, can be in any combination
Form is combined.Therefore, to disclosed in this invention implementation method description be not intended to limit the scope of the present invention,
But for describing the present invention.Correspondingly, the scope of the present invention is not limited by embodiment of above, but by claim or
Its equivalent is defined.
Claims (8)
1. a kind of device of the patterned data prediction based on big data technology, it is characterised in that:Including data acquisition dress
Put, monitoring of equipment device, distributed memory, spark internal memories computing engines, computing unit, ETL processing units, data are located in advance
Reason unit, wherein data acquisition device is connected with monitoring of equipment device, monitoring of equipment device connection distributed memory, distributed
Memory connects data pre-processing unit, and data pre-processing unit includes spark internal memories computing engines, computing unit, ETL treatment
Unit;
Data acquisition device, for obtaining facility information isomeric data in real time or quasi real time, and the facility information that will be collected is different
Structure data are transferred to monitoring of equipment device;
Monitoring of equipment device, is stored for being collected facility information isomeric data, and being pushed to distributed memory, and
In the way of data are flowed into, by monitoring of equipment device data output to data pre-processing unit process;
Distributed memory, also known as time series data memory, for by equipment real time mass isomeric data and data prediction
The storage of the device data after unit.
Spark internal memory computing engines, for being calculated data by calling computing unit logic rules, and by after calculating
Data output is to distributed memory;
Computing unit, calls and receives the data of distributed memory storage for driving scheduling rule engine, according in advance
Programmed treatment logic is processed the data called and receive, and training forms data mining model;
Computing unit includes many sub- computing units, many sub- computing units according to the graphical dynamic configuration of practical business demand,
Dynamic layout forms operation;Every sub- computing unit is individually present, and can independently extend evolution according to industry specialists experience, uses
Distributed streaming computing engines are exported in real time after calculating the data called and receive, and output data to distribution
Data storage;
ETL processing units, for forming operation based on computing unit dynamic layout, are built based on Spark internal memories computing engines, will
Data pick-up, data conversion, data load logic are converted into the graphical parameter configuration of support and dynamic combined;
Data pre-processing unit, for by facility information isomeric data, according to ETL processing units carry out the extraction of data, conversion,
The pretreatment of loading, while standard data format can be carried out, abnormal data is removed, error correcting, the removing of repeated data;
And the data in multiple data sources are combined into unified storage;By smooth aggregation, Data generalization and/or normalized fashion will
Process of the data conversion into the data mode suitable for data mining.
2. device as claimed in claim 1, it is characterised in that:Data acquisition device is that the data being installed in monitoring device are adopted
Collection sensor.
3. device as claimed in claim 1, it is characterised in that:The data acquisition device is red for monitoring device installation region
Outer thread detector or temperature detector.
4. device as claimed in claim 1, it is characterised in that:Also include that what is be connected with monitoring of equipment device is manually entered dress
Put, for because safety requirements implements quarantine measures or does not support data access in the case of Input Monitor Connector device data.
5. device as claimed in claim 1, it is characterised in that:The data pre-processing unit is additionally operable to call and receive distribution
Monitoring of equipment device pushes the new time series data for producing in formula memory, and new time series data is repeated trains
Journey, is updated to data mining model.
6. device as claimed in claim 1, it is characterised in that:The manual input device is notebook computer, panel computer
Or mobile phone.
7. device as claimed in claim 1, it is characterised in that:The related computing unit of data pre-processing unit includes but invalid
Value filter element, missing values supplementary units, data column selection unit, Data Row Transformation unit, data row additional unit and data
Collection combining unit in one or more, be mutually combined according to specific business, and support extension, specifically:
Invalid value filter element:Freely configuring for combination condition judgment rule is realized using regulation engine, invalid record is moved
Remove, retain satisfactory data and enter next processing links;
Missing values supplementary units:Freely configuring for missing values calculating logic is realized using function is calculated, operation is calculated specific
In can with it is self-defined missing value complement calculate logic, complete mend calculate operation data enter next processing links;
Data column selection unit:Legacy data collection includes n field, and m field of unrestricted choice enters next processing links, its
In (m<=n);
Data Row Transformation unit:Change the title or data form of some row of legacy data collection, complete under the data entrance of conversion
One processing links;
The additional unit of data row:Legacy data collection includes n field, freely adds m field, the name of new field, data
Type, data value can be self-defined, complete the additional data of row and enter next processing links;
Data set combining unit:The aggregation node of many data sets, supports SQL statement inquiry, and result data collection enters next place
Reason link.
8. a kind of patterned data prediction based on big data technology using as described in above-mentioned any one of claim 1-6
Device processing method, it is characterised in that in turn include the following steps:
(1) initialize, the initial parameter of data acquisition device is set, according to the initial parameter control data harvester for setting
Sampling period be per hour for 10 times, the sampling time is 7 days, and the data sampled in 7 days are averaged A;
(2) under the conditions of same initial parameter, real-time data collection, using 4 data of every continuous acquisition as one group of [B C
D E], 4 data are designated as B, C, D, E respectively, calculate difference scores M respectively using formula, wherein:
A ' is a value in B, C, D, E in formula;
(3) if difference scores are in threshold range, then it is assumed that secondary gathered data group effectively, B, C, D, E averaging M makes
P' is the real-time measured value of data acquisition device, then:
If A.Then data acquisition device stable performance, into step (4);
If B.Then data acquisition device unstable properties, then into step (1);
(4) facility information Monitoring Data is obtained in real time or quasi real time, and the facility information Monitoring Data that will be collected is transferred to and sets
In standby monitoring device, in the way of data-pushing, being pushed in distributed memory (mainly includes account data and history number
According to magnanimity isomeric data), or in the way of streaming is exported, by monitoring of equipment data output to data pre-processing unit process;
(5) in the way of batch is accessed, the conventional account in distributed memory is obtained automatically by predefined operation plan
Data and historical data, by equipment magnanimity isomeric data with preprocessing rule carry out the extraction of data, conversion, loading data it is pre-
Processing unit, and pretreated data output to distributed memory is stored;
(6) in the way of streaming is accessed, the equipment magnanimity isomery in distributed memory is obtained by predefined system drive
Data, carry out the extraction of data, conversion, the data pre-processing unit of loading under preprocessing rule, and by pretreated number
Stored according to output to distributed memory;
(7) scheduling rule engine is driven to call and receive distributed storage by computing engines during data pre-processing unit
The data of device storage, and the data called and receive are processed according to programmed treatment logic in advance, training is formed
Data mining model, by by the data back after ETL processing unit processes to distributed memory.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710090025.9A CN106874482A (en) | 2017-02-20 | 2017-02-20 | A kind of device and method of the patterned data prediction based on big data technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710090025.9A CN106874482A (en) | 2017-02-20 | 2017-02-20 | A kind of device and method of the patterned data prediction based on big data technology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106874482A true CN106874482A (en) | 2017-06-20 |
Family
ID=59167415
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710090025.9A Pending CN106874482A (en) | 2017-02-20 | 2017-02-20 | A kind of device and method of the patterned data prediction based on big data technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106874482A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108052574A (en) * | 2017-12-08 | 2018-05-18 | 南京中新赛克科技有限责任公司 | Slave ftp server based on Kafka technologies imports the ETL system and implementation method of mass data |
CN108242149A (en) * | 2018-03-16 | 2018-07-03 | 成都智达万应科技有限公司 | A kind of big data analysis method based on traffic data |
CN108628931A (en) * | 2018-03-15 | 2018-10-09 | 阿里巴巴集团控股有限公司 | A kind of method, apparatus and equipment of data-driven business |
CN109143017A (en) * | 2018-07-31 | 2019-01-04 | 成都天衡智造科技有限公司 | A kind of semicon industry production test data processing method |
CN109165202A (en) * | 2018-07-04 | 2019-01-08 | 华南理工大学 | A kind of preprocess method of multi-source heterogeneous big data |
CN109241112A (en) * | 2018-08-28 | 2019-01-18 | 北京明朝万达科技股份有限公司 | A kind of data processing method and device |
CN109491651A (en) * | 2018-10-24 | 2019-03-19 | 东软集团股份有限公司 | Data preprocessing method, device, storage medium and electronic equipment |
CN109614205A (en) * | 2018-10-18 | 2019-04-12 | 阿里巴巴集团控股有限公司 | A kind of method for processing business, device, equipment and system |
CN112433998A (en) * | 2020-11-20 | 2021-03-02 | 广东电网有限责任公司佛山供电局 | Multisource heterogeneous data acquisition and convergence system and method based on power system |
CN112925838A (en) * | 2019-12-06 | 2021-06-08 | 阿里巴巴集团控股有限公司 | Data processing method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130047161A1 (en) * | 2011-08-19 | 2013-02-21 | Alkiviadis Simitsis | Selecting processing techniques for a data flow task |
CN103617554A (en) * | 2013-10-22 | 2014-03-05 | 芜湖大学科技园发展有限公司 | Flexible drive system for grid data evaluation |
CN105427193A (en) * | 2015-12-17 | 2016-03-23 | 山东鲁能软件技术有限公司 | Device and method for big data analysis based on distributed time sequence data service |
CN106202566A (en) * | 2016-08-02 | 2016-12-07 | 山东鲁能软件技术有限公司 | A kind of magnanimity electricity consumption data mixing based on big data storage system and method |
-
2017
- 2017-02-20 CN CN201710090025.9A patent/CN106874482A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130047161A1 (en) * | 2011-08-19 | 2013-02-21 | Alkiviadis Simitsis | Selecting processing techniques for a data flow task |
CN103617554A (en) * | 2013-10-22 | 2014-03-05 | 芜湖大学科技园发展有限公司 | Flexible drive system for grid data evaluation |
CN105427193A (en) * | 2015-12-17 | 2016-03-23 | 山东鲁能软件技术有限公司 | Device and method for big data analysis based on distributed time sequence data service |
CN106202566A (en) * | 2016-08-02 | 2016-12-07 | 山东鲁能软件技术有限公司 | A kind of magnanimity electricity consumption data mixing based on big data storage system and method |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108052574A (en) * | 2017-12-08 | 2018-05-18 | 南京中新赛克科技有限责任公司 | Slave ftp server based on Kafka technologies imports the ETL system and implementation method of mass data |
CN108628931A (en) * | 2018-03-15 | 2018-10-09 | 阿里巴巴集团控股有限公司 | A kind of method, apparatus and equipment of data-driven business |
CN108628931B (en) * | 2018-03-15 | 2022-08-30 | 创新先进技术有限公司 | Method, device and equipment for data driving service |
CN108242149A (en) * | 2018-03-16 | 2018-07-03 | 成都智达万应科技有限公司 | A kind of big data analysis method based on traffic data |
CN109165202A (en) * | 2018-07-04 | 2019-01-08 | 华南理工大学 | A kind of preprocess method of multi-source heterogeneous big data |
CN109143017B (en) * | 2018-07-31 | 2021-03-30 | 成都天衡智造科技有限公司 | Production test data processing method for semiconductor industry |
CN109143017A (en) * | 2018-07-31 | 2019-01-04 | 成都天衡智造科技有限公司 | A kind of semicon industry production test data processing method |
CN109241112A (en) * | 2018-08-28 | 2019-01-18 | 北京明朝万达科技股份有限公司 | A kind of data processing method and device |
CN109614205A (en) * | 2018-10-18 | 2019-04-12 | 阿里巴巴集团控股有限公司 | A kind of method for processing business, device, equipment and system |
CN109491651A (en) * | 2018-10-24 | 2019-03-19 | 东软集团股份有限公司 | Data preprocessing method, device, storage medium and electronic equipment |
CN112925838A (en) * | 2019-12-06 | 2021-06-08 | 阿里巴巴集团控股有限公司 | Data processing method and device |
CN112433998A (en) * | 2020-11-20 | 2021-03-02 | 广东电网有限责任公司佛山供电局 | Multisource heterogeneous data acquisition and convergence system and method based on power system |
CN112433998B (en) * | 2020-11-20 | 2022-01-21 | 广东电网有限责任公司佛山供电局 | Multisource heterogeneous data acquisition and convergence system and method based on power system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106874482A (en) | A kind of device and method of the patterned data prediction based on big data technology | |
CN105608758B (en) | A kind of big data analysis platform device and method calculated based on algorithm configuration and distributed stream | |
CN105427193B (en) | A kind of big data analysis device and method based on distributed time series data service | |
CN105608144B (en) | A kind of big data analysis stage apparatus and method based on multilayered model iteration | |
CN106874483A (en) | A kind of device and method of the patterned quality of data evaluation and test based on big data technology | |
CN106651188A (en) | Electric transmission and transformation device multi-source state assessment data processing method and application thereof | |
CN105786912B (en) | Data collecting conversion method and device | |
CN108804630B (en) | Industry application-oriented big data intelligent analysis service system | |
CN108170655A (en) | Production method, device, terminal device and the storage medium of Visual Report Forms | |
CN104376365A (en) | Method for constructing information system running rule libraries on basis of association rule mining | |
CN112181960B (en) | Intelligent operation and maintenance framework system based on AIOps | |
CN113741883B (en) | RPA lightweight data middling station system | |
CN105467953A (en) | Knowledge representation facing industrial big data and automation application method thereof | |
CN105069025A (en) | Intelligent aggregation visualization and management control system for big data | |
CN106202566A (en) | A kind of magnanimity electricity consumption data mixing based on big data storage system and method | |
CN107862392B (en) | Equipment account management and control method based on power distribution network intelligent operation and maintenance management and control platform | |
CN102750367A (en) | Big data checking system and method thereof on cloud platform | |
CN115271648B (en) | Project visual supervision system, method, equipment and storage medium | |
CN109582837A (en) | A kind of visualized data processing method based on cloud and system | |
WO2021233160A1 (en) | Data presentation system, method and device, and computer-readable storage medium | |
CN113642850A (en) | Data fusion method and terminal for power distribution network planning | |
CN113535831A (en) | Report form analysis method, device, equipment and medium based on big data | |
WO2024066683A1 (en) | Industrial internet operating system and product processing method | |
CN116484651B (en) | Digital twinning-based system parameter adjusting method and device and electronic equipment | |
CN108874818A (en) | A kind of data intelligence visualization system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170620 |
|
RJ01 | Rejection of invention patent application after publication |