CN106874483A - A kind of device and method of the patterned quality of data evaluation and test based on big data technology - Google Patents
A kind of device and method of the patterned quality of data evaluation and test based on big data technology Download PDFInfo
- Publication number
- CN106874483A CN106874483A CN201710090356.2A CN201710090356A CN106874483A CN 106874483 A CN106874483 A CN 106874483A CN 201710090356 A CN201710090356 A CN 201710090356A CN 106874483 A CN106874483 A CN 106874483A
- Authority
- CN
- China
- Prior art keywords
- data
- monitoring
- quality
- unit
- distributed memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
Abstract
A kind of device and method of the patterned quality of data evaluation and test based on big data technology, including data acquisition device, monitoring of equipment device, distributed memory, spark internal memory computing engines, computing unit, the quality of data checks unit, data pre-processing unit, wherein data acquisition device respectively with monitoring of equipment device, monitoring of equipment device connects distributed memory, distributed memory connects data pre-processing unit, data pre-processing unit includes spark internal memory computing engines, computing unit, the quality of data checks unit, can be quick, efficiently, mass data is processed and assessed in time, ensure equipment safety simultaneously, stabilization, efficiently run.
Description
Technical field
The present invention relates to monitoring of equipment analysis application field, and in particular to a kind of based on the patterned of big data technology
The device and method of quality of data evaluation and test.
Background technology
With the fast development of intelligent grid, power system has begun to march toward energy internet and " big data " epoch,
The a large amount of service datas of power industry are increasingly presented that the scale of construction is big, type is more, are worth high feature, Data Analysis Services ability fall behind with
Contradiction between data rapid growth will be protruded more;As data volume, data type are on the increase, also there is data analysis
Performance bottleneck, lack data analysis excavation sophisticated method, unstructured data still lack effectively utilize the problems such as, this restrict
Electric power trade information is from digitlization to intelligentized development.The big data key technology of energy Internet era is adopted including data
The many aspects such as collection, transmission, storage, quality management, fusion be shared and depth is excavated.
History service data collection and analysis, the instant analysis of real-time or near-realtime data are power industries in power industry
Important content during middle informatization, the big data that it needs complete set, stablizes, agrees with practical business scene
The solution of analytical equipment, to equipment fault early-warning etc., analysis classes business scenario provides reliable and stable bottom data branch in real time
Support.
In recent years, with the IT technology fast developments such as cloud computing, big data, machine learning, data mining, distribution is deposited
Storage, high-performance calculation obtain key breakthrough in theoretical research and engineering practice aspect, industry emerged it is a collection of with
Hadoop is big data treatment and the application solution of representative.
Hadoop is an expansible open source software unit, can carry out reliable distributed treatment to big data,
The design that the unit of Hadoop is most crucial includes HDFS and MapReduce.HDFS provides storage for the data of magnanimity, then
MapReduce provides calculating for the data of magnanimity.HDFS is a distributed file system, with low cost, high reliability
The characteristics of property, high-throughput.MapReduce is one and becomes model and software unit, and it can greatly simplify extensive number
According to processing procedure.Spark is a kind of distributed big data handling implement, and itself does not provide data storage function, and it can be transported
On the HDFS or other distributed file systems of Hadoop, the design original intention of Spark is exactly to solve Hadoop to row
MapReduce repeatedly reading and writing of files system so as to inefficiency problem, it by build elasticity distribution formula data set (RDD) tie
Structure, supports that datarams are resident, and realizes In-memory MapReduce frameworks, is made up under application-specific scene
The deficiency of MapReduce.Hadoop, Spark etc. general open source technology component is deposited in terms of functional completeness, operation stability
In certain limitation, and it is based on actual demand of some commercial big data platforms with power business scene derived from Hadoop
There is deviation, thus, depth analysis study power industry business demand, the integration of heterogeneous data source, it is integrated be company information
Change the realistic problem that process of construction is frequently encountered, with the increasing of sharply increasing for data volume, particularly unstructured data
Plus, the performance of traditional data warehouse technology and data extraction tool in terms of quality of data evaluation and test is had too many difficulties to cope with, it is impossible to met
The data processing performance requirement of magnanimity isomeric data and mixed and disorderly low quality data, builds a kind of based on the graphical of big data technology
Quality of data evaluating apparatus and method, have far-reaching significance and stronger value.
Data are the critical assets of electric power enterprise data center, obtain and safeguard quality data to efficient IT and business
Operation is most important, effectively collects data, analyze data, is exactly to strengthen data quality management using the premise of data.In face of multiple
The ever-increasing magnanimity business datum of miscellaneous degree how the general warranty quality of data, be that can not return in effective mining data value creation process
The key subject kept away.
Quality of data guarantee is successfully crucial and basic big data, data quality management (Data Quality
Management) be related to the plan of data, acquisition, storage, it is shared, safeguard, using each stage of the, life cycle such as wither away,
Identification, measurement, monitoring, early warning etc. are a series of to be arranged in any stage the hidden danger of data quality problem may to be triggered all to be implemented
Apply, remain evading for data quality problem, thus ensure for big data effective analysis and make full use of, really
Enterprise is allowed to be acquired an advantage from big data application.Data quality problem can be attributed to " scarce, heavy, scattered, slow, poor ", these factors
Big data analysis and application effect are had a strong impact on, has been also the severe situation that current electric power data quality management faces.Data matter
Buret reason, as a horizontal mountain in face of the development of power industry big data, is each power informatization service provider big in development
The problem for all having to face and solve during data, the research and development of Correlative Standard System and kit are imperative.
With reference to quality testing standard and management system under big data background, matter is lifted with big data treatment technology
Amount check work efficiency, administers for data and provides decision-making foundation.Research kernel business system Capability Maturity Model (Data
Management Maturity, DMM), based on pinpointing the problems, solve problem, avoid the Thoughts and objects big data background of problem under
Data quality management system, development evaluation, prevent and the data quality management of repair data defect and improve mechanism, towards complete
The evaluative dimensions such as whole property, uniformity, accuracy, promptness, metadata system research and development data are coordinated based on big data treatment technology
Quality evaluation (Data Quality Assessment) system.
Comprehensive quality of data management and control is provided for enterprise's mass data, is checked by implementing the quality of data, find data
Quality problems, monitoring data quality fluctuation situation.The quality of data is built based on Spark internal memories computing engines and check unit, by number
The calculating for supporting parameter configuration and dynamic combined is converted into according to the quality indicator rule such as integrality, normalization, uniformity, accuracy
Unit, coordinates patterned flow configuration instrument to realize the flexible customization of data pre-processing unit process.
But a kind of device and method of the patterned quality of data evaluation and test based on big data technology is built, help enterprise
Implement the quality of data to check, with the performance bottleneck of big data technological break-through mass data quality evaluation, realize that the quality of data is commented
The standardized management of survey, effectively reduces data quality management cost.
The content of the invention
It is an object of the invention to overcome the deficiencies in the prior art, there is provided a kind of patterned number based on big data technology
According to the device and method of quality assessment, mass data can quickly, efficiently, be in time processed and assess, while ensureing equipment peace
Entirely, stably, efficiently run.
Adopted the invention provides a kind of device of the patterned quality of data evaluation and test based on big data technology, including data
Acquisition means, monitoring of equipment device, distributed memory, spark internal memories computing engines, computing unit, the quality of data check unit,
Data pre-processing unit, wherein data acquisition device respectively with monitoring of equipment device, monitoring of equipment device connection distributed storage
Device, distributed memory connection data pre-processing unit, data pre-processing unit includes spark internal memories computing engines, calculates single
Unit, the quality of data check unit;
Data acquisition device, for obtaining facility information isomeric data, and the equipment letter that will be collected in real time or quasi real time
Breath isomeric data is transferred to monitoring of equipment device;
Monitoring of equipment device, is stored for being collected facility information isomeric data, and being pushed to distributed memory,
And in the way of data are flowed into, by monitoring of equipment device data output to data pre-processing unit process;
Distributed memory, also known as time series data memory, for pre- by equipment real time mass isomeric data and data
The storage of the device data after processing unit.
Spark internal memory computing engines, for being calculated data by calling computing unit logic rules, and will calculate
Data output afterwards is to distributed memory;
Computing unit, calls and receives the data of distributed memory storage, foundation for driving scheduling rule engine
Programmed treatment logic is processed the data called and receive in advance, and training forms data mining model;
Computing unit includes many sub- computing units, and many sub- computing units are graphically dynamically matched somebody with somebody according to practical business demand
Put, dynamic layout forms operation;Every sub- computing unit is individually present, and can independently extend evolution according to industry specialists experience,
Exported in real time after being calculated the data called and receive using distributed streaming computing engines, and output data to point
Cloth data storage;
The quality of data checks unit, operation is formed for computing unit dynamic layout, based on Spark internal memory computing engines structures
Build the quality of data and check unit, the quality indicator rule of data integrity, normalization, uniformity, accuracy is converted into support
The computing unit of graphical parameter configuration and dynamic combined;
Data pre-processing unit, for by facility information isomeric data, checking unit according to the quality of data to carry out data
Integrality, normalization, uniformity, the pretreatment of accuracy, while carrying out standard data format, abnormal data is removed, mistake
Correct, the removing of repeated data;And the data in multiple data sources are combined into unified storage;Data generalization and/or specification
Change mode converts the data into the process of the data mode suitable for data mining, and pretreated data output is extremely distributed
Formula memory.
Preferably, the data acquisition device is the data sampling sensor being installed in monitoring device.
Preferably, the data acquisition device is the infrared detector or temperature detection of monitoring device installation region
Device.
Preferably, also including the manual input device being connected with monitoring of equipment device, for being implemented because of safety requirements
Quarantine measures or Input Monitor Connector device data in the case of do not support data access.
Preferably, monitoring of equipment device is pushed away during the data pre-processing unit is additionally operable to call and receive distributed memory
The new time series data of generation is sent, and training process is repeated to new time series data, data mining model is updated.
Preferably, the manual input device is notebook computer, panel computer or mobile phone.
The present invention also provides a kind of device and method of the patterned quality of data evaluation and test based on big data technology, successively
Comprise the following steps:
(1) initialize, the initial parameter of data acquisition device is set, according to the initial parameter control data collection for setting
The sampling period of device is per hour for 10 times, and the sampling time is 7 days, and the data sampled in 7 days are averaged A;
(2) under the conditions of same initial parameter, real-time data collection, using 4 data of every continuous acquisition as a group
[B C D E], 4 data are designated as B, C, D, E respectively, calculate difference scores M respectively using formula, wherein:
A ' is a value in B, C, D, E in formula;
(3) if difference scores are in threshold range, then it is assumed that secondary gathered data group is effective, by B, C, D, E averaging
M, makes the real-time measured value that P' is data acquisition device, then:
If A.Then data acquisition device stable performance, into step (4);
If B.Then data acquisition device unstable properties, then into step (1);
(4) facility information Monitoring Data, and the facility information Monitoring Data transmission that will be collected are obtained in real time or quasi real time
In to monitoring of equipment device, in the way of data-pushing, (account data and history are mainly included in being pushed to distributed memory
Data, magnanimity isomeric data), or in the way of streaming is exported, by monitoring of equipment data output to data pre-processing unit mistake
Journey;
(5) in the way of batch is accessed, the routine in distributed memory is obtained by predefined operation plan automatically
Account data and historical data, by equipment magnanimity isomeric data with preprocessing rule carry out the data integrity of data, normalization,
The data pre-processing unit of uniformity, accuracy, and pretreated data output to distributed memory is stored;
(6) in the way of streaming is accessed, the equipment magnanimity in distributed memory is obtained by predefined system drive
Isomeric data, carries out the data prediction list of Data Data integrality, normalization, uniformity, accuracy under preprocessing rule
Unit, and pretreated data output to distributed memory is stored, wherein compliance evaluation index parameter is specific
Mode is:
Relevant parameter is configured in uniformity operator, M is problem data item number, and to lack data item number, C is data to Q
The record number of collection, P is metadata definition data item number,
The uniformity of data is:Wherein n is the number of data set;
(7) scheduling rule engine is driven to call and receive distribution by computing engines during data pre-processing unit
The data of memory storage, and the data called and receive are processed according to programmed treatment logic in advance, train
Data mining model is formed, by by the data back after quality of data core inspection processing unit processes to distributed memory.
The device and method of the patterned quality of data evaluation and test based on big data technology of the invention, it is possible to achieve:
1) with stabilization, it is reliable, efficient increase income distributed memory system and parallel computation service is core, by the pre- of data
Processing procedure, transfer to Distributed Calculation unit perform, can not only reduce data processing complexity, improve time series data access gulp down
The amount of telling;
2) build the quality of data based on Spark internal memories computing engines and check unit, by data integrity, normalization, consistent
Property, the quality indicator rule such as accuracy be converted into the computing unit for supporting graphical parameter configuration and dynamic combined, coordinate figure
The flow configuration instrument of change realizes the flexible customization of data pre-processing unit process, can not only break through mass data quality evaluation
Performance bottleneck, realize the quality of data evaluation and test standardized management, effectively reduce data quality management cost;
3) for the reliability of system data, devise average data and confirm scheme so that monitoring of equipment data are more
It is reliable and stable, the live load of device is alleviated, service life is longer, and performance is more stablized.
4) the data acquisition device performances evaluation mode of optimization so that data are more reliable.
Brief description of the drawings
Fig. 1 is based on the apparatus structure schematic diagram of the data pre-processing unit of the graphics data quality assessment of big data
Specific embodiment
The following detailed description of specific implementation of the invention, it is necessary to it is pointed out here that, below implement to be only intended to this hair
Bright further illustrates, it is impossible to be interpreted as limiting the scope of the invention, and art skilled person is according to above-mentioned
Some nonessential modifications and adaptations that the content of the invention is made to the present invention, still fall within protection scope of the present invention.
The invention provides it is a kind of based on big data technology the patterned quality of data evaluation and test device and method, such as
Shown in accompanying drawing 1, including data acquisition device 1, monitoring of equipment device 2, distributed memory 3, Spark internal memories computing engines 4, meter
Calculate the unit 5, quality of data and check unit 6, data pre-processing unit 7, wherein monitoring of equipment device 2 respectively with data acquisition device
1 and distributed memory 3 connect, distributed storage device 3 is connected with data pre-processing unit 7, data pre-processing unit 7 include number
Unit 6 is checked according to quality, the quality of data checks unit 6 includes spark internal memories computing engines 4 and computing unit 5;
Data acquisition device, for obtaining facility information isomeric data, and the equipment letter that will be collected in real time or quasi real time
Breath isomeric data is transferred to monitoring of equipment device, and data acquisition device is the information acquisition sensor being installed in monitoring device,
Can also be the sensors such as infrared imaging, camera, the temperature detector of monitoring device installation region, monitoring of equipment device can
In real time to store facility information Monitoring Data, and in the way of pushing or in the way of streaming is exported, equipment is believed
Breath Monitoring Data is exported to distributed memory.
Monitoring of equipment device, is stored for being collected facility information isomeric data, and being pushed to distributed memory,
And in the way of data are flowed into, by monitoring of equipment device data output to data pre-processing unit process;
Distributed memory, also known as time series data memory, for pre- by equipment real time mass isomeric data and data
The storage of the device data after processing unit.
Spark internal memory computing engines, are drivers that data are calculated, by calling computing unit logic rules logarithm
According to being calculated, and by the data output after calculating to distributed memory.
Computing unit, also known as operator, calls and receives distributed memory storage for driving scheduling rule engine
The data called and receive can be processed by data according to programmed treatment logic in advance, and training forms data mining
Model.Data integrity, normalization, uniformity, accuracy logic are converted into the calculating for supporting parameter configuration and dynamic combined
Unit, in patterned flow configuration instrument, according to practical business demand, marshal data pretreatment unit process, by dragging
The mode for dragging operator carries out flexible configuration to data prediction unit process.Wherein computing unit includes many sub- computing units,
Many sub- computing units form operation according to the graphical dynamic configuration of practical business demand, dynamic layout.Every sub- computing unit
It is individually present, evolution can be independently extended according to industry specialists experience, using distributed streaming computing engines to calling and connecing
The data of receipts carry out output in real time after the quality of data is checked, and output data to distributed data storage;
Computing unit is the part for calculating operation simultaneously, and calculating operation is used to define (the also referred to as operation of calculating task
Node) topological structure and execution logic, similar to workflow (Workflow), it is graphical that its definition procedure can be provided in system
Flow configuration instrument in complete, by way of pulling jobs node, by jobs node independent assortment and configuration, form one
Job task.From in terms of the visual angle of computing engines, each jobs node corresponds to a computing unit (Compute Unit), calculates
The corresponding programmed logic of unit is referred to as operator (Transformation).System provides visual modeling tool, preset abundant
Data processing and data display operator, while open operator development specifications, supports the secondary development of practical business scene.
The quality of data checks unit, and mainly computing unit dynamic layout forms operation, based on Spark internal memory computing engines
Build the quality of data and check unit, the quality indicator rule such as data integrity, normalization, uniformity, accuracy is converted into branch
Hold the computing unit of graphical parameter configuration and dynamic combined.
Data quality management is circulation management process, and its ultimate aim is to lift data in use by reliable data
Value.The check method of function formation is checked with independent assortment, is possessed at big data with the parallelization for checking function
Reason ability, can be substantially improved quality check work efficiency, administered for data and provide decision-making foundation.
The quality of data corresponding to check method checks logic and may be regarded as a calculating operation in itself, and its definition procedure passes through
Visual designer is completed, and is performed by the periodic scheduling of check method, quality of data trend is formed, with this monitoring data matter
The implementation result of management work is measured, while ensuring that data keep high-quality, prevents from causing the quality of data to decline over time.
The logic unit (operator) that can be pulled substantially is a regulation engine for realizing parallelization, the weight of such as data
It is one group of function, these letters that renaturation, relevance, correctness, completeness, uniformity, compliance etc. check that logic is predefined
Number can supply user's unrestricted choice in the configuration interface of regulation engine operator, and configuration forms various check methods and judges to patrol
Volume.Check method decision logic uses the pattern of " labelling " for data set supplemental data is arranged, new data set maintenance data mistake
Filter, data statistics operator complete the customization output of evaluation and test data.
Data pre-processing unit, for by facility information isomeric data, checking unit according to the quality of data to carry out data
The magnanimity isomeric data that integrality, normalization, uniformity, the pretreatment of accuracy are accessed before storing, can carry out necessity
Pretreatment, using pre-configured preprocessing rule carry out data pick-up, data conversion, data loading etc. operation.Equipment
Data (or other data) enter data pre-processing unit program, treatment knot in forms such as data flow, timer-triggered scheduler, manual importings
Fruit is by according to the configuration output of specific treatment operation to specified location.Pretreatment logic realization configuration, configurationization, visually
Change, each configurable logic unit is referred to as operator, according to actual business demand, in graphical tools, pull data
Pretreatment unit operator, dynamic layout forms preprocessing process, and configures operator relevant parameter.Whole pretreatment logic is referred to as
Operation, the parallelization of operation is realized by the parallelization of operator
The present invention also provides a kind of device and method of the patterned quality of data evaluation and test based on big data technology, successively
Comprise the following steps:
(1) initialize, the initial parameter of data acquisition device is set, according to the initial parameter control data collection for setting
The sampling period of device is per hour for 10 times, and the sampling time is 7 days, and the data sampled in 7 days are averaged A;
(2) under the conditions of same initial parameter, real-time data collection, using 4 data of every continuous acquisition as a group
[B C D E], 4 data are designated as B, C, D, E respectively, calculate difference scores M respectively using formula, wherein:
A ' is a value in B, C, D, E in formula;
(3) if difference scores are in threshold range, then it is assumed that secondary gathered data group is effective, by B, C, D, E averaging
M, makes the real-time measured value that P' is data acquisition device, then:
If A.Then data acquisition device stable performance, into step (4);
If B.Then data acquisition device unstable properties, then into step (1);
(4) facility information Monitoring Data, and the facility information Monitoring Data transmission that will be collected are obtained in real time or quasi real time
In to monitoring of equipment device, in the way of data-pushing, (account data and history are mainly included in being pushed to distributed memory
Data, magnanimity isomeric data), or in the way of streaming is exported, by monitoring of equipment data output to data pre-processing unit mistake
Journey;
(5) in the way of batch is accessed, the routine in distributed memory is obtained by predefined operation plan automatically
Account data and historical data, by equipment magnanimity isomeric data with preprocessing rule carry out the data integrity of data, normalization,
The data pre-processing unit of uniformity, accuracy, and pretreated data output to distributed memory is stored;
(6) in the way of streaming is accessed, the equipment magnanimity in distributed memory is obtained by predefined system drive
Isomeric data, carries out the data prediction list of Data Data integrality, normalization, uniformity, accuracy under preprocessing rule
Unit, and pretreated data output to distributed memory is stored, wherein compliance evaluation index parameter is specific
Mode is:
Relevant parameter is configured in uniformity operator, M is problem data item number, and to lack data item number, C is data to Q
The record number of collection, P is metadata definition data item number,
The uniformity of data is:Wherein n is the number of data set;
(7) scheduling rule engine is driven to call and receive distribution by computing engines during data pre-processing unit
The data of memory storage, and the data called and receive are processed according to programmed treatment logic in advance, train
Data mining model is formed, by by the data back after quality of data core inspection processing unit processes to distributed memory.
The device and method of the patterned quality of data evaluation and test based on big data technology of the invention be by software and
The cooperation of hardware unit is completed, but be not limited to that this, under certain condition, it is also possible to the reality completely by way of software
It is existing.
Although for illustrative purposes, it has been described that illustrative embodiments of the invention, those skilled in the art
Member it will be understood that, in the case of not departing from the scope and spirit of invention disclosed in appended claims, can be in form and details
On carry out various modifications, addition and replace etc. change, and it is all these change should all belong to appended claims of the present invention
Each step in protection domain, and claimed each department of product and method, can be in any combination
Form is combined.Therefore, to disclosed in this invention implementation method description be not intended to limit the scope of the present invention,
But for describing the present invention.Correspondingly, the scope of the present invention is not limited by embodiment of above, but by claim or
Its equivalent is defined.
Claims (7)
1. the device that a kind of patterned quality of data based on big data technology is evaluated and tested, it is characterised in that:Including data acquisition
Device, monitoring of equipment device, distributed memory, spark internal memories computing engines, computing unit, the quality of data check unit, number
Data preprocess unit, wherein data acquisition device respectively with monitoring of equipment device, monitoring of equipment device connection distributed memory,
Distributed memory connect data pre-processing unit, data pre-processing unit include spark internal memories computing engines, computing unit,
The quality of data checks unit;
Data acquisition device, for obtaining facility information isomeric data in real time or quasi real time, and the facility information that will be collected is different
Structure data are transferred to monitoring of equipment device;
Monitoring of equipment device, is stored for being collected facility information isomeric data, and being pushed to distributed memory, and
In the way of data are flowed into, by monitoring of equipment device data output to data pre-processing unit process;
Distributed memory, also known as time series data memory, for by equipment real time mass isomeric data and data prediction
The storage of the device data after unit.
Spark internal memory computing engines, for being calculated data by calling computing unit logic rules, and by after calculating
Data output is to distributed memory;
Computing unit, calls and receives the data of distributed memory storage for driving scheduling rule engine, according in advance
Programmed treatment logic is processed the data called and receive, and training forms data mining model;
Computing unit includes many sub- computing units, many sub- computing units according to the graphical dynamic configuration of practical business demand,
Dynamic layout forms operation;Every sub- computing unit is individually present, and can independently extend evolution according to industry specialists experience, uses
Distributed streaming computing engines are exported in real time after calculating the data called and receive, and output data to distribution
Data storage;
The quality of data checks unit, and operation is formed for computing unit dynamic layout, and number is built based on Spark internal memories computing engines
Unit is checked according to quality, the quality indicator rule of data integrity, normalization, uniformity, accuracy is converted into support figure
Change the computing unit of parameter configuration and dynamic combined;
Data pre-processing unit, for by facility information isomeric data, checking unit according to the quality of data to carry out the complete of data
Property, the pretreatment of normalization, uniformity, accuracy, while carry out standard data format, abnormal data is removed, error correcting,
The removing of repeated data;And the data in multiple data sources are combined into unified storage;Data generalization and/or normalized fashion
Convert the data into the process of the data mode suitable for data mining, and by pretreated data output to distributed storage
Device.
2. device as claimed in claim 1, it is characterised in that:The data acquisition device is the number being installed in monitoring device
According to collection sensor.
3. device as claimed in claim 1, it is characterised in that:The data acquisition device is red for monitoring device installation region
Outer thread detector or temperature detector.
4. device as claimed in claim 1, it is characterised in that:Also include that what is be connected with monitoring of equipment device is manually entered dress
Put, for because safety requirements implements quarantine measures or does not support data access in the case of Input Monitor Connector device data.
5. device as claimed in claim 1, it is characterised in that:The data pre-processing unit is additionally operable to call and receive distribution
Monitoring of equipment device pushes the new time series data for producing in formula memory, and new time series data is repeated trains
Journey, is updated to data mining model.
6. device as claimed in claim 1, it is characterised in that:The manual input device is notebook computer, panel computer
Or mobile phone.
7. a kind of patterned quality of data based on big data technology using as described in above-mentioned any one of claim 1-6 is commented
The evaluating method of the device of survey, it is characterised in that in turn include the following steps:
(1) initialize, the initial parameter of data acquisition device is set, according to the initial parameter control data harvester for setting
Sampling period be per hour for 10 times, the sampling time is 7 days, and the data sampled in 7 days are averaged A;
(2) under the conditions of same initial parameter, real-time data collection, using 4 data of every continuous acquisition as one group of [B C
D E], 4 data are designated as B, C, D, E respectively, calculate difference scores M respectively using formula, wherein:
A ' is a value in B, C, D, E in formula;
(3) if difference scores are in threshold range, then it is assumed that secondary gathered data group effectively, B, C, D, E averaging M makes
P' is the real-time measured value of data acquisition device, then:
If A.Then data acquisition device stable performance, into step (4);
If B.Then data acquisition device unstable properties, then into step (1);
(4) facility information Monitoring Data is obtained in real time or quasi real time, and the facility information Monitoring Data that will be collected is transferred to and sets
In standby monitoring device, in the way of data-pushing, being pushed in distributed memory (mainly includes account data and history number
According to magnanimity isomeric data), or in the way of streaming is exported, by monitoring of equipment data output to data pre-processing unit process;
(5) in the way of batch is accessed, the conventional account in distributed memory is obtained automatically by predefined operation plan
Data and historical data, the data integrity of data, normalization, consistent is carried out by equipment magnanimity isomeric data with preprocessing rule
Property, the data pre-processing unit of accuracy, and pretreated data output to distributed memory is stored;
(6) in the way of streaming is accessed, the equipment magnanimity isomery in distributed memory is obtained by predefined system drive
Data, carry out the data pre-processing unit of Data Data integrality, normalization, uniformity, accuracy under preprocessing rule, and
Pretreated data output to distributed memory is stored, wherein the concrete mode of compliance evaluation index parameter
For:
Relevant parameter is configured in uniformity operator, M is problem data item number, and to lack data item number, C is data set to Q
Record number, P is metadata definition data item number,
The uniformity of data is:Wherein n is the number of data set;
(7) scheduling rule engine is driven to call and receive distributed storage by computing engines during data pre-processing unit
The data of device storage, and the data called and receive are processed according to programmed treatment logic in advance, training is formed
Data mining model, by by the data back after quality of data core inspection processing unit processes to distributed memory.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710090356.2A CN106874483A (en) | 2017-02-20 | 2017-02-20 | A kind of device and method of the patterned quality of data evaluation and test based on big data technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710090356.2A CN106874483A (en) | 2017-02-20 | 2017-02-20 | A kind of device and method of the patterned quality of data evaluation and test based on big data technology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106874483A true CN106874483A (en) | 2017-06-20 |
Family
ID=59167167
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710090356.2A Pending CN106874483A (en) | 2017-02-20 | 2017-02-20 | A kind of device and method of the patterned quality of data evaluation and test based on big data technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106874483A (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107361396A (en) * | 2017-07-10 | 2017-11-21 | 红云红河烟草(集团)有限责任公司 | Tobacco based on big data dries the prediction of silk moisture and control system |
CN107807972A (en) * | 2017-10-19 | 2018-03-16 | 北京科技大学 | A kind of test data consistency detecting method |
CN108416067A (en) * | 2018-03-29 | 2018-08-17 | 重庆大学 | Mass data processing and the optimization of storing process execute evaluation method in industrial process |
CN108564260A (en) * | 2018-03-29 | 2018-09-21 | 重庆沐信润喆网络科技有限公司 | Appraisal procedure for industrial process mass data processing and storage |
CN109656917A (en) * | 2018-12-18 | 2019-04-19 | 深圳前海微众银行股份有限公司 | Data detection method, device, equipment and the readable storage medium storing program for executing of multi-data source |
CN110007654A (en) * | 2019-04-10 | 2019-07-12 | 华夏天信(北京)智能低碳技术研究院有限公司 | A kind of production big data service system based on Red-Sensor sensor |
CN110136789A (en) * | 2019-05-14 | 2019-08-16 | 浪潮软件集团有限公司 | A kind of data governance quality detection method based on electronic health record application |
CN110442567A (en) * | 2019-07-30 | 2019-11-12 | 郑州航管科技有限公司 | A kind of data fusion method for airport automatic observing system |
CN111313958A (en) * | 2020-02-12 | 2020-06-19 | 深圳航天宏图信息技术有限公司 | Satellite data quality inspection report generation system |
CN111400299A (en) * | 2020-06-04 | 2020-07-10 | 成都四方伟业软件股份有限公司 | Method and system for testing fusion quality of multiple data |
CN111400288A (en) * | 2019-01-02 | 2020-07-10 | 中国移动通信有限公司研究院 | Data quality inspection method and system |
CN112784208A (en) * | 2021-01-19 | 2021-05-11 | 深圳市紫衡技术有限公司 | Building energy consumption calculation method and system, electronic equipment and storage medium |
CN112925838A (en) * | 2019-12-06 | 2021-06-08 | 阿里巴巴集团控股有限公司 | Data processing method and device |
CN111401064B (en) * | 2019-01-02 | 2024-04-19 | 中国移动通信有限公司研究院 | Named entity identification method and device and terminal equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103034928A (en) * | 2012-12-11 | 2013-04-10 | 清华大学 | Self-regulation dispersive plug-and-play data platform and management method and application |
CN103247008A (en) * | 2013-05-07 | 2013-08-14 | 国家电网公司 | Quality evaluation method of electricity statistical index data |
CN103414601A (en) * | 2013-07-19 | 2013-11-27 | 广东电网公司电力调度控制中心 | Method and system for detecting data for communication resource management system |
CN103560910A (en) * | 2013-10-25 | 2014-02-05 | 广东电网公司电力调度控制中心 | Communication resource management system accuracy detection system dynamic configuration method and device |
CN103617554A (en) * | 2013-10-22 | 2014-03-05 | 芜湖大学科技园发展有限公司 | Flexible drive system for grid data evaluation |
CN105427193A (en) * | 2015-12-17 | 2016-03-23 | 山东鲁能软件技术有限公司 | Device and method for big data analysis based on distributed time sequence data service |
CN105741196A (en) * | 2016-03-01 | 2016-07-06 | 万达信息股份有限公司 | Four-dimension-based data quality monitoring and evaluating method |
-
2017
- 2017-02-20 CN CN201710090356.2A patent/CN106874483A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103034928A (en) * | 2012-12-11 | 2013-04-10 | 清华大学 | Self-regulation dispersive plug-and-play data platform and management method and application |
CN103247008A (en) * | 2013-05-07 | 2013-08-14 | 国家电网公司 | Quality evaluation method of electricity statistical index data |
CN103414601A (en) * | 2013-07-19 | 2013-11-27 | 广东电网公司电力调度控制中心 | Method and system for detecting data for communication resource management system |
CN103617554A (en) * | 2013-10-22 | 2014-03-05 | 芜湖大学科技园发展有限公司 | Flexible drive system for grid data evaluation |
CN103560910A (en) * | 2013-10-25 | 2014-02-05 | 广东电网公司电力调度控制中心 | Communication resource management system accuracy detection system dynamic configuration method and device |
CN105427193A (en) * | 2015-12-17 | 2016-03-23 | 山东鲁能软件技术有限公司 | Device and method for big data analysis based on distributed time sequence data service |
CN105741196A (en) * | 2016-03-01 | 2016-07-06 | 万达信息股份有限公司 | Four-dimension-based data quality monitoring and evaluating method |
Non-Patent Citations (1)
Title |
---|
黄刚 等: "元数据驱动的数据质量评估体系架构研究", 《计算机工程与应用》 * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107361396A (en) * | 2017-07-10 | 2017-11-21 | 红云红河烟草(集团)有限责任公司 | Tobacco based on big data dries the prediction of silk moisture and control system |
CN107807972A (en) * | 2017-10-19 | 2018-03-16 | 北京科技大学 | A kind of test data consistency detecting method |
CN107807972B (en) * | 2017-10-19 | 2020-12-22 | 北京科技大学 | Test data consistency detection method |
CN108416067A (en) * | 2018-03-29 | 2018-08-17 | 重庆大学 | Mass data processing and the optimization of storing process execute evaluation method in industrial process |
CN108564260A (en) * | 2018-03-29 | 2018-09-21 | 重庆沐信润喆网络科技有限公司 | Appraisal procedure for industrial process mass data processing and storage |
CN109656917A (en) * | 2018-12-18 | 2019-04-19 | 深圳前海微众银行股份有限公司 | Data detection method, device, equipment and the readable storage medium storing program for executing of multi-data source |
CN111401064B (en) * | 2019-01-02 | 2024-04-19 | 中国移动通信有限公司研究院 | Named entity identification method and device and terminal equipment |
CN111400288A (en) * | 2019-01-02 | 2020-07-10 | 中国移动通信有限公司研究院 | Data quality inspection method and system |
CN110007654A (en) * | 2019-04-10 | 2019-07-12 | 华夏天信(北京)智能低碳技术研究院有限公司 | A kind of production big data service system based on Red-Sensor sensor |
CN110136789A (en) * | 2019-05-14 | 2019-08-16 | 浪潮软件集团有限公司 | A kind of data governance quality detection method based on electronic health record application |
CN110442567A (en) * | 2019-07-30 | 2019-11-12 | 郑州航管科技有限公司 | A kind of data fusion method for airport automatic observing system |
CN112925838A (en) * | 2019-12-06 | 2021-06-08 | 阿里巴巴集团控股有限公司 | Data processing method and device |
CN111313958A (en) * | 2020-02-12 | 2020-06-19 | 深圳航天宏图信息技术有限公司 | Satellite data quality inspection report generation system |
CN111400299A (en) * | 2020-06-04 | 2020-07-10 | 成都四方伟业软件股份有限公司 | Method and system for testing fusion quality of multiple data |
CN112784208A (en) * | 2021-01-19 | 2021-05-11 | 深圳市紫衡技术有限公司 | Building energy consumption calculation method and system, electronic equipment and storage medium |
CN112784208B (en) * | 2021-01-19 | 2023-08-15 | 深圳市紫衡技术有限公司 | Building energy consumption calculation method, system, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106874483A (en) | A kind of device and method of the patterned quality of data evaluation and test based on big data technology | |
CN112348339A (en) | Power distribution network planning method based on big data analysis | |
CN106934720A (en) | Equipment insurance intelligent pricing method and system based on Internet of Things | |
CN109840157A (en) | Method, apparatus, electronic equipment and the storage medium of fault diagnosis | |
CN107967485A (en) | Electro-metering equipment fault analysis method and device | |
CN109784758B (en) | Engineering quality supervision early warning system and method based on BIM model | |
CN106874482A (en) | A kind of device and method of the patterned data prediction based on big data technology | |
CN107479540B (en) | Method for diagnosing faults and system | |
US20230326010A1 (en) | Defective picture generation method and apparatus applied to industrial quality inspection | |
US11704186B2 (en) | Analysis of deep-level cause of fault of storage management | |
CN111680855A (en) | Automatic risk detection and early warning method and system for whole process of project | |
CN110262975A (en) | Test data management method, device, equipment and computer readable storage medium | |
CN109685275A (en) | Dispense team's load pressure prediction technique, device, electronic equipment and storage medium | |
CN109492699A (en) | Passway for transmitting electricity method for three-dimensional measurement and device | |
CN105868956A (en) | Data processing method and device | |
CN111524026A (en) | Power resource allocation method and device | |
CN111080484A (en) | Method and device for monitoring abnormal data of power distribution network | |
CN103617447A (en) | Evaluation system and method for intelligent substation | |
CN110287114B (en) | Method and device for testing performance of database script | |
CN116822926A (en) | Delay statistics and analysis method and device, electronic equipment and storage medium | |
CN107093018A (en) | Communication engineering project information method for visualizing and device based on health model | |
CN104035342A (en) | Real-time alarm intelligent aided analysis system and real-time alarm intelligent aided analysis method based on IFIX platform | |
CN114312930A (en) | Train operation abnormity diagnosis method and device based on log data | |
CN115169578A (en) | AI model production method and system based on meta-space data markers | |
CN112598142B (en) | Wind turbine maintenance working quality inspection auxiliary method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170620 |