CN106874483A - A kind of device and method of the patterned quality of data evaluation and test based on big data technology - Google Patents

A kind of device and method of the patterned quality of data evaluation and test based on big data technology Download PDF

Info

Publication number
CN106874483A
CN106874483A CN201710090356.2A CN201710090356A CN106874483A CN 106874483 A CN106874483 A CN 106874483A CN 201710090356 A CN201710090356 A CN 201710090356A CN 106874483 A CN106874483 A CN 106874483A
Authority
CN
China
Prior art keywords
data
monitoring
quality
unit
distributed memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710090356.2A
Other languages
Chinese (zh)
Inventor
杨立涛
王庆刚
刘涛
丛兴滋
李书明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Luneng Software Technology Co Ltd
Original Assignee
Shandong Luneng Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Luneng Software Technology Co Ltd filed Critical Shandong Luneng Software Technology Co Ltd
Priority to CN201710090356.2A priority Critical patent/CN106874483A/en
Publication of CN106874483A publication Critical patent/CN106874483A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems

Abstract

A kind of device and method of the patterned quality of data evaluation and test based on big data technology, including data acquisition device, monitoring of equipment device, distributed memory, spark internal memory computing engines, computing unit, the quality of data checks unit, data pre-processing unit, wherein data acquisition device respectively with monitoring of equipment device, monitoring of equipment device connects distributed memory, distributed memory connects data pre-processing unit, data pre-processing unit includes spark internal memory computing engines, computing unit, the quality of data checks unit, can be quick, efficiently, mass data is processed and assessed in time, ensure equipment safety simultaneously, stabilization, efficiently run.

Description

A kind of device and method of the patterned quality of data evaluation and test based on big data technology
Technical field
The present invention relates to monitoring of equipment analysis application field, and in particular to a kind of based on the patterned of big data technology The device and method of quality of data evaluation and test.
Background technology
With the fast development of intelligent grid, power system has begun to march toward energy internet and " big data " epoch, The a large amount of service datas of power industry are increasingly presented that the scale of construction is big, type is more, are worth high feature, Data Analysis Services ability fall behind with Contradiction between data rapid growth will be protruded more;As data volume, data type are on the increase, also there is data analysis Performance bottleneck, lack data analysis excavation sophisticated method, unstructured data still lack effectively utilize the problems such as, this restrict Electric power trade information is from digitlization to intelligentized development.The big data key technology of energy Internet era is adopted including data The many aspects such as collection, transmission, storage, quality management, fusion be shared and depth is excavated.
History service data collection and analysis, the instant analysis of real-time or near-realtime data are power industries in power industry Important content during middle informatization, the big data that it needs complete set, stablizes, agrees with practical business scene The solution of analytical equipment, to equipment fault early-warning etc., analysis classes business scenario provides reliable and stable bottom data branch in real time Support.
In recent years, with the IT technology fast developments such as cloud computing, big data, machine learning, data mining, distribution is deposited Storage, high-performance calculation obtain key breakthrough in theoretical research and engineering practice aspect, industry emerged it is a collection of with Hadoop is big data treatment and the application solution of representative.
Hadoop is an expansible open source software unit, can carry out reliable distributed treatment to big data, The design that the unit of Hadoop is most crucial includes HDFS and MapReduce.HDFS provides storage for the data of magnanimity, then MapReduce provides calculating for the data of magnanimity.HDFS is a distributed file system, with low cost, high reliability The characteristics of property, high-throughput.MapReduce is one and becomes model and software unit, and it can greatly simplify extensive number According to processing procedure.Spark is a kind of distributed big data handling implement, and itself does not provide data storage function, and it can be transported On the HDFS or other distributed file systems of Hadoop, the design original intention of Spark is exactly to solve Hadoop to row MapReduce repeatedly reading and writing of files system so as to inefficiency problem, it by build elasticity distribution formula data set (RDD) tie Structure, supports that datarams are resident, and realizes In-memory MapReduce frameworks, is made up under application-specific scene The deficiency of MapReduce.Hadoop, Spark etc. general open source technology component is deposited in terms of functional completeness, operation stability In certain limitation, and it is based on actual demand of some commercial big data platforms with power business scene derived from Hadoop There is deviation, thus, depth analysis study power industry business demand, the integration of heterogeneous data source, it is integrated be company information Change the realistic problem that process of construction is frequently encountered, with the increasing of sharply increasing for data volume, particularly unstructured data Plus, the performance of traditional data warehouse technology and data extraction tool in terms of quality of data evaluation and test is had too many difficulties to cope with, it is impossible to met The data processing performance requirement of magnanimity isomeric data and mixed and disorderly low quality data, builds a kind of based on the graphical of big data technology Quality of data evaluating apparatus and method, have far-reaching significance and stronger value.
Data are the critical assets of electric power enterprise data center, obtain and safeguard quality data to efficient IT and business Operation is most important, effectively collects data, analyze data, is exactly to strengthen data quality management using the premise of data.In face of multiple The ever-increasing magnanimity business datum of miscellaneous degree how the general warranty quality of data, be that can not return in effective mining data value creation process The key subject kept away.
Quality of data guarantee is successfully crucial and basic big data, data quality management (Data Quality Management) be related to the plan of data, acquisition, storage, it is shared, safeguard, using each stage of the, life cycle such as wither away, Identification, measurement, monitoring, early warning etc. are a series of to be arranged in any stage the hidden danger of data quality problem may to be triggered all to be implemented Apply, remain evading for data quality problem, thus ensure for big data effective analysis and make full use of, really Enterprise is allowed to be acquired an advantage from big data application.Data quality problem can be attributed to " scarce, heavy, scattered, slow, poor ", these factors Big data analysis and application effect are had a strong impact on, has been also the severe situation that current electric power data quality management faces.Data matter Buret reason, as a horizontal mountain in face of the development of power industry big data, is each power informatization service provider big in development The problem for all having to face and solve during data, the research and development of Correlative Standard System and kit are imperative.
With reference to quality testing standard and management system under big data background, matter is lifted with big data treatment technology Amount check work efficiency, administers for data and provides decision-making foundation.Research kernel business system Capability Maturity Model (Data Management Maturity, DMM), based on pinpointing the problems, solve problem, avoid the Thoughts and objects big data background of problem under Data quality management system, development evaluation, prevent and the data quality management of repair data defect and improve mechanism, towards complete The evaluative dimensions such as whole property, uniformity, accuracy, promptness, metadata system research and development data are coordinated based on big data treatment technology Quality evaluation (Data Quality Assessment) system.
Comprehensive quality of data management and control is provided for enterprise's mass data, is checked by implementing the quality of data, find data Quality problems, monitoring data quality fluctuation situation.The quality of data is built based on Spark internal memories computing engines and check unit, by number The calculating for supporting parameter configuration and dynamic combined is converted into according to the quality indicator rule such as integrality, normalization, uniformity, accuracy Unit, coordinates patterned flow configuration instrument to realize the flexible customization of data pre-processing unit process.
But a kind of device and method of the patterned quality of data evaluation and test based on big data technology is built, help enterprise Implement the quality of data to check, with the performance bottleneck of big data technological break-through mass data quality evaluation, realize that the quality of data is commented The standardized management of survey, effectively reduces data quality management cost.
The content of the invention
It is an object of the invention to overcome the deficiencies in the prior art, there is provided a kind of patterned number based on big data technology According to the device and method of quality assessment, mass data can quickly, efficiently, be in time processed and assess, while ensureing equipment peace Entirely, stably, efficiently run.
Adopted the invention provides a kind of device of the patterned quality of data evaluation and test based on big data technology, including data Acquisition means, monitoring of equipment device, distributed memory, spark internal memories computing engines, computing unit, the quality of data check unit, Data pre-processing unit, wherein data acquisition device respectively with monitoring of equipment device, monitoring of equipment device connection distributed storage Device, distributed memory connection data pre-processing unit, data pre-processing unit includes spark internal memories computing engines, calculates single Unit, the quality of data check unit;
Data acquisition device, for obtaining facility information isomeric data, and the equipment letter that will be collected in real time or quasi real time Breath isomeric data is transferred to monitoring of equipment device;
Monitoring of equipment device, is stored for being collected facility information isomeric data, and being pushed to distributed memory, And in the way of data are flowed into, by monitoring of equipment device data output to data pre-processing unit process;
Distributed memory, also known as time series data memory, for pre- by equipment real time mass isomeric data and data The storage of the device data after processing unit.
Spark internal memory computing engines, for being calculated data by calling computing unit logic rules, and will calculate Data output afterwards is to distributed memory;
Computing unit, calls and receives the data of distributed memory storage, foundation for driving scheduling rule engine Programmed treatment logic is processed the data called and receive in advance, and training forms data mining model;
Computing unit includes many sub- computing units, and many sub- computing units are graphically dynamically matched somebody with somebody according to practical business demand Put, dynamic layout forms operation;Every sub- computing unit is individually present, and can independently extend evolution according to industry specialists experience, Exported in real time after being calculated the data called and receive using distributed streaming computing engines, and output data to point Cloth data storage;
The quality of data checks unit, operation is formed for computing unit dynamic layout, based on Spark internal memory computing engines structures Build the quality of data and check unit, the quality indicator rule of data integrity, normalization, uniformity, accuracy is converted into support The computing unit of graphical parameter configuration and dynamic combined;
Data pre-processing unit, for by facility information isomeric data, checking unit according to the quality of data to carry out data Integrality, normalization, uniformity, the pretreatment of accuracy, while carrying out standard data format, abnormal data is removed, mistake Correct, the removing of repeated data;And the data in multiple data sources are combined into unified storage;Data generalization and/or specification Change mode converts the data into the process of the data mode suitable for data mining, and pretreated data output is extremely distributed Formula memory.
Preferably, the data acquisition device is the data sampling sensor being installed in monitoring device.
Preferably, the data acquisition device is the infrared detector or temperature detection of monitoring device installation region Device.
Preferably, also including the manual input device being connected with monitoring of equipment device, for being implemented because of safety requirements Quarantine measures or Input Monitor Connector device data in the case of do not support data access.
Preferably, monitoring of equipment device is pushed away during the data pre-processing unit is additionally operable to call and receive distributed memory The new time series data of generation is sent, and training process is repeated to new time series data, data mining model is updated.
Preferably, the manual input device is notebook computer, panel computer or mobile phone.
The present invention also provides a kind of device and method of the patterned quality of data evaluation and test based on big data technology, successively Comprise the following steps:
(1) initialize, the initial parameter of data acquisition device is set, according to the initial parameter control data collection for setting The sampling period of device is per hour for 10 times, and the sampling time is 7 days, and the data sampled in 7 days are averaged A;
(2) under the conditions of same initial parameter, real-time data collection, using 4 data of every continuous acquisition as a group [B C D E], 4 data are designated as B, C, D, E respectively, calculate difference scores M respectively using formula, wherein:
A ' is a value in B, C, D, E in formula;
(3) if difference scores are in threshold range, then it is assumed that secondary gathered data group is effective, by B, C, D, E averaging M, makes the real-time measured value that P' is data acquisition device, then:
If A.Then data acquisition device stable performance, into step (4);
If B.Then data acquisition device unstable properties, then into step (1);
(4) facility information Monitoring Data, and the facility information Monitoring Data transmission that will be collected are obtained in real time or quasi real time In to monitoring of equipment device, in the way of data-pushing, (account data and history are mainly included in being pushed to distributed memory Data, magnanimity isomeric data), or in the way of streaming is exported, by monitoring of equipment data output to data pre-processing unit mistake Journey;
(5) in the way of batch is accessed, the routine in distributed memory is obtained by predefined operation plan automatically Account data and historical data, by equipment magnanimity isomeric data with preprocessing rule carry out the data integrity of data, normalization, The data pre-processing unit of uniformity, accuracy, and pretreated data output to distributed memory is stored;
(6) in the way of streaming is accessed, the equipment magnanimity in distributed memory is obtained by predefined system drive Isomeric data, carries out the data prediction list of Data Data integrality, normalization, uniformity, accuracy under preprocessing rule Unit, and pretreated data output to distributed memory is stored, wherein compliance evaluation index parameter is specific Mode is:
Relevant parameter is configured in uniformity operator, M is problem data item number, and to lack data item number, C is data to Q The record number of collection, P is metadata definition data item number,
The uniformity of data is:Wherein n is the number of data set;
(7) scheduling rule engine is driven to call and receive distribution by computing engines during data pre-processing unit The data of memory storage, and the data called and receive are processed according to programmed treatment logic in advance, train Data mining model is formed, by by the data back after quality of data core inspection processing unit processes to distributed memory.
The device and method of the patterned quality of data evaluation and test based on big data technology of the invention, it is possible to achieve:
1) with stabilization, it is reliable, efficient increase income distributed memory system and parallel computation service is core, by the pre- of data Processing procedure, transfer to Distributed Calculation unit perform, can not only reduce data processing complexity, improve time series data access gulp down The amount of telling;
2) build the quality of data based on Spark internal memories computing engines and check unit, by data integrity, normalization, consistent Property, the quality indicator rule such as accuracy be converted into the computing unit for supporting graphical parameter configuration and dynamic combined, coordinate figure The flow configuration instrument of change realizes the flexible customization of data pre-processing unit process, can not only break through mass data quality evaluation Performance bottleneck, realize the quality of data evaluation and test standardized management, effectively reduce data quality management cost;
3) for the reliability of system data, devise average data and confirm scheme so that monitoring of equipment data are more It is reliable and stable, the live load of device is alleviated, service life is longer, and performance is more stablized.
4) the data acquisition device performances evaluation mode of optimization so that data are more reliable.
Brief description of the drawings
Fig. 1 is based on the apparatus structure schematic diagram of the data pre-processing unit of the graphics data quality assessment of big data
Specific embodiment
The following detailed description of specific implementation of the invention, it is necessary to it is pointed out here that, below implement to be only intended to this hair Bright further illustrates, it is impossible to be interpreted as limiting the scope of the invention, and art skilled person is according to above-mentioned Some nonessential modifications and adaptations that the content of the invention is made to the present invention, still fall within protection scope of the present invention.
The invention provides it is a kind of based on big data technology the patterned quality of data evaluation and test device and method, such as Shown in accompanying drawing 1, including data acquisition device 1, monitoring of equipment device 2, distributed memory 3, Spark internal memories computing engines 4, meter Calculate the unit 5, quality of data and check unit 6, data pre-processing unit 7, wherein monitoring of equipment device 2 respectively with data acquisition device 1 and distributed memory 3 connect, distributed storage device 3 is connected with data pre-processing unit 7, data pre-processing unit 7 include number Unit 6 is checked according to quality, the quality of data checks unit 6 includes spark internal memories computing engines 4 and computing unit 5;
Data acquisition device, for obtaining facility information isomeric data, and the equipment letter that will be collected in real time or quasi real time Breath isomeric data is transferred to monitoring of equipment device, and data acquisition device is the information acquisition sensor being installed in monitoring device, Can also be the sensors such as infrared imaging, camera, the temperature detector of monitoring device installation region, monitoring of equipment device can In real time to store facility information Monitoring Data, and in the way of pushing or in the way of streaming is exported, equipment is believed Breath Monitoring Data is exported to distributed memory.
Monitoring of equipment device, is stored for being collected facility information isomeric data, and being pushed to distributed memory, And in the way of data are flowed into, by monitoring of equipment device data output to data pre-processing unit process;
Distributed memory, also known as time series data memory, for pre- by equipment real time mass isomeric data and data The storage of the device data after processing unit.
Spark internal memory computing engines, are drivers that data are calculated, by calling computing unit logic rules logarithm According to being calculated, and by the data output after calculating to distributed memory.
Computing unit, also known as operator, calls and receives distributed memory storage for driving scheduling rule engine The data called and receive can be processed by data according to programmed treatment logic in advance, and training forms data mining Model.Data integrity, normalization, uniformity, accuracy logic are converted into the calculating for supporting parameter configuration and dynamic combined Unit, in patterned flow configuration instrument, according to practical business demand, marshal data pretreatment unit process, by dragging The mode for dragging operator carries out flexible configuration to data prediction unit process.Wherein computing unit includes many sub- computing units, Many sub- computing units form operation according to the graphical dynamic configuration of practical business demand, dynamic layout.Every sub- computing unit It is individually present, evolution can be independently extended according to industry specialists experience, using distributed streaming computing engines to calling and connecing The data of receipts carry out output in real time after the quality of data is checked, and output data to distributed data storage;
Computing unit is the part for calculating operation simultaneously, and calculating operation is used to define (the also referred to as operation of calculating task Node) topological structure and execution logic, similar to workflow (Workflow), it is graphical that its definition procedure can be provided in system Flow configuration instrument in complete, by way of pulling jobs node, by jobs node independent assortment and configuration, form one Job task.From in terms of the visual angle of computing engines, each jobs node corresponds to a computing unit (Compute Unit), calculates The corresponding programmed logic of unit is referred to as operator (Transformation).System provides visual modeling tool, preset abundant Data processing and data display operator, while open operator development specifications, supports the secondary development of practical business scene.
The quality of data checks unit, and mainly computing unit dynamic layout forms operation, based on Spark internal memory computing engines Build the quality of data and check unit, the quality indicator rule such as data integrity, normalization, uniformity, accuracy is converted into branch Hold the computing unit of graphical parameter configuration and dynamic combined.
Data quality management is circulation management process, and its ultimate aim is to lift data in use by reliable data Value.The check method of function formation is checked with independent assortment, is possessed at big data with the parallelization for checking function Reason ability, can be substantially improved quality check work efficiency, administered for data and provide decision-making foundation.
The quality of data corresponding to check method checks logic and may be regarded as a calculating operation in itself, and its definition procedure passes through Visual designer is completed, and is performed by the periodic scheduling of check method, quality of data trend is formed, with this monitoring data matter The implementation result of management work is measured, while ensuring that data keep high-quality, prevents from causing the quality of data to decline over time.
The logic unit (operator) that can be pulled substantially is a regulation engine for realizing parallelization, the weight of such as data It is one group of function, these letters that renaturation, relevance, correctness, completeness, uniformity, compliance etc. check that logic is predefined Number can supply user's unrestricted choice in the configuration interface of regulation engine operator, and configuration forms various check methods and judges to patrol Volume.Check method decision logic uses the pattern of " labelling " for data set supplemental data is arranged, new data set maintenance data mistake Filter, data statistics operator complete the customization output of evaluation and test data.
Data pre-processing unit, for by facility information isomeric data, checking unit according to the quality of data to carry out data The magnanimity isomeric data that integrality, normalization, uniformity, the pretreatment of accuracy are accessed before storing, can carry out necessity Pretreatment, using pre-configured preprocessing rule carry out data pick-up, data conversion, data loading etc. operation.Equipment Data (or other data) enter data pre-processing unit program, treatment knot in forms such as data flow, timer-triggered scheduler, manual importings Fruit is by according to the configuration output of specific treatment operation to specified location.Pretreatment logic realization configuration, configurationization, visually Change, each configurable logic unit is referred to as operator, according to actual business demand, in graphical tools, pull data Pretreatment unit operator, dynamic layout forms preprocessing process, and configures operator relevant parameter.Whole pretreatment logic is referred to as Operation, the parallelization of operation is realized by the parallelization of operator
The present invention also provides a kind of device and method of the patterned quality of data evaluation and test based on big data technology, successively Comprise the following steps:
(1) initialize, the initial parameter of data acquisition device is set, according to the initial parameter control data collection for setting The sampling period of device is per hour for 10 times, and the sampling time is 7 days, and the data sampled in 7 days are averaged A;
(2) under the conditions of same initial parameter, real-time data collection, using 4 data of every continuous acquisition as a group [B C D E], 4 data are designated as B, C, D, E respectively, calculate difference scores M respectively using formula, wherein:
A ' is a value in B, C, D, E in formula;
(3) if difference scores are in threshold range, then it is assumed that secondary gathered data group is effective, by B, C, D, E averaging M, makes the real-time measured value that P' is data acquisition device, then:
If A.Then data acquisition device stable performance, into step (4);
If B.Then data acquisition device unstable properties, then into step (1);
(4) facility information Monitoring Data, and the facility information Monitoring Data transmission that will be collected are obtained in real time or quasi real time In to monitoring of equipment device, in the way of data-pushing, (account data and history are mainly included in being pushed to distributed memory Data, magnanimity isomeric data), or in the way of streaming is exported, by monitoring of equipment data output to data pre-processing unit mistake Journey;
(5) in the way of batch is accessed, the routine in distributed memory is obtained by predefined operation plan automatically Account data and historical data, by equipment magnanimity isomeric data with preprocessing rule carry out the data integrity of data, normalization, The data pre-processing unit of uniformity, accuracy, and pretreated data output to distributed memory is stored;
(6) in the way of streaming is accessed, the equipment magnanimity in distributed memory is obtained by predefined system drive Isomeric data, carries out the data prediction list of Data Data integrality, normalization, uniformity, accuracy under preprocessing rule Unit, and pretreated data output to distributed memory is stored, wherein compliance evaluation index parameter is specific Mode is:
Relevant parameter is configured in uniformity operator, M is problem data item number, and to lack data item number, C is data to Q The record number of collection, P is metadata definition data item number,
The uniformity of data is:Wherein n is the number of data set;
(7) scheduling rule engine is driven to call and receive distribution by computing engines during data pre-processing unit The data of memory storage, and the data called and receive are processed according to programmed treatment logic in advance, train Data mining model is formed, by by the data back after quality of data core inspection processing unit processes to distributed memory.
The device and method of the patterned quality of data evaluation and test based on big data technology of the invention be by software and The cooperation of hardware unit is completed, but be not limited to that this, under certain condition, it is also possible to the reality completely by way of software It is existing.
Although for illustrative purposes, it has been described that illustrative embodiments of the invention, those skilled in the art Member it will be understood that, in the case of not departing from the scope and spirit of invention disclosed in appended claims, can be in form and details On carry out various modifications, addition and replace etc. change, and it is all these change should all belong to appended claims of the present invention Each step in protection domain, and claimed each department of product and method, can be in any combination Form is combined.Therefore, to disclosed in this invention implementation method description be not intended to limit the scope of the present invention, But for describing the present invention.Correspondingly, the scope of the present invention is not limited by embodiment of above, but by claim or Its equivalent is defined.

Claims (7)

1. the device that a kind of patterned quality of data based on big data technology is evaluated and tested, it is characterised in that:Including data acquisition Device, monitoring of equipment device, distributed memory, spark internal memories computing engines, computing unit, the quality of data check unit, number Data preprocess unit, wherein data acquisition device respectively with monitoring of equipment device, monitoring of equipment device connection distributed memory, Distributed memory connect data pre-processing unit, data pre-processing unit include spark internal memories computing engines, computing unit, The quality of data checks unit;
Data acquisition device, for obtaining facility information isomeric data in real time or quasi real time, and the facility information that will be collected is different Structure data are transferred to monitoring of equipment device;
Monitoring of equipment device, is stored for being collected facility information isomeric data, and being pushed to distributed memory, and In the way of data are flowed into, by monitoring of equipment device data output to data pre-processing unit process;
Distributed memory, also known as time series data memory, for by equipment real time mass isomeric data and data prediction The storage of the device data after unit.
Spark internal memory computing engines, for being calculated data by calling computing unit logic rules, and by after calculating Data output is to distributed memory;
Computing unit, calls and receives the data of distributed memory storage for driving scheduling rule engine, according in advance Programmed treatment logic is processed the data called and receive, and training forms data mining model;
Computing unit includes many sub- computing units, many sub- computing units according to the graphical dynamic configuration of practical business demand, Dynamic layout forms operation;Every sub- computing unit is individually present, and can independently extend evolution according to industry specialists experience, uses Distributed streaming computing engines are exported in real time after calculating the data called and receive, and output data to distribution Data storage;
The quality of data checks unit, and operation is formed for computing unit dynamic layout, and number is built based on Spark internal memories computing engines Unit is checked according to quality, the quality indicator rule of data integrity, normalization, uniformity, accuracy is converted into support figure Change the computing unit of parameter configuration and dynamic combined;
Data pre-processing unit, for by facility information isomeric data, checking unit according to the quality of data to carry out the complete of data Property, the pretreatment of normalization, uniformity, accuracy, while carry out standard data format, abnormal data is removed, error correcting, The removing of repeated data;And the data in multiple data sources are combined into unified storage;Data generalization and/or normalized fashion Convert the data into the process of the data mode suitable for data mining, and by pretreated data output to distributed storage Device.
2. device as claimed in claim 1, it is characterised in that:The data acquisition device is the number being installed in monitoring device According to collection sensor.
3. device as claimed in claim 1, it is characterised in that:The data acquisition device is red for monitoring device installation region Outer thread detector or temperature detector.
4. device as claimed in claim 1, it is characterised in that:Also include that what is be connected with monitoring of equipment device is manually entered dress Put, for because safety requirements implements quarantine measures or does not support data access in the case of Input Monitor Connector device data.
5. device as claimed in claim 1, it is characterised in that:The data pre-processing unit is additionally operable to call and receive distribution Monitoring of equipment device pushes the new time series data for producing in formula memory, and new time series data is repeated trains Journey, is updated to data mining model.
6. device as claimed in claim 1, it is characterised in that:The manual input device is notebook computer, panel computer Or mobile phone.
7. a kind of patterned quality of data based on big data technology using as described in above-mentioned any one of claim 1-6 is commented The evaluating method of the device of survey, it is characterised in that in turn include the following steps:
(1) initialize, the initial parameter of data acquisition device is set, according to the initial parameter control data harvester for setting Sampling period be per hour for 10 times, the sampling time is 7 days, and the data sampled in 7 days are averaged A;
(2) under the conditions of same initial parameter, real-time data collection, using 4 data of every continuous acquisition as one group of [B C D E], 4 data are designated as B, C, D, E respectively, calculate difference scores M respectively using formula, wherein:
N = 4 A ′ ( B + C + D + E )
A ' is a value in B, C, D, E in formula;
(3) if difference scores are in threshold range, then it is assumed that secondary gathered data group effectively, B, C, D, E averaging M makes P' is the real-time measured value of data acquisition device, then:
If A.Then data acquisition device stable performance, into step (4);
If B.Then data acquisition device unstable properties, then into step (1);
(4) facility information Monitoring Data is obtained in real time or quasi real time, and the facility information Monitoring Data that will be collected is transferred to and sets In standby monitoring device, in the way of data-pushing, being pushed in distributed memory (mainly includes account data and history number According to magnanimity isomeric data), or in the way of streaming is exported, by monitoring of equipment data output to data pre-processing unit process;
(5) in the way of batch is accessed, the conventional account in distributed memory is obtained automatically by predefined operation plan Data and historical data, the data integrity of data, normalization, consistent is carried out by equipment magnanimity isomeric data with preprocessing rule Property, the data pre-processing unit of accuracy, and pretreated data output to distributed memory is stored;
(6) in the way of streaming is accessed, the equipment magnanimity isomery in distributed memory is obtained by predefined system drive Data, carry out the data pre-processing unit of Data Data integrality, normalization, uniformity, accuracy under preprocessing rule, and Pretreated data output to distributed memory is stored, wherein the concrete mode of compliance evaluation index parameter For:
Relevant parameter is configured in uniformity operator, M is problem data item number, and to lack data item number, C is data set to Q Record number, P is metadata definition data item number,
The uniformity of data is:Wherein n is the number of data set;
(7) scheduling rule engine is driven to call and receive distributed storage by computing engines during data pre-processing unit The data of device storage, and the data called and receive are processed according to programmed treatment logic in advance, training is formed Data mining model, by by the data back after quality of data core inspection processing unit processes to distributed memory.
CN201710090356.2A 2017-02-20 2017-02-20 A kind of device and method of the patterned quality of data evaluation and test based on big data technology Pending CN106874483A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710090356.2A CN106874483A (en) 2017-02-20 2017-02-20 A kind of device and method of the patterned quality of data evaluation and test based on big data technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710090356.2A CN106874483A (en) 2017-02-20 2017-02-20 A kind of device and method of the patterned quality of data evaluation and test based on big data technology

Publications (1)

Publication Number Publication Date
CN106874483A true CN106874483A (en) 2017-06-20

Family

ID=59167167

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710090356.2A Pending CN106874483A (en) 2017-02-20 2017-02-20 A kind of device and method of the patterned quality of data evaluation and test based on big data technology

Country Status (1)

Country Link
CN (1) CN106874483A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107361396A (en) * 2017-07-10 2017-11-21 红云红河烟草(集团)有限责任公司 Tobacco based on big data dries the prediction of silk moisture and control system
CN107807972A (en) * 2017-10-19 2018-03-16 北京科技大学 A kind of test data consistency detecting method
CN108416067A (en) * 2018-03-29 2018-08-17 重庆大学 Mass data processing and the optimization of storing process execute evaluation method in industrial process
CN108564260A (en) * 2018-03-29 2018-09-21 重庆沐信润喆网络科技有限公司 Appraisal procedure for industrial process mass data processing and storage
CN109656917A (en) * 2018-12-18 2019-04-19 深圳前海微众银行股份有限公司 Data detection method, device, equipment and the readable storage medium storing program for executing of multi-data source
CN110007654A (en) * 2019-04-10 2019-07-12 华夏天信(北京)智能低碳技术研究院有限公司 A kind of production big data service system based on Red-Sensor sensor
CN110136789A (en) * 2019-05-14 2019-08-16 浪潮软件集团有限公司 A kind of data governance quality detection method based on electronic health record application
CN110442567A (en) * 2019-07-30 2019-11-12 郑州航管科技有限公司 A kind of data fusion method for airport automatic observing system
CN111313958A (en) * 2020-02-12 2020-06-19 深圳航天宏图信息技术有限公司 Satellite data quality inspection report generation system
CN111400299A (en) * 2020-06-04 2020-07-10 成都四方伟业软件股份有限公司 Method and system for testing fusion quality of multiple data
CN111400288A (en) * 2019-01-02 2020-07-10 中国移动通信有限公司研究院 Data quality inspection method and system
CN112784208A (en) * 2021-01-19 2021-05-11 深圳市紫衡技术有限公司 Building energy consumption calculation method and system, electronic equipment and storage medium
CN112925838A (en) * 2019-12-06 2021-06-08 阿里巴巴集团控股有限公司 Data processing method and device
CN111401064B (en) * 2019-01-02 2024-04-19 中国移动通信有限公司研究院 Named entity identification method and device and terminal equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103034928A (en) * 2012-12-11 2013-04-10 清华大学 Self-regulation dispersive plug-and-play data platform and management method and application
CN103247008A (en) * 2013-05-07 2013-08-14 国家电网公司 Quality evaluation method of electricity statistical index data
CN103414601A (en) * 2013-07-19 2013-11-27 广东电网公司电力调度控制中心 Method and system for detecting data for communication resource management system
CN103560910A (en) * 2013-10-25 2014-02-05 广东电网公司电力调度控制中心 Communication resource management system accuracy detection system dynamic configuration method and device
CN103617554A (en) * 2013-10-22 2014-03-05 芜湖大学科技园发展有限公司 Flexible drive system for grid data evaluation
CN105427193A (en) * 2015-12-17 2016-03-23 山东鲁能软件技术有限公司 Device and method for big data analysis based on distributed time sequence data service
CN105741196A (en) * 2016-03-01 2016-07-06 万达信息股份有限公司 Four-dimension-based data quality monitoring and evaluating method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103034928A (en) * 2012-12-11 2013-04-10 清华大学 Self-regulation dispersive plug-and-play data platform and management method and application
CN103247008A (en) * 2013-05-07 2013-08-14 国家电网公司 Quality evaluation method of electricity statistical index data
CN103414601A (en) * 2013-07-19 2013-11-27 广东电网公司电力调度控制中心 Method and system for detecting data for communication resource management system
CN103617554A (en) * 2013-10-22 2014-03-05 芜湖大学科技园发展有限公司 Flexible drive system for grid data evaluation
CN103560910A (en) * 2013-10-25 2014-02-05 广东电网公司电力调度控制中心 Communication resource management system accuracy detection system dynamic configuration method and device
CN105427193A (en) * 2015-12-17 2016-03-23 山东鲁能软件技术有限公司 Device and method for big data analysis based on distributed time sequence data service
CN105741196A (en) * 2016-03-01 2016-07-06 万达信息股份有限公司 Four-dimension-based data quality monitoring and evaluating method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
黄刚 等: "元数据驱动的数据质量评估体系架构研究", 《计算机工程与应用》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107361396A (en) * 2017-07-10 2017-11-21 红云红河烟草(集团)有限责任公司 Tobacco based on big data dries the prediction of silk moisture and control system
CN107807972A (en) * 2017-10-19 2018-03-16 北京科技大学 A kind of test data consistency detecting method
CN107807972B (en) * 2017-10-19 2020-12-22 北京科技大学 Test data consistency detection method
CN108416067A (en) * 2018-03-29 2018-08-17 重庆大学 Mass data processing and the optimization of storing process execute evaluation method in industrial process
CN108564260A (en) * 2018-03-29 2018-09-21 重庆沐信润喆网络科技有限公司 Appraisal procedure for industrial process mass data processing and storage
CN109656917A (en) * 2018-12-18 2019-04-19 深圳前海微众银行股份有限公司 Data detection method, device, equipment and the readable storage medium storing program for executing of multi-data source
CN111401064B (en) * 2019-01-02 2024-04-19 中国移动通信有限公司研究院 Named entity identification method and device and terminal equipment
CN111400288A (en) * 2019-01-02 2020-07-10 中国移动通信有限公司研究院 Data quality inspection method and system
CN110007654A (en) * 2019-04-10 2019-07-12 华夏天信(北京)智能低碳技术研究院有限公司 A kind of production big data service system based on Red-Sensor sensor
CN110136789A (en) * 2019-05-14 2019-08-16 浪潮软件集团有限公司 A kind of data governance quality detection method based on electronic health record application
CN110442567A (en) * 2019-07-30 2019-11-12 郑州航管科技有限公司 A kind of data fusion method for airport automatic observing system
CN112925838A (en) * 2019-12-06 2021-06-08 阿里巴巴集团控股有限公司 Data processing method and device
CN111313958A (en) * 2020-02-12 2020-06-19 深圳航天宏图信息技术有限公司 Satellite data quality inspection report generation system
CN111400299A (en) * 2020-06-04 2020-07-10 成都四方伟业软件股份有限公司 Method and system for testing fusion quality of multiple data
CN112784208A (en) * 2021-01-19 2021-05-11 深圳市紫衡技术有限公司 Building energy consumption calculation method and system, electronic equipment and storage medium
CN112784208B (en) * 2021-01-19 2023-08-15 深圳市紫衡技术有限公司 Building energy consumption calculation method, system, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN106874483A (en) A kind of device and method of the patterned quality of data evaluation and test based on big data technology
CN112348339A (en) Power distribution network planning method based on big data analysis
CN106934720A (en) Equipment insurance intelligent pricing method and system based on Internet of Things
CN109840157A (en) Method, apparatus, electronic equipment and the storage medium of fault diagnosis
CN107967485A (en) Electro-metering equipment fault analysis method and device
CN109784758B (en) Engineering quality supervision early warning system and method based on BIM model
CN106874482A (en) A kind of device and method of the patterned data prediction based on big data technology
CN107479540B (en) Method for diagnosing faults and system
US20230326010A1 (en) Defective picture generation method and apparatus applied to industrial quality inspection
US11704186B2 (en) Analysis of deep-level cause of fault of storage management
CN111680855A (en) Automatic risk detection and early warning method and system for whole process of project
CN110262975A (en) Test data management method, device, equipment and computer readable storage medium
CN109685275A (en) Dispense team's load pressure prediction technique, device, electronic equipment and storage medium
CN109492699A (en) Passway for transmitting electricity method for three-dimensional measurement and device
CN105868956A (en) Data processing method and device
CN111524026A (en) Power resource allocation method and device
CN111080484A (en) Method and device for monitoring abnormal data of power distribution network
CN103617447A (en) Evaluation system and method for intelligent substation
CN110287114B (en) Method and device for testing performance of database script
CN116822926A (en) Delay statistics and analysis method and device, electronic equipment and storage medium
CN107093018A (en) Communication engineering project information method for visualizing and device based on health model
CN104035342A (en) Real-time alarm intelligent aided analysis system and real-time alarm intelligent aided analysis method based on IFIX platform
CN114312930A (en) Train operation abnormity diagnosis method and device based on log data
CN115169578A (en) AI model production method and system based on meta-space data markers
CN112598142B (en) Wind turbine maintenance working quality inspection auxiliary method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170620