CN114154317A - Model and system for hydrological sensor data quality evaluation - Google Patents
Model and system for hydrological sensor data quality evaluation Download PDFInfo
- Publication number
- CN114154317A CN114154317A CN202111389462.3A CN202111389462A CN114154317A CN 114154317 A CN114154317 A CN 114154317A CN 202111389462 A CN202111389462 A CN 202111389462A CN 114154317 A CN114154317 A CN 114154317A
- Authority
- CN
- China
- Prior art keywords
- data
- evaluation
- index
- rule
- quality
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2111/00—Details relating to CAD techniques
- G06F2111/04—Constraint-based CAD
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Strategic Management (AREA)
- Educational Administration (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Marketing (AREA)
- Game Theory and Decision Science (AREA)
- Computer Hardware Design (AREA)
- Evolutionary Computation (AREA)
- Geometry (AREA)
- General Engineering & Computer Science (AREA)
- Testing Or Calibration Of Command Recording Devices (AREA)
Abstract
The invention provides a model and a system for evaluating data quality of a hydrological sensor, and belongs to the technical field of data processing. The model and the system can solve the problem of hydrological sensor data quality evaluation, and the normative and objectivity of the hydrological sensor data evaluation are enhanced. Wherein, hydrological sensor data quality evaluation model includes: a data set to be evaluated; evaluating the index set: selecting proper evaluation indexes, wherein the evaluation indexes comprise correctness, completeness, uniqueness and effectiveness; evaluating the rule set: selecting a number of evaluation rules for each index; and evaluating scores of the indexes under the evaluation rule. The hydrological sensor data quality evaluation system comprises: a data collection module: collecting hydrological sensor data and clearing invalid data; a data evaluation module: performing quality evaluation on the data set to be evaluated; an evaluation result display module: and displaying the summary information and the result after evaluation during data quality evaluation.
Description
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a model and a system for evaluating data quality of a hydrological sensor.
Background
With the advance of the technology of the internet of things, sensors are more and more widely applied to water resource management, such as monitoring conductivity, PH value, turbidity, ammonia, chloride and the like by using the sensors. However, sensor devices are often placed in harsh environments, and in addition to fouling problems, sensors can be subject to clogging, debris, ice, device failure, and the like. A device with a damaged function may cause uncertainty in the data it records, thereby reducing the trustworthiness of the data. High quality data is the premise and basis for full exploitation of its implications. Powerful data analysis technology and data analysis algorithm are important means for data performance. These powerful data analysis techniques and algorithms can only extract implicit, accurate and useful information in a high-quality data environment. Otherwise, it is useless information to extract in a big data environment full of error data. There is therefore a need to improve the data quality of sensors to support and optimize various applications and complex business decisions.
Data quality assessment is a source for solving data quality problems, and data quality assessment needs to be performed on sensor data in order to avoid false business decisions caused by dirty data. Most of the existing data quality evaluation technologies are directed to relational databases, and the evaluation technologies proposed for sensor data are not suitable for evaluating the hydrological sensor data, so that intensive research needs to be performed on the hydrological sensor data quality evaluation problem.
Disclosure of Invention
Technical problem to be solved
The invention aims to overcome the defects of the prior art and provide a model and a system for evaluating the data quality of hydrological sensor data, and high-quality data guarantee is provided for enterprises by providing a data quality evaluation model framework structure and constructing a data quality evaluation system.
(II) technical scheme
1. Data quality evaluation model provided by the solution
The invention provides a quality evaluation model for hydrological sensor data, wherein the model consists of four parts, namely a data set to be evaluated, an evaluation index set, an evaluation rule and a comprehensive score after evaluation. The meaning of each part is:
m (model): data quality evaluation models, i.e.
M=<D,I,R,G>
D (dataset): a hydrological sensor data set requiring evaluation.
I (indicator): and (3) selecting an index set when evaluating the data set D, wherein the index set comprises four indexes of correctness, completeness, uniqueness and effectiveness. For the same data set, multiple indexes can be selected for evaluation, and the meaning of each index is as follows:
accuracy: whether errors occur in the processes of acquisition, transmission, unloading and the like of the sensor data and whether the recorded data conform to objective facts.
Integrity: whether missing records or missing fields exist in the sensor data is described, and the completeness of the objective fact of the data records is reflected.
Uniqueness: describes the extent to which matching hydrologic data is uniquely present in the data source, i.e. whether duplicate records are present.
Effectiveness: describing whether the sensor data value is within a user-defined range of values.
R (rule): for each specific rule set for the index I, one index may contain a plurality of rules.
G (goal): and (4) evaluating the data set under each index and rule, and calculating the score according to a calculation formula. The size of the score reflects the quality of the evaluation data set, the larger the score is, the better the quality of the data representing the data set, and the smaller the score is, the worse the quality of the data representing the data set. The calculation formula of the comprehensive evaluation score of each index is as follows:
wherein G istAn evaluation score representing the t-th index; r represents the number of rules contained in the index i; n is the total record number of the data set; diRepresenting the ith record; fijA rule function representing the jth rule of the ith record, and if the record meets the current rule, FijIs 1, otherwise the value is 0.
In the quality evaluation of each data set, a plurality of indexes can be selected for evaluation, so that the comprehensive score of the data set quality index evaluation needs to be calculated according to the attention degree of a user to each index, and the calculation mode is as follows:
wherein C is the number of indexes selected when evaluating the data set, WjIs a weight of the index, andthe weight value is self-defined by the user and reflects the attention degree of the user to each index.
2. Data quality evaluation system provided by the solution
The invention provides a hydrological sensor data evaluation system which comprises a data collection module, a data evaluation module and an evaluation result display module, wherein the data collection module is used for collecting data;
a data collection module: collecting hydrological sensor data, and removing invalid data and then persistently storing the data in a database;
a data evaluation module: selecting different evaluation indexes, setting rules for each index, and evaluating the quality of the data set to be evaluated, wherein the method comprises the following specific steps of:
step 1: selecting a data set to be evaluated;
step 2: selecting proper indexes according to the index set, and determining an analysis task;
and step 3: determining a constraint rule for each index according to the rule set;
and 4, step 4: carrying out rule check on each rule and recording a check result;
and 5: and (4) performing quality evaluation on the data set according to the data quality evaluation model and the inspection result in the step 4, and recording the evaluation result.
An evaluation result display module: the module is used for displaying the evaluation result, including summary information, rule violation information, evaluation index information and evaluation score condition of the evaluation task.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) an effective data quality evaluation model is provided for hydrological sensor data;
(2) a data quality evaluation framework is provided, and the overall process of data quality evaluation is shown;
(3) the quality evaluation can be objectively carried out on the data set, and the subjectivity of the traditional data quality evaluation is avoided.
Drawings
FIG. 1 is a diagram of a data quality assessment model.
Fig. 2 is an overall architecture diagram of the data quality evaluation system.
Detailed Description
The embodiments of the present invention will be described more fully hereinafter with reference to the accompanying drawings, in which examples of the invention are shown. It should be understood that the following examples are illustrative of the present invention only and are not intended to limit the scope of the present invention. Various equivalent modifications of the invention, which will occur to those skilled in the art after reading the disclosure herein, are within the scope of the invention as defined by the appended claims.
Fig. 2 is a data quality evaluation system architecture proposed in the present invention, and the present example illustrates the present invention by using real sensor data of the river basin.
And the data collection module is used for placing the wireless sensor in a selected geographic position along the Chuzhou river. These sensor devices continuously transmit sensor measurements to the gateway module via the ZigBee protocol. The gateway module is responsible for collecting these measurements and transmitting the raw data collected to the industrial PC using a communication interface (WiFi). And then, processing the original data before storage by using a Flink platform according to the process from data cleaning to data transformation. The main work of the data cleaning stage is to delete repeated data at the same time by taking the data acquisition time of each sensor as a judgment basis and delete the data with the water level value lower than 0, so as to eliminate the influence of the repeated data and invalid data on the subsequent data quality evaluation result. The data conversion stage mainly works to convert the format of the sensor data into the format of the table field defined in the database, and then package each sensor data into a piece of standard data of the water level station to be stored in the database. The main process of data processing is as follows:
S1:Definition examples Sensor(id:Int,num:String,time:Long,level:Double,name:String)
S2:readTextfile(filePath)Obtain raw sensor data stream called in_streaming
S3:in_streaming.map(packaging input stream).fliter(filter redundant and erroneous data)called out_streaming
S4:Create HiveCatalog to connect to HIVE
S5:Load out_streaming to HIVE
S6:Submit task to Flink
S7:Record task results
the data evaluation module selects different evaluation indexes, sets evaluation rules for the indexes, and evaluates the quality of the data set to be evaluated, and the data evaluation module specifically comprises the following steps:
step 1: selecting sensor data of a station A for quality evaluation;
step 2: selecting 3 indexes of correctness, completeness and effectiveness to carry out quality evaluation on the sensor data of the site A;
and step 3: determining a constraint rule according to the rule set, and adding a value domain constraint rule on a 'level' field: the value of the level is in the range of [32.3, 51.5 ]; and adding a lexical constraint rule to the ID field: the length of the ID field must be 9; adding an equivalent consistency constraint rule to the 'name' field: the value of the name field must be 2BDB 7C;
and 4, step 4: carrying out rule check on each rule of each index, and recording a check result;
and 5: and (4) performing quality evaluation on the data set according to the data quality evaluation model and the inspection result in the step 4, and recording the evaluation result.
And obtaining a quality evaluation model of the sensor data of the site A according to the data quality evaluation model and the recorded detection result. Table 1 is a quality assessment model of site a sensor data.
TABLE 1 quality assessment model for site A sensor data
And the evaluation result display module is used for recording the summary information, the data information violating the rules, the evaluation index information and the evaluation score condition of the evaluation task in the data evaluation module and displaying the evaluation result information, the evaluation index information and the evaluation score condition to a user.
Finally, it should be noted that: the above examples are intended only to illustrate the technical process of the invention, and not to limit it; although the invention has been described in detail with reference to the foregoing examples, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing examples can be modified, or some technical features can be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions.
Claims (2)
1. A model for hydrological sensor data quality assessment, comprising:
m (model): data quality evaluation models, i.e.
M=<D,I,R,G>
D (dataset): a hydrological sensor data set requiring evaluation.
I (indicator): the method comprises the following steps of selecting an index set when evaluating a data set D, wherein the index set comprises four indexes of correctness, completeness, uniqueness and effectiveness, aiming at the same data set, selecting a plurality of indexes for evaluation, and the meaning of each index is as follows:
accuracy: whether errors occur in the processes of acquisition, transmission, unloading and the like of the sensor data and whether the recorded data conform to objective facts.
Integrity: whether missing records or missing fields exist in the sensor data is described, and the completeness of the objective fact of the data records is reflected.
Uniqueness: describes the extent to which matching hydrologic data is uniquely present in the data source, i.e. whether duplicate records are present.
Effectiveness: describing whether the sensor data value is within a user-defined range of values.
R (rule): for each specific rule set for the index I, one index may contain a plurality of rules.
G (goal): after the data set is evaluated under each index and rule, the score is calculated according to a calculation formula, the score reflects the quality of the evaluation data set, and the calculation formula of the comprehensive evaluation score of each index is as follows:
wherein G istAn evaluation score representing the t-th index; r represents the number of rules contained in the index i; n is the total record number of the data set; diRepresenting the ith record; fijA rule function representing the jth rule of the ith record, and if the record satisfies the rule, FijIs 1, otherwise the value is 0.
A plurality of indexes can be selected for evaluating in the quality evaluation of each data set, so that the comprehensive score of the data set quality index evaluation needs to be calculated according to the attention degree of a user to each index, and the calculation mode is as follows:
2. A system for hydrological sensor data quality assessment, comprising:
the data collection module is used for collecting hydrological sensor data and eliminating invalid data;
the data evaluation module selects different evaluation indexes, sets rules for the indexes, and evaluates the quality of the data set to be evaluated after the rules are set, and the data evaluation module specifically comprises the following steps:
step 1: selecting a data set to be evaluated;
step 2: selecting proper indexes according to the index set, and determining an analysis task;
and step 3: determining a constraint rule for each index according to the rule set;
and 4, step 4: carrying out rule check on each rule and recording a check result;
and 5: and (4) performing quality evaluation on the data set according to the data quality evaluation model and the inspection result of the step 4, and recording the evaluation result.
And the evaluation result display module is used for displaying the evaluation result, and comprises summary information, rule violation information, evaluation index information and evaluation score condition of the evaluation task.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111389462.3A CN114154317A (en) | 2021-11-22 | 2021-11-22 | Model and system for hydrological sensor data quality evaluation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111389462.3A CN114154317A (en) | 2021-11-22 | 2021-11-22 | Model and system for hydrological sensor data quality evaluation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114154317A true CN114154317A (en) | 2022-03-08 |
Family
ID=80457183
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111389462.3A Pending CN114154317A (en) | 2021-11-22 | 2021-11-22 | Model and system for hydrological sensor data quality evaluation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114154317A (en) |
-
2021
- 2021-11-22 CN CN202111389462.3A patent/CN114154317A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111143438B (en) | Workshop field data real-time monitoring and anomaly detection method based on stream processing | |
CN105241524A (en) | Urban flood warning system and method based on radial basic function neural network model | |
CN117034143B (en) | Distributed system fault diagnosis method and device based on machine learning | |
CN118280579B (en) | Sepsis patient condition assessment method and system based on multi-mode data fusion | |
CN113254517A (en) | Service providing method based on internet big data | |
CN118037469A (en) | Financial management system based on big data | |
CN108414016A (en) | A kind of sewage network monitoring system based on big data technology | |
CN118211882A (en) | Product quality management system and method based on big data | |
CN110336860A (en) | Key node data guard method based on multidimensional data processing in industrial Internet of Things | |
CN114154317A (en) | Model and system for hydrological sensor data quality evaluation | |
Ma et al. | A systematic data characteristic understanding framework towards physical-sensor big data challenges | |
CN116152018A (en) | High and new technology enterprise patent intellectual property project feasibility pre-evaluation system | |
CN106778252A (en) | Intrusion detection method based on rough set theory Yu WAODE algorithms | |
CN117272362A (en) | Block chain-based carbon emission perfect management method and system | |
Shao et al. | Data-model-linked remaining useful life prediction method with small sample data: A case of subsea valve | |
CN111859783B (en) | Water pressure prediction method, system, storage medium, equipment and urban water supply system | |
CN115171912A (en) | Epidemic propagation chain analysis method and device and computer medium | |
CN114579647A (en) | Fusion model for ecological monitoring data of multi-source heterogeneous wetland | |
CN114254928A (en) | Industrial Internet platform development index monitoring method and system | |
CN116308436B (en) | Block chain-based energy consumption and carbon emission data acquisition method and system | |
CN112784203B (en) | Method for calculating possible maximum flood for ultra-small watershed | |
CN116737797B (en) | Bridge online health monitoring system based on high-performance time sequence database | |
CN118211168B (en) | Water business checking and collecting list management system and method | |
CN114139995B (en) | Test area monitoring and evaluating method and device, electronic equipment and storage medium | |
CN116701729A (en) | Network public opinion detection system and detection method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |