CN110955650B - Method for cleaning out-of-tolerance data of digital hygrothermograph in standard laboratory - Google Patents
Method for cleaning out-of-tolerance data of digital hygrothermograph in standard laboratory Download PDFInfo
- Publication number
- CN110955650B CN110955650B CN201911142890.9A CN201911142890A CN110955650B CN 110955650 B CN110955650 B CN 110955650B CN 201911142890 A CN201911142890 A CN 201911142890A CN 110955650 B CN110955650 B CN 110955650B
- Authority
- CN
- China
- Prior art keywords
- data
- abnormal
- humidity
- temperature
- data set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Testing Resistance To Weather, Investigating Materials By Mechanical Methods (AREA)
Abstract
The application discloses a cleaning method of standard laboratory digital wet thermometer out-of-tolerance data, which utilizes a clustering analysis method to cluster a plurality of data of all data sources to obtain a temperature data set and a humidity data set; carrying out linear analysis on the temperature data in the temperature data set and the humidity data in the humidity data set by using a trend analysis method to obtain pre-abnormal temperature data and pre-abnormal humidity data; identifying a temperature data set and the humidity data set by using a box graph algorithm to obtain abnormal data, and forming the abnormal data into an abnormal data set; and when the pre-abnormal temperature data or the pre-abnormal humidity data are data in the abnormal data set, cleaning the pre-abnormal temperature data or the pre-abnormal humidity data. In the method, the abnormal data are identified through the two methods, when the data identified through the two methods are abnormal, the data are identified as the stain data, so that the personnel loss can be reduced, a large amount of manpower is saved, and the accuracy of identifying the stain data is improved due to objectivity of the testing method.
Description
Technical Field
The application relates to the technical field of data processing, in particular to a method for cleaning out-of-tolerance data of a digital wet thermometer in a standard laboratory.
Background
Along with the use of various high-precision metering sensors in metering standard equipment, a large number of metering standard equipment of an electric power system has higher and higher requirements on environmental temperature and humidity of a laboratory, and in order to facilitate accurate and effective control and acquisition of the temperature and humidity of the laboratory, a large number of digital hygrothermographs are installed in the current method. The temperature and humidity sensor of the meter is utilized to monitor the temperature and humidity of the whole laboratory, and the laboratory air conditioner is controlled through the data, so that the environment is ensured to meet the test requirement. How to ensure that the amount of excess (stain data) in these data does not affect the overall laboratory temperature humidity control becomes a difficulty in laboratory temperature humidity control. According to the principle of 'garbage in and garbage out', the existence of out-of-tolerance data can cause errors in control of an air conditioner, and therefore environmental temperature and humidity are affected.
Therefore, a large amount of digital hygrothermograph data in a laboratory needs to be analyzed, and out-of-tolerance data, i.e., stain data, is cleaned. In conventional spot data cleaning operations, it is still a major concern to manually process different databases or to use some simple data extraction, transmission, loading application or tool. Not only does such a method consume a great deal of effort, but also the error rate of cleaning the temperature and humidity data is increased due to too many uncontrollable factors.
Disclosure of Invention
The application provides a cleaning method for out-of-tolerance data of a digital wet thermometer in a standard laboratory, which aims to solve the technical problems that the traditional cleaning method for stain data is more in time consumption and low in accuracy of identifying the stain data.
In order to solve the above problems, the present application provides the following technical solutions:
the method for cleaning out-of-tolerance data of the digital wet thermometer in the standard laboratory comprises the following steps: clustering a plurality of data of all data sources by using a clustering analysis method to obtain a temperature data set and a humidity data set; carrying out linear analysis on temperature data in the temperature data set and humidity data in the humidity data set by using a trend analysis method to obtain pre-abnormal temperature data and pre-abnormal humidity data, wherein the pre-abnormal temperature data is temperature data on a deviation temperature linear curve, and the pre-abnormal humidity data is humidity data on a deviation humidity linear curve; identifying a temperature data set and the humidity data set by using a box graph algorithm to obtain abnormal data, and forming the abnormal data into an abnormal data set, wherein the abnormal data comprises abnormal temperature data and abnormal humidity data; and when the pre-abnormal temperature data or the pre-abnormal humidity data are data in the abnormal data set, cleaning the pre-abnormal temperature data or the pre-abnormal humidity data.
Optionally, identifying the temperature dataset and the humidity dataset using a box-plot algorithm to obtain the anomaly data includes: analyzing the temperature data of the temperature dataset and the humidity data of the humidity dataset by using a box graph algorithm to obtain preliminary abnormal data; judging whether the preliminary abnormal data exceeds an out-of-tolerance threshold value; if yes, the preliminary abnormal data are abnormal data; if not, the preliminary abnormal data is not abnormal data.
Optionally, analyzing the temperature data of the temperature dataset and the humidity data of the humidity dataset by using a box-plot algorithm to obtain preliminary anomaly data, including: calculating a median, a 25% quantile, a 75% quantile, an upper boundary and a lower boundary of temperature data of the temperature dataset and humidity data of the humidity dataset; when the temperature data or the humidity data is above 75% quantile or below 25% quantile, the temperature data or the humidity data are all preliminary abnormal data.
Optionally, the out-of-tolerance threshold is an upper boundary or a lower boundary.
Optionally, the calculation formula of the upper boundary is as follows:
UpperLimit=Q 3 +kIQR (1)
wherein Q is 3 Is the upper quartile, i.e., 75% quartile; k represents an empirical coefficient; IQR represents the up-down quartile range.
Alternatively, the calculation formula of the lower boundary is as follows:
LowerLimit=Q 1 -kIQR (2)
wherein Q is 1 Lower quartile, 25% quantile; k represents an empirical coefficient; IQR represents the up-down quartile range.
The beneficial effects are that: the application provides a cleaning method of standard laboratory digital wet thermometer out-of-tolerance data, firstly, clustering a plurality of data of all data sources by using a cluster analysis method to obtain a temperature data set and a humidity data set. And secondly, linearly analyzing the temperature data in the temperature data set and the humidity data in the humidity data set by using a trend analysis method to obtain pre-abnormal temperature data and pre-abnormal humidity data. And identifying the temperature data set and the humidity data set by using a box graph algorithm to obtain abnormal data, and forming the abnormal data into an abnormal data set. And finally, when the pre-abnormal temperature data or the pre-abnormal humidity data are data in the abnormal data set, cleaning the pre-abnormal temperature data or the pre-abnormal humidity data. In the method, data in a temperature data set and data in a humidity data set are respectively identified by two methods, and when pre-abnormal data identified by a trend analysis method is located in abnormal data identified by a box-shaped graph algorithm, the pre-abnormal data is stain data, and stain data is cleaned. In the method, the abnormal data are identified through the two methods, when the data identified through the two methods are abnormal, the data are identified as the stain data, so that the personnel loss can be reduced, a large amount of manpower is saved, and the accuracy of identifying the stain data is improved due to objectivity of the testing method.
Drawings
In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a flow chart of a method for cleaning standard laboratory digital wet thermometer out of tolerance data provided by the application;
fig. 2 is a flowchart of a method for acquiring abnormal data provided by the application.
Detailed Description
Referring to fig. 1, which is a flowchart of a method for cleaning standard laboratory digital wet thermometer out-of-tolerance data provided in the present application, it can be seen that the present application provides a method for cleaning standard laboratory digital wet thermometer out-of-tolerance data, the cleaning method includes:
s01: and clustering a plurality of data of all the data sources by using a cluster analysis method to obtain a temperature data set and a humidity data set.
S02: and linearly analyzing the temperature data in the temperature data set and the humidity data in the humidity data set by using a trend analysis method to obtain pre-abnormal temperature data and pre-abnormal humidity data.
The pre-abnormal temperature data is temperature data on a deviation temperature linear curve, and the pre-abnormal humidity data is humidity data on a deviation humidity linear curve.
S03: and identifying the temperature data set and the humidity data set by using a box graph algorithm to obtain abnormal data, and forming the abnormal data into an abnormal data set.
The abnormal data includes abnormal temperature data and abnormal humidity data.
Referring to fig. 2, a flowchart of a method for obtaining abnormal data provided for application can be known, and the specific process is as follows:
s031: and analyzing the temperature data of the temperature data set and the humidity data of the humidity data set by using a box graph algorithm to obtain preliminary abnormal data.
S0311: the median, 25% quantile, 75% quantile, upper and lower boundaries of the temperature data of the temperature dataset and the humidity data of the humidity dataset are calculated.
S0312: when the temperature data or the humidity data is above 75% quantile or below 25% quantile, the temperature data or the humidity data are all preliminary abnormal data.
When the temperature data or the humidity data is between 25% quantiles and 75% quantiles, the temperature data or the humidity data are both normal data.
S032: and judging whether the preliminary abnormal data exceeds an out-of-tolerance threshold value.
The out-of-tolerance threshold is either an upper or lower boundary.
And judging whether the preliminary abnormal data exceeds an upper boundary or not.
The calculation formula of the upper boundary is as follows:
UpperLimit=Q 3 +kIQR (1)
wherein Q is 3 Is the upper quartile, i.e., 75% quartile; IQR represents the upper and lower quartile range, k represents the empirical coefficient, and is a constant.
The calculation formula of the lower boundary is as follows:
LowerLimit=Q 1 -kIQR (2)
wherein Q is 1 Lower quartile, 25% quantile; IQR represents the upper and lower quartile range, k represents the empirical coefficient, and is a constant.
S033: if yes, the preliminary abnormal data is abnormal data.
S034: if not, the preliminary abnormal data is non-abnormal data.
S04: and when the pre-abnormal temperature data or the pre-abnormal humidity data are data in the abnormal data set, cleaning the pre-abnormal temperature data or the pre-abnormal humidity data.
The application provides a cleaning method of standard laboratory digital wet thermometer out-of-tolerance data, firstly, clustering a plurality of data of all data sources by using a cluster analysis method to obtain a temperature data set and a humidity data set. And secondly, linearly analyzing the temperature data in the temperature data set and the humidity data in the humidity data set by using a trend analysis method to obtain pre-abnormal temperature data and pre-abnormal humidity data. And identifying the temperature data set and the humidity data set by using a box graph algorithm to obtain abnormal data, and forming the abnormal data into an abnormal data set. And finally, when the pre-abnormal temperature data or the pre-abnormal humidity data are data in the abnormal data set, cleaning the pre-abnormal temperature data or the pre-abnormal humidity data. In the method, data in a temperature data set and data in a humidity data set are respectively identified by two methods, and when pre-abnormal data identified by a trend analysis method is located in abnormal data identified by a box-shaped graph algorithm, the pre-abnormal data is stain data, and stain data is cleaned. In the method, the abnormal data are identified through the two methods, when the data identified through the two methods are abnormal, the data are identified as the stain data, so that the personnel loss can be reduced, a large amount of manpower is saved, and the accuracy of identifying the stain data is improved due to objectivity of the testing method.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
The above-described embodiments of the present application are not intended to limit the scope of the present application.
Claims (5)
1. The method for cleaning out-of-tolerance data of the digital wet thermometer in the standard laboratory is characterized by comprising the following steps of:
clustering a plurality of data of all data sources by using a clustering analysis method to obtain a temperature data set and a humidity data set;
carrying out linear analysis on the temperature data in the temperature data set and the humidity data in the humidity data set by using a trend analysis method to obtain pre-abnormal temperature data and pre-abnormal humidity data, wherein the pre-abnormal temperature data is temperature data on a deviation temperature linear curve, and the pre-abnormal humidity data is humidity data on a deviation humidity linear curve;
identifying the temperature data set and the humidity data set by using a box graph algorithm to obtain abnormal data, and forming the abnormal data into an abnormal data set, wherein the abnormal data comprises abnormal temperature data and abnormal humidity data;
when the pre-abnormal temperature data or the pre-abnormal humidity data are data in the abnormal data set, cleaning the pre-abnormal temperature data or the pre-abnormal humidity data;
identifying the temperature dataset and the humidity dataset by using a box graph algorithm to obtain abnormal data, wherein the abnormal data comprises the following steps:
analyzing the temperature data of the temperature dataset and the humidity data of the humidity dataset by using a box graph algorithm to obtain preliminary abnormal data;
judging whether the preliminary abnormal data exceeds an out-of-tolerance threshold value or not;
if yes, the preliminary abnormal data are abnormal data; if not, the preliminary abnormal data is not abnormal data.
2. The cleaning method according to claim 1, wherein analyzing the temperature data of the temperature dataset and the humidity data of the humidity dataset by using a box-plot algorithm to obtain preliminary anomaly data comprises:
calculating a median, 25% quantile, 75% quantile, upper and lower boundaries of temperature data of the temperature dataset and humidity data of the humidity dataset;
when the temperature data or the humidity data is above 75% quantile or below 25% quantile, the temperature data or the humidity data are both preliminary abnormal data.
3. The cleaning method of claim 1, wherein the out-of-tolerance threshold is an upper or lower boundary.
4. A cleaning method according to claim 3, wherein the upper boundary is calculated as:
UpperLimit=Q 3 +kIQR (1)
wherein Q is 3 Is the upper quartile, i.e., 75% quartile; k represents an empirical coefficient; IQR represents the up-down quartile range.
5. A cleaning method according to claim 3, wherein the lower boundary is calculated as:
LowerLimit=Q 1 -kIQR (2)
wherein Q is 1 Lower quartile, 25% quantile; k represents an empirical coefficient; IQR represents the up-down quartile range.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911142890.9A CN110955650B (en) | 2019-11-20 | 2019-11-20 | Method for cleaning out-of-tolerance data of digital hygrothermograph in standard laboratory |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911142890.9A CN110955650B (en) | 2019-11-20 | 2019-11-20 | Method for cleaning out-of-tolerance data of digital hygrothermograph in standard laboratory |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110955650A CN110955650A (en) | 2020-04-03 |
CN110955650B true CN110955650B (en) | 2023-06-23 |
Family
ID=69978026
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911142890.9A Active CN110955650B (en) | 2019-11-20 | 2019-11-20 | Method for cleaning out-of-tolerance data of digital hygrothermograph in standard laboratory |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110955650B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106097138A (en) * | 2016-06-03 | 2016-11-09 | 合肥工业大学 | A kind of electricity consumption anomaly data detection System and method for based on statistical model |
CN108412710A (en) * | 2018-01-30 | 2018-08-17 | 同济大学 | A kind of Wind turbines wind power data cleaning method |
CN108830510A (en) * | 2018-07-16 | 2018-11-16 | 国网上海市电力公司 | A kind of electric power data preprocess method based on mathematical statistics |
CN109918364A (en) * | 2019-02-28 | 2019-06-21 | 华北电力大学 | A kind of data cleaning method based on two-dimensional probability density estimation and quartile method |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7904279B2 (en) * | 2004-04-02 | 2011-03-08 | Test Advantage, Inc. | Methods and apparatus for data analysis |
CN106204335A (en) * | 2016-07-21 | 2016-12-07 | 广东工业大学 | A kind of electricity price performs abnormality judgment method, Apparatus and system |
US10528533B2 (en) * | 2017-02-09 | 2020-01-07 | Adobe Inc. | Anomaly detection at coarser granularity of data |
EP3364157A1 (en) * | 2017-02-16 | 2018-08-22 | Fundación Tecnalia Research & Innovation | Method and system of outlier detection in energy metering data |
CN109766331A (en) * | 2018-12-06 | 2019-05-17 | 中科恒运股份有限公司 | Method for processing abnormal data and device |
-
2019
- 2019-11-20 CN CN201911142890.9A patent/CN110955650B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106097138A (en) * | 2016-06-03 | 2016-11-09 | 合肥工业大学 | A kind of electricity consumption anomaly data detection System and method for based on statistical model |
CN108412710A (en) * | 2018-01-30 | 2018-08-17 | 同济大学 | A kind of Wind turbines wind power data cleaning method |
CN108830510A (en) * | 2018-07-16 | 2018-11-16 | 国网上海市电力公司 | A kind of electric power data preprocess method based on mathematical statistics |
CN109918364A (en) * | 2019-02-28 | 2019-06-21 | 华北电力大学 | A kind of data cleaning method based on two-dimensional probability density estimation and quartile method |
Non-Patent Citations (1)
Title |
---|
杨政 ; .基于模型驱动的数据清洗组件研究.云南电力技术.2017,(第06期),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN110955650A (en) | 2020-04-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111046564B (en) | Residual life prediction method for two-stage degraded product | |
TWI539298B (en) | Metrology sampling method with sampling rate decision scheme and computer program product thereof | |
CN112800616B (en) | Equipment residual life self-adaptive prediction method based on proportional acceleration degradation modeling | |
CN106844901B (en) | Structural part residual strength evaluation method based on multi-factor fusion correction | |
CN116520236B (en) | Abnormality detection method and system for intelligent ammeter | |
CN110738346A (en) | batch electric energy meter reliability prediction method based on Weibull distribution | |
CN106950507A (en) | A kind of intelligent clock battery high reliability lifetime estimation method | |
CN110955650B (en) | Method for cleaning out-of-tolerance data of digital hygrothermograph in standard laboratory | |
CN114691661B (en) | Assimilation-based cloud air guide and temperature and humidity profile pretreatment analysis method and system | |
US7043970B2 (en) | Method for monitoring wood-drying kiln state | |
CN108169565B (en) | Nonlinear temperature compensation method for conductivity measurement | |
CN116503025B (en) | Business work order flow processing method based on workflow engine | |
CN107843215B (en) | Based on the roughness value fractal evaluation model building method under optional sampling spacing condition | |
CN104267270B (en) | Transformer key parameters extracting method based on vector similitude | |
CN113934536A (en) | Data acquisition method facing edge calculation | |
CN115598309B (en) | Method and system for monitoring and early warning of lead content in atmospheric environment | |
CN108109675B (en) | Laboratory quality control data management system | |
CN116029165A (en) | Power cable reliability analysis method and system considering lightning influence | |
CN108124442B (en) | Elevator element parameter calibration method, device, equipment and storage medium | |
CN113378309B (en) | Rolling bearing health state online monitoring and residual life prediction method | |
CN112287302B (en) | Method for detecting pH value of oil, computing equipment and computer storage medium | |
CN112685912A (en) | Multivariate generalized Wiener process performance degradation reliability analysis method | |
CN108459948B (en) | Method for determining failure data distribution type in system reliability evaluation | |
CN108108864B (en) | Laboratory quality control data management method | |
CN109307524B (en) | Sensor measurement data spot detection and repair technology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |