CN110955650B - Method for cleaning out-of-tolerance data of digital hygrothermograph in standard laboratory - Google Patents

Method for cleaning out-of-tolerance data of digital hygrothermograph in standard laboratory Download PDF

Info

Publication number
CN110955650B
CN110955650B CN201911142890.9A CN201911142890A CN110955650B CN 110955650 B CN110955650 B CN 110955650B CN 201911142890 A CN201911142890 A CN 201911142890A CN 110955650 B CN110955650 B CN 110955650B
Authority
CN
China
Prior art keywords
data
abnormal
humidity
temperature
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911142890.9A
Other languages
Chinese (zh)
Other versions
CN110955650A (en
Inventor
唐标
李博
于辉
王恩
朱梦梦
朱全聪
林中爱
杨明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electric Power Research Institute of Yunnan Power Grid Co Ltd
Original Assignee
Electric Power Research Institute of Yunnan Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Power Research Institute of Yunnan Power Grid Co Ltd filed Critical Electric Power Research Institute of Yunnan Power Grid Co Ltd
Priority to CN201911142890.9A priority Critical patent/CN110955650B/en
Publication of CN110955650A publication Critical patent/CN110955650A/en
Application granted granted Critical
Publication of CN110955650B publication Critical patent/CN110955650B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Testing Resistance To Weather, Investigating Materials By Mechanical Methods (AREA)

Abstract

The application discloses a cleaning method of standard laboratory digital wet thermometer out-of-tolerance data, which utilizes a clustering analysis method to cluster a plurality of data of all data sources to obtain a temperature data set and a humidity data set; carrying out linear analysis on the temperature data in the temperature data set and the humidity data in the humidity data set by using a trend analysis method to obtain pre-abnormal temperature data and pre-abnormal humidity data; identifying a temperature data set and the humidity data set by using a box graph algorithm to obtain abnormal data, and forming the abnormal data into an abnormal data set; and when the pre-abnormal temperature data or the pre-abnormal humidity data are data in the abnormal data set, cleaning the pre-abnormal temperature data or the pre-abnormal humidity data. In the method, the abnormal data are identified through the two methods, when the data identified through the two methods are abnormal, the data are identified as the stain data, so that the personnel loss can be reduced, a large amount of manpower is saved, and the accuracy of identifying the stain data is improved due to objectivity of the testing method.

Description

Method for cleaning out-of-tolerance data of digital hygrothermograph in standard laboratory
Technical Field
The application relates to the technical field of data processing, in particular to a method for cleaning out-of-tolerance data of a digital wet thermometer in a standard laboratory.
Background
Along with the use of various high-precision metering sensors in metering standard equipment, a large number of metering standard equipment of an electric power system has higher and higher requirements on environmental temperature and humidity of a laboratory, and in order to facilitate accurate and effective control and acquisition of the temperature and humidity of the laboratory, a large number of digital hygrothermographs are installed in the current method. The temperature and humidity sensor of the meter is utilized to monitor the temperature and humidity of the whole laboratory, and the laboratory air conditioner is controlled through the data, so that the environment is ensured to meet the test requirement. How to ensure that the amount of excess (stain data) in these data does not affect the overall laboratory temperature humidity control becomes a difficulty in laboratory temperature humidity control. According to the principle of 'garbage in and garbage out', the existence of out-of-tolerance data can cause errors in control of an air conditioner, and therefore environmental temperature and humidity are affected.
Therefore, a large amount of digital hygrothermograph data in a laboratory needs to be analyzed, and out-of-tolerance data, i.e., stain data, is cleaned. In conventional spot data cleaning operations, it is still a major concern to manually process different databases or to use some simple data extraction, transmission, loading application or tool. Not only does such a method consume a great deal of effort, but also the error rate of cleaning the temperature and humidity data is increased due to too many uncontrollable factors.
Disclosure of Invention
The application provides a cleaning method for out-of-tolerance data of a digital wet thermometer in a standard laboratory, which aims to solve the technical problems that the traditional cleaning method for stain data is more in time consumption and low in accuracy of identifying the stain data.
In order to solve the above problems, the present application provides the following technical solutions:
the method for cleaning out-of-tolerance data of the digital wet thermometer in the standard laboratory comprises the following steps: clustering a plurality of data of all data sources by using a clustering analysis method to obtain a temperature data set and a humidity data set; carrying out linear analysis on temperature data in the temperature data set and humidity data in the humidity data set by using a trend analysis method to obtain pre-abnormal temperature data and pre-abnormal humidity data, wherein the pre-abnormal temperature data is temperature data on a deviation temperature linear curve, and the pre-abnormal humidity data is humidity data on a deviation humidity linear curve; identifying a temperature data set and the humidity data set by using a box graph algorithm to obtain abnormal data, and forming the abnormal data into an abnormal data set, wherein the abnormal data comprises abnormal temperature data and abnormal humidity data; and when the pre-abnormal temperature data or the pre-abnormal humidity data are data in the abnormal data set, cleaning the pre-abnormal temperature data or the pre-abnormal humidity data.
Optionally, identifying the temperature dataset and the humidity dataset using a box-plot algorithm to obtain the anomaly data includes: analyzing the temperature data of the temperature dataset and the humidity data of the humidity dataset by using a box graph algorithm to obtain preliminary abnormal data; judging whether the preliminary abnormal data exceeds an out-of-tolerance threshold value; if yes, the preliminary abnormal data are abnormal data; if not, the preliminary abnormal data is not abnormal data.
Optionally, analyzing the temperature data of the temperature dataset and the humidity data of the humidity dataset by using a box-plot algorithm to obtain preliminary anomaly data, including: calculating a median, a 25% quantile, a 75% quantile, an upper boundary and a lower boundary of temperature data of the temperature dataset and humidity data of the humidity dataset; when the temperature data or the humidity data is above 75% quantile or below 25% quantile, the temperature data or the humidity data are all preliminary abnormal data.
Optionally, the out-of-tolerance threshold is an upper boundary or a lower boundary.
Optionally, the calculation formula of the upper boundary is as follows:
UpperLimit=Q 3 +kIQR (1)
wherein Q is 3 Is the upper quartile, i.e., 75% quartile; k represents an empirical coefficient; IQR represents the up-down quartile range.
Alternatively, the calculation formula of the lower boundary is as follows:
LowerLimit=Q 1 -kIQR (2)
wherein Q is 1 Lower quartile, 25% quantile; k represents an empirical coefficient; IQR represents the up-down quartile range.
The beneficial effects are that: the application provides a cleaning method of standard laboratory digital wet thermometer out-of-tolerance data, firstly, clustering a plurality of data of all data sources by using a cluster analysis method to obtain a temperature data set and a humidity data set. And secondly, linearly analyzing the temperature data in the temperature data set and the humidity data in the humidity data set by using a trend analysis method to obtain pre-abnormal temperature data and pre-abnormal humidity data. And identifying the temperature data set and the humidity data set by using a box graph algorithm to obtain abnormal data, and forming the abnormal data into an abnormal data set. And finally, when the pre-abnormal temperature data or the pre-abnormal humidity data are data in the abnormal data set, cleaning the pre-abnormal temperature data or the pre-abnormal humidity data. In the method, data in a temperature data set and data in a humidity data set are respectively identified by two methods, and when pre-abnormal data identified by a trend analysis method is located in abnormal data identified by a box-shaped graph algorithm, the pre-abnormal data is stain data, and stain data is cleaned. In the method, the abnormal data are identified through the two methods, when the data identified through the two methods are abnormal, the data are identified as the stain data, so that the personnel loss can be reduced, a large amount of manpower is saved, and the accuracy of identifying the stain data is improved due to objectivity of the testing method.
Drawings
In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a flow chart of a method for cleaning standard laboratory digital wet thermometer out of tolerance data provided by the application;
fig. 2 is a flowchart of a method for acquiring abnormal data provided by the application.
Detailed Description
Referring to fig. 1, which is a flowchart of a method for cleaning standard laboratory digital wet thermometer out-of-tolerance data provided in the present application, it can be seen that the present application provides a method for cleaning standard laboratory digital wet thermometer out-of-tolerance data, the cleaning method includes:
s01: and clustering a plurality of data of all the data sources by using a cluster analysis method to obtain a temperature data set and a humidity data set.
S02: and linearly analyzing the temperature data in the temperature data set and the humidity data in the humidity data set by using a trend analysis method to obtain pre-abnormal temperature data and pre-abnormal humidity data.
The pre-abnormal temperature data is temperature data on a deviation temperature linear curve, and the pre-abnormal humidity data is humidity data on a deviation humidity linear curve.
S03: and identifying the temperature data set and the humidity data set by using a box graph algorithm to obtain abnormal data, and forming the abnormal data into an abnormal data set.
The abnormal data includes abnormal temperature data and abnormal humidity data.
Referring to fig. 2, a flowchart of a method for obtaining abnormal data provided for application can be known, and the specific process is as follows:
s031: and analyzing the temperature data of the temperature data set and the humidity data of the humidity data set by using a box graph algorithm to obtain preliminary abnormal data.
S0311: the median, 25% quantile, 75% quantile, upper and lower boundaries of the temperature data of the temperature dataset and the humidity data of the humidity dataset are calculated.
S0312: when the temperature data or the humidity data is above 75% quantile or below 25% quantile, the temperature data or the humidity data are all preliminary abnormal data.
When the temperature data or the humidity data is between 25% quantiles and 75% quantiles, the temperature data or the humidity data are both normal data.
S032: and judging whether the preliminary abnormal data exceeds an out-of-tolerance threshold value.
The out-of-tolerance threshold is either an upper or lower boundary.
And judging whether the preliminary abnormal data exceeds an upper boundary or not.
The calculation formula of the upper boundary is as follows:
UpperLimit=Q 3 +kIQR (1)
wherein Q is 3 Is the upper quartile, i.e., 75% quartile; IQR represents the upper and lower quartile range, k represents the empirical coefficient, and is a constant.
The calculation formula of the lower boundary is as follows:
LowerLimit=Q 1 -kIQR (2)
wherein Q is 1 Lower quartile, 25% quantile; IQR represents the upper and lower quartile range, k represents the empirical coefficient, and is a constant.
S033: if yes, the preliminary abnormal data is abnormal data.
S034: if not, the preliminary abnormal data is non-abnormal data.
S04: and when the pre-abnormal temperature data or the pre-abnormal humidity data are data in the abnormal data set, cleaning the pre-abnormal temperature data or the pre-abnormal humidity data.
The application provides a cleaning method of standard laboratory digital wet thermometer out-of-tolerance data, firstly, clustering a plurality of data of all data sources by using a cluster analysis method to obtain a temperature data set and a humidity data set. And secondly, linearly analyzing the temperature data in the temperature data set and the humidity data in the humidity data set by using a trend analysis method to obtain pre-abnormal temperature data and pre-abnormal humidity data. And identifying the temperature data set and the humidity data set by using a box graph algorithm to obtain abnormal data, and forming the abnormal data into an abnormal data set. And finally, when the pre-abnormal temperature data or the pre-abnormal humidity data are data in the abnormal data set, cleaning the pre-abnormal temperature data or the pre-abnormal humidity data. In the method, data in a temperature data set and data in a humidity data set are respectively identified by two methods, and when pre-abnormal data identified by a trend analysis method is located in abnormal data identified by a box-shaped graph algorithm, the pre-abnormal data is stain data, and stain data is cleaned. In the method, the abnormal data are identified through the two methods, when the data identified through the two methods are abnormal, the data are identified as the stain data, so that the personnel loss can be reduced, a large amount of manpower is saved, and the accuracy of identifying the stain data is improved due to objectivity of the testing method.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
The above-described embodiments of the present application are not intended to limit the scope of the present application.

Claims (5)

1. The method for cleaning out-of-tolerance data of the digital wet thermometer in the standard laboratory is characterized by comprising the following steps of:
clustering a plurality of data of all data sources by using a clustering analysis method to obtain a temperature data set and a humidity data set;
carrying out linear analysis on the temperature data in the temperature data set and the humidity data in the humidity data set by using a trend analysis method to obtain pre-abnormal temperature data and pre-abnormal humidity data, wherein the pre-abnormal temperature data is temperature data on a deviation temperature linear curve, and the pre-abnormal humidity data is humidity data on a deviation humidity linear curve;
identifying the temperature data set and the humidity data set by using a box graph algorithm to obtain abnormal data, and forming the abnormal data into an abnormal data set, wherein the abnormal data comprises abnormal temperature data and abnormal humidity data;
when the pre-abnormal temperature data or the pre-abnormal humidity data are data in the abnormal data set, cleaning the pre-abnormal temperature data or the pre-abnormal humidity data;
identifying the temperature dataset and the humidity dataset by using a box graph algorithm to obtain abnormal data, wherein the abnormal data comprises the following steps:
analyzing the temperature data of the temperature dataset and the humidity data of the humidity dataset by using a box graph algorithm to obtain preliminary abnormal data;
judging whether the preliminary abnormal data exceeds an out-of-tolerance threshold value or not;
if yes, the preliminary abnormal data are abnormal data; if not, the preliminary abnormal data is not abnormal data.
2. The cleaning method according to claim 1, wherein analyzing the temperature data of the temperature dataset and the humidity data of the humidity dataset by using a box-plot algorithm to obtain preliminary anomaly data comprises:
calculating a median, 25% quantile, 75% quantile, upper and lower boundaries of temperature data of the temperature dataset and humidity data of the humidity dataset;
when the temperature data or the humidity data is above 75% quantile or below 25% quantile, the temperature data or the humidity data are both preliminary abnormal data.
3. The cleaning method of claim 1, wherein the out-of-tolerance threshold is an upper or lower boundary.
4. A cleaning method according to claim 3, wherein the upper boundary is calculated as:
UpperLimit=Q 3 +kIQR (1)
wherein Q is 3 Is the upper quartile, i.e., 75% quartile; k represents an empirical coefficient; IQR represents the up-down quartile range.
5. A cleaning method according to claim 3, wherein the lower boundary is calculated as:
LowerLimit=Q 1 -kIQR (2)
wherein Q is 1 Lower quartile, 25% quantile; k represents an empirical coefficient; IQR represents the up-down quartile range.
CN201911142890.9A 2019-11-20 2019-11-20 Method for cleaning out-of-tolerance data of digital hygrothermograph in standard laboratory Active CN110955650B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911142890.9A CN110955650B (en) 2019-11-20 2019-11-20 Method for cleaning out-of-tolerance data of digital hygrothermograph in standard laboratory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911142890.9A CN110955650B (en) 2019-11-20 2019-11-20 Method for cleaning out-of-tolerance data of digital hygrothermograph in standard laboratory

Publications (2)

Publication Number Publication Date
CN110955650A CN110955650A (en) 2020-04-03
CN110955650B true CN110955650B (en) 2023-06-23

Family

ID=69978026

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911142890.9A Active CN110955650B (en) 2019-11-20 2019-11-20 Method for cleaning out-of-tolerance data of digital hygrothermograph in standard laboratory

Country Status (1)

Country Link
CN (1) CN110955650B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106097138A (en) * 2016-06-03 2016-11-09 合肥工业大学 A kind of electricity consumption anomaly data detection System and method for based on statistical model
CN108412710A (en) * 2018-01-30 2018-08-17 同济大学 A kind of Wind turbines wind power data cleaning method
CN108830510A (en) * 2018-07-16 2018-11-16 国网上海市电力公司 A kind of electric power data preprocess method based on mathematical statistics
CN109918364A (en) * 2019-02-28 2019-06-21 华北电力大学 A kind of data cleaning method based on two-dimensional probability density estimation and quartile method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7904279B2 (en) * 2004-04-02 2011-03-08 Test Advantage, Inc. Methods and apparatus for data analysis
CN106204335A (en) * 2016-07-21 2016-12-07 广东工业大学 A kind of electricity price performs abnormality judgment method, Apparatus and system
US10528533B2 (en) * 2017-02-09 2020-01-07 Adobe Inc. Anomaly detection at coarser granularity of data
EP3364157A1 (en) * 2017-02-16 2018-08-22 Fundación Tecnalia Research & Innovation Method and system of outlier detection in energy metering data
CN109766331A (en) * 2018-12-06 2019-05-17 中科恒运股份有限公司 Method for processing abnormal data and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106097138A (en) * 2016-06-03 2016-11-09 合肥工业大学 A kind of electricity consumption anomaly data detection System and method for based on statistical model
CN108412710A (en) * 2018-01-30 2018-08-17 同济大学 A kind of Wind turbines wind power data cleaning method
CN108830510A (en) * 2018-07-16 2018-11-16 国网上海市电力公司 A kind of electric power data preprocess method based on mathematical statistics
CN109918364A (en) * 2019-02-28 2019-06-21 华北电力大学 A kind of data cleaning method based on two-dimensional probability density estimation and quartile method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨政 ; .基于模型驱动的数据清洗组件研究.云南电力技术.2017,(第06期),全文. *

Also Published As

Publication number Publication date
CN110955650A (en) 2020-04-03

Similar Documents

Publication Publication Date Title
CN111046564B (en) Residual life prediction method for two-stage degraded product
TWI539298B (en) Metrology sampling method with sampling rate decision scheme and computer program product thereof
CN112800616B (en) Equipment residual life self-adaptive prediction method based on proportional acceleration degradation modeling
CN106844901B (en) Structural part residual strength evaluation method based on multi-factor fusion correction
CN116520236B (en) Abnormality detection method and system for intelligent ammeter
CN110738346A (en) batch electric energy meter reliability prediction method based on Weibull distribution
CN106950507A (en) A kind of intelligent clock battery high reliability lifetime estimation method
CN110955650B (en) Method for cleaning out-of-tolerance data of digital hygrothermograph in standard laboratory
CN114691661B (en) Assimilation-based cloud air guide and temperature and humidity profile pretreatment analysis method and system
US7043970B2 (en) Method for monitoring wood-drying kiln state
CN108169565B (en) Nonlinear temperature compensation method for conductivity measurement
CN116503025B (en) Business work order flow processing method based on workflow engine
CN107843215B (en) Based on the roughness value fractal evaluation model building method under optional sampling spacing condition
CN104267270B (en) Transformer key parameters extracting method based on vector similitude
CN113934536A (en) Data acquisition method facing edge calculation
CN115598309B (en) Method and system for monitoring and early warning of lead content in atmospheric environment
CN108109675B (en) Laboratory quality control data management system
CN116029165A (en) Power cable reliability analysis method and system considering lightning influence
CN108124442B (en) Elevator element parameter calibration method, device, equipment and storage medium
CN113378309B (en) Rolling bearing health state online monitoring and residual life prediction method
CN112287302B (en) Method for detecting pH value of oil, computing equipment and computer storage medium
CN112685912A (en) Multivariate generalized Wiener process performance degradation reliability analysis method
CN108459948B (en) Method for determining failure data distribution type in system reliability evaluation
CN108108864B (en) Laboratory quality control data management method
CN109307524B (en) Sensor measurement data spot detection and repair technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant