CN110955650A - Cleaning method for out-of-tolerance data of digital hygrothermograph in standard laboratory - Google Patents

Cleaning method for out-of-tolerance data of digital hygrothermograph in standard laboratory Download PDF

Info

Publication number
CN110955650A
CN110955650A CN201911142890.9A CN201911142890A CN110955650A CN 110955650 A CN110955650 A CN 110955650A CN 201911142890 A CN201911142890 A CN 201911142890A CN 110955650 A CN110955650 A CN 110955650A
Authority
CN
China
Prior art keywords
data
abnormal
humidity
temperature
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911142890.9A
Other languages
Chinese (zh)
Other versions
CN110955650B (en
Inventor
唐标
李博
于辉
王恩
朱梦梦
朱全聪
林中爱
杨明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electric Power Research Institute of Yunnan Power Grid Co Ltd
Original Assignee
Electric Power Research Institute of Yunnan Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Power Research Institute of Yunnan Power Grid Co Ltd filed Critical Electric Power Research Institute of Yunnan Power Grid Co Ltd
Priority to CN201911142890.9A priority Critical patent/CN110955650B/en
Publication of CN110955650A publication Critical patent/CN110955650A/en
Application granted granted Critical
Publication of CN110955650B publication Critical patent/CN110955650B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Testing Resistance To Weather, Investigating Materials By Mechanical Methods (AREA)

Abstract

The application discloses a method for cleaning out-of-tolerance data of a digital hygrothermograph in a standard laboratory, which clusters a plurality of data of all data sources by using a cluster analysis method to obtain a temperature data set and a humidity data set; performing linear analysis on the temperature data in the temperature data set and the humidity data in the humidity data set by using a trend analysis method to obtain pre-abnormal temperature data and pre-abnormal humidity data; identifying a temperature data set and a humidity data set by using a box chart algorithm to obtain abnormal data, and forming the abnormal data into an abnormal data set; and when the pre-abnormal temperature data or the pre-abnormal humidity data are data in the abnormal data set, cleaning the pre-abnormal temperature data or the pre-abnormal humidity data. In the application, abnormal data are identified through two methods, when the data identified by the two methods are abnormal, the data are determined to be stain data, personnel loss can be reduced, a large amount of manpower is saved, and the accuracy of identifying the stain data is improved due to the fact that the testing method has objectivity.

Description

Cleaning method for out-of-tolerance data of digital hygrothermograph in standard laboratory
Technical Field
The application relates to the technical field of data processing, in particular to a method for cleaning out-of-tolerance data of a digital hygrothermograph in a standard laboratory.
Background
Along with the use of various high accuracy measurement sensors in measurement standard equipment, the requirement of a large amount of measurement standard equipment of electric power system to the environment humiture of laboratory is higher and higher, in order to carry out accurate effectual control and collection to the humiture of laboratory, the method that adopts at present is for installing a large amount of digital warm and humid acidimeters. The humiture sensor that utilizes this type of table meter monitors the humiture of whole laboratory to through these data control laboratory air conditioner, in order to ensure that the environment accords with the test requirement. How to ensure that the dispersion (smear data) in the data does not affect the whole experimental room temperature and humidity control becomes a difficult point of the experimental room temperature and humidity control. According to the principle of 'garbage inlet and garbage outlet', the control error of the air conditioner can be caused by the out-of-tolerance data, and the environment temperature and humidity are further influenced.
Therefore, a large amount of digital thermo-hygrometer data in a laboratory needs to be analyzed to clean out the out-of-tolerance data, i.e., the smear data. In the traditional taint data cleaning work, people are mainly relied on to process different databases, or some simple data extraction, transmission and loading application programs or tools are used. Not only does such an approach consume a lot of effort, but also the error rate of temperature and humidity data cleaning increases due to too many uncontrollable factors.
Disclosure of Invention
The application provides a cleaning method for out-of-tolerance data of a digital hygrothermograph in a standard laboratory, and aims to solve the technical problems that the conventional stain data cleaning method is more in time consumption and low in accuracy rate of stain data identification.
In order to solve the above problems, the present application provides the following technical solutions:
the method for cleaning out-of-tolerance data of the digital hygrothermograph in the standard laboratory comprises the following steps: clustering a plurality of data of all data sources by using a clustering analysis method to obtain a temperature data set and a humidity data set; performing linear analysis on temperature data in the temperature data set and humidity data in the humidity data set by using a trend analysis method to obtain pre-abnormal temperature data and pre-abnormal humidity data, wherein the pre-abnormal temperature data is temperature data on a deviation temperature linear curve, and the pre-abnormal humidity data is humidity data on a deviation humidity linear curve; identifying a temperature data set and a humidity data set by using a box chart algorithm to obtain abnormal data, and forming the abnormal data into an abnormal data set, wherein the abnormal data comprises abnormal temperature data and abnormal humidity data; and when the pre-abnormal temperature data or the pre-abnormal humidity data are data in the abnormal data set, cleaning the pre-abnormal temperature data or the pre-abnormal humidity data.
Optionally, identifying the temperature data set and the humidity data set by using a box plot algorithm to obtain abnormal data, including: analyzing the temperature data of the temperature data set and the humidity data of the humidity data set by using a box chart algorithm to obtain primary abnormal data; judging whether the preliminary abnormal data exceeds an out-of-tolerance threshold; if so, the preliminary abnormal data is abnormal data; if not, the preliminary abnormal data is not abnormal data.
Optionally, analyzing the temperature data of the temperature data set and the humidity data of the humidity data set by using a box plot algorithm to obtain preliminary abnormal data, including: calculating the median, 25% quantile, 75% quantile, upper boundary and lower boundary of the temperature data set and the humidity data of the humidity data set; and when the temperature data or the humidity data is above 75% quantile or below 25% quantile, the temperature data or the humidity data is preliminary abnormal data.
Optionally, the out-of-tolerance threshold is an upper bound or a lower bound.
Optionally, the calculation formula of the upper bound is as follows:
UpperLimit=Q3+kIQR (1)
wherein Q is3Is the upper quartile, i.e. the 75% quantile; k represents an empirical coefficient; IQR represents a difference between upper and lower quadrants.
Optionally, the calculation formula of the lower bound is as follows:
LowerLimit=Q1-kIQR (2)
wherein Q is1The lower quartile, i.e. the 25% quantile; k represents an empirical coefficient; IQR represents a difference between upper and lower quadrants.
Has the advantages that: the application provides a method for cleaning out-of-tolerance data of a digital hygrothermograph in a standard laboratory. And secondly, performing linear analysis on the temperature data in the temperature data set and the humidity data in the humidity data set by using a trend analysis method to obtain pre-abnormal temperature data and pre-abnormal humidity data. And thirdly, identifying the temperature data set and the humidity data set by using a box chart algorithm to obtain abnormal data, and forming the abnormal data into an abnormal data set. And finally, when the pre-abnormal temperature data or the pre-abnormal humidity data are data in the abnormal data set, cleaning the pre-abnormal temperature data or the pre-abnormal humidity data. In the application, the data in the temperature data set and the humidity data set are respectively identified by two methods, and when the pre-abnormal data identified by the trend analysis method is positioned in the abnormal data identified by the boxplot algorithm, the pre-abnormal data is indicated as stain data, and the stain data is cleaned. In the application, abnormal data are identified through two methods, when the data identified by the two methods are abnormal, the data are determined to be stain data, personnel loss can be reduced, a large amount of manpower is saved, and the accuracy of identifying the stain data is improved due to the fact that the testing method has objectivity.
Drawings
In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without any creative effort.
FIG. 1 is a flow chart of a method for cleaning standard laboratory digital hygrometer out-of-tolerance data provided in the application;
fig. 2 is a flowchart of a method for acquiring abnormal data.
Detailed Description
Referring to fig. 1, a flowchart of a method for cleaning out-of-tolerance data of a standard laboratory digital hygrometer is provided, and it can be seen that the method for cleaning out-of-tolerance data of a standard laboratory digital hygrometer is provided, and the method for cleaning out-of-tolerance data of a standard laboratory digital hygrometer includes:
s01: and clustering a plurality of data of all data sources by using a clustering analysis method to obtain a temperature data set and a humidity data set.
S02: and performing linear analysis on the temperature data in the temperature data set and the humidity data in the humidity data set by using a trend analysis method to obtain pre-abnormal temperature data and pre-abnormal humidity data.
The pre-abnormal temperature data is temperature data on a deviation temperature linear curve, and the pre-abnormal humidity data is humidity data on a deviation humidity linear curve.
S03: and identifying the temperature data set and the humidity data set by using a box chart algorithm to obtain abnormal data, and forming the abnormal data into an abnormal data set.
The abnormal data includes abnormal temperature data and abnormal humidity data.
Referring to fig. 2, a flowchart of an abnormal data obtaining method provided for application may be seen, and the specific process is as follows:
s031: and analyzing the temperature data of the temperature data set and the humidity data of the humidity data set by using a box chart algorithm to obtain preliminary abnormal data.
S0311: calculating a median, a 25% quantile, a 75% quantile, an upper boundary, and a lower boundary of the temperature data set and the humidity data of the humidity data set.
S0312: and when the temperature data or the humidity data is above 75% quantile or below 25% quantile, the temperature data or the humidity data is preliminary abnormal data.
When the temperature data or the humidity data is between the 25% quantile and the 75% quantile, the temperature data or the humidity data is normal data.
S032: and judging whether the preliminary abnormal data exceeds an out-of-tolerance threshold value.
The out-of-tolerance threshold is an upper bound or a lower bound.
And judging whether the preliminary abnormal data exceeds an upper boundary or an upper boundary.
The upper bound is calculated as follows:
UpperLimit=Q3+kIQR (1)
wherein Q is3Is the upper quartile, i.e. the 75% quantile; IQR represents the upper and lower quartile difference, and k represents the empirical coefficient, which is a constant.
The lower bound is calculated as follows:
LowerLimit=Q1-kIQR (2)
wherein Q is1The lower quartile, i.e. the 25% quantile; IQR represents the upper and lower quartile difference, and k represents the empirical coefficient, which is a constant.
S033: if so, the preliminary abnormal data is abnormal data.
S034: if not, the preliminary abnormal data is non-abnormal data.
S04: and when the pre-abnormal temperature data or the pre-abnormal humidity data are data in the abnormal data set, cleaning the pre-abnormal temperature data or the pre-abnormal humidity data.
The application provides a method for cleaning out-of-tolerance data of a digital hygrothermograph in a standard laboratory. And secondly, performing linear analysis on the temperature data in the temperature data set and the humidity data in the humidity data set by using a trend analysis method to obtain pre-abnormal temperature data and pre-abnormal humidity data. And thirdly, identifying the temperature data set and the humidity data set by using a box chart algorithm to obtain abnormal data, and forming the abnormal data into an abnormal data set. And finally, when the pre-abnormal temperature data or the pre-abnormal humidity data are data in the abnormal data set, cleaning the pre-abnormal temperature data or the pre-abnormal humidity data. In the application, the data in the temperature data set and the humidity data set are respectively identified by two methods, and when the pre-abnormal data identified by the trend analysis method is positioned in the abnormal data identified by the boxplot algorithm, the pre-abnormal data is indicated as stain data, and the stain data is cleaned. In the application, abnormal data are identified through two methods, when the data identified by the two methods are abnormal, the data are determined to be stain data, personnel loss can be reduced, a large amount of manpower is saved, and the accuracy of identifying the stain data is improved due to the fact that the testing method has objectivity.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
The above-described embodiments of the present application do not limit the scope of the present application.

Claims (6)

1. The method for cleaning out-of-tolerance data of the digital hygrothermograph in the standard laboratory is characterized by comprising the following steps:
clustering a plurality of data of all data sources by using a clustering analysis method to obtain a temperature data set and a humidity data set;
performing linear analysis on the temperature data in the temperature data set and the humidity data in the humidity data set by using a trend analysis method to obtain pre-abnormal temperature data and pre-abnormal humidity data, wherein the pre-abnormal temperature data is temperature data on a deviation temperature linear curve, and the pre-abnormal humidity data is humidity data on a deviation humidity linear curve;
identifying the temperature data set and the humidity data set by using a box chart algorithm to obtain abnormal data, and forming the abnormal data into an abnormal data set, wherein the abnormal data comprises abnormal temperature data and abnormal humidity data;
and when the pre-abnormal temperature data or the pre-abnormal humidity data are data in the abnormal data set, cleaning the pre-abnormal temperature data or the pre-abnormal humidity data.
2. The cleaning method of claim 1, wherein identifying the temperature data set and the humidity data set using a boxplot algorithm to obtain anomaly data comprises:
analyzing the temperature data of the temperature data set and the humidity data of the humidity data set by using a box plot algorithm to obtain preliminary abnormal data;
judging whether the preliminary abnormal data exceeds an out-of-tolerance threshold;
if so, the preliminary abnormal data is abnormal data; and if not, the preliminary abnormal data is non-abnormal data.
3. The cleaning method of claim 2, wherein analyzing the temperature data of the temperature data set and the humidity data of the humidity data set using a boxplot algorithm to obtain preliminary anomaly data comprises:
calculating a median, a 25% quantile, a 75% quantile, an upper boundary, and a lower boundary of the temperature data set and the humidity data of the humidity data set;
and when the temperature data or the humidity data is above 75% quantile or below 25% quantile, the temperature data or the humidity data are preliminary abnormal data.
4. The cleaning method of claim 2, wherein the out-of-tolerance threshold is an upper bound or a lower bound.
5. The cleaning method according to claim 4, wherein the calculation formula of the upper boundary is as follows:
UpperLimit=Q3+kIQR (1)
wherein Q is3Is the upper quartile, i.e. the 75% quantile; k represents an empirical coefficient; IQR represents a difference between upper and lower quadrants.
6. The cleaning method according to claim 4, wherein the calculation formula of the lower boundary is as follows:
LowerLimit=Q1-kIQR (2)
wherein Q is1The lower quartile, i.e. the 25% quantile; k represents an empirical coefficient; IQR represents a difference between upper and lower quadrants.
CN201911142890.9A 2019-11-20 2019-11-20 Method for cleaning out-of-tolerance data of digital hygrothermograph in standard laboratory Active CN110955650B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911142890.9A CN110955650B (en) 2019-11-20 2019-11-20 Method for cleaning out-of-tolerance data of digital hygrothermograph in standard laboratory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911142890.9A CN110955650B (en) 2019-11-20 2019-11-20 Method for cleaning out-of-tolerance data of digital hygrothermograph in standard laboratory

Publications (2)

Publication Number Publication Date
CN110955650A true CN110955650A (en) 2020-04-03
CN110955650B CN110955650B (en) 2023-06-23

Family

ID=69978026

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911142890.9A Active CN110955650B (en) 2019-11-20 2019-11-20 Method for cleaning out-of-tolerance data of digital hygrothermograph in standard laboratory

Country Status (1)

Country Link
CN (1) CN110955650B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080091977A1 (en) * 2004-04-02 2008-04-17 Emilio Miguelanez Methods and apparatus for data analysis
CN106097138A (en) * 2016-06-03 2016-11-09 合肥工业大学 A kind of electricity consumption anomaly data detection System and method for based on statistical model
CN106204335A (en) * 2016-07-21 2016-12-07 广东工业大学 A kind of electricity price performs abnormality judgment method, Apparatus and system
US20180225320A1 (en) * 2017-02-09 2018-08-09 Adobe Systems Incorporated Anomaly Detection at Coarser Granularity of Data
CN108412710A (en) * 2018-01-30 2018-08-17 同济大学 A kind of Wind turbines wind power data cleaning method
EP3364157A1 (en) * 2017-02-16 2018-08-22 Fundación Tecnalia Research & Innovation Method and system of outlier detection in energy metering data
CN108830510A (en) * 2018-07-16 2018-11-16 国网上海市电力公司 A kind of electric power data preprocess method based on mathematical statistics
CN109766331A (en) * 2018-12-06 2019-05-17 中科恒运股份有限公司 Method for processing abnormal data and device
CN109918364A (en) * 2019-02-28 2019-06-21 华北电力大学 A kind of data cleaning method based on two-dimensional probability density estimation and quartile method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080091977A1 (en) * 2004-04-02 2008-04-17 Emilio Miguelanez Methods and apparatus for data analysis
CN106097138A (en) * 2016-06-03 2016-11-09 合肥工业大学 A kind of electricity consumption anomaly data detection System and method for based on statistical model
CN106204335A (en) * 2016-07-21 2016-12-07 广东工业大学 A kind of electricity price performs abnormality judgment method, Apparatus and system
US20180225320A1 (en) * 2017-02-09 2018-08-09 Adobe Systems Incorporated Anomaly Detection at Coarser Granularity of Data
EP3364157A1 (en) * 2017-02-16 2018-08-22 Fundación Tecnalia Research & Innovation Method and system of outlier detection in energy metering data
CN108412710A (en) * 2018-01-30 2018-08-17 同济大学 A kind of Wind turbines wind power data cleaning method
CN108830510A (en) * 2018-07-16 2018-11-16 国网上海市电力公司 A kind of electric power data preprocess method based on mathematical statistics
CN109766331A (en) * 2018-12-06 2019-05-17 中科恒运股份有限公司 Method for processing abnormal data and device
CN109918364A (en) * 2019-02-28 2019-06-21 华北电力大学 A kind of data cleaning method based on two-dimensional probability density estimation and quartile method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨政;: "基于模型驱动的数据清洗组件研究" *

Also Published As

Publication number Publication date
CN110955650B (en) 2023-06-23

Similar Documents

Publication Publication Date Title
CN113255795B (en) Equipment state monitoring method based on multi-index cluster analysis
CN112381476B (en) Method and device for determining electric energy meter with abnormal state
CN110083803B (en) Method and system for detecting water taking abnormality based on time sequence ARIMA model
CN113838054B (en) Mechanical part surface damage detection method based on artificial intelligence
KR20170122043A (en) Real-time indoor air quality outlier smoothing method and apparatus
CN115876258B (en) Livestock and poultry breeding environment abnormity monitoring and alarming system based on multi-source data
CN116520236B (en) Abnormality detection method and system for intelligent ammeter
US7043970B2 (en) Method for monitoring wood-drying kiln state
CN115937595A (en) Bridge apparent anomaly identification method and system based on intelligent data processing
CN112417371A (en) Method for monitoring running state of intelligent electric energy meter in distribution network area
CN110955650A (en) Cleaning method for out-of-tolerance data of digital hygrothermograph in standard laboratory
CN117330156A (en) Automatic fault detection and analysis device for gas flowmeter
CN113934536A (en) Data acquisition method facing edge calculation
CN116859875B (en) Steel pipe production process adjusting and controlling method and system based on use requirements
CN116503025B (en) Business work order flow processing method based on workflow engine
CN117115169A (en) Intelligent recognition method for abnormal deformation of surface of die-casting die of automobile part
CN117113104A (en) Intelligent management system and method applying data analysis technology
CN108109675B (en) Laboratory quality control data management system
CN108124442B (en) Elevator element parameter calibration method, device, equipment and storage medium
CN108108864B (en) Laboratory quality control data management method
CN115586321A (en) Method, system, memory and equipment for identifying online monitoring data of dissolved gas in oil
CN114626758A (en) Effect evaluation system for medical equipment maintenance
CN108459948B (en) Method for determining failure data distribution type in system reliability evaluation
CN111061257B (en) Industrial process monitoring method based on dynamic global LPP
CN112763678A (en) PCA-based sewage treatment process monitoring method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant