CN110955650A - Cleaning method for out-of-tolerance data of digital hygrothermograph in standard laboratory - Google Patents
Cleaning method for out-of-tolerance data of digital hygrothermograph in standard laboratory Download PDFInfo
- Publication number
- CN110955650A CN110955650A CN201911142890.9A CN201911142890A CN110955650A CN 110955650 A CN110955650 A CN 110955650A CN 201911142890 A CN201911142890 A CN 201911142890A CN 110955650 A CN110955650 A CN 110955650A
- Authority
- CN
- China
- Prior art keywords
- data
- abnormal
- humidity
- temperature
- data set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Testing Resistance To Weather, Investigating Materials By Mechanical Methods (AREA)
Abstract
The application discloses a method for cleaning out-of-tolerance data of a digital hygrothermograph in a standard laboratory, which clusters a plurality of data of all data sources by using a cluster analysis method to obtain a temperature data set and a humidity data set; performing linear analysis on the temperature data in the temperature data set and the humidity data in the humidity data set by using a trend analysis method to obtain pre-abnormal temperature data and pre-abnormal humidity data; identifying a temperature data set and a humidity data set by using a box chart algorithm to obtain abnormal data, and forming the abnormal data into an abnormal data set; and when the pre-abnormal temperature data or the pre-abnormal humidity data are data in the abnormal data set, cleaning the pre-abnormal temperature data or the pre-abnormal humidity data. In the application, abnormal data are identified through two methods, when the data identified by the two methods are abnormal, the data are determined to be stain data, personnel loss can be reduced, a large amount of manpower is saved, and the accuracy of identifying the stain data is improved due to the fact that the testing method has objectivity.
Description
Technical Field
The application relates to the technical field of data processing, in particular to a method for cleaning out-of-tolerance data of a digital hygrothermograph in a standard laboratory.
Background
Along with the use of various high accuracy measurement sensors in measurement standard equipment, the requirement of a large amount of measurement standard equipment of electric power system to the environment humiture of laboratory is higher and higher, in order to carry out accurate effectual control and collection to the humiture of laboratory, the method that adopts at present is for installing a large amount of digital warm and humid acidimeters. The humiture sensor that utilizes this type of table meter monitors the humiture of whole laboratory to through these data control laboratory air conditioner, in order to ensure that the environment accords with the test requirement. How to ensure that the dispersion (smear data) in the data does not affect the whole experimental room temperature and humidity control becomes a difficult point of the experimental room temperature and humidity control. According to the principle of 'garbage inlet and garbage outlet', the control error of the air conditioner can be caused by the out-of-tolerance data, and the environment temperature and humidity are further influenced.
Therefore, a large amount of digital thermo-hygrometer data in a laboratory needs to be analyzed to clean out the out-of-tolerance data, i.e., the smear data. In the traditional taint data cleaning work, people are mainly relied on to process different databases, or some simple data extraction, transmission and loading application programs or tools are used. Not only does such an approach consume a lot of effort, but also the error rate of temperature and humidity data cleaning increases due to too many uncontrollable factors.
Disclosure of Invention
The application provides a cleaning method for out-of-tolerance data of a digital hygrothermograph in a standard laboratory, and aims to solve the technical problems that the conventional stain data cleaning method is more in time consumption and low in accuracy rate of stain data identification.
In order to solve the above problems, the present application provides the following technical solutions:
the method for cleaning out-of-tolerance data of the digital hygrothermograph in the standard laboratory comprises the following steps: clustering a plurality of data of all data sources by using a clustering analysis method to obtain a temperature data set and a humidity data set; performing linear analysis on temperature data in the temperature data set and humidity data in the humidity data set by using a trend analysis method to obtain pre-abnormal temperature data and pre-abnormal humidity data, wherein the pre-abnormal temperature data is temperature data on a deviation temperature linear curve, and the pre-abnormal humidity data is humidity data on a deviation humidity linear curve; identifying a temperature data set and a humidity data set by using a box chart algorithm to obtain abnormal data, and forming the abnormal data into an abnormal data set, wherein the abnormal data comprises abnormal temperature data and abnormal humidity data; and when the pre-abnormal temperature data or the pre-abnormal humidity data are data in the abnormal data set, cleaning the pre-abnormal temperature data or the pre-abnormal humidity data.
Optionally, identifying the temperature data set and the humidity data set by using a box plot algorithm to obtain abnormal data, including: analyzing the temperature data of the temperature data set and the humidity data of the humidity data set by using a box chart algorithm to obtain primary abnormal data; judging whether the preliminary abnormal data exceeds an out-of-tolerance threshold; if so, the preliminary abnormal data is abnormal data; if not, the preliminary abnormal data is not abnormal data.
Optionally, analyzing the temperature data of the temperature data set and the humidity data of the humidity data set by using a box plot algorithm to obtain preliminary abnormal data, including: calculating the median, 25% quantile, 75% quantile, upper boundary and lower boundary of the temperature data set and the humidity data of the humidity data set; and when the temperature data or the humidity data is above 75% quantile or below 25% quantile, the temperature data or the humidity data is preliminary abnormal data.
Optionally, the out-of-tolerance threshold is an upper bound or a lower bound.
Optionally, the calculation formula of the upper bound is as follows:
UpperLimit=Q3+kIQR (1)
wherein Q is3Is the upper quartile, i.e. the 75% quantile; k represents an empirical coefficient; IQR represents a difference between upper and lower quadrants.
Optionally, the calculation formula of the lower bound is as follows:
LowerLimit=Q1-kIQR (2)
wherein Q is1The lower quartile, i.e. the 25% quantile; k represents an empirical coefficient; IQR represents a difference between upper and lower quadrants.
Has the advantages that: the application provides a method for cleaning out-of-tolerance data of a digital hygrothermograph in a standard laboratory. And secondly, performing linear analysis on the temperature data in the temperature data set and the humidity data in the humidity data set by using a trend analysis method to obtain pre-abnormal temperature data and pre-abnormal humidity data. And thirdly, identifying the temperature data set and the humidity data set by using a box chart algorithm to obtain abnormal data, and forming the abnormal data into an abnormal data set. And finally, when the pre-abnormal temperature data or the pre-abnormal humidity data are data in the abnormal data set, cleaning the pre-abnormal temperature data or the pre-abnormal humidity data. In the application, the data in the temperature data set and the humidity data set are respectively identified by two methods, and when the pre-abnormal data identified by the trend analysis method is positioned in the abnormal data identified by the boxplot algorithm, the pre-abnormal data is indicated as stain data, and the stain data is cleaned. In the application, abnormal data are identified through two methods, when the data identified by the two methods are abnormal, the data are determined to be stain data, personnel loss can be reduced, a large amount of manpower is saved, and the accuracy of identifying the stain data is improved due to the fact that the testing method has objectivity.
Drawings
In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without any creative effort.
FIG. 1 is a flow chart of a method for cleaning standard laboratory digital hygrometer out-of-tolerance data provided in the application;
fig. 2 is a flowchart of a method for acquiring abnormal data.
Detailed Description
Referring to fig. 1, a flowchart of a method for cleaning out-of-tolerance data of a standard laboratory digital hygrometer is provided, and it can be seen that the method for cleaning out-of-tolerance data of a standard laboratory digital hygrometer is provided, and the method for cleaning out-of-tolerance data of a standard laboratory digital hygrometer includes:
s01: and clustering a plurality of data of all data sources by using a clustering analysis method to obtain a temperature data set and a humidity data set.
S02: and performing linear analysis on the temperature data in the temperature data set and the humidity data in the humidity data set by using a trend analysis method to obtain pre-abnormal temperature data and pre-abnormal humidity data.
The pre-abnormal temperature data is temperature data on a deviation temperature linear curve, and the pre-abnormal humidity data is humidity data on a deviation humidity linear curve.
S03: and identifying the temperature data set and the humidity data set by using a box chart algorithm to obtain abnormal data, and forming the abnormal data into an abnormal data set.
The abnormal data includes abnormal temperature data and abnormal humidity data.
Referring to fig. 2, a flowchart of an abnormal data obtaining method provided for application may be seen, and the specific process is as follows:
s031: and analyzing the temperature data of the temperature data set and the humidity data of the humidity data set by using a box chart algorithm to obtain preliminary abnormal data.
S0311: calculating a median, a 25% quantile, a 75% quantile, an upper boundary, and a lower boundary of the temperature data set and the humidity data of the humidity data set.
S0312: and when the temperature data or the humidity data is above 75% quantile or below 25% quantile, the temperature data or the humidity data is preliminary abnormal data.
When the temperature data or the humidity data is between the 25% quantile and the 75% quantile, the temperature data or the humidity data is normal data.
S032: and judging whether the preliminary abnormal data exceeds an out-of-tolerance threshold value.
The out-of-tolerance threshold is an upper bound or a lower bound.
And judging whether the preliminary abnormal data exceeds an upper boundary or an upper boundary.
The upper bound is calculated as follows:
UpperLimit=Q3+kIQR (1)
wherein Q is3Is the upper quartile, i.e. the 75% quantile; IQR represents the upper and lower quartile difference, and k represents the empirical coefficient, which is a constant.
The lower bound is calculated as follows:
LowerLimit=Q1-kIQR (2)
wherein Q is1The lower quartile, i.e. the 25% quantile; IQR represents the upper and lower quartile difference, and k represents the empirical coefficient, which is a constant.
S033: if so, the preliminary abnormal data is abnormal data.
S034: if not, the preliminary abnormal data is non-abnormal data.
S04: and when the pre-abnormal temperature data or the pre-abnormal humidity data are data in the abnormal data set, cleaning the pre-abnormal temperature data or the pre-abnormal humidity data.
The application provides a method for cleaning out-of-tolerance data of a digital hygrothermograph in a standard laboratory. And secondly, performing linear analysis on the temperature data in the temperature data set and the humidity data in the humidity data set by using a trend analysis method to obtain pre-abnormal temperature data and pre-abnormal humidity data. And thirdly, identifying the temperature data set and the humidity data set by using a box chart algorithm to obtain abnormal data, and forming the abnormal data into an abnormal data set. And finally, when the pre-abnormal temperature data or the pre-abnormal humidity data are data in the abnormal data set, cleaning the pre-abnormal temperature data or the pre-abnormal humidity data. In the application, the data in the temperature data set and the humidity data set are respectively identified by two methods, and when the pre-abnormal data identified by the trend analysis method is positioned in the abnormal data identified by the boxplot algorithm, the pre-abnormal data is indicated as stain data, and the stain data is cleaned. In the application, abnormal data are identified through two methods, when the data identified by the two methods are abnormal, the data are determined to be stain data, personnel loss can be reduced, a large amount of manpower is saved, and the accuracy of identifying the stain data is improved due to the fact that the testing method has objectivity.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
The above-described embodiments of the present application do not limit the scope of the present application.
Claims (6)
1. The method for cleaning out-of-tolerance data of the digital hygrothermograph in the standard laboratory is characterized by comprising the following steps:
clustering a plurality of data of all data sources by using a clustering analysis method to obtain a temperature data set and a humidity data set;
performing linear analysis on the temperature data in the temperature data set and the humidity data in the humidity data set by using a trend analysis method to obtain pre-abnormal temperature data and pre-abnormal humidity data, wherein the pre-abnormal temperature data is temperature data on a deviation temperature linear curve, and the pre-abnormal humidity data is humidity data on a deviation humidity linear curve;
identifying the temperature data set and the humidity data set by using a box chart algorithm to obtain abnormal data, and forming the abnormal data into an abnormal data set, wherein the abnormal data comprises abnormal temperature data and abnormal humidity data;
and when the pre-abnormal temperature data or the pre-abnormal humidity data are data in the abnormal data set, cleaning the pre-abnormal temperature data or the pre-abnormal humidity data.
2. The cleaning method of claim 1, wherein identifying the temperature data set and the humidity data set using a boxplot algorithm to obtain anomaly data comprises:
analyzing the temperature data of the temperature data set and the humidity data of the humidity data set by using a box plot algorithm to obtain preliminary abnormal data;
judging whether the preliminary abnormal data exceeds an out-of-tolerance threshold;
if so, the preliminary abnormal data is abnormal data; and if not, the preliminary abnormal data is non-abnormal data.
3. The cleaning method of claim 2, wherein analyzing the temperature data of the temperature data set and the humidity data of the humidity data set using a boxplot algorithm to obtain preliminary anomaly data comprises:
calculating a median, a 25% quantile, a 75% quantile, an upper boundary, and a lower boundary of the temperature data set and the humidity data of the humidity data set;
and when the temperature data or the humidity data is above 75% quantile or below 25% quantile, the temperature data or the humidity data are preliminary abnormal data.
4. The cleaning method of claim 2, wherein the out-of-tolerance threshold is an upper bound or a lower bound.
5. The cleaning method according to claim 4, wherein the calculation formula of the upper boundary is as follows:
UpperLimit=Q3+kIQR (1)
wherein Q is3Is the upper quartile, i.e. the 75% quantile; k represents an empirical coefficient; IQR represents a difference between upper and lower quadrants.
6. The cleaning method according to claim 4, wherein the calculation formula of the lower boundary is as follows:
LowerLimit=Q1-kIQR (2)
wherein Q is1The lower quartile, i.e. the 25% quantile; k represents an empirical coefficient; IQR represents a difference between upper and lower quadrants.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911142890.9A CN110955650B (en) | 2019-11-20 | 2019-11-20 | Method for cleaning out-of-tolerance data of digital hygrothermograph in standard laboratory |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911142890.9A CN110955650B (en) | 2019-11-20 | 2019-11-20 | Method for cleaning out-of-tolerance data of digital hygrothermograph in standard laboratory |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110955650A true CN110955650A (en) | 2020-04-03 |
CN110955650B CN110955650B (en) | 2023-06-23 |
Family
ID=69978026
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911142890.9A Active CN110955650B (en) | 2019-11-20 | 2019-11-20 | Method for cleaning out-of-tolerance data of digital hygrothermograph in standard laboratory |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110955650B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080091977A1 (en) * | 2004-04-02 | 2008-04-17 | Emilio Miguelanez | Methods and apparatus for data analysis |
CN106097138A (en) * | 2016-06-03 | 2016-11-09 | 合肥工业大学 | A kind of electricity consumption anomaly data detection System and method for based on statistical model |
CN106204335A (en) * | 2016-07-21 | 2016-12-07 | 广东工业大学 | A kind of electricity price performs abnormality judgment method, Apparatus and system |
US20180225320A1 (en) * | 2017-02-09 | 2018-08-09 | Adobe Systems Incorporated | Anomaly Detection at Coarser Granularity of Data |
CN108412710A (en) * | 2018-01-30 | 2018-08-17 | 同济大学 | A kind of Wind turbines wind power data cleaning method |
EP3364157A1 (en) * | 2017-02-16 | 2018-08-22 | Fundación Tecnalia Research & Innovation | Method and system of outlier detection in energy metering data |
CN108830510A (en) * | 2018-07-16 | 2018-11-16 | 国网上海市电力公司 | A kind of electric power data preprocess method based on mathematical statistics |
CN109766331A (en) * | 2018-12-06 | 2019-05-17 | 中科恒运股份有限公司 | Method for processing abnormal data and device |
CN109918364A (en) * | 2019-02-28 | 2019-06-21 | 华北电力大学 | A kind of data cleaning method based on two-dimensional probability density estimation and quartile method |
-
2019
- 2019-11-20 CN CN201911142890.9A patent/CN110955650B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080091977A1 (en) * | 2004-04-02 | 2008-04-17 | Emilio Miguelanez | Methods and apparatus for data analysis |
CN106097138A (en) * | 2016-06-03 | 2016-11-09 | 合肥工业大学 | A kind of electricity consumption anomaly data detection System and method for based on statistical model |
CN106204335A (en) * | 2016-07-21 | 2016-12-07 | 广东工业大学 | A kind of electricity price performs abnormality judgment method, Apparatus and system |
US20180225320A1 (en) * | 2017-02-09 | 2018-08-09 | Adobe Systems Incorporated | Anomaly Detection at Coarser Granularity of Data |
EP3364157A1 (en) * | 2017-02-16 | 2018-08-22 | Fundación Tecnalia Research & Innovation | Method and system of outlier detection in energy metering data |
CN108412710A (en) * | 2018-01-30 | 2018-08-17 | 同济大学 | A kind of Wind turbines wind power data cleaning method |
CN108830510A (en) * | 2018-07-16 | 2018-11-16 | 国网上海市电力公司 | A kind of electric power data preprocess method based on mathematical statistics |
CN109766331A (en) * | 2018-12-06 | 2019-05-17 | 中科恒运股份有限公司 | Method for processing abnormal data and device |
CN109918364A (en) * | 2019-02-28 | 2019-06-21 | 华北电力大学 | A kind of data cleaning method based on two-dimensional probability density estimation and quartile method |
Non-Patent Citations (1)
Title |
---|
杨政;: "基于模型驱动的数据清洗组件研究" * |
Also Published As
Publication number | Publication date |
---|---|
CN110955650B (en) | 2023-06-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113255795B (en) | Equipment state monitoring method based on multi-index cluster analysis | |
CN112381476B (en) | Method and device for determining electric energy meter with abnormal state | |
CN110083803B (en) | Method and system for detecting water taking abnormality based on time sequence ARIMA model | |
CN113838054B (en) | Mechanical part surface damage detection method based on artificial intelligence | |
KR20170122043A (en) | Real-time indoor air quality outlier smoothing method and apparatus | |
CN115876258B (en) | Livestock and poultry breeding environment abnormity monitoring and alarming system based on multi-source data | |
CN116520236B (en) | Abnormality detection method and system for intelligent ammeter | |
US7043970B2 (en) | Method for monitoring wood-drying kiln state | |
CN115937595A (en) | Bridge apparent anomaly identification method and system based on intelligent data processing | |
CN112417371A (en) | Method for monitoring running state of intelligent electric energy meter in distribution network area | |
CN110955650A (en) | Cleaning method for out-of-tolerance data of digital hygrothermograph in standard laboratory | |
CN117330156A (en) | Automatic fault detection and analysis device for gas flowmeter | |
CN113934536A (en) | Data acquisition method facing edge calculation | |
CN116859875B (en) | Steel pipe production process adjusting and controlling method and system based on use requirements | |
CN116503025B (en) | Business work order flow processing method based on workflow engine | |
CN117115169A (en) | Intelligent recognition method for abnormal deformation of surface of die-casting die of automobile part | |
CN117113104A (en) | Intelligent management system and method applying data analysis technology | |
CN108109675B (en) | Laboratory quality control data management system | |
CN108124442B (en) | Elevator element parameter calibration method, device, equipment and storage medium | |
CN108108864B (en) | Laboratory quality control data management method | |
CN115586321A (en) | Method, system, memory and equipment for identifying online monitoring data of dissolved gas in oil | |
CN114626758A (en) | Effect evaluation system for medical equipment maintenance | |
CN108459948B (en) | Method for determining failure data distribution type in system reliability evaluation | |
CN111061257B (en) | Industrial process monitoring method based on dynamic global LPP | |
CN112763678A (en) | PCA-based sewage treatment process monitoring method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |