CN114116689A - Big data cleaning method based on building structure safety monitoring - Google Patents

Big data cleaning method based on building structure safety monitoring Download PDF

Info

Publication number
CN114116689A
CN114116689A CN202111240237.3A CN202111240237A CN114116689A CN 114116689 A CN114116689 A CN 114116689A CN 202111240237 A CN202111240237 A CN 202111240237A CN 114116689 A CN114116689 A CN 114116689A
Authority
CN
China
Prior art keywords
data
sensor
monitoring
processed
abnormal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111240237.3A
Other languages
Chinese (zh)
Inventor
陈聪
王晋
吴为民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Ruibangkete Testing Co ltd
Original Assignee
Zhejiang Ruibangkete Testing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Ruibangkete Testing Co ltd filed Critical Zhejiang Ruibangkete Testing Co ltd
Priority to CN202111240237.3A priority Critical patent/CN114116689A/en
Publication of CN114116689A publication Critical patent/CN114116689A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01DMEASURING NOT SPECIALLY ADAPTED FOR A SPECIFIC VARIABLE; ARRANGEMENTS FOR MEASURING TWO OR MORE VARIABLES NOT COVERED IN A SINGLE OTHER SUBCLASS; TARIFF METERING APPARATUS; MEASURING OR TESTING NOT OTHERWISE PROVIDED FOR
    • G01D21/00Measuring or testing not otherwise provided for
    • G01D21/02Measuring two or more variables by means not covered by a single other subclass

Abstract

The invention relates to a big data cleaning technology, and provides a big data cleaning method based on building structure safety monitoring, which is characterized by comprising the following steps: (1) acquiring the type of a sensor, monitoring data of the sensor, temperature data of the sensor and a sensor layout on a building; (2) calculating the data loss rate, and judging whether the data loss rate is more than 20%; if the data missing rate is less than or equal to 20%, performing missing value completion on the monitoring data of the sensor to be processed; if the data missing rate is more than 20%, entering the step (3); (3) judging abnormal drift data, and cleaning the abnormal drift data; (4) and judging the data jumping points, and cleaning the mutation data. The method has the advantages of simple and convenient operation flow, small calculated amount, convenient programming realization and capability of realizing automatic processing of the safety monitoring data of the construction engineering.

Description

Big data cleaning method based on building structure safety monitoring
Technical Field
The invention relates to a big data cleaning technology, in particular to a big data cleaning method.
Background
In recent years, along with the advance of digital reformation of the building industry, digital automatic monitoring is gradually applied to engineering structure safety monitoring, the automatic monitoring with dense acquisition frequency can generate a large amount of data, and abnormal data can be inevitably generated due to abnormal factors such as stability of a sensing instrument, power failure and leakage, sudden failure, construction interference and the like. The cleaning of abnormal data usually occupies a large part of the time for processing the large data, and a certain method is needed for automatic processing. Chinese patent document CN103198147A discloses a method for discriminating and processing automatically monitored abnormal data, which divides abnormal data into accidental data, abrupt change data and slowly changing data for discrimination and processing, but the above patent does not discriminate dynamic data and static data, and theoretically cannot judge whether the abnormal data is caused by abnormal construction factors or normal construction factors through data collected by a single sensor. Chinese patent document CN11391982A discloses an anomaly monitoring method, device and equipment for monitoring data, which determines an anomaly point in original time series data through secondary processing, but the above patent requires setting a target threshold in the secondary processing, and human interference factors have a large influence. Chinese patent document CN113297744A discloses a charging pile data cleaning method and a charging station suitable for error monitoring and calculation, which mainly aim at cleaning charging pile information data. Chinese patent document CN113377750A discloses a method and a system for cleaning hydrological data, which are used for cleaning acquired hydrological data such as point rainfall, flow and evaporation.
The abnormal data cleaning of the engineering structure safety monitoring is mainly focused on a road bridge structure. Chinese patent CN112700622A discloses a storm-based railway geological disaster monitoring big data preprocessing method and system, which constructs a storm cluster topology structure, and performs data extraction, cleaning, conversion and synchronous integration on various types of sensor data from each monitoring point, but the data cleaning of the above patent only aims at dynamic acceleration and speed data of railway geological disaster monitoring, and is different from a static data cleaning method. Chinese patent document CN110287178A discloses a bridge progressive drift data cleaning method based on data difference, which cleans drift data by calculating the difference between adjacent data between original data and trend data calculated by Super Smoother algorithm and using interval estimation theory, but the above patent only aims at progressive drift data, and does not distinguish dynamic data with large noise from static data with small noise. Neither of the above patents addresses missing data. Therefore, in view of the above technical shortcomings, a system processing method for monitoring abnormal data for building structure safety is urgently needed.
Disclosure of Invention
In view of the above disadvantages of the prior art, the present invention is directed to a big data cleaning method based on building structure safety monitoring, which is used to solve the technical problems in the prior art.
In order to achieve the purpose, the invention provides the following technical scheme:
the invention provides a big data cleaning method based on building structure safety monitoring, which is characterized by comprising the following steps of:
(1) acquiring the type of a sensor, monitoring data of the sensor, temperature data of the sensor and a sensor layout on a building;
(2) calculating the data loss rate, and judging whether the data loss rate is more than 20%; if the data missing rate is less than or equal to 20%, performing missing value completion on the monitoring data of the sensor to be processed; if the data missing rate is more than 20%, entering the step (3);
(3) judging abnormal drift data, and cleaning the abnormal drift data;
(4) and judging the data jumping points, and cleaning the mutation data.
Further, the sensor layout is an engineering position corresponding to the sensor data number, and the sensor layout is used for correlation analysis of the sensors.
Further, the sensor monitoring data is monitoring data corresponding to the sensor type.
Furthermore, the temperature data of the sensor is the temperature data measured by a built-in temperature collector of the sensor, such as strain, stress, crack, displacement and the like.
Further, the sensor types include strain gauges, crack gauges, displacement gauges, static levels, fixed inclinometers, water level gauges, soil pressure gauges, pore water pressure gauges, anemometers, or inclinometers, among other static data sensors.
Further, the formula for calculating the data missing rate is as follows:
Figure BDA0003319195690000031
wherein NaN is a data loss value;
num (NaN) is the number of data missing values NaN, and can be determined according to the number of monitored data 0;
and n is the number of the monitoring data of the sensor to be processed.
Further, the step (2) of performing missing value completion on the monitoring data of the sensor to be processed comprises the following steps:
(2.1) recording a sensor monitoring data set { a) to be processedi}(i=1,2,…,n);
(2.2) monitoring the data set { a) with the sensor to be processediDividing (i is 1, 2, …, n), wherein all data missing values NaN are used as a test set y { test _ y }, recording the missing position of each data missing value NaN, and dividing the rest values of the data missing values NaN as a training set y { train _ y };
(2.3) traversing the monitoring data of the other sensors of the same type in the engineering project, and calculating a monitoring data set { a) of the sensors to be processediA Person correlation coefficient of {1, 2, …, n) } (i ═ 1, 2, …, n), and the sensor monitoring data set having the largest Person correlation coefficient is found as { a'j(j ═ 1, 2, …, m), m being the number of sensor monitor data having the largest correlation coefficient with the sensor monitor data set to be processed; because of the data transmission time difference of different sensors, m is generally not equal to n;
preferably, the Person correlation coefficient calculation is a prior art, is not a protection point of the present application, and therefore, is not further described.
(2.4) monitoring the sensor data set { a 'with the maximum correlation coefficient by taking the number n of the sensor monitoring data to be processed as a reference'jNumber of (j ═ 1, 2, …, m)According to increase and decrease:
if m<n, then the data set { a 'is monitored at the sensor with the largest correlation coefficient'j(n-m) 0's at the end of the data sequence of (j ═ 1, 2, …, m);
if m is larger than or equal to n, deleting the sensor monitoring data set { a 'with the maximum correlation coefficient'jThe end of the data sequence of (m-n) elements of (j ═ 1, 2, …, m);
sensor monitoring data set { a 'with maximum correlation coefficient'jThe data set with data added or subtracted from (j-1, 2, …, m) is denoted as { a'i}(i=1,2,…,n);
(2.5) finding a sensor monitoring data set { a) to be processediPosition of data missing value in the data set, monitor the sensor for data set { a'iThe data value at the corresponding position in the data set is used as a test set x { test _ x }, and the sensor monitors a data set { a'iThe data values of the rest positions are used as a training set x { train _ x };
(2.6) training an algorithm classifier according to a training set x { train _ x } and a training set y { train _ y } by adopting a K nearest neighbor algorithm;
preferably, the K-nearest neighbor algorithm and the training algorithm classifier are both prior art and are not protection points of the present application, so that no further calculation description is provided.
(2.7) predicting a target set { pred-y } according to the test set x { test _ x } by using a trained algorithm classifier;
(2.8) replacing the data in the target set { pred-y } with the data in the test set y { test _ y } and monitoring the sensor to be processed by the data set { a } according to the data missing value position informationiFilling the missing data value in the target set { pred-y } with data of a corresponding position in the target set, and marking the sensor monitoring data set after data filling as { y }i}(i=1,2,…,n)。
Further, the specific steps of the step (3) are as follows:
(3.1) computing a sensor monitoring dataset { y i1, 2, …, n and a sensor monitoring dataset { a'iThe average value of each element (i ═ 1, 2, …, n) is expressed as an average value data set { μ ═ μi}(i=1,2,…,n);
(3.2) computing a sensor monitoring dataset { y i1, 2, …, n) and a mean data set μiThe difference value of (i ═ 1, 2, …, n) is recorded as data difference value set { dy }i}(i=1,2,…,n);
(3.3) calculating a data difference set { dy) by using a 3 sigma criterioniFind out the data difference set { dy } by "the mean μ" and standard deviation σ "of (i ═ 1, 2, …, n)iData out of the range (μ "-3 σ", μ "+ 3 σ") in (i ═ 1, 2, …, n) is an abnormal drift data set { dy-abnormal ″r(r ═ 1, 2, …, s), where s is the number of anomalous drift data;
(3.4) monitoring the dataset at the sensor { y }iIn (i ═ 1, 2, …, n) abnormal drift data sets { dy-abnormalrRecording abnormal drift data in the data as NaN' to obtain a processed monitoring data set { x }i}(i=1,2,…,n)。
Further, the specific steps of the step (4) are as follows:
(4.1) raw temperature dataset of sensor to be processed is denoted as { t }i}(i=1,2,…,n);
(4.2) removal of the monitor dataset { x) using one-dimensional wavelet decompositioniAnd short-period components of the raw temperature data of the sensor to be processed to obtain a smoothed monitoring data set { x'iAnd smoothed temperature data set t'i}(i=1,2,…,n);
(4.3) calculating the difference between the smoothed data set and the original data set:
monitoring data difference set { dviThe method is as follows:
{dvi}={x’i}-{xij (i ═ 1, 2, …, n) (equation 2)
Temperature data difference set { tviThe method is as follows:
{tvi}={t’i}-{tij (i ═ 1, 2, …, n) (equation 3)
(4.4) calculating a set of difference values { dv } of the monitored dataiFind the mean value mu and standard deviation sigma of (i ═ 1, 2, …, n), and find the monitoring data difference set { dvi}(i=12, …, n) out of the range (μ -3 σ, μ +3 σ), recording the data in the monitored data difference set { dviPosition in (i ═ 1, 2, …, n), which is recorded as the set of monitored abnormal trip point data positions { dv-location }k1, 2, …, p is the number of the abnormal jumping point data;
calculating a temperature data difference set { tviFind out the temperature data difference value set { tv } by the mean value mu 'and standard deviation sigma' of (i ═ 1, 2, …, n)iData out of (μ '-3 σ', μ '+ 3 σ') range in (i ═ 1, 2, …, n), the data is recorded in the temperature data difference set { tv ═ tiThe position in (i ═ 1, 2, …, n) is marked as the temperature anomaly trip point data position set { tv-location }lQ is the number of the temperature abnormal jumping point data;
(4.5) judging and monitoring abnormal jumping point data position set { dv-location }kElement and temperature anomaly trip point data position set { tv-location } in (k 1, 2, …, p)lWhether the elements in (l ═ 1, 2, …, q) are the same;
the same elements are not processed;
will be located in the monitoring abnormal jumping point data location set { dv-locationkPosition set of middle and temperature abnormal jumping point data { tv-location }lThe different elements of the page are marked as the jumping point data positions; in the data set { xiThe data value of the corresponding position is marked as NaN ″ } (i ═ 1, 2, …, n), and a processed data set { z ═ is obtainedi}(i=1,2,…,n)。
Wherein i, j, k and l all represent serial numbers.
The invention has the following beneficial effects:
(1) aiming at static data obtained by the safety automatic monitoring of the construction engineering, the invention systematically carries out data cleaning on mutation data, drift data and missing data in the static data, and provides an effective data preprocessing method for the safety monitoring of the static data of the construction engineering;
(2) the method has the advantages of simple and convenient operation flow, small calculated amount, easy programming realization and capability of realizing automatic processing of the safety monitoring data of the construction engineering;
(3) by combining with engineering practice, the invention is applied to the actual old house monitoring project, provides reference and reference significance for data processing of the automatic monitoring project of the enterprise in the building engineering, and has good popularization value.
Drawings
FIG. 1 is a flow chart of the cleaning method of the present invention.
Fig. 2 is a layout diagram of the tilt sensor of the present embodiment.
Fig. 3 is tilt sensor monitoring data of observation point 5 in the present embodiment.
Fig. 4 is a comparison diagram before and after processing of the inclination sensor data missing value of observation point 5 in the present embodiment.
Fig. 5 is a comparison chart before and after processing of the tilt sensor data drift value of observation point 5 in the present embodiment.
Fig. 6 is a comparison graph before and after processing of the tilt sensor data jump point value of observation point 5 in the present embodiment.
Detailed Description
The following detailed description of the embodiments of the present invention is provided in conjunction with the accompanying drawings, and it should be noted that the embodiments are merely illustrative of the present invention and should not be considered as limiting the invention, and the purpose of the embodiments is to make those skilled in the art better understand and reproduce the technical solutions of the present invention, and the protection scope of the present invention should be subject to the scope defined by the claims.
The building structure safety monitoring mainly monitors static data such as deformation and internal force of a structure, and abnormal monitoring data of the building structure safety monitoring mainly comprises mutation data, drift data and missing data. At present, methods which can be used for cleaning mutation data and drift data mainly comprise a 3 sigma criterion, a Super Smoother algorithm, a wavelet method and the like, and methods which are used for processing missing data mainly comprise interpolation point complementation, fitting point complementation, artificial intelligence point complementation and the like. When the method is applied to actual building engineering, the complexity and the accuracy of the processing method need to be balanced, and the abnormal building structure safety monitoring data needs to be cleaned systematically by combining the characteristics of the static building structure safety monitoring data.
The invention provides a big data cleaning method based on building structure safety monitoring, which is characterized by comprising the following steps of:
s1, acquiring the type of the sensors on the building, sensor monitoring data, sensor temperature data and sensor layout;
the sensor layout is an engineering position corresponding to the sensor data number, and the sensor layout is used for correlation analysis of the sensors.
The sensor monitoring data is monitoring data corresponding to the sensor type.
The temperature data of the sensor is the temperature data measured by a built-in temperature collector of the sensor, such as strain, stress, crack, displacement and the like.
The sensor types include strain gauges, crack gauges, displacement gauges, static levels, fixed inclinometers, water level gauges, soil pressure gauges, pore water pressure gauges, anemometers, or inclinometers, among other static data sensors.
S2, calculating the data missing rate and judging whether the data missing rate is more than 20%;
Figure BDA0003319195690000091
wherein NaN is a data loss value;
num (NaN) is the number of data missing values NaN, and can be determined according to the number of monitored data 0;
and n is the number of the monitoring data of the sensor to be processed.
If the data missing rate is less than or equal to 20%, performing missing value completion on the monitoring data of the sensor to be processed;
s2.1, recording a monitoring data set { a) of the sensor to be processedi}(i=1,2,…,n);
S2.2, monitoring a data set { a) of the sensor to be processediDividing (i is 1, 2, …, n), wherein all data missing values NaN are used as a test set y { test _ y }, recording the missing position of each data missing value NaN, and dividing the rest values of the data missing values NaN as a training set y { train _ y };
s2.3, traversing the monitoring data of the other sensors of the same type in the engineering project, and calculating a monitoring data set { a ] of the sensor to be processediA Person correlation coefficient of {1, 2, …, n) } (i ═ 1, 2, …, n), and the sensor monitoring data set having the largest Person correlation coefficient is found as { a'j(j ═ 1, 2, …, m), m being the number of sensor monitor data having the largest correlation coefficient with the sensor monitor data set to be processed; because of the data transmission time difference of different sensors, m is generally not equal to n;
preferably, the Person correlation coefficient calculation is a prior art, is not a protection point of the present application, and therefore, is not further described.
S2.4, monitoring the sensor data set { a 'with the maximum correlation coefficient by taking the number n of the sensor monitoring data to be processed as a reference'jData addition and subtraction are performed (j is 1, 2, …, m):
if m<n, then the data set { a 'is monitored at the sensor with the largest correlation coefficient'j(n-m) 0's at the end of the data sequence of (j ═ 1, 2, …, m);
if m is larger than or equal to n, deleting the sensor monitoring data set { a 'with the maximum correlation coefficient'jThe end of the data sequence of (m-n) elements of (j ═ 1, 2, …, m);
sensor monitoring data set { a 'with maximum correlation coefficient'jThe data set with data added or subtracted from (j-1, 2, …, m) is denoted as { a'i}(i=1,2,…,n);
S2.5, finding a monitoring data set { a) of the sensor to be processediPosition of data missing value in the data set, monitor the sensor for data set { a'iThe data value at the corresponding position in the data set is used as a test set x { test _ x }, and the sensor monitors a data set { a'iThe data values of the rest positions are used as a training set x { train _ x };
s2.6, training an algorithm classifier according to a training set x { train _ x } and a training set y { train _ y } by adopting a K nearest neighbor algorithm;
s2.7, predicting a target set { pred-y } according to the test set x { test _ x } by using a trained algorithm classifier;
s2.8, replacing the data in the target set { pred-y } in the test set y { test _ y }And monitoring the data set { a) of the sensor to be processed according to the data missing value position informationiFilling the missing data value in the target set { pred-y } with data of a corresponding position in the target set, and marking the sensor monitoring data set after data filling as { y }i}(i=1,2,…,n)。
If the data missing rate is more than 20%, directly entering the step (3);
s3, judging abnormal drift data and cleaning the abnormal drift data;
s3.1, calculating a sensor monitoring data set { y i1, 2, …, n and a sensor monitoring dataset { a'iThe average value of each element (i ═ 1, 2, …, n) is expressed as an average value data set { μ ═ μi}(i=1,2,…,n);
S3.2, calculating a data sensor monitoring data set { y i1, 2, …, n and a sensor monitoring dataset { a'iThe difference value of (i ═ 1, 2, …, n) is recorded as data difference value set { dy }i}(i=1,2,…,n);
S3.3, adopting a 3 sigma criterion to calculate a data difference value set { dyiFind out the data difference set { dy } by "the mean μ" and standard deviation σ "of (i ═ 1, 2, …, n)iData out of the range (μ "-3 σ", μ "+ 3 σ") in (i ═ 1, 2, …, n) is an abnormal drift data set { dy-abnormal ″r(r ═ 1, 2, …, s), the number of s abnormal drift data;
s3.4, monitoring the data set { y ] at the sensoriIn (i ═ 1, 2, …, n) abnormal drift data sets { dy-abnormalrRecording abnormal drift data in the data as NaN' to obtain a processed monitoring data set { x }i}(i=1,2,…,n)。
And S4, judging the data jump point and cleaning the mutation data.
S4.1, recording the raw temperature data of the sensor to be processed as ti}(i=1,2,…,n);
S4.2, removing the monitoring data set { x) by using one-dimensional wavelet decompositioniObtaining smoothed monitoring data { x 'according to short-period components of the original temperature data of the sensor to be processed'iAnd smoothed temperature data { t'i}(i=1,2,…,n);
S4.3, calculating the difference value between the smoothed data and the original data:
difference set of monitoring data { dviThe method is as follows:
{dvi}={x’i}-{xij (i ═ 1, 2, …, n) (equation 2)
Set of differences in temperature data { tviThe method is as follows:
{tvi}={t’i}-{tij (i ═ 1, 2, …, n) (equation 3)
S4.4, calculating a monitoring data difference value set { dviFind the mean value mu and standard deviation sigma of (i ═ 1, 2, …, n), and find the monitoring data difference set { dviData out of the range (μ -3 σ, μ +3 σ) in (i ═ 1, 2, …, n) is recorded in the monitoring data difference set { dviPosition in (i ═ 1, 2, …, n), which is recorded as the set of monitored abnormal trip point data positions { dv-location }k1, 2, …, p is the number of the abnormal jumping point data;
calculating a temperature data difference set { tviFind out the temperature data difference value set { tv } by the mean value mu 'and standard deviation sigma' of (i ═ 1, 2, …, n)iData out of (μ '-3 σ', μ '+ 3 σ') range in (i ═ 1, 2, …, n), the data is recorded in the temperature data difference set { tv ═ tiThe position in (i ═ 1, 2, …, n) is marked as the temperature anomaly trip point data position set { tv-location }l1, 2, … and q, wherein q is the number of the temperature abnormal data;
s4.5, judging and monitoring the position set { dv-location of abnormal jumping point datakElement and temperature anomaly trip point data position set { tv-location } in (k 1, 2, …, p)lWhether the elements in (l ═ 1, 2, …, q) are the same;
processing the same elements;
will be located in the monitoring abnormal jumping point data location set { dv-locationkPosition set of middle and temperature abnormal jumping point data { tv-location }lThe different elements of the page are marked as the jumping point data positions;
in the data set { xiThe data value of the corresponding position is marked as NaN ″ } (i ═ 1, 2, …, n), and a processed data set { z ═ is obtainedi}(i=1,2,…,n)。
Taking a dangerous house in a certain family area as an example, the general building outline is shown in table 1, the building of the project is close to the service life or exceeds the service life, different aging damage conditions exist in the main structure, and the building project is dynamically monitored for 24 hours by the entrepreneur for ensuring the use safety of the building.
Table 1 project building overview
Figure BDA0003319195690000131
As shown in FIG. 2, the data cleaning is carried out on the building house inclination monitoring point 5 of No. 57-61 of the head tap in the embodiment;
as shown in fig. 3, the inclination angle sensor of observation point 5 is selected to monitor data for three months, and it can be seen from the figure that the number of the monitoring data 0 is 18, so num (nan) is 18;
monitoring raw data set denoted as { a }i}(i=1,2,…,275),
Figure BDA0003319195690000132
The missing rate is less than 20%, so the missing data is completed.
In the tilt sensors of observation points 1, 2, 3, 4, 6 and 7, traversing each sensor monitoring data, calculating Person correlation coefficients of the tilt sensor monitoring data of observation point 5, wherein the Person correlation coefficients of the tilt sensor monitoring data of observation point 3 and observation point 5 are 0.3421, -0.2916, 0.4488, 0.2139, -0.0166 and 0.1125, respectively, selecting the tilt sensor monitoring data of observation point 3, and marking as a sensor monitoring data set { a'j}(j=1,2,…,288)。
Monitoring the data set with the data to be processed { a }iWith respect to observation point 3, using 1, 2, …, 275 as a referenceThe tilt sensor of (1) monitors the data set { a'jThe tilt sensor monitoring data set { a 'of observation point 3 is deleted by increasing or decreasing data (j ═ 1, 2, …, 288)'jEnd of data sequence (288-'i}(i=1,2,…,275)。
To-be-processed monitoring data set { aiAnd (i) 1, 2, … and 275), wherein the data missing value set is used as a test set y { test _ y }, the total number of the data is 18, the data missing value position is recorded, and the rest values are used as a training set y { train _ y }, and the total number of the data is 257.
Finding a monitoring data set { a) of a monitoring sensor to be processediPosition of data missing value in the data set, monitor the sensor for data set { a'iThe data value at the corresponding position in the data set is used as a test set x { test _ x }, and the sensor monitors a data set { a'iThe remaining position data values are used as a training set x train _ x.
And training the algorithm classifier according to the training set x { train _ x } and the training set y { train _ y } by adopting a K neighbor algorithm.
Predicting a target set { pred-y } according to the test set x { test _ x } by using a trained algorithm classifier; replacing the data in the test set y { test _ y } with the data in the target set { pred-y }, and enabling the to-be-processed monitoring data set { a to be processed according to the position information of the missing data valueiFilling the missing data value in the target set { pred-y } with data of a corresponding position in the target set, and marking the sensor monitoring data set after data filling as { y }iFig. 4 shows a data comparison graph before and after completion of data (i ═ 1, 2, …, 275).
Computing a sensor monitoring dataset { yiAnd a sensor monitoring dataset { a'iMean value of each element, which is expressed as mean value data set [ mu ]i}(i=1,2,…,275);
Computing a sensor monitoring dataset { yiAnd mean data set [ mu ]iThe difference is recorded as a data difference set dyi}(i=1,2,…,275)。
Computing a set of data differences { dyiThe mean value μ "standard deviation σ" of (i ═ 1, 2, …, n),
μ″=3.3259;
σ″=17.7849;
find the data difference set { dyiData out of the range (μ "-3 σ", μ "+ 3 σ") in (i ═ 1, 2, …, n) is an abnormal drift data set { dy-abnormal ″rR ═ 1, 2, …, 18. In the data set yiThe abnormal drift data { dy-abnormal } in (i ═ 1, 2, …, 257) is analyzedrRecording abnormal drift data in the data as NaN' to obtain a processed monitoring data set { x }iFig. 5 shows a comparison between before and after data drift processing (i ═ 1, 2, …, 257).
The raw temperature data set of the tilt sensor for obtaining the observation point 5 to be processed is recorded as ti}(i=1,2,…,257);
Removing short-period components of the tilt sensor monitoring data and the temperature data of the observation point 5 to be processed by using one-dimensional wavelet decomposition to obtain a smoothed monitoring data set { x'iAnd smoothed temperature data set t'i}(i=1,2,…,257)。
Calculating a monitoring data difference set { dviThe method is as follows:
{dvi}={x’i}-{xi}(i=1,2,…,257)
calculating a temperature data difference set { tviThe method is as follows:
{tvi}={t’i}-{ti}(i=1,2,…,257)
calculating a monitoring data difference set { dviMean μ and standard deviation σ of };
μ=-0.0032;
σ=23923;
finding out a monitoring data difference value set { dviData out of the range (mu-3 sigma, mu +3 sigma), monitor the set of abnormal jumping point data locations { dv-location }k}={4,22,36,197,211,260}。
Calculating a temperature data difference set { tviMean μ 'and standard deviation σ';
μ′=0.0023;
σ′=0.219;
finding out temperature data difference value set { tviData out of the range (μ '-3 σ', μ '+ 3 σ') in (i ═ 1, 2, …, n), whose temperature anomaly trip point data position set is { tv-location ═ location }l4, 22, 36, 260, 262, 264 }; wherein an abnormal jumping point data position set { dv-location is monitoredkElement {4, 22, 36, 260} of {4, 22, 36, 260} and a set of temperature anomaly trip point data locations { tv-location }lThe elements in the sequence are the same, and the abnormal jumping point data position set { dv-location is monitoredkElement 197, 211 of { is the data skip point position, data set { x }iThe data at the corresponding position in the data is set as NaN ", and a comparison graph before and after the data jumping point processing is shown in FIG. 6.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

Claims (9)

1. A big data cleaning method based on building structure safety monitoring is characterized by comprising the following steps:
(1) acquiring the type of a sensor, monitoring data of the sensor, temperature data of the sensor and a sensor layout on a building;
(2) calculating the data loss rate, and judging whether the data loss rate is more than 20%;
if the data missing rate is less than or equal to 20%, performing missing value completion on the monitoring data of the sensor to be processed;
if the data missing rate is more than 20%, entering the step (3);
(3) judging abnormal drift data, and cleaning the abnormal drift data;
(4) and judging the data jumping points, and cleaning the mutation data.
2. The big data cleaning method based on building structure safety monitoring as claimed in claim 1, wherein the sensor layout is an engineering position corresponding to a sensor data number.
3. The big data washing method based on building structure safety monitoring as claimed in claim 1, wherein the sensor monitoring data is the monitoring data corresponding to the sensor type.
4. The big data cleaning method based on building structure safety monitoring as claimed in claim 1, wherein the sensor temperature data is temperature data measured by a built-in temperature collector of the sensor.
5. The big data cleaning method based on building structure safety monitoring as claimed in claim 1, wherein the sensor types include strain gauge, stress gauge, crack gauge, displacement gauge, static level gauge, fixed inclinometer, water level gauge, soil pressure gauge, pore water pressure gauge, anemometer or inclinometer.
6. The big data cleaning method based on building structure safety monitoring as claimed in claim 1, wherein the formula for calculating the data loss rate is as follows:
Figure FDA0003319195680000021
wherein NaN is a data loss value;
num (NaN) is the number of data loss values NaN;
and n is the number of the monitoring data of the sensor to be processed.
7. The big data cleaning method based on building structure safety monitoring as claimed in claim 1, wherein the step of performing missing value completion on the monitoring data of the sensor to be processed in the step (2) is as follows:
(2.1) recording a sensor monitoring data set { a) to be processedi}(i=1,2,…,n);
(2.2) monitoring the data set { a) with the sensor to be processediDividing (i is 1, 2, …, n), wherein all data missing values NaN are used as a test set y { test _ y }, recording the missing position of each data missing value NaN, and dividing the rest values of the data missing values NaN as a training set y { train _ y };
(2.3) traversing the monitoring data of the other sensors of the same type in the engineering project, and calculating a monitoring data set { a) of the sensors to be processediA Person correlation coefficient of {1, 2, …, n) } (i ═ 1, 2, …, n), and the sensor monitoring data set having the largest Person correlation coefficient is found as { a'j(j ═ 1, 2, …, m), m being the number of sensor monitor data having the largest correlation coefficient with the sensor monitor data set to be processed;
(2.4) monitoring the sensor data set { a 'with the maximum correlation coefficient by taking the number n of the sensor monitoring data to be processed as a reference'jData addition and subtraction are performed (j is 1, 2, …, m):
if m<n, then the data set { a 'is monitored at the sensor with the largest correlation coefficient'j(n-m) 0's at the end of the data sequence of (j ═ 1, 2, …, m);
if m is larger than or equal to n, deleting the sensor monitoring data set { a 'with the maximum correlation coefficient'jThe end of the data sequence of (m-n) elements of (j ═ 1, 2, …, m);
sensor monitoring data set { a 'with maximum correlation coefficient'jThe data set with data added or subtracted from (j-1, 2, …, m) is denoted as { a'i}(i=1,2,…,n);
(2.5) finding a sensor monitoring data set { a) to be processediPosition of data missing value in the data set, monitoring the data set { a 'by the sensor after data increase and decrease'iThe data value at the corresponding position in the data set is used as a test set x { test _ x }, and the sensor monitors a data set { a'iThe data values of the rest positions are used as a training set x { train _ x };
(2.6) training an algorithm classifier according to a training set x { train _ x } and a training set y { train _ y } by adopting a K nearest neighbor algorithm;
(2.7) predicting a target set { pred-y } according to the test set x { test _ x } by using a trained algorithm classifier;
(2.8) replacing the data in the target set { pred-y } with the data in the test set y { test _ y } and monitoring the sensor to be processed by the data set { a } according to the data missing value position informationiFilling the missing data value in the target set { pred-y } with data of a corresponding position in the target set, and marking the sensor monitoring data set after data filling as { y }i}(i=1,2,…,n)。
8. The big data cleaning method based on building structure safety monitoring as claimed in claim 7, wherein the concrete steps of step (3) are:
(3.1) computing a sensor monitoring dataset { yi1, 2, …, n and a sensor monitoring dataset { a'iThe average value of each element (i ═ 1, 2, …, n) is expressed as an average value data set { μ R }i}(i=1,2,…,n);
(3.2) computing a sensor monitoring dataset { yi1, 2, …, n and a mean dataset μ RiThe difference value of (i ═ 1, 2, …, n) is recorded as data difference value set { dy }i}(i=1,2,…,n);
(3.3) calculating a data difference set { dy) by using a 3 sigma criterioniFind out the data difference set { dy } by "the mean μ" and standard deviation σ "of (i ═ 1, 2, …, n)iData out of the range (μ "-3 σ", μ "+ 3 σ") in (i ═ 1, 2, …, n) is an abnormal drift data set { dy-abnormal ″r(r ═ 1, 2, …, s), where s is the number of anomalous drift data;
(3.4) monitoring the dataset at the sensor { y }iIn (i ═ 1, 2, …, n) abnormal drift data sets { dy-abnormalrRecording abnormal drift data in the data as NaN' to obtain a processed monitoring data set { x }i}(i=1,2,…,n)。
9. The big data cleaning method based on building structure safety monitoring as claimed in claim 8, wherein the concrete steps of step (4) are:
(4.1) raw temperature dataset of sensor to be processed is denoted as { t }i}(i=1,2,…,n);
(4.2) removal of the monitor dataset { x) using one-dimensional wavelet decompositioniAnd short-period components of the raw temperature data of the sensor to be processed to obtain a smoothed monitoring data set { x'i1, 2, …, n and smoothed temperature data set t'i}(i=1,2,…,n);
(4.3) calculating the difference between the smoothed data set and the original data set:
monitoring data difference set { dviThe method is as follows:
{dvi}={x’i}-{xij (i ═ 1, 2, …, n) (equation 2)
Temperature data difference set { tviThe method is as follows:
{tvi}={t’i}-{tij (i ═ 1, 2, …, n) (equation 3)
(4.4) calculating a set of difference values { dv } of the monitored dataiFind the mean value mu and standard deviation sigma of (i ═ 1, 2, …, n), and find the monitoring data difference set { dviData out of the range (μ -3 σ, μ +3 σ) in (i ═ 1, 2, …, n) is recorded in the monitoring data difference set { dviPosition in (i ═ 1, 2, …, n), which is recorded as the set of monitored abnormal trip point data positions { dv-location }k1, 2, …, p is the number of the abnormal jumping point data;
calculating a temperature data difference set { tviFind out the temperature data difference value set { tv } by the mean value mu 'and standard deviation sigma' of (i ═ 1, 2, …, n)iData out of (μ '-3 σ', μ '+ 3 σ') range in (i ═ 1, 2, …, n), the data is recorded in the temperature data difference set { tv ═ tiThe position in (i ═ 1, 2, …, n) is marked as the temperature anomaly trip point data position set { tv-location }lQ is the number of the temperature abnormal jumping point data;
(4.5) judging and monitoring abnormal jumping point data position set { dv-location }kElement and temperature anomaly trip point data position set { tv-location } in (k 1, 2, …, p)lWhether the elements in (l ═ 1, 2, …, q) are the same;
do not process the same element(ii) a Will be located in the monitoring abnormal jumping point data location set { dv-locationkPosition set of middle and temperature abnormal jumping point data { tv-location }lThe different elements of the page are marked as the jumping point data positions; in the data set { xiThe data value of the corresponding position is marked as NaN ″ } (i ═ 1, 2, …, n), and a processed data set { z ═ is obtainedi}(i=1,2,…,n)。
CN202111240237.3A 2021-10-25 2021-10-25 Big data cleaning method based on building structure safety monitoring Pending CN114116689A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111240237.3A CN114116689A (en) 2021-10-25 2021-10-25 Big data cleaning method based on building structure safety monitoring

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111240237.3A CN114116689A (en) 2021-10-25 2021-10-25 Big data cleaning method based on building structure safety monitoring

Publications (1)

Publication Number Publication Date
CN114116689A true CN114116689A (en) 2022-03-01

Family

ID=80377317

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111240237.3A Pending CN114116689A (en) 2021-10-25 2021-10-25 Big data cleaning method based on building structure safety monitoring

Country Status (1)

Country Link
CN (1) CN114116689A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114967613A (en) * 2022-05-11 2022-08-30 杭州康吉森自动化科技有限公司 Method and device for monitoring state of production equipment with multiple sensors
CN115618273A (en) * 2022-09-15 2023-01-17 哈尔滨工业大学 Railway track state evaluation method and system based on parallel graph convolution neural network
CN116186634A (en) * 2023-04-26 2023-05-30 青岛新航农高科产业发展有限公司 Intelligent management system for construction data of building engineering

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114967613A (en) * 2022-05-11 2022-08-30 杭州康吉森自动化科技有限公司 Method and device for monitoring state of production equipment with multiple sensors
CN115618273A (en) * 2022-09-15 2023-01-17 哈尔滨工业大学 Railway track state evaluation method and system based on parallel graph convolution neural network
CN115618273B (en) * 2022-09-15 2023-06-30 哈尔滨工业大学 Railway track state evaluation method and system based on parallel graph convolution neural network
CN116186634A (en) * 2023-04-26 2023-05-30 青岛新航农高科产业发展有限公司 Intelligent management system for construction data of building engineering

Similar Documents

Publication Publication Date Title
CN114116689A (en) Big data cleaning method based on building structure safety monitoring
CN108805059B (en) Sparse regularization filtering and self-adaptive sparse decomposition gearbox fault diagnosis method
EP1451550B1 (en) System and method for identifying the presence of a defect in vibrating machinery
CN113670616B (en) Bearing performance degradation state detection method and system
CN106845447A (en) A kind of face gas concentration prediction method for early warning
CN109255395B (en) Service life prediction method of ball screw pair
CN109376401A (en) A kind of adaptive multi-source information preferably with the mechanical method for predicting residual useful life that merges
CN112348237A (en) Dynamic drilling data abnormal trend detection method
CN112966856A (en) Mountain torrent risk prediction method and prediction system
CN111881594B (en) Non-stationary signal state monitoring method and system for nuclear power equipment
CN117057616B (en) Water conservancy monitoring method and system based on digital twin
CN112380992B (en) Method and device for evaluating and optimizing accuracy of monitoring data in machining process
JP4061008B2 (en) Result prediction apparatus, method, and computer-readable storage medium
CN116308305A (en) Bridge health monitoring data management system
CN116579615A (en) Vegetation coverage monitoring system based on unmanned aerial vehicle remote sensing
CN112461340A (en) Fault correcting and detecting method and device for water level meter
CN114186194B (en) Design wave height calculation method based on space storm process
CN116222670A (en) Ecological landscape slope monitoring method for urban green land planning
CN114544040A (en) Pile group node stress monitoring system based on neural network algorithm and early warning method thereof
WO2017055838A1 (en) Method and system for predicting railway track quality
JP6803788B2 (en) Information processing equipment, information processing methods and programs
CN112528227A (en) Sensor abnormal data identification method based on mathematical statistics
CN109507697B (en) New precise identification method for abnormal value in GNSS time sequence
CN102323049A (en) Structural abnormality detection method based on consistent data replacement under incomplete data
Pailoplee et al. CU-PSHA: A Matlab software for probabilistic seismic hazard analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination