CN114116689A

CN114116689A - Big data cleaning method based on building structure safety monitoring

Info

Publication number: CN114116689A
Application number: CN202111240237.3A
Authority: CN
Inventors: 陈聪; 王晋; 吴为民
Original assignee: Zhejiang Ruibangkete Testing Co ltd
Current assignee: Zhejiang Ruibangkete Testing Co ltd
Priority date: 2021-10-25
Filing date: 2021-10-25
Publication date: 2022-03-01

Abstract

The invention relates to a big data cleaning technology, and provides a big data cleaning method based on building structure safety monitoring, which is characterized by comprising the following steps: (1) acquiring the type of a sensor, monitoring data of the sensor, temperature data of the sensor and a sensor layout on a building; (2) calculating the data loss rate, and judging whether the data loss rate is more than 20%; if the data missing rate is less than or equal to 20%, performing missing value completion on the monitoring data of the sensor to be processed; if the data missing rate is more than 20%, entering the step (3); (3) judging abnormal drift data, and cleaning the abnormal drift data; (4) and judging the data jumping points, and cleaning the mutation data. The method has the advantages of simple and convenient operation flow, small calculated amount, convenient programming realization and capability of realizing automatic processing of the safety monitoring data of the construction engineering.

Description

Big data cleaning method based on building structure safety monitoring

Technical Field

The invention relates to a big data cleaning technology, in particular to a big data cleaning method.

Background

In recent years, along with the advance of digital reformation of the building industry, digital automatic monitoring is gradually applied to engineering structure safety monitoring, the automatic monitoring with dense acquisition frequency can generate a large amount of data, and abnormal data can be inevitably generated due to abnormal factors such as stability of a sensing instrument, power failure and leakage, sudden failure, construction interference and the like. The cleaning of abnormal data usually occupies a large part of the time for processing the large data, and a certain method is needed for automatic processing. Chinese patent document CN103198147A discloses a method for discriminating and processing automatically monitored abnormal data, which divides abnormal data into accidental data, abrupt change data and slowly changing data for discrimination and processing, but the above patent does not discriminate dynamic data and static data, and theoretically cannot judge whether the abnormal data is caused by abnormal construction factors or normal construction factors through data collected by a single sensor. Chinese patent document CN11391982A discloses an anomaly monitoring method, device and equipment for monitoring data, which determines an anomaly point in original time series data through secondary processing, but the above patent requires setting a target threshold in the secondary processing, and human interference factors have a large influence. Chinese patent document CN113297744A discloses a charging pile data cleaning method and a charging station suitable for error monitoring and calculation, which mainly aim at cleaning charging pile information data. Chinese patent document CN113377750A discloses a method and a system for cleaning hydrological data, which are used for cleaning acquired hydrological data such as point rainfall, flow and evaporation.

The abnormal data cleaning of the engineering structure safety monitoring is mainly focused on a road bridge structure. Chinese patent CN112700622A discloses a storm-based railway geological disaster monitoring big data preprocessing method and system, which constructs a storm cluster topology structure, and performs data extraction, cleaning, conversion and synchronous integration on various types of sensor data from each monitoring point, but the data cleaning of the above patent only aims at dynamic acceleration and speed data of railway geological disaster monitoring, and is different from a static data cleaning method. Chinese patent document CN110287178A discloses a bridge progressive drift data cleaning method based on data difference, which cleans drift data by calculating the difference between adjacent data between original data and trend data calculated by Super Smoother algorithm and using interval estimation theory, but the above patent only aims at progressive drift data, and does not distinguish dynamic data with large noise from static data with small noise. Neither of the above patents addresses missing data. Therefore, in view of the above technical shortcomings, a system processing method for monitoring abnormal data for building structure safety is urgently needed.

Disclosure of Invention

In view of the above disadvantages of the prior art, the present invention is directed to a big data cleaning method based on building structure safety monitoring, which is used to solve the technical problems in the prior art.

In order to achieve the purpose, the invention provides the following technical scheme:

the invention provides a big data cleaning method based on building structure safety monitoring, which is characterized by comprising the following steps of:

(1) acquiring the type of a sensor, monitoring data of the sensor, temperature data of the sensor and a sensor layout on a building;

(2) calculating the data loss rate, and judging whether the data loss rate is more than 20%; if the data missing rate is less than or equal to 20%, performing missing value completion on the monitoring data of the sensor to be processed; if the data missing rate is more than 20%, entering the step (3);

(3) judging abnormal drift data, and cleaning the abnormal drift data;

(4) and judging the data jumping points, and cleaning the mutation data.

Further, the sensor layout is an engineering position corresponding to the sensor data number, and the sensor layout is used for correlation analysis of the sensors.

Further, the sensor monitoring data is monitoring data corresponding to the sensor type.

Furthermore, the temperature data of the sensor is the temperature data measured by a built-in temperature collector of the sensor, such as strain, stress, crack, displacement and the like.

Further, the sensor types include strain gauges, crack gauges, displacement gauges, static levels, fixed inclinometers, water level gauges, soil pressure gauges, pore water pressure gauges, anemometers, or inclinometers, among other static data sensors.

Further, the formula for calculating the data missing rate is as follows:

wherein NaN is a data loss value;

num (NaN) is the number of data missing values NaN, and can be determined according to the number of monitored data 0;

and n is the number of the monitoring data of the sensor to be processed.

Further, the step (2) of performing missing value completion on the monitoring data of the sensor to be processed comprises the following steps:

(2.1) recording a sensor monitoring data set { a) to be processed_i}(i＝1，2，…，n)；

(2.2) monitoring the data set { a) with the sensor to be processed_iDividing (i is 1, 2, …, n), wherein all data missing values NaN are used as a test set y { test _ y }, recording the missing position of each data missing value NaN, and dividing the rest values of the data missing values NaN as a training set y { train _ y };

(2.3) traversing the monitoring data of the other sensors of the same type in the engineering project, and calculating a monitoring data set { a) of the sensors to be processed_iA Person correlation coefficient of {1, 2, …, n) } (i ═ 1, 2, …, n), and the sensor monitoring data set having the largest Person correlation coefficient is found as { a'_j(j ═ 1, 2, …, m), m being the number of sensor monitor data having the largest correlation coefficient with the sensor monitor data set to be processed; because of the data transmission time difference of different sensors, m is generally not equal to n;

preferably, the Person correlation coefficient calculation is a prior art, is not a protection point of the present application, and therefore, is not further described.

(2.4) monitoring the sensor data set { a 'with the maximum correlation coefficient by taking the number n of the sensor monitoring data to be processed as a reference'_jNumber of (j ═ 1, 2, …, m)According to increase and decrease:

if m<n, then the data set { a 'is monitored at the sensor with the largest correlation coefficient'_j(n-m) 0's at the end of the data sequence of (j ═ 1, 2, …, m);

if m is larger than or equal to n, deleting the sensor monitoring data set { a 'with the maximum correlation coefficient'_jThe end of the data sequence of (m-n) elements of (j ═ 1, 2, …, m);

sensor monitoring data set { a 'with maximum correlation coefficient'_jThe data set with data added or subtracted from (j-1, 2, …, m) is denoted as { a'_i}(i＝1，2，…，n)；

(2.5) finding a sensor monitoring data set { a) to be processed_iPosition of data missing value in the data set, monitor the sensor for data set { a'_iThe data value at the corresponding position in the data set is used as a test set x { test _ x }, and the sensor monitors a data set { a'_iThe data values of the rest positions are used as a training set x { train _ x };

(2.6) training an algorithm classifier according to a training set x { train _ x } and a training set y { train _ y } by adopting a K nearest neighbor algorithm;

preferably, the K-nearest neighbor algorithm and the training algorithm classifier are both prior art and are not protection points of the present application, so that no further calculation description is provided.

(2.7) predicting a target set { pred-y } according to the test set x { test _ x } by using a trained algorithm classifier;

(2.8) replacing the data in the target set { pred-y } with the data in the test set y { test _ y } and monitoring the sensor to be processed by the data set { a } according to the data missing value position information_iFilling the missing data value in the target set { pred-y } with data of a corresponding position in the target set, and marking the sensor monitoring data set after data filling as { y }_i}(i＝1，2，…，n)。

Further, the specific steps of the step (3) are as follows:

(3.1) computing a sensor monitoring dataset { y _i1, 2, …, n and a sensor monitoring dataset { a'_iThe average value of each element (i ═ 1, 2, …, n) is expressed as an average value data set { μ ═ μ_i}(i＝1，2，…，n)；

(3.2) computing a sensor monitoring dataset { y _i1, 2, …, n) and a mean data set μ_iThe difference value of (i ═ 1, 2, …, n) is recorded as data difference value set { dy }_i}(i＝1，2，…，n)；

(3.3) calculating a data difference set { dy) by using a 3 sigma criterion_iFind out the data difference set { dy } by "the mean μ" and standard deviation σ "of (i ═ 1, 2, …, n)_iData out of the range (μ "-3 σ", μ "+ 3 σ") in (i ═ 1, 2, …, n) is an abnormal drift data set { dy-abnormal ″_r(r ═ 1, 2, …, s), where s is the number of anomalous drift data;

(3.4) monitoring the dataset at the sensor { y }_iIn (i ═ 1, 2, …, n) abnormal drift data sets { dy-abnormal_rRecording abnormal drift data in the data as NaN' to obtain a processed monitoring data set { x }_i}(i＝1，2，…，n)。

Further, the specific steps of the step (4) are as follows:

(4.1) raw temperature dataset of sensor to be processed is denoted as { t }_i}(i＝1，2，…，n)；

(4.2) removal of the monitor dataset { x) using one-dimensional wavelet decomposition_iAnd short-period components of the raw temperature data of the sensor to be processed to obtain a smoothed monitoring data set { x'_iAnd smoothed temperature data set t'_i}(i＝1，2，…，n)；

(4.3) calculating the difference between the smoothed data set and the original data set:

monitoring data difference set { dv_iThe method is as follows:

{dv_i}＝{x’_i}-{x_ij (i ═ 1, 2, …, n) (equation 2)

Temperature data difference set { tv_iThe method is as follows:

{tv_i}＝{t’_i}-{t_ij (i ═ 1, 2, …, n) (equation 3)

(4.4) calculating a set of difference values { dv } of the monitored data_iFind the mean value mu and standard deviation sigma of (i ═ 1, 2, …, n), and find the monitoring data difference set { dv_i}(i＝12, …, n) out of the range (μ -3 σ, μ +3 σ), recording the data in the monitored data difference set { dv_iPosition in (i ═ 1, 2, …, n), which is recorded as the set of monitored abnormal trip point data positions { dv-location }_k1, 2, …, p is the number of the abnormal jumping point data;

calculating a temperature data difference set { tv_iFind out the temperature data difference value set { tv } by the mean value mu 'and standard deviation sigma' of (i ═ 1, 2, …, n)_iData out of (μ '-3 σ', μ '+ 3 σ') range in (i ═ 1, 2, …, n), the data is recorded in the temperature data difference set { tv ═ t_iThe position in (i ═ 1, 2, …, n) is marked as the temperature anomaly trip point data position set { tv-location }_lQ is the number of the temperature abnormal jumping point data;

(4.5) judging and monitoring abnormal jumping point data position set { dv-location }_kElement and temperature anomaly trip point data position set { tv-location } in (k 1, 2, …, p)_lWhether the elements in (l ═ 1, 2, …, q) are the same;

the same elements are not processed;

will be located in the monitoring abnormal jumping point data location set { dv-location_kPosition set of middle and temperature abnormal jumping point data { tv-location }_lThe different elements of the page are marked as the jumping point data positions; in the data set { x_iThe data value of the corresponding position is marked as NaN ″ } (i ═ 1, 2, …, n), and a processed data set { z ═ is obtained_i}(i＝1，2，…，n)。

Wherein i, j, k and l all represent serial numbers.

The invention has the following beneficial effects:

(1) aiming at static data obtained by the safety automatic monitoring of the construction engineering, the invention systematically carries out data cleaning on mutation data, drift data and missing data in the static data, and provides an effective data preprocessing method for the safety monitoring of the static data of the construction engineering;

(2) the method has the advantages of simple and convenient operation flow, small calculated amount, easy programming realization and capability of realizing automatic processing of the safety monitoring data of the construction engineering;

(3) by combining with engineering practice, the invention is applied to the actual old house monitoring project, provides reference and reference significance for data processing of the automatic monitoring project of the enterprise in the building engineering, and has good popularization value.

Drawings

FIG. 1 is a flow chart of the cleaning method of the present invention.

Fig. 2 is a layout diagram of the tilt sensor of the present embodiment.

Fig. 3 is tilt sensor monitoring data of observation point 5 in the present embodiment.

Fig. 4 is a comparison diagram before and after processing of the inclination sensor data missing value of observation point 5 in the present embodiment.

Fig. 5 is a comparison chart before and after processing of the tilt sensor data drift value of observation point 5 in the present embodiment.

Fig. 6 is a comparison graph before and after processing of the tilt sensor data jump point value of observation point 5 in the present embodiment.

Detailed Description

The following detailed description of the embodiments of the present invention is provided in conjunction with the accompanying drawings, and it should be noted that the embodiments are merely illustrative of the present invention and should not be considered as limiting the invention, and the purpose of the embodiments is to make those skilled in the art better understand and reproduce the technical solutions of the present invention, and the protection scope of the present invention should be subject to the scope defined by the claims.

The building structure safety monitoring mainly monitors static data such as deformation and internal force of a structure, and abnormal monitoring data of the building structure safety monitoring mainly comprises mutation data, drift data and missing data. At present, methods which can be used for cleaning mutation data and drift data mainly comprise a 3 sigma criterion, a Super Smoother algorithm, a wavelet method and the like, and methods which are used for processing missing data mainly comprise interpolation point complementation, fitting point complementation, artificial intelligence point complementation and the like. When the method is applied to actual building engineering, the complexity and the accuracy of the processing method need to be balanced, and the abnormal building structure safety monitoring data needs to be cleaned systematically by combining the characteristics of the static building structure safety monitoring data.

s1, acquiring the type of the sensors on the building, sensor monitoring data, sensor temperature data and sensor layout;

the sensor layout is an engineering position corresponding to the sensor data number, and the sensor layout is used for correlation analysis of the sensors.

The sensor monitoring data is monitoring data corresponding to the sensor type.

The temperature data of the sensor is the temperature data measured by a built-in temperature collector of the sensor, such as strain, stress, crack, displacement and the like.

The sensor types include strain gauges, crack gauges, displacement gauges, static levels, fixed inclinometers, water level gauges, soil pressure gauges, pore water pressure gauges, anemometers, or inclinometers, among other static data sensors.

S2, calculating the data missing rate and judging whether the data missing rate is more than 20%;

wherein NaN is a data loss value;

and n is the number of the monitoring data of the sensor to be processed.

If the data missing rate is less than or equal to 20%, performing missing value completion on the monitoring data of the sensor to be processed;

s2.1, recording a monitoring data set { a) of the sensor to be processed_i}(i＝1，2，…，n)；

S2.2, monitoring a data set { a) of the sensor to be processed_iDividing (i is 1, 2, …, n), wherein all data missing values NaN are used as a test set y { test _ y }, recording the missing position of each data missing value NaN, and dividing the rest values of the data missing values NaN as a training set y { train _ y };

s2.3, traversing the monitoring data of the other sensors of the same type in the engineering project, and calculating a monitoring data set { a ] of the sensor to be processed_iA Person correlation coefficient of {1, 2, …, n) } (i ═ 1, 2, …, n), and the sensor monitoring data set having the largest Person correlation coefficient is found as { a'_j(j ═ 1, 2, …, m), m being the number of sensor monitor data having the largest correlation coefficient with the sensor monitor data set to be processed; because of the data transmission time difference of different sensors, m is generally not equal to n;

S2.4, monitoring the sensor data set { a 'with the maximum correlation coefficient by taking the number n of the sensor monitoring data to be processed as a reference'_jData addition and subtraction are performed (j is 1, 2, …, m):

S2.5, finding a monitoring data set { a) of the sensor to be processed_iPosition of data missing value in the data set, monitor the sensor for data set { a'_iThe data value at the corresponding position in the data set is used as a test set x { test _ x }, and the sensor monitors a data set { a'_iThe data values of the rest positions are used as a training set x { train _ x };

s2.6, training an algorithm classifier according to a training set x { train _ x } and a training set y { train _ y } by adopting a K nearest neighbor algorithm;

s2.7, predicting a target set { pred-y } according to the test set x { test _ x } by using a trained algorithm classifier;

s2.8, replacing the data in the target set { pred-y } in the test set y { test _ y }And monitoring the data set { a) of the sensor to be processed according to the data missing value position information_iFilling the missing data value in the target set { pred-y } with data of a corresponding position in the target set, and marking the sensor monitoring data set after data filling as { y }_i}(i＝1，2，…，n)。

If the data missing rate is more than 20%, directly entering the step (3);

s3, judging abnormal drift data and cleaning the abnormal drift data;

s3.1, calculating a sensor monitoring data set { y _i1, 2, …, n and a sensor monitoring dataset { a'_iThe average value of each element (i ═ 1, 2, …, n) is expressed as an average value data set { μ ═ μ_i}(i＝1，2，…，n)；

S3.2, calculating a data sensor monitoring data set { y _i1, 2, …, n and a sensor monitoring dataset { a'_iThe difference value of (i ═ 1, 2, …, n) is recorded as data difference value set { dy }_i}(i＝1，2，…，n)；

S3.3, adopting a 3 sigma criterion to calculate a data difference value set { dy_iFind out the data difference set { dy } by "the mean μ" and standard deviation σ "of (i ═ 1, 2, …, n)_iData out of the range (μ "-3 σ", μ "+ 3 σ") in (i ═ 1, 2, …, n) is an abnormal drift data set { dy-abnormal ″_r(r ═ 1, 2, …, s), the number of s abnormal drift data;

s3.4, monitoring the data set { y ] at the sensor_iIn (i ═ 1, 2, …, n) abnormal drift data sets { dy-abnormal_rRecording abnormal drift data in the data as NaN' to obtain a processed monitoring data set { x }_i}(i＝1，2，…，n)。

And S4, judging the data jump point and cleaning the mutation data.

S4.1, recording the raw temperature data of the sensor to be processed as t_i}(i＝1，2，…，n)；

S4.2, removing the monitoring data set { x) by using one-dimensional wavelet decomposition_iObtaining smoothed monitoring data { x 'according to short-period components of the original temperature data of the sensor to be processed'_iAnd smoothed temperature data { t'_i}(i＝1，2，…，n)；

S4.3, calculating the difference value between the smoothed data and the original data:

difference set of monitoring data { dv_iThe method is as follows:

{dv_i}＝{x’_i}-{x_ij (i ═ 1, 2, …, n) (equation 2)

Set of differences in temperature data { tv_iThe method is as follows:

{tv_i}＝{t’_i}-{t_ij (i ═ 1, 2, …, n) (equation 3)

S4.4, calculating a monitoring data difference value set { dv_iFind the mean value mu and standard deviation sigma of (i ═ 1, 2, …, n), and find the monitoring data difference set { dv_iData out of the range (μ -3 σ, μ +3 σ) in (i ═ 1, 2, …, n) is recorded in the monitoring data difference set { dv_iPosition in (i ═ 1, 2, …, n), which is recorded as the set of monitored abnormal trip point data positions { dv-location }_k1, 2, …, p is the number of the abnormal jumping point data;

calculating a temperature data difference set { tv_iFind out the temperature data difference value set { tv } by the mean value mu 'and standard deviation sigma' of (i ═ 1, 2, …, n)_iData out of (μ '-3 σ', μ '+ 3 σ') range in (i ═ 1, 2, …, n), the data is recorded in the temperature data difference set { tv ═ t_iThe position in (i ═ 1, 2, …, n) is marked as the temperature anomaly trip point data position set { tv-location }_l1, 2, … and q, wherein q is the number of the temperature abnormal data;

s4.5, judging and monitoring the position set { dv-location of abnormal jumping point data_kElement and temperature anomaly trip point data position set { tv-location } in (k 1, 2, …, p)_lWhether the elements in (l ═ 1, 2, …, q) are the same;

processing the same elements;

will be located in the monitoring abnormal jumping point data location set { dv-location_kPosition set of middle and temperature abnormal jumping point data { tv-location }_lThe different elements of the page are marked as the jumping point data positions;

in the data set { x_iThe data value of the corresponding position is marked as NaN ″ } (i ═ 1, 2, …, n), and a processed data set { z ═ is obtained_i}(i＝1，2，…，n)。

Taking a dangerous house in a certain family area as an example, the general building outline is shown in table 1, the building of the project is close to the service life or exceeds the service life, different aging damage conditions exist in the main structure, and the building project is dynamically monitored for 24 hours by the entrepreneur for ensuring the use safety of the building.

Table 1 project building overview

As shown in FIG. 2, the data cleaning is carried out on the building house inclination monitoring point 5 of No. 57-61 of the head tap in the embodiment;

as shown in fig. 3, the inclination angle sensor of observation point 5 is selected to monitor data for three months, and it can be seen from the figure that the number of the monitoring data 0 is 18, so num (nan) is 18;

monitoring raw data set denoted as { a }_i}(i＝1，2，…，275)，

The missing rate is less than 20%, so the missing data is completed.

In the tilt sensors of observation points 1, 2, 3, 4, 6 and 7, traversing each sensor monitoring data, calculating Person correlation coefficients of the tilt sensor monitoring data of observation point 5, wherein the Person correlation coefficients of the tilt sensor monitoring data of observation point 3 and observation point 5 are 0.3421, -0.2916, 0.4488, 0.2139, -0.0166 and 0.1125, respectively, selecting the tilt sensor monitoring data of observation point 3, and marking as a sensor monitoring data set { a'_j}(j＝1，2，…，288)。

Monitoring the data set with the data to be processed { a }_iWith respect to observation point 3, using 1, 2, …, 275 as a referenceThe tilt sensor of (1) monitors the data set { a'_jThe tilt sensor monitoring data set { a 'of observation point 3 is deleted by increasing or decreasing data (j ═ 1, 2, …, 288)'_jEnd of data sequence (288-'_i}(i＝1，2，…，275)。

To-be-processed monitoring data set { a_iAnd (i) 1, 2, … and 275), wherein the data missing value set is used as a test set y { test _ y }, the total number of the data is 18, the data missing value position is recorded, and the rest values are used as a training set y { train _ y }, and the total number of the data is 257.

Finding a monitoring data set { a) of a monitoring sensor to be processed_iPosition of data missing value in the data set, monitor the sensor for data set { a'_iThe data value at the corresponding position in the data set is used as a test set x { test _ x }, and the sensor monitors a data set { a'_iThe remaining position data values are used as a training set x train _ x.

And training the algorithm classifier according to the training set x { train _ x } and the training set y { train _ y } by adopting a K neighbor algorithm.

Predicting a target set { pred-y } according to the test set x { test _ x } by using a trained algorithm classifier; replacing the data in the test set y { test _ y } with the data in the target set { pred-y }, and enabling the to-be-processed monitoring data set { a to be processed according to the position information of the missing data value_iFilling the missing data value in the target set { pred-y } with data of a corresponding position in the target set, and marking the sensor monitoring data set after data filling as { y }_iFig. 4 shows a data comparison graph before and after completion of data (i ═ 1, 2, …, 275).

Computing a sensor monitoring dataset { y_iAnd a sensor monitoring dataset { a'_iMean value of each element, which is expressed as mean value data set [ mu ]_i}(i＝1，2，…，275)；

Computing a sensor monitoring dataset { y_iAnd mean data set [ mu ]_iThe difference is recorded as a data difference set dy_i}(i＝1，2，…，275)。

Computing a set of data differences { dy_iThe mean value μ "standard deviation σ" of (i ═ 1, 2, …, n),

μ″＝3.3259；

σ″＝17.7849；

find the data difference set { dy_iData out of the range (μ "-3 σ", μ "+ 3 σ") in (i ═ 1, 2, …, n) is an abnormal drift data set { dy-abnormal ″_rR ═ 1, 2, …, 18. In the data set y_iThe abnormal drift data { dy-abnormal } in (i ═ 1, 2, …, 257) is analyzed_rRecording abnormal drift data in the data as NaN' to obtain a processed monitoring data set { x }_iFig. 5 shows a comparison between before and after data drift processing (i ═ 1, 2, …, 257).

The raw temperature data set of the tilt sensor for obtaining the observation point 5 to be processed is recorded as t_i}(i＝1，2，…，257)；

Removing short-period components of the tilt sensor monitoring data and the temperature data of the observation point 5 to be processed by using one-dimensional wavelet decomposition to obtain a smoothed monitoring data set { x'_iAnd smoothed temperature data set t'_i}(i＝1，2，…，257)。

Calculating a monitoring data difference set { dv_iThe method is as follows:

{dv_i}＝{x’_i}-{x_i}(i＝1，2，…，257)

calculating a temperature data difference set { tv_iThe method is as follows:

{tv_i}＝{t’_i}-{t_i}(i＝1，2，…，257)

calculating a monitoring data difference set { dv_iMean μ and standard deviation σ of };

μ＝-0.0032；

σ＝23923；

finding out a monitoring data difference value set { dv_iData out of the range (mu-3 sigma, mu +3 sigma), monitor the set of abnormal jumping point data locations { dv-location }_k}＝{4，22，36，197，211，260}。

Calculating a temperature data difference set { tv_iMean μ 'and standard deviation σ';

μ′＝0.0023；

σ′＝0.219；

finding out temperature data difference value set { tv_iData out of the range (μ '-3 σ', μ '+ 3 σ') in (i ═ 1, 2, …, n), whose temperature anomaly trip point data position set is { tv-location ═ location }_l4, 22, 36, 260, 262, 264 }; wherein an abnormal jumping point data position set { dv-location is monitored_kElement {4, 22, 36, 260} of {4, 22, 36, 260} and a set of temperature anomaly trip point data locations { tv-location }_lThe elements in the sequence are the same, and the abnormal jumping point data position set { dv-location is monitored_kElement 197, 211 of { is the data skip point position, data set { x }_iThe data at the corresponding position in the data is set as NaN ", and a comparison graph before and after the data jumping point processing is shown in FIG. 6.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

Claims

1. A big data cleaning method based on building structure safety monitoring is characterized by comprising the following steps:

(2) calculating the data loss rate, and judging whether the data loss rate is more than 20%;

if the data missing rate is more than 20%, entering the step (3);

(3) judging abnormal drift data, and cleaning the abnormal drift data;

(4) and judging the data jumping points, and cleaning the mutation data.

2. The big data cleaning method based on building structure safety monitoring as claimed in claim 1, wherein the sensor layout is an engineering position corresponding to a sensor data number.

3. The big data washing method based on building structure safety monitoring as claimed in claim 1, wherein the sensor monitoring data is the monitoring data corresponding to the sensor type.

4. The big data cleaning method based on building structure safety monitoring as claimed in claim 1, wherein the sensor temperature data is temperature data measured by a built-in temperature collector of the sensor.

5. The big data cleaning method based on building structure safety monitoring as claimed in claim 1, wherein the sensor types include strain gauge, stress gauge, crack gauge, displacement gauge, static level gauge, fixed inclinometer, water level gauge, soil pressure gauge, pore water pressure gauge, anemometer or inclinometer.

6. The big data cleaning method based on building structure safety monitoring as claimed in claim 1, wherein the formula for calculating the data loss rate is as follows:

wherein NaN is a data loss value;

num (NaN) is the number of data loss values NaN;

and n is the number of the monitoring data of the sensor to be processed.

7. The big data cleaning method based on building structure safety monitoring as claimed in claim 1, wherein the step of performing missing value completion on the monitoring data of the sensor to be processed in the step (2) is as follows:

(2.3) traversing the monitoring data of the other sensors of the same type in the engineering project, and calculating a monitoring data set { a) of the sensors to be processed_iA Person correlation coefficient of {1, 2, …, n) } (i ═ 1, 2, …, n), and the sensor monitoring data set having the largest Person correlation coefficient is found as { a'_j(j ═ 1, 2, …, m), m being the number of sensor monitor data having the largest correlation coefficient with the sensor monitor data set to be processed;

(2.4) monitoring the sensor data set { a 'with the maximum correlation coefficient by taking the number n of the sensor monitoring data to be processed as a reference'_jData addition and subtraction are performed (j is 1, 2, …, m):

(2.5) finding a sensor monitoring data set { a) to be processed_iPosition of data missing value in the data set, monitoring the data set { a 'by the sensor after data increase and decrease'_iThe data value at the corresponding position in the data set is used as a test set x { test _ x }, and the sensor monitors a data set { a'_iThe data values of the rest positions are used as a training set x { train _ x };

8. The big data cleaning method based on building structure safety monitoring as claimed in claim 7, wherein the concrete steps of step (3) are:

(3.1) computing a sensor monitoring dataset { y_i1, 2, …, n and a sensor monitoring dataset { a'_iThe average value of each element (i ═ 1, 2, …, n) is expressed as an average value data set { μ R }_i}(i＝1，2，…，n)；

(3.2) computing a sensor monitoring dataset { y_i1, 2, …, n and a mean dataset μ R_iThe difference value of (i ═ 1, 2, …, n) is recorded as data difference value set { dy }_i}(i＝1，2，…，n)；

9. The big data cleaning method based on building structure safety monitoring as claimed in claim 8, wherein the concrete steps of step (4) are:

(4.2) removal of the monitor dataset { x) using one-dimensional wavelet decomposition_iAnd short-period components of the raw temperature data of the sensor to be processed to obtain a smoothed monitoring data set { x'_i1, 2, …, n and smoothed temperature data set t'_i}(i＝1，2，…，n)；

monitoring data difference set { dv_iThe method is as follows:

{dv_i}＝{x’_i}-{x_ij (i ═ 1, 2, …, n) (equation 2)

Temperature data difference set { tv_iThe method is as follows:

{tv_i}＝{t’_i}-{t_ij (i ═ 1, 2, …, n) (equation 3)

(4.4) calculating a set of difference values { dv } of the monitored data_iFind the mean value mu and standard deviation sigma of (i ═ 1, 2, …, n), and find the monitoring data difference set { dv_iData out of the range (μ -3 σ, μ +3 σ) in (i ═ 1, 2, …, n) is recorded in the monitoring data difference set { dv_iPosition in (i ═ 1, 2, …, n), which is recorded as the set of monitored abnormal trip point data positions { dv-location }_k1, 2, …, p is the number of the abnormal jumping point data;

do not process the same element(ii) a Will be located in the monitoring abnormal jumping point data location set { dv-location_kPosition set of middle and temperature abnormal jumping point data { tv-location }_lThe different elements of the page are marked as the jumping point data positions; in the data set { x_iThe data value of the corresponding position is marked as NaN ″ } (i ═ 1, 2, …, n), and a processed data set { z ═ is obtained_i}(i＝1，2，…，n)。