CN113361596B

CN113361596B - Sensor data augmentation method, system and storage medium

Info

Publication number: CN113361596B
Application number: CN202110623634.2A
Authority: CN
Inventors: 饶元; 王文; 江朝晖; 朱军; 张武; 高宁
Original assignee: Anhui Agricultural University AHAU
Current assignee: Anhui Agricultural University AHAU
Priority date: 2021-06-04
Filing date: 2021-06-04
Publication date: 2022-10-04
Anticipated expiration: 2041-06-04
Also published as: CN113361596A

Abstract

The invention discloses a sensor data augmentation method, a system and a storage medium, which belong to the technical field of sensor data processing. This method uses random cropping for the drift-free data to augment the drift-free data samples, and selects multiple function models that conform to the sensor drift characteristics as trend items to construct a non-stationary random walk process with trend items to simulate sensor data. In the drift process, the drift simulation is realized by setting the sensor drift probability threshold and determining the maximum drift range according to the data characteristics. The drift-containing data samples are augmented by adding drift to the augmented drift-free data samples. The invention comprehensively considers the time and space characteristics of sensor drift data by fusing the data of each adjacent sensor in the perception field, ensures the correctness of the drift amount simulation and the diversity of the augmented data characteristics, and overcomes the problems caused by the training samples in the sensor drift calibration model. The problem of weak generalization ability of the model caused by the deficiency.

Description

A sensor data augmentation method, system and storage medium

技术领域technical field

本发明涉及传感器数据处理技术领域，更具体地说，涉及一种传感器数据增广方法、系统、存储介质以及传感器数据漂移校准方法。The present invention relates to the technical field of sensor data processing, and more particularly, to a sensor data augmentation method, system, storage medium and sensor data drift calibration method.

背景技术Background technique

农业物联网中传感器数据随着时间产生的漂移量是影响传感器采集数据质量的关键制约因素，传感器漂移现象的存在导致物联网设备的智能调控和数据的有效分析无法得到保证，由于传感器通常大规模和长时间部署，单独卸载和重新校准传感器往往难以实现，因此在无法获取无漂移真实信号的情况下，对传感器进行校准越来越重要。The drift of sensor data over time in the agricultural Internet of Things is a key constraint that affects the quality of data collected by sensors. The existence of sensor drift makes the intelligent regulation of IoT devices and the effective analysis of data cannot be guaranteed. Because sensors are usually large-scale Unloading and recalibrating sensors alone is often difficult to achieve and long-term deployment, so it is increasingly important to calibrate sensors when a drift-free true signal cannot be obtained.

目前，基于深度学习进行传感器的校准，引起了发明人的关注。其中卷积神经网络(Convolutional Neural Network,CNN)是深度学习的一个子集，已经在其他领域应用，并且取得了不错的效果。关于将CNN应用到时间序列领域的研究也逐渐兴起。例如，对脑电信号去噪，该方法不仅能够建立起噪声信号到脑电信号的准确映射，实现了实时去噪，而且能够有效提升脑电信号去噪的效率和质量。但是此方法只限于单一时序数据中去噪，并不适用于多个传感器的漂移校准。At present, the calibration of sensors based on deep learning has attracted the attention of the inventors. Among them, Convolutional Neural Network (CNN) is a subset of deep learning, which has been applied in other fields and achieved good results. Research on the application of CNN to the time series field is also gradually emerging. For example, for EEG signal denoising, this method can not only establish an accurate mapping of noise signal to EEG signal, realize real-time denoising, but also effectively improve the efficiency and quality of EEG signal denoising. However, this method is limited to denoising in a single time series data, and is not suitable for drift calibration of multiple sensors.

此外，Bao等2018年在Structural Health Monitoring,18(2),401-421中将时间序列信号转换成图像向量，在灰度图像中分段绘制，然后将随机选择和手动标记的图像向量组成的训练数据集输入到一个深度神经网络或一组深度神经网络中，这些深度神经网络通过堆叠式自动编码器和贪婪分层训练技术进行训练，能够较准确地检测时序数据，其中包括数据漂移在内的多模式异常，但是该方法只能对时序数据中的异常模式进行检测和分类，并未涉及传感器漂移的校准。Tian等在2020年在IEEE Access,8,121385-121397中，提出一种新型深度学习模型，将无监督技术和监督技术相结合实现电子鼻的漂移补偿，但此方法较为复杂且未涉及训练集构建方法研究，适用场景具有局限性。In addition, Bao et al. (2018) in Structural Health Monitoring, 18(2), 401-421 convert time-series signals into image vectors, plot them segmentally in grayscale images, and then combine randomly selected and manually labeled image vectors into The training dataset is fed into a deep neural network or set of deep neural networks trained with stacked autoencoders and greedy layered training techniques to more accurately detect time-series data, including data drift However, this method can only detect and classify abnormal patterns in time series data, and does not involve the calibration of sensor drift. In 2020, Tian et al. proposed a new deep learning model in IEEE Access, 8, 121385-121397, which combines unsupervised and supervised techniques to achieve drift compensation of electronic noses, but this method is relatively complex and does not involve training sets Research on construction methods, applicable scenarios are limited.

故，基于以上分析现有方法尚不能有效满足传感器漂移校准要求。深度学习具有从大量数据中自动学习特征的能力，将深度学习方法运用到传感器漂移校准领域中具有巨大应用前景。因此，采用深度学习来自动提取感知数据中的漂移特征，实现传感器的漂移校准是十分有意义的工作。然而神经网络等深度学习模型的训练需要大量的数据样本，如何构建充足的含漂移数据样本和迭代输入训练样本的合适尺寸，是本领域人员亟待解决的问题；另一方面，如何保证提取漂移特征的准确性和校准的精确度也是一个难题。Therefore, based on the above analysis, the existing methods cannot effectively meet the sensor drift calibration requirements. Deep learning has the ability to automatically learn features from a large amount of data, and the application of deep learning methods to the field of sensor drift calibration has great application prospects. Therefore, it is very meaningful to use deep learning to automatically extract the drift features in the sensory data and realize the drift calibration of the sensor. However, the training of deep learning models such as neural networks requires a large amount of data samples. How to construct sufficient data samples with drift and the appropriate size of iterative input training samples is an urgent problem to be solved by those in the field. On the other hand, how to ensure the extraction of drift features The accuracy of the calibration and the precision of the calibration are also a challenge.

发明内容SUMMARY OF THE INVENTION

1.要解决的问题1. The problem to be solved

针对现有技术中神经网络等深度学习模型的训练需要大量的数据样本，如何构建充足的含漂移数据样本和迭代输入训练样本的合适尺寸的问题，本发明提供了一种传感器数据增广方法，该方法对无漂移数据采用随机裁剪的方法实现无漂移数据样本的增广，选取多个符合传感器漂移特性的函数模型作为趋势项构建含趋势项的非平稳随机游走过程用以模拟传感器数据漂移过程，通过设置传感器发生漂移概率阈值和依据数据特征确定最大漂移范围，模拟数据漂移特征实现漂移量仿真，采用向增广后的无漂移数据样本中加入漂移量的方式实现含漂移数据样本的增广，本发明通过融合感知场内各邻近传感器数据统筹考虑了传感器漂移数据的时间和空间特征，保证了漂移量仿真的正确性和增广数据特征的多样性，克服了基于深度学习的传感器漂移校准方法因训练样本不足造成的模型泛化能力弱的问题。采用在残差网络(Residual Networks,ResNet)的残差块中嵌入挤压-激励(Squeeze and Excitation,SE)模块，构建基于挤压-激励残差网络(Squeeze-and-Excitation Residual Networks,SE-ResNet)的漂移校准方法实现传感器漂移数据校准，有效提高传感器漂移校准的效率和质量。Aiming at the problem that training of deep learning models such as neural networks in the prior art requires a large number of data samples, how to construct sufficient drift-containing data samples and the appropriate size of iteratively input training samples, the present invention provides a sensor data augmentation method, This method uses random cropping for the drift-free data to augment the drift-free data samples, and selects multiple function models that conform to the sensor drift characteristics as trend items to construct a non-stationary random walk process with trend items to simulate sensor data drift. In the process, by setting the sensor drift probability threshold and determining the maximum drift range according to the data characteristics, simulating the data drift characteristics to realize the drift amount simulation, and adding the drift amount to the augmented drift-free data samples to realize the increase of the drift-containing data samples. The invention comprehensively considers the time and space characteristics of sensor drift data by fusing the data of each adjacent sensor in the perception field, ensures the correctness of the drift amount simulation and the diversity of the augmented data characteristics, and overcomes the sensor drift based on deep learning. The problem of weak generalization ability of the model caused by insufficient training samples in the calibration method. The Squeeze and Excitation (SE) module is embedded in the residual block of Residual Networks (ResNet) to construct a Squeeze-and-Excitation Residual Networks (SE-) based on Squeeze-and-Excitation Residual Networks (SE-) ResNet) drift calibration method realizes sensor drift data calibration, which effectively improves the efficiency and quality of sensor drift calibration.

2.技术方案2. Technical solutions

为解决上述问题，本发明采用如下的技术方案。In order to solve the above problems, the present invention adopts the following technical solutions.

本发明第一方面提供一种传感器数据增广方法，所述方法包括：A first aspect of the present invention provides a sensor data augmentation method, the method comprising:

SA：部署已校准的传感器，构建目标感知场；SA: Deploy calibrated sensors to build target perception fields;

SB：采集目标感知场内已校准的传感器的第一感知数据，利用随机裁剪法将所述第一感知数据裁剪为多个数据矩阵，从单个所裁剪的数据矩阵中依据矩阵内各行对应传感器在感知场中的位置，选取该数据矩阵中部分行数据构建为邻近传感器数据矩阵，确定第一增广数据；SB: Collect the first perception data of the calibrated sensor in the target perception field, use the random clipping method to clip the first perception data into multiple data matrices, and from the single clipped data matrix, according to the rows of the matrix corresponding to the sensor in the matrix. The position in the perception field, select part of the row data in the data matrix to construct the adjacent sensor data matrix, and determine the first augmented data;

SC：依据所述第一感知数据裁剪的数据矩阵，利用含趋势项的非平稳随机游走过程对传感器数据进行漂移过程仿真，获取第二感知数据，从第二感知数据中选取和第一增广数据相同位置和尺寸的漂移量矩阵，确定为第二增广数据。SC: According to the data matrix tailored from the first sensing data, use the non-stationary random walk process with trend items to simulate the drift process of the sensor data, obtain the second sensing data, select and first increase the second sensing data from the second sensing data. The drift matrix of the same position and size of the augmented data is determined as the second augmented data.

在一些实施例中，将所述根据目标感知场内的已校准的N个传感器所采集的连续L个采样点确定第一感知数据，将其记作大小为N×L的无漂移数据矩阵X，其中无漂移数据矩阵X中的N个矩阵行数据分别对应感知场内N个传感器所采集的数据；In some embodiments, the first sensing data is determined according to the consecutive L sampling points collected by the calibrated N sensors in the target sensing field, and denoted as a drift-free data matrix X with a size of N×L , wherein the N matrix row data in the drift-free data matrix X respectively correspond to the data collected by N sensors in the sensing field;

从所述无漂移数据矩阵X中随机裁剪出多个大小为N×l的数据矩阵，将所裁剪得到的任一数据矩阵记为X^l，所述数据矩阵X^l的列数所述无漂移数据矩阵X的列数，从单个所裁剪的数据矩阵X^l中依据各矩阵行对应的传感器在感知场中的位置，从该数据矩阵X^l中选取n行数据构建大小为n×l的邻近传感器数据矩阵X^b，并记为第一增广数据；具体可表示为：A plurality of data matrices with a size of N×1 are randomly cut out from the drift-free data matrix X, and any data matrix obtained by cutting is denoted as X ^l , and the number of columns of the data matrix X ^l is described as the drift-free data matrix The number of columns of the data matrix X, from a single cropped data matrix X ^l , according to the position of the sensor corresponding to each matrix row in the perception field, select n rows of data from the data matrix X ^l to construct a neighborhood of size n × l. The sensor data matrix X ^b , and denoted as the first augmented data; it can be specifically expressed as:

其中

为采样点的裁剪初始位置，l为裁剪数据矩阵的列数即各矩阵行对应传感器连续采样点的个数；1:N表示取矩阵的第1到第N行的数据，

表示取矩阵的第

到

列数据；n为从裁剪数据矩阵X^l中选取的行数。in

is the initial cropping position of the sampling point, l is the number of columns of the cropping data matrix, that is, the number of consecutive sampling points of the sensor corresponding to each matrix row; 1:N means to take the data from the 1st to the Nth row of the matrix,

Represents the first order of the matrix

arrive

Column data; n is the number of rows selected from the crop data matrix ^Xl .

采用含趋势项的非平稳随机游走过程构建N×l大小的漂移量仿真矩阵D^l，按照从X^l中选取n行数据构建X^b的方法，从D^l中选取相同行位置的n行数据构建大小为n×l的漂移量矩阵D^b，作为第二增广数据；具体可表示为：A non-stationary random walk process with trend items is used to construct a drift simulation matrix D ^l of size N×l. According to the method of selecting n rows of data from X ^l to construct X ^b , n rows of the same row position are selected from D ^l . The data constructs a drift matrix D ^b with a size of n×l as the second augmented data; it can be specifically expressed as:

其中，d为各传感器不同时刻的漂移量。Among them, d is the drift amount of each sensor at different times.

在一些实施例中，所述邻近传感器个数n的选择方法为：In some embodiments, the method for selecting the number n of adjacent sensors is:

根据数据增广应用场景自适应确定邻近传感器数量n的大小，从所裁剪的数据矩阵X^l中随机选择一个矩阵行作为基准，按照感知场内其它传感器与该矩阵行对应的基准传感器间的欧式距离由小到大依次选择n-1个邻近传感器，最终选择的n个传感器对应的数据矩阵X^l中的矩阵行数据即为需要选择的n行数据。According to the data augmentation application scenario, the size of the number of adjacent sensors n is adaptively determined, and a matrix row is randomly selected from the cropped data matrix ^Xl as the reference, and the Euclidean relationship between other sensors in the sensing field and the reference sensor corresponding to the matrix row is used. From small to large, select n-1 adjacent sensors in sequence, and the matrix row data in the data matrix X ^l corresponding to the n sensors finally selected is the n row data to be selected.

在一些实施例中，所裁剪数据矩阵的长度l由传感器采集数据时间间隔和数据分布的特征周期确定，计算方式为：In some embodiments, the length l of the cropped data matrix is determined by the time interval of the data collected by the sensor and the characteristic period of the data distribution, and the calculation method is as follows:

其中，T为传感器数据流规律分布的特征周期，Δt为传感器采集数据时间间隔，λ为正有理数，一般依据所需训练样本尺寸选择适宜的正整数。Among them, T is the characteristic period of the regular distribution of sensor data flow, Δt is the time interval of sensor data collection, and λ is a positive rational number. Generally, an appropriate positive integer is selected according to the required training sample size.

在一些实施例中，采用含趋势项的非平稳随机游走过程构建N×l大小的传感器漂移量仿真矩阵D^l，按照从X^l中选取n行数据构建X^b的方法，从D^l中选取相同行位置的n行数据构建为大小为n×l的漂移量矩阵D^b，并记为第二增广数据；In some embodiments, a non-stationary random walk process with a trend term is used to construct a sensor drift simulation matrix D ^l of size N×l, and according to the method of selecting n rows of data from X ^l to construct X ^b , from D ^l Select n rows of data at the same row position to construct a drift matrix D ^b with a size of n×l, and denote it as the second augmented data;

设置传感器漂移概率阈值，采用含趋势项的非平稳随机游走过程对传感器数据进行漂移过程仿真，仿真后各传感器数据可表示为：The sensor drift probability threshold is set, and a non-stationary random walk process with a trend term is used to simulate the drift process of the sensor data. After the simulation, each sensor data can be expressed as:

其中，y_i,t为仿真后传感器i在t时刻的含漂移数据，x_i,t为传感器i在t时刻的无漂移数据，d_i,t为传感器i通过仿真在t时刻产生的漂移量，rand(0,1)为0到1之间的随机浮点数，α为浮点数且α∈(0,1)为传感器发生漂移的概率阈值，当rand(0,1)>α时，传感器i不发生漂移，产生的漂移量为零；反之，则发生漂移；Among them, y _i,t is the drift-containing data of sensor i at time t after simulation, xi _, t is the drift-free data of sensor i at time t, d _i,t is the amount of drift generated by sensor i at time t through simulation , rand(0,1) is a random floating point number between 0 and 1, α is a floating point number and α∈(0,1) is the probability threshold of sensor drift, when rand(0,1)>α, the sensor i does not drift, and the resulting drift is zero; otherwise, drift occurs;

各传感器数据在进行漂移量仿真时自适应选择线性函数、指数函数、正方根函数和正弦函数的其中一种作为所述含趋势项的非平稳随机游走过程中的趋势项，用以生成相应的漂移趋势；漂移量仿真具体方式如下：Each sensor data adaptively selects one of a linear function, an exponential function, a square root function and a sine function as the trend item in the non-stationary random walk process with the trend item during the drift simulation, so as to generate the corresponding trend item. The drift trend of ; the specific method of drift simulation is as follows:

线性漂移趋势中，传感器i在t时刻的漂移量可表示为：In the linear drift trend, the drift of sensor i at time t can be expressed as:

指数漂移趋势中，传感器i在t时刻的漂移量可表示为：In the exponential drift trend, the drift of sensor i at time t can be expressed as:

正方根漂移趋势中，传感器i在t时刻的漂移量可表示为：In the square root drift trend, the drift of sensor i at time t can be expressed as:

正弦漂移趋势中，传感器i在t时刻的漂移量可表示为：In the sinusoidal drift trend, the drift of sensor i at time t can be expressed as:

其中，e为各趋势项中最大漂移量，u_i,t为随机游走数据量，r为角速度参数。Among them, e is the maximum drift in each trend item, ui _{, t} is the amount of random walk data, and r is the angular velocity parameter.

在一些实施例中，各趋势项漂移量仿真参数选择方法如下：In some embodiments, the method for selecting the simulation parameters of each trend item drift is as follows:

各个趋势项中最大漂移量e的取值应由所采集数据特征和裁剪数据矩阵的列数共同决定，且与数据矩阵列数的大小正相关，具体为

其中，s为特征周期T内数据归一化后的标准差，Δt为传感器数据采样间隔。随机游走数据量u_i,t～iid(0,σ²)，其中

角速度参数r用于调整正弦周期，且

The value of the maximum drift e in each trend item should be determined by the characteristics of the collected data and the number of columns of the cropped data matrix, and is positively related to the number of columns in the data matrix, specifically:

Among them, s is the standard deviation of the normalized data in the characteristic period T, and Δt is the sampling interval of the sensor data. The random walk data volume u _i,t ～iid(0,σ ² ), where

The angular velocity parameter r is used to adjust the sine period, and

本发明第二方面提供一种传感器漂移校准方法，所述方法包括：A second aspect of the present invention provides a sensor drift calibration method, the method comprising:

获取待校准的传感器数据，将所述待校准的传感器数据输入到训练完成的挤压-激励残差网络模型中，输出对应的漂移校准数据。Acquire sensor data to be calibrated, input the sensor data to be calibrated into the trained squeeze-excitation residual network model, and output corresponding drift calibration data.

其中所述挤压-激励残差网络模型是通过多组数据样本训练得到的；所述多组数据样本通过第一增广数据和第二增广数据构建的数据集。Wherein, the squeeze-excitation residual network model is obtained by training multiple sets of data samples; the multiple sets of data samples are a data set constructed by the first augmented data and the second augmented data.

在一些实施例中，所述通过第一增广数据和第二增广数据构建的数据集步骤包括：In some embodiments, the step of constructing a dataset from the first augmented data and the second augmented data includes:

将第一增广数据作为无漂移数据样本X^b，将第二增广数据作为漂移量仿真数据样本D^b；Taking the first augmented data as the drift-free data sample X ^b , and taking the second augmented data as the drift simulation data sample D ^b ;

将所述无漂移数据样本和漂移量仿真数据样本执行矩阵相加获得含漂移数据样本Y^b；采用多个无漂移数据样本X^b组成无漂移数据集X^M，采用多个含漂移数据样本组成含漂移数据集Y^M；将含漂移数据集Y^M和无漂移数据集X^M作为训练挤压-激励残差网络模型的数据集。本发明第三方面提供一种传感器数据增广系统，包括：Perform matrix addition of the drift-free data samples and the drift-amount simulation data samples to obtain a drift-containing data sample Y ^b ; use a plurality of drift-free data samples X ^b to form a drift-free data set X ^M , and use a plurality of drift-containing data samples to form The data set Y ^M with drift; the data set Y ^M with drift and the data set X ^M without drift are used as the data sets for training the squeeze-excitation residual network model. A third aspect of the present invention provides a sensor data augmentation system, including:

感知场构建模块，其用于部署已校准的传感器，构建目标感知场；A perception field building block for deploying calibrated sensors to construct a target perception field;

无漂移数据样本构建模块，其用于采集目标感知场内已校准的传感器的第一感知数据，利用随机裁剪法将所述第一感知数据裁剪为多个数据矩阵，从单个所裁剪的数据矩阵中依据矩阵内各行对应传感器在感知场中的位置，选取该数据矩阵中部分行构建为邻近传感器数据矩阵为第一增广数据，其中所述第一增广数据为无漂移数据样本；The drift-free data sample building module is used to collect the first perception data of the calibrated sensor in the target perception field, and use the random clipping method to clip the first perception data into multiple data matrices, and from a single clipped data matrix According to the position of each row in the matrix corresponding to the sensor in the sensing field, select some rows in the data matrix to construct the adjacent sensor data matrix as the first augmented data, wherein the first augmented data is a drift-free data sample;

漂移量仿真数据样本构建模块，其用于根据所述第一感知数据裁剪的数据矩阵，利用含趋势项的非平稳随机游走过程对传感器数据进行漂移过程仿真，获取第二感知数据，从第二感知数据中选取和第一增广数据相同位置和尺寸的漂移量矩阵，确定为第二增广数据，其中所述第二增广数据为漂移量仿真数据样本。The drift simulation data sample building module is used to simulate the drift process of the sensor data by using the non-stationary random walk process containing the trend term according to the data matrix tailored from the first sensing data, and obtain the second sensing data. The drift matrix with the same position and size as the first augmented data is selected from the second perceptual data, and determined as the second augmented data, wherein the second augmented data is a drift simulation data sample.

本发明第四方面提供一种可读存储介质，所述存储介质存储有计算机程序，所述计算机程序包括程序指令，所述程序指令当被处理器执行时使所述处理器执行上述的方法。A fourth aspect of the present invention provides a readable storage medium, the storage medium stores a computer program, the computer program includes program instructions, the program instructions when executed by a processor cause the processor to perform the above method.

3.有益效果3. Beneficial effects

与现有技术相比，本发明具有明显的技术优势：Compared with the prior art, the present invention has obvious technical advantages:

(1)本发明提出的一种数据增广方法，通过采用含趋势项的非平稳随机游走过程构建传感器数据漂移过程，并利用随机裁剪和选取邻近传感器数据的方法实现传感器数据的增广操作，增加了增广数据的特征多样性；且该数据增广方法能够融合传感器漂移数据的时间和空间特征，克服了传感器漂移校准模型中因训练样本不足造成的模型泛化能力弱的问题。(1) A data augmentation method proposed by the present invention constructs a sensor data drift process by adopting a non-stationary random walk process with trend items, and realizes the augmentation operation of sensor data by randomly cropping and selecting adjacent sensor data. , which increases the feature diversity of augmented data; and the data augmentation method can fuse the temporal and spatial features of sensor drift data, which overcomes the problem of weak model generalization ability caused by insufficient training samples in the sensor drift calibration model.

(2)本发明提出的一种数据增广方法，采用含趋势项的非平稳随机游走过程构建传感器漂移过程，根据传感器漂移的不确定性设置传感器发生漂移的概率阈值，同时依据感知数据特征设置趋势项最大漂移量的范围来控制感知数据漂移的幅度，增加了可控性；采用4类符合漂移特性的函数模型作为趋势项模拟传感器漂移特征，各漂移量仿真参数均基于数据特征进行选择，保证了该漂移数据样本构建方法在不同应用场景和不同数据种类下的适用性。(2) In a data augmentation method proposed by the present invention, a non-stationary random walk process with a trend term is used to construct a sensor drift process, the probability threshold of sensor drift is set according to the uncertainty of sensor drift, and the probability threshold of sensor drift is set according to the sensor data characteristics. The range of the maximum drift amount of the trend item is set to control the amplitude of the drift of the sensing data, which increases the controllability; 4 types of function models conforming to the drift characteristics are used as the trend item to simulate the drift characteristics of the sensor, and the simulation parameters of each drift amount are selected based on the data characteristics. , which ensures the applicability of the drift data sample construction method in different application scenarios and different data types.

(3)本发明提出的一种数据增广方法，将感知场内各邻近传感器数据融合在同一训练样本中，根据传感器数据的采样间隔和数据特征周期确定裁剪数据矩阵尺寸作为传感器漂移校准网络模型迭代输入值，既能避免数据样本迭代输入导致数据特征丢失问题，又能够保证神经网络输入尺寸适中，有效降低了运算代价。(3) In a data augmentation method proposed by the present invention, the data of each adjacent sensor in the sensing field is fused into the same training sample, and the size of the cropped data matrix is determined according to the sampling interval and data characteristic period of the sensor data as the sensor drift calibration network model The iterative input value can not only avoid the loss of data features caused by the iterative input of data samples, but also ensure that the input size of the neural network is moderate, which effectively reduces the computational cost.

(4)本发明提出的一种传感器数据漂移校准方法，采用挤压-激励残差网络构建传感器漂移校准模型，其中包含的挤压-激励残差模块能利用邻近传感器数据的相关性，提取数据的时间和空间特征实现了传感器漂移数据的校准；且提出的数据增广方法能够融合传感器漂移数据的时间和空间特征，为训练深度神经网络提供数据保障。(4) A sensor data drift calibration method proposed by the present invention uses a squeeze-excitation residual network to construct a sensor drift calibration model, and the squeeze-excitation residual module contained therein can utilize the correlation of adjacent sensor data to extract data. The temporal and spatial features of the sensor realize the calibration of sensor drift data; and the proposed data augmentation method can fuse the temporal and spatial features of sensor drift data to provide data guarantee for training deep neural networks.

(5)本发明提出的一种传感器数据漂移校准方法，能为精准提取多个传感器数据中的漂移特征提供保障，保证了同时对感知场内多个传感器数据特征提取和漂移补偿的可靠性。采用挤压-激励残差网络不仅能有效利用邻近传感器数据的相关性而且增加了网络深度，有效提升了传感器数据漂移校准质量。(5) The sensor data drift calibration method proposed by the present invention can provide guarantee for accurately extracting drift features in multiple sensor data, and ensure the reliability of feature extraction and drift compensation for multiple sensor data in the sensing field at the same time. Using the squeeze-excitation residual network can not only effectively utilize the correlation of adjacent sensor data but also increase the network depth and effectively improve the quality of sensor data drift calibration.

(6)本发明中提出的数据增广方法具有较强的可扩展性，感知场内的多个传感器数据和单个传感器不同时间段的数据皆可作为数据增广所需的基础数据，将基础数据通过此数据增广方法获取所需数据集。(6) The data augmentation method proposed in the present invention has strong scalability, and the data of multiple sensors in the sensing field and the data of a single sensor in different time periods can be used as the basic data required for data augmentation. The data obtains the desired dataset through this data augmentation method.

附图说明Description of drawings

图1为本发明实施例提供的一种数据增广方法流程图；1 is a flowchart of a data augmentation method provided by an embodiment of the present invention;

图2为本发明实施例提供的一种数据增广系统框图；2 is a block diagram of a data augmentation system provided by an embodiment of the present invention;

图3为本发明提供的数据集扩展方法示意图；3 is a schematic diagram of a data set expansion method provided by the present invention;

图4为本发明实施例提供的数据集构建流程图；Fig. 4 is the data set construction flow chart provided by the embodiment of the present invention;

图5为本发明提供的传感器漂移校准方法流程图；5 is a flowchart of a sensor drift calibration method provided by the present invention;

图6为本发明提供的SE-ResNet残差模块示意图；6 is a schematic diagram of the SE-ResNet residual module provided by the present invention;

图7为本发明实施例提供的不同漂移趋势下SE-ResNet和ResNet漂移校准误差柱状图；7 is a histogram of drift calibration errors of SE-ResNet and ResNet under different drift trends provided by an embodiment of the present invention;

具体实施方式Detailed ways

下面，将参考附图详细地描述根据本申请的实施例。显然，所描述的实施例仅仅是本申请的一部分实施例，而不是本申请的全部实施例，应理解，本申请不受这里描述的实施例的限制。Hereinafter, embodiments according to the present application will be described in detail with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments of the present application, and it should be understood that the present application is not limited by the embodiments described herein.

实施例1Example 1

如图1所示，本申请实施例公开了一种传感器数据增广方法，总体流程如下：As shown in FIG. 1 , an embodiment of the present application discloses a sensor data augmentation method, and the overall process is as follows:

SA：部署已校准的传感器，构建目标感知场。SA: Deploy calibrated sensors to build target perception fields.

具体的，在土壤环境信息监测中，部署已校准的传感器，构建目标感知场，本示例中目标感知场内所布置的传感器为土壤温湿度传感器，个数为20个，传感器采样间隔为5分钟，感知数据流规律分布的特征周期为24小时，校准传感器后取2020年2月1日至4月30日共90天的土壤温度数据作为传感器初期采集数据。其中，周期特征指的是感知数据随着时间变化，其分布规律呈现周期性。Specifically, in the monitoring of soil environmental information, calibrated sensors are deployed to construct a target perception field. In this example, the sensors arranged in the target perception field are soil temperature and humidity sensors, the number of which is 20, and the sensor sampling interval is 5 minutes. , the characteristic period of the regular distribution of the sensing data flow is 24 hours. After calibrating the sensor, the soil temperature data for a total of 90 days from February 1 to April 30, 2020 was taken as the initial acquisition data of the sensor. Among them, the periodic feature refers to the fact that the sensory data changes with time, and its distribution law presents periodicity.

通过采集感知场内已校准传感器的初期感知数据，此数据可以被认为是无漂移的，以此数据作为构建传感器漂移校准网络模型所需的漂移数据样本和校准数据的基础。By collecting initial sensing data from calibrated sensors within the sensing field, this data can be considered drift-free, and this data serves as the basis for the drift data samples and calibration data needed to build the sensor drift calibration network model.

SB：采集目标感知场内已校准传感器的第一感知数据，利用随机裁剪法将所述第一感知数据裁剪为多个数据矩阵，从单个所裁剪的数据矩阵中依据矩阵内各行对应传感器在感知场中的位置，选取该数据矩阵中部分行数据构建为邻近传感器数据矩阵，确定第一增广数据。SB: Collect the first perception data of the calibrated sensor in the target perception field, and use the random clipping method to clip the first perception data into multiple data matrices. position in the field, select some row data in the data matrix to construct the adjacent sensor data matrix, and determine the first augmented data.

具体的，包括如下步骤：SB1：将所述第一感知数据建模为无漂移数据矩阵。Specifically, it includes the following steps: SB1: Model the first perception data as a drift-free data matrix.

从目标感知场中选取校准后土壤温度传感器所采集的2020年2月1日至4月30日共90天的土壤温度数据作为第一感知数据。传感器的个数N为20个，传感器采样间隔为5分钟，特征周期为24小时，将其记作大小为20×25920的无漂移数据矩阵X，其中无漂移数据矩阵X中的20个矩阵行数据分别对应感知场内20个传感器所采集的数据；The soil temperature data collected by the calibrated soil temperature sensor for a total of 90 days from February 1 to April 30, 2020 were selected from the target perception field as the first perception data. The number N of sensors is 20, the sensor sampling interval is 5 minutes, and the characteristic period is 24 hours, which is recorded as a drift-free data matrix X with a size of 20×25920, in which there are 20 matrix rows in the drift-free data matrix X The data respectively correspond to the data collected by 20 sensors in the sensing field;

SB2：将所述无漂移数据矩阵X裁剪为多个数据矩阵，所述裁剪的数据矩阵X^l的列数l小于所述无漂移数据矩阵X的列数L。SB2: Trim the drift-free data matrix X into multiple data matrices, where the column number ^l of the trimmed data matrix X1 is smaller than the column number L of the drift-free data matrix X.

具体的，为避免数据特征的损失，需根据具体监测场景下的数据采集特征，将无漂移数据矩阵裁剪为尺寸较小的数据矩阵，从上述无漂移数据矩阵X中随机裁剪出大小为N×l的数据矩阵记为X^l。Specifically, in order to avoid the loss of data features, it is necessary to cut the drift-free data matrix into a data matrix with a smaller size according to the data collection features in a specific monitoring scenario, and randomly cut out a size of N × from the drift-free data matrix X above. The data matrix of l is denoted as X ^l .

SB3：从所述的裁剪的数据矩阵中从单个所裁剪的数据矩阵中依据矩阵中各行对应传感器在感知场中的位置，选取该数据矩阵中部分行数据构建为邻近传感器数据矩阵确定第一增广数据。SB3: From the cropped data matrix, from a single cropped data matrix, according to the position of the sensor in the sensing field corresponding to each row in the matrix, select some row data in the data matrix to construct the adjacent sensor data matrix to determine the first augmentation data.

从该数据矩阵X^l中选取n行数据构建大小为n×l的邻近传感器数据矩阵X^b，并记为第一增广数据，具体方式为：Select n rows of data from the data matrix X ^l to construct a neighboring sensor data matrix X ^b with a size of n×l, and record it as the first augmented data. The specific method is as follows:

其中

为采样点的裁剪初始位置，l为裁剪数据矩阵的列数即各矩阵行对应传感器连续采样点的个数。1:N表示取矩阵的第1到第N行的数据，

表示取矩阵的第

到

列数据。n为从裁剪数据矩阵X^l中选取的行数。in

is the initial cropping position of the sampling point, and l is the number of columns of the cropping data matrix, that is, the number of consecutive sensor sampling points corresponding to each matrix row. 1:N means to take the data from the 1st to the Nth row of the matrix,

Represents the first order of the matrix

arrive

column data. n is the number of rows selected from the cropped data matrix ^Xl .

作为一个变化例，本示例中所述传感器数量n的选择方法为：As a variation example, the method for selecting the number n of sensors in this example is:

根据神经网络感受野确定邻近传感器数量n的大小，从所裁剪的数据矩阵X^l中随机选择一个矩阵行作为基准，按照感知场内其它传感器与该矩阵行对应的基准传感器间的欧式距离由小到大依次选择n-1个邻近传感器，最终选择的n个传感器对应的数据矩阵X^l中的矩阵行数据即为需要选择的n行数据。本示例中n选择10，其中n由神经网络感受野确定。需要说明的是，传感器的个数选择决定每个数据样本的大小，神经网络感受野是由卷积核的尺寸和步长决定，卷积核在数据样本上滑动卷积获取特征。Determine the size of the number n of adjacent sensors according to the receptive field of the neural network, randomly select a matrix row from the cropped data matrix X ^l as the reference, according to the Euclidean distance between other sensors in the perception field and the reference sensor corresponding to the matrix row from small Select n-1 adjacent sensors in sequence, and the matrix row data in the data matrix X ^l corresponding to the n sensors finally selected is the n row data to be selected. In this example, n is selected as 10, where n is determined by the neural network receptive field. It should be noted that the selection of the number of sensors determines the size of each data sample, and the receptive field of the neural network is determined by the size and step size of the convolution kernel, which slides convolution on the data sample to obtain features.

所裁剪数据矩阵的列数l由传感器采集数据时间间隔和数据分布的特征周期确定，具体计算方式为：The number of columns l of the cropped data matrix is determined by the time interval of the data collected by the sensor and the characteristic period of the data distribution. The specific calculation method is as follows:

本示例中λ选择8由式(3)计算得数据矩阵列数l为2304，则数据矩阵X^l可表示为形状是20×2304的二维矩阵，经过式(2)选取邻近10个传感器数据构建形状为10×2304的无漂移数据样本X^b。In this example, λ is selected 8, and the number of data matrix columns l calculated by formula (3) is 2304, then the data matrix X ^l can be expressed as a two-dimensional matrix with a shape of 20 × 2304, and 10 adjacent sensor data are selected through formula (2) A drift-free data sample X ^b of shape 10×2304 is constructed.

SC：采用含趋势项的非平稳随机游走过程构建N×l大小的漂移量仿真矩阵D^l，按照从X^l中选取n个矩阵行构建X^b的相同位置从D^l中选取n个矩阵行数据构建为大小为n×l的漂移量矩阵D^b，作为第二增广数据，其中所述第二感知数据为漂移量仿真矩阵；第二增广数据为漂移量仿真数据样本，具体可表示为：SC: Use a non-stationary random walk process with trend items to construct a drift simulation matrix D ^l of size N×l, and select n matrices from D ^l according to the same position of selecting n matrix rows from X ^l to construct X ^b The row data is constructed as a drift amount matrix D ^b with a size of n×1, which is used as the second augmented data, wherein the second sensing data is a drift amount simulation matrix; the second augmented data is a drift amount simulation data sample, which can be specifically Expressed as:

其中d为各传感器不同时刻的漂移量。where d is the drift of each sensor at different times.

具体的，构建传感器漂移过程时，一般情况下不同传感器间的发生漂移是相互独立的，本示例设置传感器漂移概率阈值，采用含趋势项的非平稳随机游走过程对传感器数据进行漂移过程仿真，仿真后各传感器数据可表示为：Specifically, when the sensor drift process is constructed, the drift between different sensors is generally independent of each other. In this example, the sensor drift probability threshold is set, and a non-stationary random walk process with a trend term is used to simulate the drift process of the sensor data. The sensor data after simulation can be expressed as:

其中，y_i,t为仿真后传感器i在t时刻的含漂移数据，x_i,t为传感器i在t时刻的无漂移数据，d_i,t为传感器i通过仿真在t时刻产生的漂移量。rand(0,1)为0到1之间的随机浮点数，α为浮点数且α∈(0,1)为传感器发生漂移的概率阈值，当rand(0,1)>α时，传感器i不发生漂移，产生的漂移量为零；反之，则发生漂移；Among them, y _i,t is the drift-containing data of sensor i at time t after simulation, xi _, t is the drift-free data of sensor i at time t, d _i,t is the amount of drift generated by sensor i at time t through simulation . rand(0,1) is a random floating point number between 0 and 1, α is a floating point number and α∈(0,1) is the probability threshold of sensor drift, when rand(0,1)>α, sensor i If no drift occurs, the amount of drift generated is zero; otherwise, drift occurs;

各传感器数据在进行漂移量仿真时自适应选择线性函数、指数函数、正方根函数和正弦函数的其中一种作为所述含趋势项的非平稳随机游走过程中的趋势项，用以生成相应的漂移趋势。漂移量仿真具体方式如下：Each sensor data adaptively selects one of a linear function, an exponential function, a square root function and a sine function as the trend item in the non-stationary random walk process with the trend item during the drift simulation, so as to generate the corresponding trend item. drift trend. The specific method of drift simulation is as follows:

其中，e为各趋势项中最大漂移量，u_i,t为随机游走数据量，r_i为角速度参数。Among them, e is the maximum drift in each trend item, _ui _{, t} is the amount of random walk data, and ri is the angular velocity parameter.

角速度参数r用于调整正弦周期，且

The angular velocity parameter r is used to adjust the sine period, and

在本示例中，为了保证漂移样本的充足性，在采样点个数为l的同一时间段内各传感器发生的漂移概率阈值设置为0.5，即在同一数据矩阵的10个传感器中发生漂移的概率为50％。4类漂移量仿真方式可以根据应用场景调整各种漂移发生概率，本示例中分别采用4类传感器漂移趋势构建4类数据集。经计算各趋势项中最大漂移量e～U(2.97，5.94)，随机游走数据量u_i,t～iid(0,σ²)中σ～U(0.03,0.06)；角速度参数r～U(2，4)。In this example, in order to ensure the sufficiency of drift samples, the drift probability threshold of each sensor in the same time period when the number of sampling points is 1 is set to 0.5, that is, the probability of drift in 10 sensors in the same data matrix is set to 0.5. is 50%. The four types of drift simulation methods can adjust the probability of various drift occurrences according to application scenarios. In this example, four types of sensor drift trends are used to construct four types of data sets. After calculating the maximum drift e~U(2.97, 5.94) in each trend item, σ~U(0.03, 0.06) in the random walk data u _i,t ~iid(0,σ ² ); angular velocity parameter r~U (2, 4).

在一种可能的实施方式中，将所述第一增广数据和第二增广数据构建数据集，具体如下所示：In a possible implementation manner, a dataset is constructed from the first augmented data and the second augmented data, which is specifically as follows:

SD：将无漂移数据样本X^b与漂移量仿真数据样本D^b执行矩阵相加获得含漂移数据样本Y^b。SD: Perform matrix addition of the drift-free data sample X ^b and the drift-amount simulation data sample D ^b to obtain the drift-containing data sample Y ^b .

具体的，将无漂移数据样本X^b和漂移量仿真数据样本D^b执行矩阵相加获得的含漂移数据样本应为：Specifically, the drift-containing data sample obtained by performing matrix addition of the drift-free data sample X ^b and the drift simulation data sample D ^b should be:

Y^b＝X^b+D^b (11)Y ^b =X ^b +D ^b (11)

SE：选取多个漂移数据样本X^b和多个含漂移数据样本Y^b分别组成无漂移数据集X^M和含漂移数据集Y^M；将无漂移数据集X^M、含漂移数据集Y^M分别各自划分一部分作为网络训练集，另一部分作为网络测试集，所述网络训练集与网络测试集之间无交叉。SE: Select multiple drift data samples X ^b and multiple drift data samples Y ^b to form a drift-free data set X ^M and a drift-containing data set Y ^M respectively; combine the drift-free data set X ^M and the drift-containing data set Y ^M respectively Each is divided into a part as a network training set and the other part as a network test set, and there is no overlap between the network training set and the network test set.

具体的，从感知场获20×25920的测量矩阵根据式(1)随机裁剪8000个大小为20×2304的数据矩阵X^l，选取10个邻近传感器按照式(2)获得8000个大小为10×2304的无漂移数据样本X^b作为无漂移数据集，按照此方式同样得到8000个大小为10×2304的含漂移数据样本Y^b作为含漂移数据集Y^M。综上，本示例在传感器数据样本构建过程中，根据物联网数据流中数据的特征周期和采样间隔确定每次迭代输入的训练样本尺寸，采用含趋势项的非平稳随机游走过程对传感器漂移过程仿真，为传感器漂移校准提供了大规模高质量的基础数据，能够解决采用深度学习方法进行传感器校准时面临的训练样本不足问题，为提高数据采集质量提供保障。Specifically, a measurement matrix of 20×25920 is obtained from the perception field and 8000 data matrices X ^l of size 20×2304 are randomly cropped according to formula (1), and 10 adjacent sensors are selected to obtain 8000 data matrices of size 10×10 according to formula (2). The non-drift data sample X ^b of 2304 is used as the non-drift data set, and 8000 data samples Y ^b with drift of size 10×2304 are also obtained in this way as the data set with drift Y ^M . To sum up, in this example, during the construction of sensor data samples, the size of the training samples input for each iteration is determined according to the characteristic period and sampling interval of the data in the IoT data stream, and the non-stationary random walk process with trend items is used to control the sensor drift. Process simulation provides large-scale and high-quality basic data for sensor drift calibration, which can solve the problem of insufficient training samples when using deep learning methods for sensor calibration, and provide a guarantee for improving the quality of data collection.

实施例2Example 2

如图5所示，在实施例1的基础上本实施例公开了一种传感器漂移校准方法，总体流程如下：As shown in FIG. 5 , on the basis of Embodiment 1, this embodiment discloses a sensor drift calibration method, and the overall process is as follows:

获取待校准的传感器数据，将所述待校准的传感器数据输入到训练好的挤压-激励残差网络模型中，输出对应的漂移校准数据；acquiring sensor data to be calibrated, inputting the sensor data to be calibrated into the trained squeeze-excitation residual network model, and outputting corresponding drift calibration data;

具体的，参见步骤SF，所述挤压-激励残差网络模型训练步骤如下：在一维卷积残差网络中加入挤压-激励模块充分利用邻近传感器间数据相关性，实现传感器漂移校准，如图6所示，具体步骤如下：Specifically, referring to step SF, the training steps of the squeeze-excitation residual network model are as follows: adding a squeeze-excitation module to the one-dimensional convolutional residual network to make full use of the data correlation between adjacent sensors to achieve sensor drift calibration, As shown in Figure 6, the specific steps are as follows:

SF1：与网络输入相连接的为1个一维卷积层，其输出作为残差模块的输入，与残差模块输出相连接的1个一维卷积层用于调整输出结果与输入尺寸相同。残差模块输入连接3个一维卷积层，每个一维卷积层中包括卷积操作、批量归一化和激活函数映射输出，最后一个一维卷积层输出包含两个网络分支，其中一个网络分支连接挤压-激励模块，另一个与挤压-激励模块的输出结果做乘积。挤压-激励模块中主要包括全局池化层、和两个全连接层，其中第一个全连接层输出采用的激活函数为Relu，第二个全连接层输出需采用Sigmoid函数作为激活函数。SF1: A one-dimensional convolutional layer is connected to the network input, and its output is used as the input of the residual module, and a one-dimensional convolutional layer connected to the output of the residual module is used to adjust the output result to be the same size as the input . The input of the residual module is connected to three one-dimensional convolutional layers. Each one-dimensional convolutional layer includes convolution operation, batch normalization and activation function mapping output. The output of the last one-dimensional convolutional layer contains two network branches. One of the network branches is connected to the squeeze-excitation module, and the other is multiplied with the output of the squeeze-excitation module. The squeeze-excitation module mainly includes a global pooling layer and two fully connected layers. The activation function used for the output of the first fully connected layer is Relu, and the output of the second fully connected layer needs to use the Sigmoid function as the activation function.

其中，本示例中网络输入样本为行数10的数据矩阵，因此以通道为10的方式输入到传感器漂移校准网络中第1个卷积层其卷积核个数为32，卷积核尺寸为1×5，残差模块输入连接3个一维卷积层，各一维卷积层卷积核个数为32，卷积核的尺寸为1×3，在SE-ResNet模型中挤压-激励模块经过第一个全连接层通道数目减半，经Relu激活函数激活后经下一个全连接层将通道数还原至32；最后一个一维卷积层将通道数设置为10保持与初始输入尺寸一致。Among them, in this example, the network input sample is a data matrix with 10 rows, so it is input to the first convolutional layer in the sensor drift calibration network with 10 channels. The number of convolution kernels is 32, and the size of the convolution kernel is 1×5, the input of the residual module is connected to 3 one-dimensional convolutional layers, the number of convolution kernels in each one-dimensional convolutional layer is 32, and the size of the convolution kernel is 1×3, which is squeezed in the SE-ResNet model- The number of channels in the excitation module is halved after the first fully connected layer, activated by the Relu activation function, and the number of channels is restored to 32 by the next fully connected layer; the last one-dimensional convolutional layer sets the number of channels to 10 to keep the same as the initial input. Same size.

SF2：分别采用4类漂移趋势构建4类含漂移数据集和无漂移数据集，选取含漂移数据集Y^M中的6400个含漂移数据样本，即前80％作为神经网络的训练集；另外1600个数据样本，即后20％作为神经网络测试集，网络训练集与网络测试集之间无交叉。SF2: 4 types of drift-containing data sets and non-drift data sets are constructed by using 4 types of drift trends respectively, and 6400 drift-containing data samples in the drift-containing data set Y ^M are selected, that is, the first 80% are used as the training set of the neural network; the other 1600 The last 20% of the data samples are used as the neural network test set, and there is no overlap between the network training set and the network test set.

SF3：将含漂移数据集Y^M划分的训练集中的含漂移数据样本Y^b迭代输入传感器漂移校准网络中进行训练，采用对应的无漂移数据样本X^b作为神经网络的输出真值，神经网络损失采用均方误差函数，达到数据漂移补偿的目的，所设计校准模块损失应为：SF3: The drift-containing data sample Y ^b in the training set divided by the drift-containing data set Y ^M is iteratively input into the sensor drift calibration network for training, and the corresponding drift-free data sample X ^b is used as the output true value of the neural network, and the neural network loss Using the mean square error function to achieve the purpose of data drift compensation, the loss of the designed calibration module should be:

其中，m为训练样本数量，

为输出的校准数据，X_i为对应的无漂移数据样本。Among them, m is the number of training samples,

is the output calibration data, and X _i is the corresponding drift-free data sample.

本示例采用Adam优化器进行迭代训练，Batch Size大小设置为200，迭代次数为5000次，网络学习率设置为0.001，用以获得训练完成的挤压-激励残差网络模型。This example uses the Adam optimizer for iterative training, the Batch Size is set to 200, the number of iterations is 5000, and the network learning rate is set to 0.001 to obtain the trained squeeze-excitation residual network model.

SG：校准漂移传感器数据，将测试集的含漂移数据样本作为待校准数据输入到训练完成的挤压-激励残差网络模型中，输出对应的漂移校准数据。SG: calibrate the drift sensor data, input the drift data samples of the test set as the data to be calibrated into the squeeze-excitation residual network model after training, and output the corresponding drift calibration data.

需要说明的是，卷积神经网络通过卷积层特征学习能力，从历史感知数据中提取含漂移数据中的时间和空间特征，实现传感器的漂移校准。但是从特定传感器网络中收集的感知数据是有限的，含有漂移量的数据更为匮乏，难以保障神经网络对训练样本的需求，而本发明提出传感器数据增广方法可以有效地解决缺乏神经网络训练样本问题，并将特征学习和漂移校准步骤建模为不同的网络模块，利用增广的感知数据集对其进行联合训练，能够保证漂移特征的提取质量，降低漂移校准误差。It should be noted that the convolutional neural network uses the feature learning ability of the convolution layer to extract the temporal and spatial features in the drift-containing data from the historical sensory data to realize the drift calibration of the sensor. However, the sensing data collected from a specific sensor network is limited, and the data containing drift is more scarce, which makes it difficult to ensure the demand of the neural network for training samples. The sensor data augmentation method proposed in the present invention can effectively solve the lack of neural network training. The sample problem is solved, and the feature learning and drift calibration steps are modeled as different network modules, and they are jointly trained with the augmented perceptual data set, which can ensure the extraction quality of drift features and reduce drift calibration errors.

本示例采用2020年2月1日至4月30日共90天的土壤温度感知数据作为第一感知数据。采用本发明提出的样本数据增广方法满足了神经网络训练所需的样本数据，由本发明方法所获样本数据与真实场景中漂移数据无明显差异。采用不同漂移趋势的样本数据对神经网络模型训练，实验结果表明，本发明能够有效地解决物联网数据流中数据漂移问题，采用均方根误差作为评价标准，采用上述20个总数据集，分别测试20次，试验结果取平均值如图7所示，总体漂移校准误差仅约为传统的基于残差神经网络校准方法的1/2。This example uses the soil temperature perception data for 90 days from February 1, 2020 to April 30, 2020 as the first perception data. The sample data augmentation method proposed by the present invention satisfies the sample data required for neural network training, and the sample data obtained by the method of the present invention is not significantly different from the drift data in the real scene. Using sample data with different drift trends to train the neural network model, the experimental results show that the invention can effectively solve the data drift problem in the data flow of the Internet of Things, using the root mean square error as the evaluation standard, using the above 20 total data sets, respectively Test 20 times, and the average test results are shown in Figure 7. The overall drift calibration error is only about 1/2 of the traditional residual neural network-based calibration method.

实施例3Example 3

如图2所示，本实施例提供一种传感器数据增广系统，其包括：As shown in FIG. 2 , this embodiment provides a sensor data augmentation system, which includes:

感知场构建模块20，其用于部署已校准的传感器，构建目标感知场；感知场构建模块还包括邻近传感器数据确定模块，根据数据增广应用场景自适应确定邻近传感器数量n的大小，从所裁剪的数据矩阵X^l中随机选择一个矩阵行作为基准，按照感知场内其它传感器与该矩阵行对应的基准传感器间的欧式距离由小到大依次选择n-1个邻近传感器，最终选择的n个传感器数据在数据矩阵X^l中对应的n行数据即为需要选择的n行邻近传感器数据。The perception field building module 20 is used for deploying calibrated sensors and constructing a target perception field; the perception field building module also includes a proximity sensor data determination module, which adaptively determines the size of the number n of adjacent sensors according to the data augmentation application scenario, from all In the cropped data matrix X ^l , a matrix row is randomly selected as the reference, and n-1 adjacent sensors are selected in order from small to large according to the Euclidean distance between other sensors in the sensing field and the reference sensor corresponding to the matrix row, and the final selected n ^The n rows of data corresponding to the pieces of sensor data in the data matrix X1 are the n rows of adjacent sensor data to be selected.

无漂移数据样本构建模块30，其用于采集目标感知场内已校准的传感器的第一感知数据，利用随机裁剪法将所述第一感知数据裁剪为多个数据矩阵，从单个所裁剪的数据矩阵中依据矩阵内各行对应传感器在感知场中的位置，选取该数据矩阵中部分行数据构建为邻近传感器数据矩阵，确定第一增广数据。The drift-free data sample building module 30 is used to collect the first perception data of the calibrated sensor in the target perception field, and use the random clipping method to clip the first perception data into a plurality of data matrices, and from a single clipped data In the matrix, according to the position of the sensor in the sensing field corresponding to each row in the matrix, some row data in the data matrix is selected to construct an adjacent sensor data matrix, and the first augmented data is determined.

具体的，将所述第一感知数据建模为无漂移数据矩阵，其中无漂移数据矩阵内的多个矩阵行分别对应感知场内多个传感器采集数据；将所述无漂移数据矩阵X裁剪为多个数据矩阵，所述数据矩阵X^l的列数l小于所述无漂移数据矩阵的列数L；从单个所裁剪的数据矩阵中依据各矩阵行对应的传感器在感知场中的位置，从该数据矩阵中选取部分行数据构建为邻近传感器数据矩阵，确定第一增广数据，所述第一增广数据为无漂移数据样本；无漂移数据样本构建模块还包括裁剪数据矩阵列数计算模块，其用于根据传感器采集数据时间间隔和数据分布的特征周期确定裁剪数据矩阵的列数l，计算方式为：Specifically, the first sensing data is modeled as a drift-free data matrix, wherein multiple matrix rows in the drift-free data matrix correspond to data collected by multiple sensors in the sensing field respectively; the drift-free data matrix X is cut into A plurality of data matrices, the number of columns ^l of the data matrix X1 is less than the number of columns L of the drift-free data matrix; from a single cropped data matrix, according to the position of the sensor corresponding to each matrix row in the sensing field, from Selecting part of the row data in the data matrix to construct a neighboring sensor data matrix, and determining the first augmented data, the first augmented data is a drift-free data sample; the drift-free data sample building module also includes a trimming data matrix column number calculation module , which is used to determine the number of columns l of the cropped data matrix according to the time interval of the data collected by the sensor and the characteristic period of the data distribution. The calculation method is:

漂移量仿真数据样本构建模块40，其用于根据所述第一感知数据裁剪的数据矩阵，利用含趋势项的非平稳随机游走过程对传感器数据进行漂移过程仿真，获取第二感知数据，从第二感知数据中选取和第一增广数据相同位置和尺寸的漂移量矩阵，确定为第二增广数据，所述第二增广数据即为漂移量仿真数据样本。The drift amount simulation data sample building module 40 is used to simulate the drift process of the sensor data by using a non-stationary random walk process containing trend items according to the data matrix tailored from the first sensing data, and obtain the second sensing data, from The drift amount matrix with the same position and size as the first augmented data is selected from the second sensing data, and determined as the second augmented data, and the second augmented data is the drift amount simulation data sample.

实施例4Example 4

本实施例提供一种示例性计算机程序产品和计算机可读存储介质This embodiment provides an exemplary computer program product and computer readable storage medium

除了上述方法和设备以外，本申请的实施例还可以是计算机程序产品，其包括计算机程序指令，所述计算机程序指令在被处理器运行时使得所述处理器执行本说明书上述“示例性方法”部分中描述的根据本申请各种实施例的决策方法中的步骤。In addition to the methods and apparatuses described above, embodiments of the present application may also be computer program products comprising computer program instructions that, when executed by a processor, cause the processor to perform the "exemplary methods" described above in this specification The steps in the decision-making method according to various embodiments of the present application described in the section.

所述计算机程序产品可以以一种或多种程序设计语言的任意组合来编写用于执行本申请实施例操作的程序代码，所述程序设计语言包括面向对象的程序设计语言，诸如Java、C++等，还包括常规的过程式程序设计语言，诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。The computer program product can write program codes for performing the operations of the embodiments of the present application in any combination of one or more programming languages, including object-oriented programming languages, such as Java, C++, etc. , also includes conventional procedural programming languages, such as "C" language or similar programming languages. The program code may execute entirely on the user computing device, partly on the user device, as a stand-alone software package, partly on the user computing device and partly on a remote computing device, or entirely on the remote computing device or server execute on.

此外，本申请的实施例还可以是计算机可读存储介质，其上存储有计算机程序指令，所述计算机程序指令在被处理器运行时使得所述处理器执行本说明书上述“示例性方法”部分中描述的根据本申请各种实施例的决策方法中的步骤。In addition, embodiments of the present application may also be computer-readable storage media having computer program instructions stored thereon, the computer program instructions, when executed by a processor, cause the processor to perform the above-mentioned "Example Method" section of this specification Steps in decision-making methods according to various embodiments of the present application described in .

所述计算机可读存储介质可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以包括但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括：具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。The computer-readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatuses or devices, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

以上结合具体实施例描述了本申请的基本原理，但是，需要指出的是，在本申请中提及的优点、优势、效果等仅是示例而非限制，不能认为这些优点、优势、效果等是本申请的各个实施例必须具备的。另外，上述公开的具体细节仅是为了示例的作用和便于理解的作用，而非限制，上述细节并不限制本申请为必须采用上述具体的细节来实现。The basic principles of the present application have been described above in conjunction with specific embodiments. However, it should be pointed out that the advantages, advantages, effects, etc. mentioned in the present application are only examples rather than limitations, and these advantages, advantages, effects, etc., are not considered to be Required for each embodiment of this application. In addition, the specific details disclosed above are only for the purpose of example and easy understanding, rather than limiting, and the above-mentioned details do not limit the application to be implemented by using the above-mentioned specific details.

本申请中涉及的器件、装置、设备、系统的方框图仅作为例示性的例子并且不意图要求或暗示必须按照方框图示出的方式进行连接、布置、配置。如本领域技术人员将认识到的，可以按任意方式连接、布置、配置这些器件、装置、设备、系统。诸如“包括”、“包含”、“具有”等等的词语是开放性词汇，指“包括但不限于”，且可与其互换使用。这里所使用的词汇“或”和“和”指词汇“和/或”，且可与其互换使用，除非上下文明确指示不是如此。这里所使用的词汇“诸如”指词组“诸如但不限于”，且可与其互换使用。The block diagrams of devices, apparatus, apparatuses, and systems referred to in this application are merely illustrative examples and are not intended to require or imply that the connections, arrangements, or configurations must be in the manner shown in the block diagrams. As those skilled in the art will appreciate, these means, apparatuses, apparatuses, systems may be connected, arranged, configured in any manner. Words such as "including", "including", "having" and the like are open-ended words meaning "including but not limited to" and are used interchangeably therewith. As used herein, the words "or" and "and" refer to and are used interchangeably with the word "and/or" unless the context clearly dictates otherwise. As used herein, the word "such as" refers to and is used interchangeably with the phrase "such as but not limited to".

还需要指出的是，在本申请的装置、设备和方法中，各部件或各步骤是可以分解和/或重新组合的。这些分解和/或重新组合应视为本申请的等效方案。It should also be pointed out that in the apparatus, equipment and method of the present application, each component or each step can be decomposed and/or recombined. These disaggregations and/or recombinations should be considered as equivalents of the present application.

提供所公开的方面的以上描述以使本领域的任何技术人员能够做出或者使用本申请。对这些方面的各种修改对于本领域技术人员而言是非常显而易见的，并且在此定义的一般原理可以应用于其他方面而不脱离本申请的范围。因此，本申请不意图被限制到在此示出的方面，而是按照与在此公开的原理和新颖的特征一致的最宽范围。The above description of the disclosed aspects is provided to enable any person skilled in the art to make or use this application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Therefore, this application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

为了例示和描述的目的已经给出了以上描述。此外，此描述不意图将本申请的实施例限制到在此公开的形式。尽管以上已经讨论了多个示例方面和实施例，但是本领域技术人员将认识到其某些变型、修改、改变、添加和子组合。The foregoing description has been presented for the purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the application to the forms disclosed herein. Although a number of example aspects and embodiments have been discussed above, those skilled in the art will recognize certain variations, modifications, changes, additions and sub-combinations thereof.

Claims

1. A method of sensor data augmentation, the method comprising:

and SA: deploying the calibrated sensor, and constructing a target sensing field;

SB: acquiring first sensing data of calibrated sensors in a target sensing field, cutting the first sensing data into a plurality of data matrixes by using a random cutting method, selecting partial row data in the data matrixes from the single cut data matrixes according to the positions of the sensors corresponding to all rows in the matrixes in the sensing field to construct data matrixes of adjacent sensors, and determining first augmented data;

SC: according to the data matrix cut by the first sensing data, performing drift process simulation on the sensor data by using a non-stationary random walk process containing a trend item to obtain second sensing data, selecting a drift amount matrix with the same position and size as the first augmented data from the second sensing data, and determining the second augmented data;

the determining first augmented data comprises:

determining first sensing data according to continuous L sampling points acquired by calibrated N sensors in a target sensing field, and recording the first sensing data as a non-drifting data matrix X with the size of NxL, wherein N matrix row data in the non-drifting data matrix X respectively correspond to data acquired by the N sensors in the sensing field;

randomly cutting a plurality of data matrixes with the size of NxL from the drift-free data matrix X, and recording any one of the cut data matrixes as X ^l Said data matrix X ^l Is smaller than the number of columns L of the drift-free data matrix X, from a single clipped data matrix X ^l From the data matrix X in dependence on the position in the sensing field of the sensor corresponding to each matrix row ^l Selecting n rows of data to construct n multiplied by l adjacent sensor numberAccording to matrix X ^b And recorded as the first augmented data.

2. The method of claim 1, wherein the specific formula for determining the first augmented data is:

wherein

The initial cutting position of the sampling point is defined as l, the column number of a cutting data matrix is defined as the number of continuous sampling points of the sensor corresponding to each matrix row; 1,

to express taking the matrix

To

Column data; n is a secondary cut data matrix X ^l The selected number of rows.

3. The sensor data augmentation method of claim 2, wherein the n-row data selection method comprises:

adaptively determining the size of the number n of proximity sensors from the data matrix X clipped according to the data augmentation application scenario ^l In the method, a matrix row is randomly selected as a reference, and other sensors in a sensing field and reference sensors corresponding to the matrix row are arranged between the other sensorsSequentially selecting n-1 adjacent sensors from small to large according to Euclidean distance, and finally selecting data matrix X corresponding to the n sensors ^l The matrix row data in (1) is the n row data to be selected.

4. The method of claim 2, wherein the number of columns/of the clipped data matrix is determined by the characteristic period of the data distribution and the time interval of the data acquisition of the sensor, and is calculated by:

wherein, T is a characteristic period of regular distribution of sensor data flow, delta T is a time interval of data acquisition of the sensor, lambda is a positive rational number, and a proper positive integer is generally selected according to the size of a required training sample.

5. The method according to claim 2, wherein the second sensing data is a drift amount simulation matrix D with size of NxL constructed by a non-stationary random walk process with trend terms ^l According to from X ^l Selecting n rows of data to construct X ^b Method from D ^l Constructing a drift amount matrix D with the size of nxl from n rows of data with the same row position ^b And recording as second augmentation data;

setting a sensor drift probability threshold, and performing drift process simulation on sensor data by adopting a non-stationary random walk process containing a trend item, wherein the simulated sensor data can be expressed as follows:

wherein, y _i,t For the drift-containing data, x, of the sensor i at time t after simulation _i,t For drift-free data of sensor i at time t, d _i,t For sensor i, at time t by simulationThe amount of drift produced; rand (0, 1) is a random floating point number between 0 and 1, alpha is a floating point number and alpha epsilon (0, 1) is a probability threshold value of the sensor drifting when rand (0, 1)>When alpha is reached, the sensor i does not drift, and the generated drift amount is zero; otherwise, drift occurs;

and when drift amount simulation is carried out on the data of each sensor, one of a linear function, an exponential function, a square root function and a sine function is selected in a self-adaptive mode to serve as a trend item in the non-stationary random walk process containing the trend item, and the trend item is used for generating a corresponding drift trend.

6. A sensor drift calibration method is characterized by comprising the following steps:

acquiring sensor data to be calibrated, inputting the sensor data to be calibrated into a trained extrusion-excitation residual error network model, and outputting corresponding drift calibration data;

wherein the squeeze-excitation residual error network model is obtained by training a plurality of groups of data samples; the plurality of sets of data samples are data sets constructed from first augmented data and second augmented data;

the step of constructing a data set from the first augmented data and the second augmented data comprises:

taking the first augmented data as a drift-free data sample X ^b Taking the second augmentation data as a drift amount simulation data sample D ^b ；

The drift-free data sample X ^b And drift amount simulation data sample D ^b Performing matrix addition to obtain drift-containing data samples Y ^b (ii) a Using a plurality of drift-free data samples X ^b Composing a drift-free data set X ^M (ii) a Using a plurality of drift-containing data samples Y ^b Forming a drift-containing data set Y ^M (ii) a Will contain the drift data set Y ^M And drift-free data set X ^M As a training data set for the squeeze-excitation residual network model.

7. A sensor data augmentation system, comprising:

the perception field construction module is used for deploying the calibrated sensor and constructing a target perception field;

the drift-free data sample construction module is used for collecting first sensing data of a calibrated sensor in a target sensing field, cutting the first sensing data into a plurality of data matrixes by using a random cutting method, selecting part of row data in the data matrixes from the single cut data matrixes according to the positions of the sensors corresponding to all rows in the matrixes in the sensing field to construct adjacent sensor data matrixes, and determining first augmented data, wherein the first augmented data are drift-free data samples;

the drift amount simulation data sample construction module is used for simulating the drift process of the sensor data by utilizing a non-stationary random walk process containing a trend item according to a data matrix cut by the first sensing data to obtain second sensing data, selecting a drift amount matrix with the same position and size as the first augmented data from the second sensing data, and determining the drift amount matrix as second augmented data, wherein the second augmented data is a drift amount simulation data sample;

the determining first augmented data comprises:

determining first sensing data according to continuous L sampling points acquired by N calibrated sensors in a target sensing field, and recording the first sensing data as a non-drifting data matrix X with the size of NxL, wherein N matrix row data in the non-drifting data matrix X respectively correspond to data acquired by the N sensors in the sensing field;

randomly cutting a plurality of data matrixes with the size of NxL from the drift-free data matrix X, and recording any one of the cut data matrixes as X ^l Said data matrix X ^l Is smaller than the number of columns L of the drift-free data matrix X, from a single clipped data matrix X ^l From the data matrix X in dependence on the position in the sensing field of the sensor corresponding to each matrix row ^l N rows of data are selected to construct an nxl-sized data matrix X of the adjacent sensors ^b And recorded as first augmented data.

8. A readable storage medium, characterized in that the storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the method according to any one of claims 1-6.