CN111324600A - Data cleaning method and device - Google Patents
Data cleaning method and device Download PDFInfo
- Publication number
- CN111324600A CN111324600A CN202010079670.2A CN202010079670A CN111324600A CN 111324600 A CN111324600 A CN 111324600A CN 202010079670 A CN202010079670 A CN 202010079670A CN 111324600 A CN111324600 A CN 111324600A
- Authority
- CN
- China
- Prior art keywords
- data
- detection data
- range
- detection
- terminal device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004140 cleaning Methods 0.000 title claims abstract description 139
- 238000000034 method Methods 0.000 title claims abstract description 67
- 238000001514 detection method Methods 0.000 claims abstract description 221
- 238000012545 processing Methods 0.000 claims description 21
- 238000004590 computer program Methods 0.000 claims description 12
- 238000007405 data analysis Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 11
- 238000010586 diagram Methods 0.000 description 20
- 230000015654 memory Effects 0.000 description 18
- 230000007246 mechanism Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 230000003993 interaction Effects 0.000 description 6
- 230000011664 signaling Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000013480 data collection Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Quality & Reliability (AREA)
- Telephonic Communication Services (AREA)
Abstract
Description
技术领域technical field
本发明涉及信息技术领域,尤其涉及一种数据清洗方法及装置。The present invention relates to the field of information technology, and in particular, to a data cleaning method and device.
背景技术Background technique
在进行数据采集时,传感器通常用于采集数据并发送给边缘网关、边缘控制器等边缘端,随后由边缘端将采集到的数据发送给服务器供用户下载使用。由于传感器周围环境的影响,其采集到的数据通常包含有大量无效数据。因此,在使用传感器采集到的数据前,还需要进行数据清洗。During data collection, sensors are usually used to collect data and send it to edge terminals such as edge gateways and edge controllers, and then the edge terminals send the collected data to the server for users to download and use. Due to the influence of the surrounding environment of the sensor, the collected data usually contains a lot of invalid data. Therefore, data cleaning needs to be performed before using the data collected by the sensor.
在现有技术中,通常采用边缘端处理进行数据清洗。具体的,可以将数据的清洗算法预先烧录到边缘端,由边缘端通过清洗算法对传感器采集到的数据进行数据清洗。In the prior art, edge processing is usually used for data cleaning. Specifically, the data cleaning algorithm can be pre-programmed to the edge end, and the data collected by the sensor can be cleaned by the edge end through the cleaning algorithm.
然而,由于边缘端资源有限,往往无法独自提供更新清洗算法所需要的算力。因此,现有技术中,数据的清洗算法对应的数据的阈值范围往往是固定的,无法根据实际情况动态调整,导致现有的由边缘端进行数据清洗的方法的清洗效果往往较差。However, due to limited resources at the edge, it is often unable to provide the computing power required to update the cleaning algorithm alone. Therefore, in the prior art, the threshold range of the data corresponding to the data cleaning algorithm is often fixed, and cannot be dynamically adjusted according to the actual situation, resulting in a poor cleaning effect of the existing data cleaning method by the edge.
发明内容SUMMARY OF THE INVENTION
本发明提供一种数据清洗方法及装置,以解决现有技术中数据清洗的效果差的技术问题。The present invention provides a data cleaning method and device to solve the technical problem of poor data cleaning effect in the prior art.
本发明的第一个方面提供一种数据清洗方法,应用于终端设备,所述方法包括:A first aspect of the present invention provides a data cleaning method, applied to a terminal device, the method includes:
获取检测设备在第一采集周期内的检测数据;Acquire the detection data of the detection equipment in the first collection period;
根据所述检测数据的阈值范围,对所述检测数据进行数据清洗,所述数据清洗用于丢弃所述检测数据中的无效值,所述检测数据的阈值范围由服务器指示;According to the threshold range of the detection data, data cleaning is performed on the detection data, and the data cleaning is used to discard invalid values in the detection data, and the threshold range of the detection data is indicated by the server;
若数据清洗后存在可用数据,则在第一上报周期将所述可用数据上报给服务器。If there is available data after data cleaning, the available data is reported to the server in the first reporting cycle.
在一种可选的实施方式中,在所述对所述检测数据进行数据清洗之后,还包括:In an optional implementation manner, after performing data cleaning on the detection data, the method further includes:
若数据清洗后不存在可用数据,则对历史可用数据进行直线插补,预测所述第一采集周期的可用数据;If there is no available data after data cleaning, linear interpolation is performed on the historically available data to predict the available data in the first collection period;
在所述第一上报周期将预测的可用数据上报给服务器。The predicted available data is reported to the server in the first reporting period.
在一种可选的实施方式中,在所述在第一上报周期将所述可用数据上报给服务器之后,还包括:In an optional implementation manner, after the available data is reported to the server in the first reporting period, the method further includes:
接收所述服务器发送的所述检测数据的第一范围,所述第一范围是根据所述第一采集周期的可用数据确定的;receiving a first range of the detection data sent by the server, where the first range is determined according to available data in the first collection period;
根据所述第一范围,更新所述检测数据的阈值范围。According to the first range, the threshold range of the detection data is updated.
本发明的第二个方面提供一种数据清洗方法,应用于服务器,所述方法包括:A second aspect of the present invention provides a data cleaning method, which is applied to a server, and the method includes:
接收终端设备在第一上报周期上报的可用数据,所述可用数据是所述终端设备根据检测数据的阈值范围,对检测设备在第一采集周期内的检测数据进行数据清洗后得到的,所述数据清洗用于丢弃所述检测数据中的无效值,所述检测数据的阈值范围由服务器指示;Receive the available data reported by the terminal device in the first reporting cycle, where the available data is obtained after the terminal device performs data cleaning on the detection data of the detection device in the first collection cycle according to the threshold range of the detection data, and the Data cleaning is used to discard invalid values in the detection data, and the threshold range of the detection data is indicated by the server;
在第一更新周期,根据所述终端设备在所述第一上报周期上报的可用数据,生成所述检测数据的第一范围;In the first update period, the first range of the detection data is generated according to the available data reported by the terminal device in the first reporting period;
将所述检测数据的第一范围发送给所述终端设备。Sending the first range of detection data to the terminal device.
在一种可选的实施方式中,在所述将所述检测数据的阈值范围的第一更新值发送给所述终端设备之前,还包括:In an optional implementation manner, before the sending the first update value of the threshold range of the detection data to the terminal device, the method further includes:
接收所述终端设备在第二上报周期上报的可用数据,所述第二上报周期晚于所述第一上报周期;receiving the available data reported by the terminal device in a second reporting period, the second reporting period being later than the first reporting period;
所述将所述检测数据的第一范围发送给所述终端设备,包括:The sending the first range of the detection data to the terminal device includes:
若所述第二上报周期上报的可用数据不在所述第一范围内,则将所述第一范围发送给所述终端设备。If the available data reported in the second reporting period is not within the first range, the first range is sent to the terminal device.
在一种可选的实施方式中,在所述将所述检测数据的第一范围发送给所述终端设备之前,还包括:In an optional implementation manner, before the sending the first range of the detection data to the terminal device, the method further includes:
获取所述检测数据的阈值范围的未更新时长;Obtain the unupdated duration of the threshold range of the detection data;
所述将所述检测数据的第一范围发送给所述终端设备,包括:The sending the first range of the detection data to the terminal device includes:
若所述未更新时长超过时长阈值,则将所述检测数据的第一范围发送给所述终端设备。If the unupdated duration exceeds the duration threshold, the first range of the detection data is sent to the terminal device.
本发明的第三个方面提供一种数据清洗装置,所述装置包括:接收模块、处理模块和发送模块;A third aspect of the present invention provides a data cleaning device, the device includes: a receiving module, a processing module and a sending module;
所述接收模块,用于获取检测设备在第一采集周期内的检测数据;The receiving module is used to obtain the detection data of the detection device in the first collection period;
所述处理模块,用于根据所述检测数据的阈值范围,对所述检测数据进行数据清洗,所述数据清洗用于丢弃所述检测数据中的无效值,所述检测数据的阈值范围由服务器指示;The processing module is configured to perform data cleaning on the detection data according to the threshold range of the detection data, the data cleaning is used to discard invalid values in the detection data, and the threshold range of the detection data is determined by the server instruct;
所述发送模块,用于若数据清洗后存在可用数据,则在第一上报周期将所述可用数据上报给服务器。The sending module is configured to report the available data to the server in the first reporting period if there is available data after data cleaning.
在一种可选的实施方式中,所述处理模块还用于若数据清洗后不存在可用数据,则对历史可用数据进行直线插补,预测所述第一采集周期的可用数据;In an optional implementation manner, the processing module is further configured to perform linear interpolation on the historically available data if there is no available data after data cleaning, and predict the available data in the first collection period;
所述发送模块,还用于在所述第一上报周期将预测的可用数据上报给服务器。The sending module is further configured to report the predicted available data to the server in the first reporting period.
在一种可选的实施方式中,所述接收模块,还用于接收所述服务器发送的所述检测数据的第一范围,所述第一范围是根据所述第一采集周期的可用数据确定的;In an optional implementation manner, the receiving module is further configured to receive a first range of the detection data sent by the server, where the first range is determined according to available data in the first collection period of;
所述处理模块,还用于根据所述第一范围,更新所述检测数据的阈值范围。The processing module is further configured to update the threshold range of the detection data according to the first range.
本发明的第四个方面提供一种数据清洗装置,所述装置包括:接收模块、处理模块和发送模块;A fourth aspect of the present invention provides a data cleaning device, the device includes: a receiving module, a processing module and a sending module;
所述接收模块,用于接收终端设备在第一上报周期上报的可用数据,所述可用数据是所述终端设备根据检测数据的阈值范围,对检测设备在第一采集周期内的检测数据进行数据清洗后得到的数据,所述数据清洗用于丢弃所述检测数据中的无效值,所述检测数据的阈值范围由服务器指示;The receiving module is configured to receive the available data reported by the terminal device in the first reporting period, and the available data is that the terminal device performs data analysis on the detection data of the detection device in the first collection period according to the threshold range of the detection data. data obtained after cleaning, the data cleaning is used to discard invalid values in the detection data, and the threshold range of the detection data is indicated by the server;
所述处理模块,用于在第一更新周期,根据所述终端设备在所述第一上报周期上报的可用数据,生成所述检测数据的第一范围;The processing module is configured to, in the first update period, generate the first range of the detection data according to the available data reported by the terminal device in the first reporting period;
所述发送模块,用于将所述检测数据的第一范围发送给所述终端设备。The sending module is configured to send the first range of the detection data to the terminal device.
在一种可选的实施方式中,所述接收模块,还用于接收所述终端设备在第二上报周期上报的可用数据,所述第二上报周期晚于所述第一上报周期;In an optional implementation manner, the receiving module is further configured to receive the available data reported by the terminal device in a second reporting period, and the second reporting period is later than the first reporting period;
所述发送模块,还用于若所述第二上报周期上报的可用数据不在所述第一范围内,则将所述第一范围发送给所述终端设备。The sending module is further configured to send the first range to the terminal device if the available data reported by the second reporting period is not within the first range.
在一种可选的实施方式中,所述接收模块,还用于获取所述检测数据的阈值范围的未更新时长;In an optional implementation manner, the receiving module is further configured to acquire the unupdated duration of the threshold range of the detection data;
所述发送模块,还用于若所述未更新时长超过时长阈值,则将所述检测数据的第一范围发送给所述终端设备。The sending module is further configured to send the first range of the detection data to the terminal device if the unupdated duration exceeds a duration threshold.
本发明实施例的第五方面,提供一种终端设备,包括:存储器、处理器以及计算机程序,所述计算机程序存储在所述存储器中,所述处理器运行所述计算机程序执行本发明第一方面及第一方面各种可选的数据清洗方法。In a fifth aspect of the embodiments of the present invention, a terminal device is provided, including: a memory, a processor, and a computer program, where the computer program is stored in the memory, and the processor executes the computer program to execute the first embodiment of the present invention Aspects and various optional data cleaning methods of the first aspect.
本发明实施例的第六方面,提供一种服务器,包括:存储器、处理器以及计算机程序,所述计算机程序存储在所述存储器中,所述处理器运行所述计算机程序执行本发明第二方面及第二方面各种可选的数据清洗方法。A sixth aspect of the embodiments of the present invention provides a server, including: a memory, a processor, and a computer program, where the computer program is stored in the memory, and the processor executes the computer program to execute the second aspect of the present invention and various optional data cleaning methods in the second aspect.
本发明的第七个方面提供一种存储介质,所述可读存储介质中存储有计算机程序,所述计算机程序用于执行第一方面及第一方面各种可选的数据清洗方法。A seventh aspect of the present invention provides a storage medium, where a computer program is stored in the readable storage medium, and the computer program is used to execute the first aspect and various optional data cleaning methods of the first aspect.
本发明的第八个方面提供一种存储介质,所述可读存储介质中存储有计算机程序,所述计算机程序用于执行第二方面及第二方面各种可选的数据清洗方法。An eighth aspect of the present invention provides a storage medium, where a computer program is stored in the readable storage medium, and the computer program is used to execute the second aspect and various optional data cleaning methods of the second aspect.
本发明提供的数据清洗方法及装置,终端设备获取检测设备在第一采集周期内的检测数据,并根据检测数据的阈值范围,对检测数据进行数据清洗,数据清洗用于丢弃检测数据中的无效值,检测数据的阈值范围由服务器指示,若数据清洗后存在可用数据,终端设备则在第一上报周期将可用数据上报给服务器。在第一更新周期,服务器根据终端设备在第一上报周期上报的可用数据,生成检测数据的第一范围,并将检测数据的第一范围发送给终端设备。通过上述方式,终端设备可以根据服务器的指示实时更新检测数据的范围,提高了数据清洗效果。In the data cleaning method and device provided by the present invention, the terminal device obtains the detection data of the detection device in the first collection period, and performs data cleaning on the detection data according to the threshold range of the detection data, and the data cleaning is used to discard invalid detection data. The threshold range of the detected data is indicated by the server. If there is available data after data cleaning, the terminal device reports the available data to the server in the first reporting cycle. In the first update cycle, the server generates a first range of detection data according to the available data reported by the terminal device in the first reporting cycle, and sends the first range of detection data to the terminal device. In the above manner, the terminal device can update the range of detection data in real time according to the instruction of the server, which improves the data cleaning effect.
附图说明Description of drawings
为了更清楚地说明本发明或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to explain the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are the For some embodiments of the invention, for those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.
图1为本申请实施例提供的一种数据清洗方法的场景示意图;FIG. 1 is a schematic diagram of a scenario of a data cleaning method provided by an embodiment of the present application;
图2为本申请实施例提供的一种数据清洗方法的信令交互图;FIG. 2 is a signaling interaction diagram of a data cleaning method provided by an embodiment of the present application;
图3为本申请实施例提供的一种DAE-LSTM模型的结构示意图;3 is a schematic structural diagram of a DAE-LSTM model provided by an embodiment of the present application;
图4为本申请实施例提供的另一种数据清洗方法的信令交互图;FIG. 4 is a signaling interaction diagram of another data cleaning method provided by an embodiment of the present application;
图5为本申请实施例提供的再一种数据清洗方法的信令交互图;FIG. 5 is a signaling interaction diagram of still another data cleaning method provided by an embodiment of the present application;
图6为本申请实施例提供的一种数据清洗方法的网络架构示意图;6 is a schematic diagram of a network architecture of a data cleaning method provided by an embodiment of the present application;
图7为本申请实施例提供的一种数据的四级清洗机制的示意图;7 is a schematic diagram of a four-level data cleaning mechanism provided by an embodiment of the present application;
图8为本申请实施例提供的一种数据清洗装置的结构示意图;FIG. 8 is a schematic structural diagram of a data cleaning apparatus provided by an embodiment of the present application;
图9为本申请实施例提供的另一种数据清洗装置的结构示意图;FIG. 9 is a schematic structural diagram of another data cleaning apparatus provided by an embodiment of the present application;
图10为本申请实施例提供的再一种数据清洗装置的结构示意图。FIG. 10 is a schematic structural diagram of still another data cleaning apparatus provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
在进行数据采集时,传感器通常用于采集数据并发送给边缘网关、边缘控制器等边缘端,随后由边缘端将采集到的数据发送给服务器供用户下载使用。由于传感器周围环境的影响,其采集到的数据通常包含有大量无效数据。因此,在使用传感器采集到的数据前,还需要进行数据清洗。在现有技术中,通常采用边缘端处理进行数据清洗。具体的,可以将数据的清洗算法预先烧录到边缘端,由边缘端通过清洗算法对传感器采集到的数据进行数据清洗。然而,由于边缘端资源有限,往往无法独自提供更新清洗算法所需要的算力。因此,现有技术中,数据的清洗算法对应的数据的阈值范围往往是固定的,无法根据实际情况动态调整,导致现有的由边缘端进行数据清洗的方法的清洗效果往往较差。During data collection, sensors are usually used to collect data and send it to edge terminals such as edge gateways and edge controllers, and then the edge terminals send the collected data to the server for users to download and use. Due to the influence of the surrounding environment of the sensor, the collected data usually contains a lot of invalid data. Therefore, data cleaning needs to be performed before using the data collected by the sensor. In the prior art, edge processing is usually used for data cleaning. Specifically, the data cleaning algorithm can be pre-programmed to the edge end, and the data collected by the sensor can be cleaned by the edge end through the cleaning algorithm. However, due to limited resources at the edge, it is often unable to provide the computing power required to update the cleaning algorithm alone. Therefore, in the prior art, the threshold range of the data corresponding to the data cleaning algorithm is often fixed, and cannot be dynamically adjusted according to the actual situation, resulting in a poor cleaning effect of the existing data cleaning method by the edge.
考虑到上述问题,本发明提供了一种数据清洗方法及装置,通过将检测数据的清洗过程由终端设备完成,而检测数据的阈值范围的确定则由服务器完成,终端设备将检测数据上传给服务器,服务器实时地将阈值范围发送给终端设备,从而使终端设备根据服务器的指示实时更新检测数据的范围,进而提高了数据清洗效果。In view of the above problems, the present invention provides a data cleaning method and device, through which the cleaning process of the detection data is completed by the terminal device, and the determination of the threshold range of the detection data is completed by the server, and the terminal device uploads the detection data to the server. , the server sends the threshold range to the terminal device in real time, so that the terminal device updates the range of the detection data in real time according to the instruction of the server, thereby improving the data cleaning effect.
图1为本申请实施例提供的一种数据清洗方法的场景示意图。如图1所示,包括有检测设备101、终端设备102和服务器103,终端设备102分别与检测设备101和服务器103连接。检测设备101用于采集检测数据,并将检测数据发送给终端设备102。终端设备102对检测数据进行数据清洗,并将清洗完成的数据发送给服务器103。服务器103根据接收到的清洗后的检测数据确定新的检测数据的阈值范围。FIG. 1 is a schematic diagram of a scenario of a data cleaning method provided by an embodiment of the present application. As shown in FIG. 1 , a
其中,检测设备101可以为传感器,也可以为控制器或驱动装置,检测设备101用于采集温度、位置等传感数据,也可以用于采集控制信号或驱动信号。The
终端设备102,还可以称为边缘端,上述终端设备102可以为边缘网关、边缘控制器等。示例性的,由于嵌入式可编程逻辑控制器(embedded programmable logiccontroller,ePLC)具有易编程和高可靠性,广泛运用于各工业领域,终端设备102可采用ePLC。The
服务器103可以是一台服务器,或者是云服务平台中的服务器。服务器103可以接收终端设备102发送的检测数据,再根据接收到的检测数据确定检测数据的阈值范围。The
需要说明的是,本申请技术方案可以应用于上述应用场景,但并不限于此,还可以运用到其他需要进行数据清洗的场景。It should be noted that the technical solutions of the present application can be applied to the above application scenarios, but are not limited thereto, and can also be applied to other scenarios that require data cleaning.
下面以集成或安装有相关执行代码的终端设备和服务器为例,以具体地实施例对本申请实施例的技术方案进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例不再赘述。The technical solutions of the embodiments of the present application will be described in detail below with specific embodiments by taking terminal devices and servers integrated or installed with relevant execution codes as examples. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments.
图2为本申请实施例提供的一种数据清洗方法的信令交互图。本实施例涉及的是终端设备和服务器如何对检测数据进行清洗的具体过程。如图2所示,该方法包括:FIG. 2 is a signaling interaction diagram of a data cleaning method provided by an embodiment of the present application. This embodiment relates to the specific process of how the terminal device and the server clean the detection data. As shown in Figure 2, the method includes:
S201、终端设备获取检测设备在第一采集周期内的检测数据。S201. The terminal device acquires the detection data of the detection device in the first collection period.
其中,本申请实施例对于第一采集周期不做限制,可以根据检测设备的检测频率具体设置。示例性的,检测设备可以将十秒作为一个采集周期,每隔十秒终端设备获取一次检测设备的检测数据。Wherein, the embodiment of the present application does not limit the first collection period, which may be specifically set according to the detection frequency of the detection device. Exemplarily, the detection device may take ten seconds as a collection period, and the terminal device acquires the detection data of the detection device every ten seconds.
在一些实施例中,终端设备还可以根据检测设备的应用环境,配置对应的采集协议和数据字典表。其中,应用环境,示例性的可以为温度、湿度和监控设备的电压等环境,采集协议示例性的可以为串行通信协议(Modbus)、远程终端单元(Remote Terminal Unit,RTU)协议等,数据字典是描述数据的信息集合,是对系统中使用的所有数据元素的定义的集合。终端设备通过采集协议采集检测设备的检测数据,通过数据字典表确定采集到的检测数据的类型。In some embodiments, the terminal device may also configure the corresponding acquisition protocol and data dictionary table according to the application environment of the detection device. Among them, the application environment can be exemplified by the environment such as temperature, humidity and the voltage of the monitoring equipment, and the acquisition protocol can be exemplified by serial communication protocol (Modbus), Remote Terminal Unit (Remote Terminal Unit, RTU) protocol, etc., data A dictionary is a collection of information describing data, a collection of definitions for all data elements used in the system. The terminal device collects the detection data of the detection device through the acquisition protocol, and determines the type of the collected detection data through the data dictionary table.
S202、根据检测数据的阈值范围,终端设备对检测数据进行数据清洗,数据清洗用于丢弃检测数据中的无效值,检测数据的阈值范围由服务器指示。S202. According to the threshold range of the detection data, the terminal device performs data cleaning on the detection data, the data cleaning is used to discard invalid values in the detection data, and the threshold range of the detection data is indicated by the server.
在本步骤中,当终端设备获取检测设备在第一采集周期内的检测数据之后,终端设备在将检测数据上传给服务器之前,还可以根据检测数据的阈值范围对检测数据进行数据清洗,丢弃检测数据中的无效值。In this step, after the terminal device acquires the detection data of the detection device in the first collection period, before uploading the detection data to the server, the terminal device may also perform data cleaning on the detection data according to the threshold range of the detection data, and discard the detection data. Invalid value in data.
由于正常的检测数据通常在一定范围内增加或减少,对于增幅或降幅过大的检测数据,可以认为其为无效值。因此,数据清洗可以通过检测数据的阈值范围进行,通过判断检测数据是否在阈值范围内,从而确定该检测数据是否需要进行清洗。示例性的,若检测数据不在阈值范围内,则确定该检测数据为无效值,则将该检测数据丢弃;若检测数据在阈值范围内,则确定该检测数据为有效,则在终端设备中保存该检测数据。Since the normal detection data usually increases or decreases within a certain range, the detection data whose increase or decrease is too large can be regarded as invalid values. Therefore, data cleaning can be performed through the threshold range of the detection data, and by judging whether the detection data is within the threshold range, it is determined whether the detection data needs to be cleaned. Exemplarily, if the detection data is not within the threshold range, it is determined that the detection data is an invalid value, and the detection data is discarded; if the detection data is within the threshold range, it is determined that the detection data is valid, and then saved in the terminal device. the detection data.
在一些实施例中,终端设备还可以检测可用数据的一致性,或者,检查可用数据是否具有缺失值。In some embodiments, the terminal device may also detect the consistency of the available data, or check whether the available data has missing values.
S203、若数据清洗后存在可用数据,终端设备则在第一上报周期将可用数据上报给服务器。S203: If there is available data after data cleaning, the terminal device reports the available data to the server in the first reporting period.
在本步骤中,在终端设备对检测数据进行清洗后,若数据清洗后存在可用数据,终端设备则在第一上报周期将可用数据上报给服务器。In this step, after the terminal device cleans the detection data, if there is available data after the data cleaning, the terminal device reports the available data to the server in the first reporting period.
示例性的,每隔t时间作为一个上传周期,终端设备中的数据清洗程序会检查最新的检测数据是否被丢弃。如果检测到丢弃标志为False,则将清洗后存在的可用数据上传给服务器。其中,本申请实施例对于如何上传可用数据不做限制,示例性的,可以通过3G网络或4G网络上传。Exemplarily, every time t is used as an upload cycle, and the data cleaning program in the terminal device will check whether the latest detection data is discarded. If it is detected that the discard flag is False, the available data that exists after cleaning is uploaded to the server. The embodiments of the present application do not limit how to upload the available data. Exemplarily, the data can be uploaded through a 3G network or a 4G network.
本申请实施例中,由于上传的可用数据是经过终端设备初步清洗后的数据,从而可以避免海量异构传感器的检测数据直接被上传到服务器进行处理,降低了网络的负载,同时也减少了费用。In the embodiment of the present application, since the uploaded available data is the data after preliminary cleaning by the terminal device, it is possible to avoid the detection data of massive heterogeneous sensors from being directly uploaded to the server for processing, which reduces the load of the network and also reduces the cost. .
S204、服务器在第一更新周期,根据终端设备在第一上报周期上报的可用数据,生成检测数据的第一范围。S204. In the first update period, the server generates a first range of detection data according to the available data reported by the terminal device in the first reporting period.
在一些实施例中,服务器每隔一个更新周期产生一个新的检测数据的阈值范围。本申请实施例对于如何产生检测数据的阈值范围不做限制,一种可选的实施方式可以通过降噪自编码长短期记忆网络(denoising autoencoder long short-term memory,DAE-LSTM)模型产生检测数据的阈值范围,通过向DAE-LSTM模型输入可用数据,获取DAE-LSTM模型输出的检测数据的阈值范围,该DAE-LSTM模型是由历史可用数据和历史阈值范围训练建立的。In some embodiments, the server generates a new threshold range of detection data every update period. The embodiments of the present application do not limit how to generate the threshold range of the detection data. An optional implementation may generate the detection data through a denoising autoencoder long short-term memory (DAE-LSTM) model. The threshold range of the DAE-LSTM model is obtained by inputting the available data to the DAE-LSTM model to obtain the threshold range of the detection data output by the DAE-LSTM model. The DAE-LSTM model is established by training from the historical available data and the historical threshold range.
图3为本申请实施例提供的一种DAE-LSTM模型的结构示意图,如图3所示,DAE-LSTM模型的输入是当前经过处理的可用数据,输出是预测的下一时刻的阈值范围,并通过设置可以有5%的上下波动。FIG. 3 is a schematic structural diagram of a DAE-LSTM model provided by an embodiment of the present application. As shown in FIG. 3 , the input of the DAE-LSTM model is currently processed available data, and the output is the predicted threshold range at the next moment, And through the settings can have a 5% fluctuation.
该模型由一个全连接网络(factorization machine supported neuralnetwork,FNN)和一个长短期记忆网络(long short-term memory,LSTM)组成,降噪自编码(denoising autoencoder,DAE)用于初始化全连接层的参数。将可用数据Xt输入FNN,得到的FNN的输出数据rt,再将rt输入到LSTM,最后LSTM输出ht。DAE-LSTM模型是基于时间序列的网络,因此应该向网络中输入一个时间序列。The model consists of a fully connected network (factorization machine supported neural network, FNN) and a long short-term memory (LSTM) network, and denoising autoencoder (DAE) is used to initialize the parameters of the fully connected layer. . Input the available data X t into the FNN, and get the output data rt of the FNN, then input rt into the LSTM, and finally the LSTM outputs h t . The DAE-LSTM model is a time series based network, so a time series should be fed into the network.
下面给出模型的输入格式公式(1),其中Nx为输入序列的个数:The input format formula (1) of the model is given below, where N x is the number of input sequences:
关于训练过程,LSTM部分的损失函数如公式(2):Regarding the training process, the loss function of the LSTM part is as formula (2):
其中,η2为调节系数。Among them, η 2 is the adjustment coefficient.
在DAE-LSTM模型中,Dense(x,f)表示一个包含x个神经元并且使用f作为激活函数的全连接层。线性整流函数(Relu)的表达式公式(3)如下:In the DAE-LSTM model, Dense(x, f) represents a fully connected layer with x neurons and using f as the activation function. The expression formula (3) of the linear rectification function (Relu) is as follows:
f(x)=max(x,0) (3)f(x)=max(x, 0) (3)
双曲正切函数(Tanh)的公式(4)如下:The formula (4) of the hyperbolic tangent function (Tanh) is as follows:
S205、服务器将检测数据的第一范围发送给终端设备。S205, the server sends the first range of the detection data to the terminal device.
在本步骤中,若生成检测数据的第一范围,服务器可以直接将检测数据的第一范围发送给终端设备,也可以再满足一定条件时将检测数据的第一范围发送给终端设备。In this step, if the first range of detection data is generated, the server may directly send the first range of detection data to the terminal device, or may send the first range of detection data to the terminal device when certain conditions are met.
在一些可选的实施方式中,服务器可以在将检测数据的阈值范围的第一更新值发送给终端设备之前,接收终端设备在第二上报周期上报的可用数据,第二上报周期晚于第一上报周期。若第二上报周期上报的可用数据不在第一范围内,服务器则将第一范围发送给终端设备。In some optional implementations, the server may receive the available data reported by the terminal device in the second reporting period before sending the first update value of the threshold range of the detection data to the terminal device, and the second reporting period is later than the first reporting cycle. If the available data reported in the second reporting period is not within the first range, the server sends the first range to the terminal device.
在一些实施例中,作为第四级清洗机制,对于可能存在的不当的检测数据的阈值范围长时间不被更新,可以在将检测数据的第一范围发送给终端设备之前,获取检测数据的阈值范围的未更新时长,若未更新时长超过时长阈值,服务器则将检测数据的第一范围发送给终端设备。In some embodiments, as a fourth-level cleaning mechanism, if the threshold range of possible improper detection data is not updated for a long time, the threshold value of detection data may be obtained before sending the first range of detection data to the terminal device The unupdated duration of the range. If the unupdated duration exceeds the duration threshold, the server sends the first range of detection data to the terminal device.
在一些实施例中在将检测数据的第一范围发送给终端设备之前,用户可以访问服务器,通过设置参数,干预是否将检测数据的第一范围发送给终端设备。In some embodiments, before sending the first range of detection data to the terminal device, the user may access the server and intervene whether to send the first range of detection data to the terminal device by setting parameters.
本申请实施例提供的数据清洗方法,终端设备获取检测设备在第一采集周期内的检测数据,并根据检测数据的阈值范围,对检测数据进行数据清洗,数据清洗用于丢弃检测数据中的无效值,检测数据的阈值范围由服务器指示,若数据清洗后存在可用数据,终端设备则在第一上报周期将可用数据上报给服务器。在第一更新周期,服务器根据终端设备在第一上报周期上报的可用数据,生成检测数据的第一范围,并将检测数据的第一范围发送给终端设备。通过上述方式,终端设备可以根据服务器的指示实时更新检测数据的范围,提高了数据清洗效果。In the data cleaning method provided by the embodiment of the present application, the terminal device acquires the detection data of the detection device in the first collection period, and performs data cleaning on the detection data according to the threshold range of the detection data, and the data cleaning is used to discard invalid detection data. The threshold range of the detected data is indicated by the server. If there is available data after data cleaning, the terminal device reports the available data to the server in the first reporting cycle. In the first update cycle, the server generates a first range of detection data according to the available data reported by the terminal device in the first reporting cycle, and sends the first range of detection data to the terminal device. In the above manner, the terminal device can update the range of detection data in real time according to the instruction of the server, which improves the data cleaning effect.
在上述实施例的基础上,终端设备对检测数据进行数据清洗丢弃无效值时,可能将第一采集周期内的所有检测数据全部丢弃,此时终端设备需要预测第一采集周期内的检测数据进行上传,下面对如何预测第一采集周期内的检测数据进行说明。图4为本申请实施例提供的另一种数据清洗方法的信令交互图,如图4所示,该方法还包括:On the basis of the above embodiment, when the terminal device cleans the detection data and discards invalid values, it may discard all the detection data in the first collection cycle. At this time, the terminal device needs to predict the detection data in the first collection cycle Upload, the following describes how to predict the detection data in the first collection period. FIG. 4 is a signaling interaction diagram of another data cleaning method provided by an embodiment of the present application. As shown in FIG. 4 , the method further includes:
S301、终端设备获取检测设备在第一采集周期内的检测数据。S301. The terminal device acquires the detection data of the detection device in the first collection period.
S302、根据检测数据的阈值范围,终端设备对检测数据进行数据清洗,数据清洗用于丢弃检测数据中的无效值,检测数据的阈值范围由服务器指示。S302. According to the threshold range of the detection data, the terminal device performs data cleaning on the detection data, and the data cleaning is used to discard invalid values in the detection data, and the threshold range of the detection data is indicated by the server.
步骤S301-S302的技术名词、技术效果、技术特征,以及可选实施方式,可参照图2所示的步骤S201-S202理解,对于重复的内容,在此不再累述。The technical terms, technical effects, technical features, and optional implementations of steps S301-S302 can be understood with reference to steps S201-S202 shown in FIG. 2 , and repeated content will not be repeated here.
S303、若数据清洗后不存在可用数据,终端设备则对历史可用数据进行直线插补,预测第一采集周期的可用数据。S303. If there is no available data after data cleaning, the terminal device performs linear interpolation on the historically available data to predict the available data in the first collection period.
在一些实施例中,一个可行的直线方法是计算本地数据库中的前两个可用数据的线性插补作为第一采集周期的可用数据。其中,线性插补的公式(5),式如下:In some embodiments, a possible straight-line approach is to compute a linear interpolation of the first two available data in the local database as the available data for the first acquisition cycle. Among them, the formula (5) of linear interpolation is as follows:
其中,tn为第n个采集周期,xn为第n个采集周期采集到的检测数据。Among them, t n is the n-th collection period, and x n is the detection data collected in the n-th collection period.
S304、终端设备在第一上报周期将预测的可用数据上报给服务器。S304. The terminal device reports the predicted available data to the server in the first reporting period.
本申请实施例提供的数据清洗方法,若数据清洗后不存在可用数据,终端设备则对历史可用数据进行直线插补,预测第一采集周期的可用数据,在第一上报周期将预测的可用数据上报给服务器。通过该方式,可以提高检测数据的完整性。According to the data cleaning method provided by the embodiment of the present application, if there is no available data after data cleaning, the terminal device performs linear interpolation on the historically available data, predicts the available data in the first collection period, and estimates the available data in the first reporting period. report to the server. In this way, the integrity of the detection data can be improved.
下面对如何终端设备如何更新检测数据的阈值范围进行说明。图5为本申请实施例提供的再一种数据清洗方法的信令交互图,如图5所示,该方法还包括:The following describes how the terminal device updates the threshold range of the detection data. FIG. 5 is a signaling interaction diagram of still another data cleaning method provided by an embodiment of the present application. As shown in FIG. 5 , the method further includes:
S401、终端设备获取检测设备在第一采集周期内的检测数据。S401. The terminal device acquires the detection data of the detection device in the first collection period.
S402、根据检测数据的阈值范围,终端设备对检测数据进行数据清洗,数据清洗用于丢弃检测数据中的无效值,检测数据的阈值范围由服务器指示。S402. According to the threshold range of the detection data, the terminal device performs data cleaning on the detection data, and the data cleaning is used to discard invalid values in the detection data, and the threshold range of the detection data is indicated by the server.
S403、若数据清洗后存在可用数据,终端设备则在第一上报周期将可用数据上报给服务器。S403. If there is available data after data cleaning, the terminal device reports the available data to the server in the first reporting period.
S404、服务器在第一更新周期,根据终端设备在第一上报周期上报的可用数据,生成检测数据的第一范围。S404. In the first update period, the server generates a first range of detection data according to the available data reported by the terminal device in the first reporting period.
S405、服务器将检测数据的第一范围发送给终端设备。S405. The server sends the first range of detection data to the terminal device.
S406、根据第一范围,终端设备更新检测数据的阈值范围。S406. According to the first range, the terminal device updates the threshold range of the detection data.
本申请实施例提供的数据清洗方法,终端设备获取检测设备在第一采集周期内的检测数据,并根据检测数据的阈值范围,对检测数据进行数据清洗,数据清洗用于丢弃检测数据中的无效值,检测数据的阈值范围由服务器指示,若数据清洗后存在可用数据,终端设备则在第一上报周期将可用数据上报给服务器。在第一更新周期,服务器根据终端设备在第一上报周期上报的可用数据,生成检测数据的第一范围,并将检测数据的第一范围发送给终端设备。通过上述方式,终端设备可以根据服务器的指示实时更新检测数据的范围,提高了数据清洗效果。In the data cleaning method provided by the embodiment of the present application, the terminal device acquires the detection data of the detection device in the first collection period, and performs data cleaning on the detection data according to the threshold range of the detection data, and the data cleaning is used to discard invalid detection data. The threshold range of the detected data is indicated by the server. If there is available data after data cleaning, the terminal device reports the available data to the server in the first reporting cycle. In the first update cycle, the server generates a first range of detection data according to the available data reported by the terminal device in the first reporting cycle, and sends the first range of detection data to the terminal device. In the above manner, the terminal device can update the range of detection data in real time according to the instruction of the server, which improves the data cleaning effect.
图6为本申请实施例提供的一种数据清洗方法的网络架构示意图,如图6所示,上述实施例中的服务器可以为智能决策服务器,用于根据可用数据产生检测数据的阈值范围,并将可用数据存储在数据库中,在上传可用数据时,终端设备不会直接向智能决策服务器发送信息,而是通过消息队列遥测传输(message queuing telemetry transport,MQTT)服务器进行中间转发。此外,用户需要与智能决策服务器交互时,也需要通过网站(web)服务器转发。FIG. 6 is a schematic diagram of a network architecture of a data cleaning method provided by an embodiment of the present application. As shown in FIG. 6 , the server in the above embodiment may be an intelligent decision server, which is used to generate a threshold range of detection data according to available data, and The available data is stored in the database. When uploading the available data, the terminal device will not directly send the information to the intelligent decision server, but intermediately forward it through the message queuing telemetry transport (MQTT) server. In addition, when the user needs to interact with the intelligent decision server, it also needs to be forwarded through a website (web) server.
图7为本申请实施例提供的一种数据的四级清洗机制的示意图,如图7所示,第一级别清洗机制,终端设备在时间间隔T1的采集周期进行检测数据的采集,并对检测数据进行数据清洗,将清洗通过的数据存储在本地存储器中,将清洗未通过的数据丢弃。FIG. 7 is a schematic diagram of a four-level data cleaning mechanism provided by an embodiment of the present application. As shown in FIG. 7 , in the first-level cleaning mechanism, the terminal device collects detection data in the collection period of the time interval T1, and performs The detection data is cleaned, the cleaned data is stored in the local storage, and the data that has not been cleaned is discarded.
第二级别清洗机制,终端设备在时间间隔T2的上报周期将清洗后保留的可用数据的上传。In the second -level cleaning mechanism, the terminal device uploads the available data retained after cleaning in the reporting period of the time interval T2.
第三级别清洗机制,终端设备在时间间隔T3的更新周期,将最新的可用数据阈值产生模型并获取阈值产生模型输出的当前的检测数据的第一范围。随后获取一个最新的可用数据,若可用数据在当前的第一范围内,则不将第一范围发送给终端设备更新阈值范围,若可用数据不在当前的第一范围内,则将第一范围发送给终端设备更新阈值范围。In the third-level cleaning mechanism, the terminal device generates a model with the latest available data threshold and obtains the first range of the current detection data output by the threshold generation model during the update period of the time interval T3. Then obtain the latest available data, if the available data is within the current first range, the first range will not be sent to the terminal device to update the threshold range, and if the available data is not within the current first range, the first range will be sent Update threshold ranges for end devices.
第四级别清洗机制,终端设备在时间间隔T4的强制更新周期,在未更新时长超过时间阈值,则强制更新阈值范围。In the fourth -level cleaning mechanism, the terminal device is forced to update the threshold range when the unupdated time exceeds the time threshold in the mandatory update period of the time interval T4.
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the steps of implementing the above method embodiments can be completed by program instructions related to hardware, the aforementioned program can be stored in a computer-readable storage medium, and when the program is executed, execute It includes the steps of the above method embodiments; and the aforementioned storage medium includes: ROM, RAM, magnetic disk or optical disk and other media that can store program codes.
图8为本申请实施例提供的一种数据清洗装置的结构示意图。该数据传输装置可以通过软件、硬件或者两者的结合实现,可以为前述所说的终端设备。FIG. 8 is a schematic structural diagram of a data cleaning apparatus according to an embodiment of the present application. The data transmission device may be implemented by software, hardware or a combination of the two, and may be the aforementioned terminal equipment.
如图8所示,该数据清洗装置500包括:接收模块501、处理模块502和发送模块503;As shown in FIG. 8 , the
接收模块501,用于获取检测设备在第一采集周期内的检测数据。The receiving
处理模块502,用于根据检测数据的阈值范围,对检测数据进行数据清洗,数据清洗用于丢弃检测数据中的无效值,检测数据的阈值范围由服务器指示。The
发送模块503,用于若数据清洗后存在可用数据,则在第一上报周期将可用数据上报给服务器。The sending
在一种可选的实施方式中,处理模块502还用于若数据清洗后不存在可用数据,则对历史可用数据进行直线插补,预测第一采集周期的可用数据;发送模块503,还用于在第一上报周期将预测的可用数据上报给服务器。In an optional implementation manner, the
在一种可选的实施方式中,接收模块501,还用于接收服务器发送的检测数据的第一范围,第一范围是根据第一采集周期的可用数据确定的;处理模块502,还用于根据第一范围,更新检测数据的阈值范围。In an optional implementation manner, the receiving
本申请实施例提供的数据清洗装置,可以执行上述方法实施例中终端设备侧的数据清洗方法,其实现原理和技术效果类似,在此不再赘述。The data cleaning apparatus provided in the embodiment of the present application can execute the data cleaning method on the terminal device side in the above method embodiment, and the implementation principle and technical effect thereof are similar, and details are not described herein again.
图9为本申请实施例提供的另一种数据清洗装置的结构示意图。该数据传输装置可以通过软件、硬件或者两者的结合实现,可以为前述所说的服务器。FIG. 9 is a schematic structural diagram of another data cleaning apparatus according to an embodiment of the present application. The data transmission device may be implemented by software, hardware or a combination of the two, and may be the aforementioned server.
如图9所示,该数据清洗装置600包括:接收模块601、处理模块602和发送模块603;As shown in FIG. 9 , the
接收模块601,用于接收终端设备在第一上报周期上报的可用数据,可用数据是终端设备根据检测数据的阈值范围,对检测设备在第一采集周期内的检测数据进行数据清洗后得到的数据,数据清洗用于丢弃检测数据中的无效值,检测数据的阈值范围由服务器指示。The receiving
处理模块602,用于在第一更新周期,根据终端设备在第一上报周期上报的可用数据,生成检测数据的第一范围。The
发送模块603,用于将检测数据的第一范围发送给终端设备。The sending
在一种可选的实施方式中,接收模块601,还用于接收终端设备在第二上报周期上报的可用数据,第二上报周期晚于第一上报周期;发送模块603,还用于若第二上报周期上报的可用数据不在第一范围内,则将第一范围发送给终端设备。In an optional implementation manner, the receiving
在一种可选的实施方式中,接收模块601,还用于获取检测数据的阈值范围的未更新时长;发送模块603,还用于若未更新时长超过时长阈值,则将检测数据的第一范围发送给终端设备。In an optional implementation manner, the receiving
本申请实施例提供的数据清洗装置,可以执行上述方法实施例中服务器侧的数据清洗方法,其实现原理和技术效果类似,在此不再赘述。The data cleaning apparatus provided in the embodiment of the present application can execute the data cleaning method on the server side in the above method embodiment, and the implementation principle and technical effect thereof are similar, and are not repeated here.
图10为本申请实施例提供的再一种数据清洗装置的结构示意图。如图10所示,该数据传输装置可以包括:至少一个处理器701和存储器702。图10示出的是以一个处理器为例的电子设备。FIG. 10 is a schematic structural diagram of still another data cleaning apparatus provided by an embodiment of the present application. As shown in FIG. 10 , the data transmission apparatus may include: at least one
存储器702,用于存放程序。具体地,程序可以包括程序代码,程序代码包括计算机操作指令。The
存储器702可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。
处理器701用于执行存储器702存储的计算机执行指令,以实现上述终端设备侧的数据清洗方法;The
或者,处理器701用于执行存储器702存储的计算机执行指令,以实现上述服务器侧的数据清洗方法;Alternatively, the
其中,处理器701可能是一个中央处理器(Central Processing Unit,简称为CPU),或者是特定集成电路(Application Specific Integrated Circuit,简称为ASIC),或者是被配置成实施本申请实施例的一个或多个集成电路。The
可选的,在具体实现上,如果通信接口、存储器702和处理器701独立实现,则通信接口、存储器702和处理器701可以通过总线相互连接并完成相互间的通信。总线可以是工业标准体系结构(Industry Standard Architecture,简称为ISA)总线、外部设备互连(Peripheral Component,简称为PCI)总线或扩展工业标准体系结构(Extended IndustryStandard Architecture,简称为EISA)总线等。总线可以分为地址总线、数据总线、控制总线等,但并不表示仅有一根总线或一种类型的总线。Optionally, in terms of specific implementation, if the communication interface, the
可选的,在具体实现上,如果通信接口、存储器702和处理器701集成在一块芯片上实现,则通信接口、存储器702和处理器701可以通过内部接口完成通信。Optionally, in terms of specific implementation, if the communication interface, the
本申请还提供一种程序产品,该程序产品包括执行指令,该执行指令存储在可读存储介质中。数据清洗装置的至少一个处理器可以从可读存储介质读取该执行指令,至少一个处理器执行该执行指令使得数据清洗装置实施上述各种实施方式提供的数据清洗方法。The present application also provides a program product including execution instructions stored in a readable storage medium. At least one processor of the data cleaning apparatus can read the execution instruction from the readable storage medium, and the at least one processor executes the execution instruction to cause the data cleaning apparatus to implement the data cleaning methods provided in the above-mentioned various embodiments.
本发明还提供了一种计算机可读存储介质,该计算机可读存储介质可以包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random AccessMemory)、磁盘或者光盘等各种可以存储程序代码的介质,具体的,该计算机可读存储介质中存储有程序指令,程序指令用于上述实施例中的方法。The present invention also provides a computer-readable storage medium, the computer-readable storage medium may include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory) Various media that can store program codes, such as a magnetic disk, a magnetic disk, or an optical disk, specifically, the computer-readable storage medium stores program instructions, and the program instructions are used for the methods in the foregoing embodiments.
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions described in the foregoing embodiments can still be modified, or some or all of the technical features thereof can be equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the present invention. scope.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010079670.2A CN111324600A (en) | 2020-02-04 | 2020-02-04 | Data cleaning method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010079670.2A CN111324600A (en) | 2020-02-04 | 2020-02-04 | Data cleaning method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111324600A true CN111324600A (en) | 2020-06-23 |
Family
ID=71165156
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010079670.2A Pending CN111324600A (en) | 2020-02-04 | 2020-02-04 | Data cleaning method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111324600A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114490596A (en) * | 2021-12-08 | 2022-05-13 | 大唐水电科学技术研究院有限公司 | Method for cleaning transformer oil chromatographic data based on machine learning and neural network |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106933992A (en) * | 2017-02-24 | 2017-07-07 | 北京华安普惠高新技术有限公司 | Distributed data purging system and method based on data analysis |
US20170359194A1 (en) * | 2014-12-19 | 2017-12-14 | Orange | Method for transmitting data from a sensor |
CN108347360A (en) * | 2018-02-07 | 2018-07-31 | 网宿科技股份有限公司 | The acquisition method and network server of network testing data |
CN108882223A (en) * | 2018-05-30 | 2018-11-23 | 努比亚技术有限公司 | Using data reporting method, mobile terminal and computer readable storage medium |
CN109756761A (en) * | 2019-01-16 | 2019-05-14 | 四川长虹电器股份有限公司 | Behavior big data based on smart television obtains system and method |
CN109889604A (en) * | 2019-03-19 | 2019-06-14 | 深圳市中电数通智慧安全科技股份有限公司 | A kind of internet-of-things terminal parameter management method, device and server |
CN109978079A (en) * | 2019-04-10 | 2019-07-05 | 东北电力大学 | A kind of data cleaning method of improved storehouse noise reduction self-encoding encoder |
CN110290466A (en) * | 2019-06-14 | 2019-09-27 | 中国移动通信集团黑龙江有限公司 | Floor discrimination method, device, equipment and computer storage medium |
-
2020
- 2020-02-04 CN CN202010079670.2A patent/CN111324600A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170359194A1 (en) * | 2014-12-19 | 2017-12-14 | Orange | Method for transmitting data from a sensor |
CN106933992A (en) * | 2017-02-24 | 2017-07-07 | 北京华安普惠高新技术有限公司 | Distributed data purging system and method based on data analysis |
CN108347360A (en) * | 2018-02-07 | 2018-07-31 | 网宿科技股份有限公司 | The acquisition method and network server of network testing data |
CN108882223A (en) * | 2018-05-30 | 2018-11-23 | 努比亚技术有限公司 | Using data reporting method, mobile terminal and computer readable storage medium |
CN109756761A (en) * | 2019-01-16 | 2019-05-14 | 四川长虹电器股份有限公司 | Behavior big data based on smart television obtains system and method |
CN109889604A (en) * | 2019-03-19 | 2019-06-14 | 深圳市中电数通智慧安全科技股份有限公司 | A kind of internet-of-things terminal parameter management method, device and server |
CN109978079A (en) * | 2019-04-10 | 2019-07-05 | 东北电力大学 | A kind of data cleaning method of improved storehouse noise reduction self-encoding encoder |
CN110290466A (en) * | 2019-06-14 | 2019-09-27 | 中国移动通信集团黑龙江有限公司 | Floor discrimination method, device, equipment and computer storage medium |
Non-Patent Citations (2)
Title |
---|
曾凯,高亮,王新颖: "《大数据治理及数据仓库模型设计》", 31 July 2017, 电子科技大学出版社 * |
符裕红: "《SPSS生物统计实例详解》", 30 November 2018 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114490596A (en) * | 2021-12-08 | 2022-05-13 | 大唐水电科学技术研究院有限公司 | Method for cleaning transformer oil chromatographic data based on machine learning and neural network |
CN114490596B (en) * | 2021-12-08 | 2024-05-10 | 大唐水电科学技术研究院有限公司 | Method for cleaning transformer oil chromatographic data based on machine learning and neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105682121A (en) | Data acquisition method for sensor network, gateway and data acquisition system | |
CN109246210B (en) | Internet of things communication method and device | |
CN108154230A (en) | The monitoring method and monitoring device of deep learning processor | |
CN110749462B (en) | Industrial equipment fault detection method and system based on edge calculation | |
JP2018147172A (en) | Abnormality detection device, abnormality detection method, and program | |
WO2024039508A1 (en) | Trigger point detection for online root cause analysis and system fault diagnosis | |
WO2025016044A1 (en) | Machine tool prediction method and apparatus, and terminal device and computer-readable storage medium | |
CN112697267A (en) | Abnormal vibration detection device for industrial equipment | |
CN109961196A (en) | Method and corresponding device for servicing CNC machine tools | |
CN107332707A (en) | A kind of acquisition method and device of SDN measurement data | |
CN117112339A (en) | Abnormality detection method, abnormality detection device, electronic device, and computer program product | |
CN111324600A (en) | Data cleaning method and device | |
CN116074215A (en) | Network quality detection method, device, equipment and storage medium | |
CN111669411A (en) | Anomaly detection method and system for industrial control equipment | |
CN109270885B (en) | Data communication method, device and equipment for monitoring PLC system and storage medium | |
US12056000B1 (en) | Anomaly detection by analyzing logs using machine learning | |
CN114138771B (en) | Abnormal data processing method, device and electronic equipment | |
CN108362957B (en) | Equipment fault diagnosis method and device, storage medium and electronic equipment | |
CN118659962A (en) | Message channel failure transfer method and device based on AI self-healing | |
CN107688878B (en) | Air quality prediction method and device | |
CN114019946B (en) | Monitoring data processing method and device for industrial control terminal | |
WO2022183739A1 (en) | Data processing method and apparatus, energy information gateway, and internet of energy system | |
JP2023106103A (en) | Traffic analysis apparatus, traffic analysis program, and traffic analysis method | |
CN116090021A (en) | Authentication method and device of access equipment and communication equipment | |
CN113204411A (en) | Data processing method, intermediate processing equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200623 |