CN111324600A - Data cleaning method and device - Google Patents

Data cleaning method and device Download PDF

Info

Publication number
CN111324600A
CN111324600A CN202010079670.2A CN202010079670A CN111324600A CN 111324600 A CN111324600 A CN 111324600A CN 202010079670 A CN202010079670 A CN 202010079670A CN 111324600 A CN111324600 A CN 111324600A
Authority
CN
China
Prior art keywords
data
detection data
range
detection
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010079670.2A
Other languages
Chinese (zh)
Inventor
邬惠峰
孙丹枫
陈佰平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202010079670.2A priority Critical patent/CN111324600A/en
Publication of CN111324600A publication Critical patent/CN111324600A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention provides a data cleaning method and a device, wherein the method comprises the following steps: the method comprises the steps that terminal equipment obtains detection data of the detection equipment in a first collection period, data cleaning is conducted on the detection data according to a threshold range of the detection data, the data cleaning is used for discarding invalid values in the detection data, the threshold range of the detection data is indicated by a server, and if available data exist after the data cleaning, the terminal equipment reports the available data to the server in a first reporting period. In the first updating period, the server generates a first range of the detection data according to the available data reported by the terminal equipment in the first reporting period, and sends the first range of the detection data to the terminal equipment. By the mode, the terminal equipment can update the range of the detection data in real time according to the indication of the server, and the data cleaning effect is improved.

Description

Data cleaning method and device
Technical Field
The invention relates to the technical field of information, in particular to a data cleaning method and device.
Background
In data acquisition, the sensor is generally used to acquire and send data to an edge terminal such as an edge gateway, an edge controller, etc., and then the edge terminal sends the acquired data to a server for downloading by a user. Due to the influence of the surrounding environment of the sensor, the collected data usually contains a large amount of invalid data. Therefore, a data cleansing is also required before using the data collected by the sensors.
In the prior art, data cleaning is usually performed by edge end processing. Specifically, a data cleaning algorithm can be burnt into the edge terminal in advance, and the edge terminal cleans the data acquired by the sensor through the cleaning algorithm.
However, due to limited resources at the edge, it is often impossible to provide the computational power required to update the cleaning algorithm on its own. Therefore, in the prior art, the threshold range of the data corresponding to the data cleaning algorithm is often fixed and cannot be dynamically adjusted according to the actual situation, so that the cleaning effect of the existing method for cleaning the data by the edge end is often poor.
Disclosure of Invention
The invention provides a data cleaning method and device, and aims to solve the technical problem that the data cleaning effect is poor in the prior art.
The first aspect of the present invention provides a data cleaning method, which is applied to a terminal device, and the method includes:
acquiring detection data of detection equipment in a first acquisition period;
performing data cleaning on the detection data according to the threshold range of the detection data, wherein the data cleaning is used for discarding invalid values in the detection data, and the threshold range of the detection data is indicated by a server;
and if available data exist after the data are cleaned, reporting the available data to the server in a first reporting period.
In an optional embodiment, after the performing data cleansing on the detection data, the method further includes:
if the available data does not exist after the data is cleaned, performing linear interpolation on the historical available data to predict the available data of the first acquisition period;
and reporting the predicted available data to a server in the first reporting period.
In an optional implementation manner, after reporting the available data to the server in the first reporting period, the method further includes:
receiving a first range of the detection data sent by the server, wherein the first range is determined according to available data of the first acquisition period;
and updating the threshold range of the detection data according to the first range.
A second aspect of the present invention provides a data cleansing method applied to a server, the method including:
receiving available data reported by a terminal device in a first reporting period, wherein the available data is obtained by the terminal device after the terminal device performs data cleaning on detection data of the detection device in a first acquisition period according to a threshold range of the detection data, the data cleaning is used for discarding invalid values in the detection data, and the threshold range of the detection data is indicated by a server;
in a first updating period, generating a first range of the detection data according to the available data reported by the terminal equipment in the first reporting period;
and sending the first range of the detection data to the terminal equipment.
In an optional implementation manner, before the sending the first updated value of the threshold range of the detection data to the terminal device, the method further includes:
receiving available data reported by the terminal equipment in a second reporting period, wherein the second reporting period is later than the first reporting period;
the sending the first range of the detection data to the terminal device includes:
and if the available data reported by the second reporting period is not in the first range, sending the first range to the terminal equipment.
In an optional implementation manner, before the sending the first range of the detection data to the terminal device, the method further includes:
acquiring the non-updated duration of the threshold range of the detection data;
the sending the first range of the detection data to the terminal device includes:
and if the non-updated duration exceeds a duration threshold, sending the first range of the detection data to the terminal equipment.
A third aspect of the present invention provides a data cleansing apparatus comprising: the device comprises a receiving module, a processing module and a sending module;
the receiving module is used for acquiring detection data of the detection equipment in a first acquisition period;
the processing module is used for performing data cleaning on the detection data according to the threshold range of the detection data, the data cleaning is used for discarding invalid values in the detection data, and the threshold range of the detection data is indicated by a server;
the sending module is used for reporting the available data to a server in a first reporting period if the available data exists after the data is cleaned.
In an optional embodiment, the processing module is further configured to perform linear interpolation on historical available data to predict available data of the first acquisition cycle if no available data exists after data cleaning;
the sending module is further configured to report the predicted available data to a server in the first reporting period.
In an optional implementation manner, the receiving module is further configured to receive a first range of the detection data sent by the server, where the first range is determined according to available data of the first acquisition cycle;
the processing module is further configured to update the threshold range of the detection data according to the first range.
A fourth aspect of the present invention provides a data cleansing apparatus comprising: the device comprises a receiving module, a processing module and a sending module;
the receiving module is used for receiving available data reported by the terminal equipment in a first reporting period, wherein the available data is obtained by the terminal equipment after data cleaning is carried out on detection data of the detection equipment in a first acquisition period according to a threshold range of the detection data, the data cleaning is used for discarding invalid values in the detection data, and the threshold range of the detection data is indicated by the server;
the processing module is configured to generate a first range of the detection data according to the available data reported by the terminal device in the first reporting period in a first updating period;
and the sending module is used for sending the first range of the detection data to the terminal equipment.
In an optional implementation manner, the receiving module is further configured to receive available data that is reported by the terminal device in a second reporting period, where the second reporting period is later than the first reporting period;
the sending module is further configured to send the first range to the terminal device if the available data reported in the second reporting period is not within the first range.
In an optional implementation manner, the receiving module is further configured to obtain an un-updated duration of a threshold range of the detection data;
the sending module is further configured to send the first range of the detection data to the terminal device if the non-updated duration exceeds a duration threshold.
In a fifth aspect of the embodiments of the present invention, there is provided a terminal device, including: a memory, a processor and a computer program, the computer program being stored in the memory, the processor running the computer program to perform the first aspect of the invention and the various optional data cleansing methods of the first aspect of the invention.
In a sixth aspect of the embodiments of the present invention, there is provided a server, including: a memory, a processor and a computer program, the computer program is stored in the memory, the processor runs the computer program to execute the optional data cleaning method of the second aspect and the second aspect of the invention.
A seventh aspect of the present invention provides a storage medium having stored thereon a computer program for executing the first aspect and the various optional data cleansing methods of the first aspect.
An eighth aspect of the present invention provides a storage medium having stored thereon a computer program for executing the second aspect and the various optional data cleansing methods of the second aspect.
According to the data cleaning method and device provided by the invention, the terminal equipment acquires the detection data of the detection equipment in the first acquisition period, and performs data cleaning on the detection data according to the threshold range of the detection data, the data cleaning is used for discarding invalid values in the detection data, the threshold range of the detection data is indicated by the server, and if available data exists after the data cleaning, the terminal equipment reports the available data to the server in the first reporting period. In the first updating period, the server generates a first range of the detection data according to the available data reported by the terminal equipment in the first reporting period, and sends the first range of the detection data to the terminal equipment. By the mode, the terminal equipment can update the range of the detection data in real time according to the indication of the server, and the data cleaning effect is improved.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the following briefly introduces the drawings needed to be used in the description of the embodiments or the prior art, and obviously, the drawings in the following description are some embodiments of the present invention, and those skilled in the art can obtain other drawings according to the drawings without inventive labor.
Fig. 1 is a schematic view of a scene of a data cleansing method according to an embodiment of the present application;
fig. 2 is a signaling interaction diagram of a data cleansing method according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a DAE-LSTM model according to an embodiment of the present application;
fig. 4 is a signaling interaction diagram of another data cleansing method according to an embodiment of the present application;
fig. 5 is a signaling interaction diagram of another data cleansing method according to an embodiment of the present application;
fig. 6 is a schematic diagram of a network architecture of a data cleansing method according to an embodiment of the present application;
FIG. 7 is a diagram illustrating a four-level cleansing mechanism for data according to an embodiment of the present disclosure;
FIG. 8 is a schematic structural diagram of a data cleansing apparatus according to an embodiment of the present disclosure;
FIG. 9 is a schematic structural diagram of another data cleansing apparatus according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of another data cleaning apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In data acquisition, the sensor is generally used to acquire and send data to an edge terminal such as an edge gateway, an edge controller, etc., and then the edge terminal sends the acquired data to a server for downloading by a user. Due to the influence of the surrounding environment of the sensor, the collected data usually contains a large amount of invalid data. Therefore, a data cleansing is also required before using the data collected by the sensors. In the prior art, data cleaning is usually performed by edge end processing. Specifically, a data cleaning algorithm can be burnt into the edge terminal in advance, and the edge terminal cleans the data acquired by the sensor through the cleaning algorithm. However, due to limited resources at the edge, it is often impossible to provide the computational power required to update the cleaning algorithm on its own. Therefore, in the prior art, the threshold range of the data corresponding to the data cleaning algorithm is often fixed and cannot be dynamically adjusted according to the actual situation, so that the cleaning effect of the existing method for cleaning the data by the edge end is often poor.
In view of the above problems, the present invention provides a data cleaning method and apparatus, in which a cleaning process of detection data is completed by a terminal device, a determination of a threshold range of the detection data is completed by a server, the terminal device uploads the detection data to the server, and the server sends the threshold range to the terminal device in real time, so that the terminal device updates the range of the detection data in real time according to an instruction of the server, thereby improving a data cleaning effect.
Fig. 1 is a scene schematic diagram of a data cleansing method according to an embodiment of the present application. As shown in fig. 1, the system comprises a detection device 101, a terminal device 102 and a server 103, wherein the terminal device 102 is connected with the detection device 101 and the server 103 respectively. The detection device 101 is configured to collect detection data and send the detection data to the terminal device 102. The terminal apparatus 102 performs data cleansing on the detection data and transmits the cleansed data to the server 103. The server 103 determines a new threshold range of the detection data from the received cleaned detection data.
The detection device 101 may be a sensor, a controller or a driving device, and the detection device 101 is used to collect sensing data such as temperature and position, and may also be used to collect a control signal or a driving signal.
The terminal device 102 may also be referred to as an edge peer, and the terminal device 102 may be an edge gateway, an edge controller, or the like. Illustratively, since an embedded programmable logic controller (ePLC) has easy programming and high reliability, and is widely used in various industrial fields, the ePLC may be used for the terminal device 102.
The server 103 may be a server or a server in a cloud service platform. The server 103 may receive the detection data sent by the terminal device 102, and determine the threshold range of the detection data according to the received detection data.
It should be noted that the technical solution of the present application may be applied to the above application scenarios, but is not limited to this, and may also be applied to other scenarios requiring data cleansing.
The following takes a terminal device and a server integrated or installed with relevant executable codes as an example, and the technical solution of the embodiment of the present application is described in detail with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
Fig. 2 is a signaling interaction diagram of a data cleansing method according to an embodiment of the present application. The embodiment relates to a specific process of how the terminal device and the server clean the detection data. As shown in fig. 2, the method includes:
s201, the terminal device obtains detection data of the detection device in a first acquisition period.
The first acquisition period is not limited in the embodiment of the application, and the first acquisition period can be specifically set according to the detection frequency of the detection device. For example, the detection device may use ten seconds as an acquisition period, and acquire the detection data of the detection device every ten seconds.
In some embodiments, the terminal device may further configure a corresponding collection protocol and data dictionary table according to an application environment of the detection device. The application environment may be, for example, an environment such as temperature, humidity, and voltage of a monitoring device, the acquisition protocol may be, for example, a serial communication protocol (Modbus), a Remote Terminal Unit (RTU) protocol, and the like, and the data dictionary is an information set describing data and is a set defining all data elements used in the system. The terminal equipment collects the detection data of the detection equipment through a collection protocol, and determines the type of the collected detection data through a data dictionary table.
S202, according to the threshold range of the detection data, the terminal device performs data cleaning on the detection data, the data cleaning is used for discarding invalid values in the detection data, and the threshold range of the detection data is indicated by the server.
In this step, after the terminal device obtains the detection data of the detection device in the first acquisition period, before the terminal device uploads the detection data to the server, the terminal device may also perform data cleaning on the detection data according to the threshold range of the detection data, and discard an invalid value in the detection data.
Since normal detection data generally increases or decreases within a certain range, detection data with excessive amplification or reduction can be regarded as invalid. Therefore, the data cleaning can be performed through the threshold range of the detection data, and whether the detection data needs to be cleaned is determined by judging whether the detection data is in the threshold range. For example, if the detection data is not within the threshold range, the detection data is determined to be invalid, and the detection data is discarded; and if the detection data is within the threshold range, determining that the detection data is valid, and storing the detection data in the terminal equipment.
In some embodiments, the terminal device may also detect the consistency of the available data, or check whether the available data has missing values.
And S203, if the available data exist after the data are cleaned, the terminal equipment reports the available data to the server in the first reporting period.
In this step, after the terminal device cleans the detected data, if there is available data after the data is cleaned, the terminal device reports the available data to the server in the first reporting period.
For example, every t time as an upload cycle, the data cleansing program in the terminal device checks whether the latest detection data is discarded. And if the drop flag is detected to be False, uploading available data existing after cleaning to the server. The embodiment of the application does not limit how to upload the available data, and for example, the available data may be uploaded through a 3G network or a 4G network.
In the embodiment of the application, the uploaded available data is data after the initial cleaning of the terminal device, so that the detection data of massive heterogeneous sensors can be prevented from being directly uploaded to a server for processing, the load of a network is reduced, and meanwhile, the cost is also reduced.
And S204, in the first updating period, the server generates a first range of the detection data according to the available data reported by the terminal equipment in the first reporting period.
In some embodiments, the server generates a new threshold range of sensed data every other update period. The embodiment of the present application is not limited to how to generate the threshold range of the detection data, and an optional implementation manner may generate the threshold range of the detection data through a noise reduction self-coding long short-term memory (DAE-LSTM) model, and obtain the threshold range of the detection data output by the DAE-LSTM model by inputting available data to the DAE-LSTM model, where the DAE-LSTM model is trained and established from historical available data and a historical threshold range.
Fig. 3 is a schematic structural diagram of a DAE-LSTM model provided in an embodiment of the present application, and as shown in fig. 3, the input of the DAE-LSTM model is currently processed available data, the output is a predicted threshold range of the next time, and by setting, there may be 5% fluctuation.
The model consists of a full connection network (FNN) and a long-short term memory network (LSTM), and noise reduction self-encoding (DAE) is used for initializing parameters of the full connection layer. Will use available data XtFNN is input, and output data r of FNN is obtainedtThen r is further reducedtInput to LSTM, and finally output ht. The DAE-LSTM model is a time series based network, and therefore a time series should be input into the network.
The input format formula (1) of the model is given below, where NxNumber of input sequences:
Figure BDA0002379831040000081
with respect to the training process, the loss function of the LSTM portion is as in equation (2):
Figure BDA0002379831040000082
wherein, η2To adjust the coefficients.
In the DAE-LSTM model, Dense (x, f) represents a fully connected layer containing x neurons and using f as an activation function. The expression equation (3) for the linear rectification function (Relu) is as follows:
f(x)=max(x,0) (3)
equation (4) for the hyperbolic tangent function (Tanh) is as follows:
Figure BDA0002379831040000091
s205, the server sends the first range of the detection data to the terminal equipment.
In this step, if the first range of the detection data is generated, the server may directly send the first range of the detection data to the terminal device, or may send the first range of the detection data to the terminal device when a certain condition is satisfied.
In some optional embodiments, the server may receive, before sending the first update value of the threshold range of the detection data to the terminal device, the available data reported by the terminal device in a second reporting period, where the second reporting period is later than the first reporting period. And if the available data reported by the second reporting period is not in the first range, the server sends the first range to the terminal equipment.
In some embodiments, as a fourth-level cleaning mechanism, for a threshold range of potentially inappropriate detection data that is not updated for a long time, before sending the first range of detection data to the terminal device, an un-updated duration of the threshold range of detection data may be obtained, and if the un-updated duration exceeds a duration threshold, the server sends the first range of detection data to the terminal device.
In some embodiments, before sending the first range of detection data to the terminal device, the user may access the server to intervene whether to send the first range of detection data to the terminal device by setting a parameter.
According to the data cleaning method provided by the embodiment of the application, the terminal equipment acquires the detection data of the detection equipment in the first acquisition period, and performs data cleaning on the detection data according to the threshold range of the detection data, the data cleaning is used for discarding invalid values in the detection data, the threshold range of the detection data is indicated by the server, and if available data exist after the data cleaning, the terminal equipment reports the available data to the server in the first reporting period. In the first updating period, the server generates a first range of the detection data according to the available data reported by the terminal equipment in the first reporting period, and sends the first range of the detection data to the terminal equipment. By the mode, the terminal equipment can update the range of the detection data in real time according to the indication of the server, and the data cleaning effect is improved.
On the basis of the above embodiment, when the terminal device performs data cleaning on the detection data and discards an invalid value, all the detection data in the first acquisition period may be discarded, and at this time, the terminal device needs to predict the detection data in the first acquisition period to upload, and how to predict the detection data in the first acquisition period is described below. Fig. 4 is a signaling interaction diagram of another data cleansing method provided in an embodiment of the present application, and as shown in fig. 4, the method further includes:
s301, the terminal device obtains detection data of the detection device in a first acquisition period.
And S302, according to the threshold range of the detection data, the terminal equipment performs data cleaning on the detection data, wherein the data cleaning is used for discarding invalid values in the detection data, and the threshold range of the detection data is indicated by the server.
The technical terms, technical effects, technical features, and alternative embodiments of steps S301 to S302 can be understood with reference to steps S201 to S202 shown in fig. 2, and repeated contents will not be described herein.
And S303, if the available data does not exist after the data is cleaned, the terminal equipment performs linear interpolation on the historical available data to predict the available data of the first acquisition period.
In some embodiments, one possible straight-line approach is to compute a linear interpolation of the first two available data in the local database as the available data for the first acquisition cycle. Equation (5) of the linear interpolation is as follows:
Figure BDA0002379831040000101
wherein, tnFor the nth acquisition cycle, xnThe detection data acquired for the nth acquisition cycle.
S304, the terminal equipment reports the predicted available data to the server in the first reporting period.
According to the data cleaning method provided by the embodiment of the application, if no available data exists after the data is cleaned, the terminal equipment performs linear interpolation on the historical available data, predicts the available data in the first acquisition period, and reports the predicted available data to the server in the first reporting period. In this way, the integrity of the detection data can be improved.
How the terminal device updates the threshold range of the detection data is explained below. Fig. 5 is a signaling interaction diagram of another data cleansing method provided in an embodiment of the present application, and as shown in fig. 5, the method further includes:
s401, the terminal equipment acquires detection data of the detection equipment in a first acquisition period.
S402, according to the threshold range of the detection data, the terminal equipment performs data cleaning on the detection data, the data cleaning is used for discarding invalid values in the detection data, and the threshold range of the detection data is indicated by the server.
And S403, if available data exist after the data are cleaned, the terminal equipment reports the available data to the server in the first reporting period.
S404, the server generates a first range of the detection data according to the available data reported by the terminal equipment in the first reporting period in the first updating period.
S405, the server sends the first range of the detection data to the terminal equipment.
And S406, according to the first range, the terminal equipment updates the threshold range of the detection data.
According to the data cleaning method provided by the embodiment of the application, the terminal equipment acquires the detection data of the detection equipment in the first acquisition period, and performs data cleaning on the detection data according to the threshold range of the detection data, the data cleaning is used for discarding invalid values in the detection data, the threshold range of the detection data is indicated by the server, and if available data exist after the data cleaning, the terminal equipment reports the available data to the server in the first reporting period. In the first updating period, the server generates a first range of the detection data according to the available data reported by the terminal equipment in the first reporting period, and sends the first range of the detection data to the terminal equipment. By the mode, the terminal equipment can update the range of the detection data in real time according to the indication of the server, and the data cleaning effect is improved.
Fig. 6 is a schematic diagram of a network architecture of a data cleansing method according to an embodiment of the present application, as shown in fig. 6, a server in the foregoing embodiment may be an intelligent decision server, and is configured to generate a threshold range of detection data according to available data, store the available data in a database, and when uploading the available data, a terminal device does not directly send information to the intelligent decision server, but performs intermediate forwarding through a Message Queue Telemetry Transport (MQTT) server. In addition, when a user needs to interact with the intelligent decision server, the user also needs to forward the information through a website (web) server.
Fig. 7 is a schematic diagram of a four-level cleansing mechanism for data according to an embodiment of the present application, as shown in fig. 7, the first-level cleansing mechanism is a terminal device at a time interval T1The acquisition period of the monitoring system is used for acquiring detection data, cleaning the detection data, storing the data which pass the cleaning in a local memory, and discarding the data which do not pass the cleaning.
Second level cleaning mechanism, terminal equipment at time interval T2The reporting period of (2) uploads the available data reserved after cleaning.
Third level cleaning mechanism, terminal equipment at time interval T3The latest available data is thresholded to generate a model and a first range of current sensed data output by the threshold generation model is obtained. And then acquiring a latest available data, if the available data is in the current first range, not sending the first range to the terminal equipment to update the threshold range, and if the available data is not in the current first range, sending the first range to the terminal equipment to update the threshold range.
Fourth level cleaning mechanism, terminal equipment at time interval T4And in the forced updating period, when the time length of the non-updating period exceeds the time threshold, the threshold range is forcibly updated.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Fig. 8 is a schematic structural diagram of a data cleaning apparatus according to an embodiment of the present application. The data transmission device can be realized by software, hardware or a combination of the two, and can be the terminal equipment.
As shown in fig. 8, the data cleansing apparatus 500 includes: a receiving module 501, a processing module 502 and a sending module 503;
the receiving module 501 is configured to obtain detection data of the detection device in a first acquisition period.
The processing module 502 is configured to perform data cleansing on the detection data according to a threshold range of the detection data, where the data cleansing is configured to discard invalid values in the detection data, and the threshold range of the detection data is indicated by the server.
A sending module 503, configured to report the available data to the server in a first reporting period if the available data exists after the data is cleaned.
In an optional embodiment, the processing module 502 is further configured to perform linear interpolation on the historical available data to predict the available data of the first acquisition cycle if there is no available data after the data washing; the sending module 503 is further configured to report the predicted available data to the server in the first reporting period.
In an optional implementation manner, the receiving module 501 is further configured to receive a first range of the detection data sent by the server, where the first range is determined according to the available data of the first acquisition cycle; the processing module 502 is further configured to update the threshold range of the detection data according to the first range.
The data cleaning device provided by the embodiment of the application can execute the data cleaning method on the terminal equipment side in the method embodiment, the implementation principle and the technical effect are similar, and the detailed description is omitted.
Fig. 9 is a schematic structural diagram of another data washing apparatus according to an embodiment of the present application. The data transmission device may be implemented by software, hardware or a combination of both, and may be the aforementioned server.
As shown in fig. 9, the data cleansing apparatus 600 includes: a receiving module 601, a processing module 602 and a sending module 603;
the receiving module 601 is configured to receive available data reported by the terminal device in the first reporting period, where the available data is obtained by the terminal device after performing data cleaning on the detection data of the detection device in the first acquisition period according to a threshold range of the detection data, the data cleaning is used to discard an invalid value in the detection data, and the threshold range of the detection data is indicated by the server.
The processing module 602 is configured to generate a first range of the detection data according to the available data reported by the terminal device in the first reporting period in the first updating period.
A sending module 603, configured to send the first range of the detection data to the terminal device.
In an optional implementation manner, the receiving module 601 is further configured to receive available data that is reported by the terminal device in a second reporting period, where the second reporting period is later than the first reporting period; the sending module 603 is further configured to send the first range to the terminal device if the available data reported in the second reporting period is not within the first range.
In an optional implementation manner, the receiving module 601 is further configured to obtain an un-updated duration of the threshold range of the detection data; the sending module 603 is further configured to send the first range of the detection data to the terminal device if the time length that is not updated exceeds the time length threshold.
The data cleaning device provided by the embodiment of the application can execute the data cleaning method on the server side in the embodiment of the method, and the implementation principle and the technical effect are similar, and are not repeated herein.
Fig. 10 is a schematic structural diagram of another data cleaning apparatus according to an embodiment of the present application. As shown in fig. 10, the data transmission apparatus may include: at least one processor 701 and a memory 702. Fig. 10 shows an electronic device as an example of a processor.
And a memory 702 for storing programs. In particular, the program may include program code including computer operating instructions.
The memory 702 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The processor 701 is configured to execute a computer execution instruction stored in the memory 702 to implement the data cleaning method on the terminal device side;
or, the processor 701 is configured to execute computer-executable instructions stored in the memory 702 to implement the data cleansing method on the server side;
the processor 701 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement the embodiments of the present Application.
Optionally, in a specific implementation, if the communication interface, the memory 702 and the processor 701 are implemented independently, the communication interface, the memory 702 and the processor 701 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. Buses may be classified as address buses, data buses, control buses, etc., but do not represent only one bus or type of bus.
Alternatively, in a specific implementation, if the communication interface, the memory 702 and the processor 701 are integrated into a chip, the communication interface, the memory 702 and the processor 701 may complete communication through an internal interface.
The present application also provides a program product comprising execution instructions stored in a readable storage medium. The at least one processor of the data cleansing apparatus may read the execution instructions from the readable storage medium, and the execution of the execution instructions by the at least one processor causes the data cleansing apparatus to implement the data cleansing method provided by the various embodiments described above.
The present invention also provides a computer-readable storage medium, which may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and in particular, the computer-readable storage medium stores program instructions, and the program instructions are used in the method in the foregoing embodiments.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A data cleaning method is applied to terminal equipment and is characterized by comprising the following steps:
acquiring detection data of detection equipment in a first acquisition period;
performing data cleaning on the detection data according to the threshold range of the detection data, wherein the data cleaning is used for discarding invalid values in the detection data, and the threshold range of the detection data is indicated by a server;
if available data exists after data cleaning, reporting the available data to a server in a first reporting period so that the server generates a first range of the detection data according to the available data, wherein the first range is used for updating a threshold value of a threshold value range of the detection data.
2. The method of claim 1, further comprising, after the data cleansing of the detection data:
if the available data does not exist after the data is cleaned, performing linear interpolation on the historical available data to predict the available data of the first acquisition period;
and reporting the predicted available data to a server in the first reporting period.
3. The method of claim 2, wherein after the reporting the available data to the server in the first reporting period, further comprising:
receiving a first range of the detection data sent by the server, wherein the first range is determined according to available data of the first acquisition period;
and updating the threshold range of the detection data according to the first range.
4. A data cleaning method is applied to a server, and is characterized by comprising the following steps:
receiving available data reported by a terminal device in a first reporting period, wherein the available data is obtained by the terminal device after the terminal device performs data cleaning on detection data of the detection device in a first acquisition period according to a threshold range of the detection data, the data cleaning is used for discarding invalid values in the detection data, and the threshold range of the detection data is indicated by a server;
in a first updating period, generating a first range of the detection data according to the available data reported by the terminal equipment in the first reporting period;
and sending the first range of the detection data to the terminal equipment.
5. The method of claim 4, wherein before the sending the first updated value of the threshold range of detection data to the terminal device, further comprising:
receiving available data reported by the terminal equipment in a second reporting period, wherein the second reporting period is later than the first reporting period;
the sending the first range of the detection data to the terminal device includes:
and if the available data reported by the second reporting period is not in the first range, sending the first range to the terminal equipment.
6. The method of claim 4, wherein prior to said sending the first range of detection data to the terminal device, further comprising:
acquiring the non-updated duration of the threshold range of the detection data;
the sending the first range of the detection data to the terminal device includes:
and if the non-updated duration exceeds a duration threshold, sending the first range of the detection data to the terminal equipment.
7. A data cleansing apparatus, said apparatus comprising: the device comprises a receiving module, a processing module and a sending module;
the receiving module is used for acquiring detection data of the detection equipment in a first acquisition period;
the processing module is used for performing data cleaning on the detection data according to the threshold range of the detection data, the data cleaning is used for discarding invalid values in the detection data, and the threshold range of the detection data is indicated by a server;
the sending module is used for reporting the available data to a server in a first reporting period if the available data exists after the data is cleaned.
8. A data cleansing apparatus, said apparatus comprising: the device comprises a receiving module, a processing module and a sending module;
the receiving module is used for receiving available data reported by the terminal equipment in a first reporting period, wherein the available data is obtained by the terminal equipment after data cleaning is carried out on detection data of the detection equipment in a first acquisition period according to a threshold range of the detection data, the data cleaning is used for discarding invalid values in the detection data, and the threshold range of the detection data is indicated by the server;
the processing module is configured to generate a first range of the detection data according to the available data reported by the terminal device in the first reporting period in a first updating period;
and the sending module is used for sending the first range of the detection data to the terminal equipment.
9. A storage medium having a computer program stored thereon, comprising: the program when executed by a processor implementing the method of any one of claims 1 to 3.
10. A storage medium having a computer program stored thereon, comprising: the program, when executed by a processor, implements the method of any of claims 4-6.
CN202010079670.2A 2020-02-04 2020-02-04 Data cleaning method and device Pending CN111324600A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010079670.2A CN111324600A (en) 2020-02-04 2020-02-04 Data cleaning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010079670.2A CN111324600A (en) 2020-02-04 2020-02-04 Data cleaning method and device

Publications (1)

Publication Number Publication Date
CN111324600A true CN111324600A (en) 2020-06-23

Family

ID=71165156

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010079670.2A Pending CN111324600A (en) 2020-02-04 2020-02-04 Data cleaning method and device

Country Status (1)

Country Link
CN (1) CN111324600A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114490596A (en) * 2021-12-08 2022-05-13 大唐水电科学技术研究院有限公司 Method for cleaning transformer oil chromatographic data based on machine learning and neural network

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106933992A (en) * 2017-02-24 2017-07-07 北京华安普惠高新技术有限公司 Distributed data purging system and method based on data analysis
US20170359194A1 (en) * 2014-12-19 2017-12-14 Orange Method for transmitting data from a sensor
CN108347360A (en) * 2018-02-07 2018-07-31 网宿科技股份有限公司 The acquisition method and network server of network testing data
CN108882223A (en) * 2018-05-30 2018-11-23 努比亚技术有限公司 Using data reporting method, mobile terminal and computer readable storage medium
CN109756761A (en) * 2019-01-16 2019-05-14 四川长虹电器股份有限公司 Behavior big data based on smart television obtains system and method
CN109889604A (en) * 2019-03-19 2019-06-14 深圳市中电数通智慧安全科技股份有限公司 A kind of internet-of-things terminal parameter management method, device and server
CN109978079A (en) * 2019-04-10 2019-07-05 东北电力大学 A kind of data cleaning method of improved storehouse noise reduction self-encoding encoder
CN110290466A (en) * 2019-06-14 2019-09-27 中国移动通信集团黑龙江有限公司 Floor method of discrimination, device, equipment and computer storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170359194A1 (en) * 2014-12-19 2017-12-14 Orange Method for transmitting data from a sensor
CN106933992A (en) * 2017-02-24 2017-07-07 北京华安普惠高新技术有限公司 Distributed data purging system and method based on data analysis
CN108347360A (en) * 2018-02-07 2018-07-31 网宿科技股份有限公司 The acquisition method and network server of network testing data
CN108882223A (en) * 2018-05-30 2018-11-23 努比亚技术有限公司 Using data reporting method, mobile terminal and computer readable storage medium
CN109756761A (en) * 2019-01-16 2019-05-14 四川长虹电器股份有限公司 Behavior big data based on smart television obtains system and method
CN109889604A (en) * 2019-03-19 2019-06-14 深圳市中电数通智慧安全科技股份有限公司 A kind of internet-of-things terminal parameter management method, device and server
CN109978079A (en) * 2019-04-10 2019-07-05 东北电力大学 A kind of data cleaning method of improved storehouse noise reduction self-encoding encoder
CN110290466A (en) * 2019-06-14 2019-09-27 中国移动通信集团黑龙江有限公司 Floor method of discrimination, device, equipment and computer storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
曾凯,高亮,王新颖: "《大数据治理及数据仓库模型设计》", 31 July 2017, 电子科技大学出版社 *
符裕红: "《SPSS生物统计实例详解》", 30 November 2018 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114490596A (en) * 2021-12-08 2022-05-13 大唐水电科学技术研究院有限公司 Method for cleaning transformer oil chromatographic data based on machine learning and neural network
CN114490596B (en) * 2021-12-08 2024-05-10 大唐水电科学技术研究院有限公司 Method for cleaning transformer oil chromatographic data based on machine learning and neural network

Similar Documents

Publication Publication Date Title
CN109246210B (en) Internet of things communication method and device
CN104584483A (en) Method and apparatus for automatically determining causes of service quality degradation
CN112671585B (en) Exception handling method and device of intelligent household equipment, processor and electronic equipment
CN111738463A (en) Operation and maintenance method, device, system, electronic equipment and storage medium
CN114021784A (en) Method and device for determining residual service life of equipment and electronic equipment
CN108154230A (en) The monitoring method and monitoring device of deep learning processor
CN112697267A (en) Abnormal vibration detection device for industrial equipment
CN111324600A (en) Data cleaning method and device
CN113259213B (en) Intelligent home information monitoring method based on edge computing intelligent gateway
CN110851333A (en) Monitoring method and device of root partition and monitoring server
CN113766357A (en) Method and system for data acquisition and transmission and data processing
CN108362957B (en) Equipment fault diagnosis method and device, storage medium and electronic equipment
KR20210097613A (en) Power consumption prediction system for optimization of demand management resources
CN112205939A (en) Control method of cleaning robot, cleaning robot and Internet of things system
CN114157486B (en) Communication flow data abnormity detection method and device, electronic equipment and storage medium
CN115904719A (en) Data acquisition method and device, electronic equipment and storage medium
EP4325377A1 (en) Data processing device, data analyzing device, data processing system and method for processing data
JP2013011987A (en) Abnormal state detection device and abnormal state detection method
CN111343255B (en) Client, intelligent robot and intelligent robot system
CN114091238A (en) Equipment life prediction method and device, electronic equipment and storage medium
CN105988835B (en) Software upgrading method and terminal
CN114742143A (en) Safe training model construction method, device and system based on federal learning
CN113204411A (en) Data processing method, intermediate processing equipment and storage medium
JP7167714B2 (en) Abnormality determination device, abnormality determination method, and abnormality determination program
CN118152966B (en) Method, device, equipment and storage medium for processing vehicle information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200623

RJ01 Rejection of invention patent application after publication