WO2022107931A1

WO2022107931A1 - Data storage method for preventing data redundancy, and data platform using same

Info

Publication number: WO2022107931A1
Application number: PCT/KR2020/016483
Authority: WO
Inventors: 김성윤; 최성찬; 정승명
Original assignee: 한국전자기술연구원
Priority date: 2020-11-20
Filing date: 2020-11-20
Publication date: 2022-05-27
Also published as: KR102460910B1; KR20220069413A

Abstract

Provided are a data storage method for preventing data redundancy, and a data platform using same. The data storage method according to an embodiment of the present invention comprises: checking whether redundant data, which has a time field containing the same time as a time field of received data, is in a history table; determining whether the reception time of the received data is more recent than that of the redundant data if the redundant data is present in the history table; and updating the data stored in the history table to the received data if the data reception time of the received data is determined to be more recent. Accordingly, unnecessary use of data storage, in which data is stored, and resources required for data processing/analysis can be avoided in the data platform by preventing data history from being stored redundantly when storing data.

Description

Data storage method for data duplication prevention and data platform to which it is applied

The present invention relates to data processing technology, and more particularly, to a data storage method for preventing redundant processing of data histories, and a data platform to which the same is applied.

The data platform is a data storage server that collects and stores data generated from sensors through IoT/M2M technology. The data collected on the data platform is used to monitor and control the system, and is also used to obtain meaningful information through big data analysis.

A large amount of data is inevitably stored in a data platform. However, not all data are as significant as the accumulated amount. This is especially true when the stored data is duplicated.

Duplicate data not only wastes storage space, but also causes unnecessary resource usage in processing/analysis.

The present invention has been devised to solve the above problems, and an object of the present invention is to eliminate unnecessary use of resources required for data processing/analysis and storage for storing data in a data platform, and data history An object of the present invention is to provide a data storage method for preventing duplicated processing and a data platform to which the same is applied.

According to an embodiment of the present invention for achieving the above object, a data storage method includes the steps of: receiving data; a first checking step of checking whether duplicate data having a time field containing the same time as the time field of the received data exists in the history table; If there is duplicate data in the history table, a second checking step of checking whether the data reception time is newer than the duplicate data; and updating the data stored in the history table with the received data if it is the latest as a result of the check in the second checking step.

In the first confirmation step, a plurality of time fields included in the received data may be compared with a plurality of time fields included in the data stored in the history table.

In the first confirmation step, only a time field set by a user among a plurality of time fields may be compared.

In addition, the data storage method according to an embodiment of the present invention may further include: if there is no duplicate data in the history table, storing the received data in the history table.

The data storage method according to an embodiment of the present invention further includes the step of checking whether a duplicate prevention flag is set, and the first checking step may be performed when the duplicate prevention flag is set.

The data storage method according to an embodiment of the present invention may further include, if the duplicate prevention flag is not set, storing the received data in a history table.

The data storage method according to an embodiment of the present invention may further include: if the reception time of data is newer than the reception time of data stored in the latest table, updating the data stored in the latest table with the received data.

On the other hand, according to another embodiment of the present invention, a data platform, a communication unit for receiving data; a storage unit storing a history table; and check whether there is duplicate data in the history table that has a time field that contains the same time as the time field of the received data. and a processor that updates the data stored in the backside history table with the received data.

As described above, according to the embodiments of the present invention, unnecessary use of resources required for data processing/analysis and storage for storing data in the data platform can be excluded by preventing redundant data histories from being stored during data storage. be able to

1 is a diagram provided for the description of a data platform to which the present invention is applicable;

2 is a flowchart provided to explain a data storage method according to an embodiment of the present invention;

3 to 6 are diagrams provided for the explanation of the redundant data determination process;

7 is a diagram provided for the description of a data storage system to which the present invention is applicable, and

8 is a block diagram illustrating a hardware structure of a data platform.

Hereinafter, the present invention will be described in more detail with reference to the drawings.

1 is a diagram provided for explanation of a data platform to which the present invention is applicable. The illustrated data platform 100 is a server/system in which data is stored, and stores/manages data by placing the latest table 110 and the history table 120 for each ID.

The latest table 110 is a table in which the latest data is stored. In the latest table 110, only one piece of data having the latest reception time is stored.

The history table 120 stores all of the received data. That is, the latest data as well as past data are stored in the history table 120 . The range of data stored in the history table 120 may be limited.

For example, it is possible to prevent too much data from accumulating in the history table 120 by designating the beginning and end of the reception time or setting an expiration time.

Meanwhile, since data may be repeatedly stored in the history table 120 , the data platform 100 checks whether data is duplicated when storing data in the history table 120 , and prevents this when determining that the data is duplicated.

A data duplication prevention process by the data platform 100 shown in FIG. 1 will be described in detail below with reference to FIG. 2 . 2 is a flowchart provided to explain a data storage method according to an embodiment of the present invention.

As shown, when data is received (S210), the data platform 100 compares the received data with the reception time of the data stored in the latest table 110 (S220).

When it is confirmed that the reception time of the received data is more recent than the reception time of the data stored in the latest table 110 (S220-Y), the data platform 100 transmits the data stored in the latest table 110 to the data received in step S210 to update (S230).

If it is confirmed that the reception time of the received data is not more recent than the reception time of the data stored in the latest table 110, that is, if both are the same or it is confirmed that the reception time of the data stored in the latest table 110 is the latest ( S220-N), data update (S230) in the latest table 110 is not performed.

Next, it is checked whether a data duplication prevention flag is set for the received data (S240). The duplicate prevention flag is a flag indicating whether to allow redundant storage of data in the history table 120 .

Therefore, if the duplicate prevention flag is not set, redundant storage of data is allowed in the history table 120 , so the data platform 100 stores the data received in step S210 in the history table 120 ( S280 ).

On the other hand, if the duplicate prevention flag is set, since redundant storage of data in the history table 120 is not allowed, the data platform 100 stores the data duplicated with the data received in step S210 in the history table 120, Check whether there is (S250).

In step S250, the data redundancy check is performed by comparing the time information included in the time field included in the data.

For example, as shown in FIG. 3 , time information "2018-11-15T20:10:00,000+09:00" included in "observedAt" of "waterLevel", which is a time field included in the received data (left), and When the time information "2018-11-15T20:10:00,000+09:00" included in "observedAt" of the time field "waterLevel" included in the history data (right) stored in the history table 120 is the same, It is judged that there is duplicate data.

Although only one history data is illustrated in FIG. 3 , it should be noted that there are actually a plurality of history data, and only one is illustrated for convenience of understanding and explanation, which is also the same hereafter.

On the other hand, as shown in FIG. 4 , the time information “2018-11-15T20:10:01,000+09:00” and the history table included in “observedAt” of the time field “waterLevel” included in the received data (left) and the history table When the time information "2018-11-15T20:10:00,000+09:00" included in "observedAt" of the time field "waterLevel" included in the history data (right) stored in 120 is not the same, It is judged that there is no duplicate data.

In the above example, data redundancy check is based on one time field, but it may be based on multiple time fields.

For example, as shown in FIG. 5 , the time information "2018-11-15T20: 10:00,000+09:00" and "2018-11-15T20:10:00,000+09:00" of "waterLevel", which are time fields included in the history data (right) stored in the history table 120 Same as "2018-11-15T20:10:00,000+09:00" and "2018-11-15T20:10:00,000+09:00", which are time information recorded in "observedAt" of "observedAt" and "pondage", respectively In this case, it may be determined that both overlap.

And, as shown in FIG. 6 , if time information is different from even one of the time fields included in the received data (left side), it is determined that both do not overlap.

Meanwhile, a time field, which is a criterion for determining whether data is duplicated, can be set. For example, if it is set to check whether there is overlap only with respect to "observedAt" of the time field "waterLevel", both the case shown in FIG. 5 and the case shown in FIG. 6 are determined to be duplicates.

If it is determined that the data is duplicated in step S250 (S250-Y), data is stored in the history table 120 based on the data reception time.

Specifically, when it is confirmed that the reception time of the data received in step S210 is more recent than the reception time of the data stored in the history table 120 (S260-Y), the data platform 100 stores the data stored in the history table 110 It is updated with the data received in step S210 (S270).

For example, in the data overlapping situation as shown in FIG. 3 , "modifiedAt":"2018-11-15T20:10:01,000+09:00", which is the reception time of the received data, is stored in the history table 120 and Since it is newer than "modifiedAt":"2018-11-15T20:10:00,000+09:00", which is the reception time of the existing data, the data stored in the history table 120 is updated with the received data.

On the other hand, when it is confirmed that the reception time of the received data is not newer than the reception time of the data stored in the history table 120, that is, both are the same or when it is confirmed that the reception time of the data stored in the history table 120 is the latest ( S260-N), the data platform 100 does not store the data received in step S210 in the history table 120 .

7 is a diagram provided for explanation of a data storage system to which the present invention is applicable. In the illustrated data storage system, in addition to the data platform 100 , a user terminal 310 and an administrator terminal 320 are further illustrated.

The user terminal 310 is a terminal that requests to create/require/update/delete data stored in the data platform 100 .

The manager terminal 320 is a terminal for managing data stored in the data platform 100, and is a terminal for setting the aforementioned duplication prevention flag and setting a time field for determining whether data is duplicated.

On the other hand, when the duplicate prevention flag is first set by the manager terminal 320 and the time field for determining whether data is duplicated is set, the data platform 100 provides duplicate data for data previously stored in the history table 120 . A procedure of aggregating the data, that is, a procedure of deleting only one of the duplicate data, must be performed.

Depending on the implementation, it is also possible for the user terminal 310 to include the function of the manager terminal 320 .

8 is a block diagram illustrating a hardware structure of the data platform 100 . As shown, the data platform 100 is a server system including a communication unit 101 , a processor 102 , and a storage unit 103 .

The communication unit 101 is a communication interface means for communicating with the user terminal 310 and the manager terminal 320 and accessing an external network. The communication unit 101 receives data to be stored.

The processor 102 includes at least one Application Entity (AE) and a Common Service Entity (CSE). Processes necessary for data storage/management of the processor 102 are performed.

The storage unit 103 is a storage space in which the latest table 110 and the history table 120 described above are built.

On the other hand, it goes without saying that the technical idea of the present invention can be applied to a computer-readable recording medium containing a computer program for performing the functions of the apparatus and method according to the present embodiment. In addition, the technical ideas according to various embodiments of the present invention may be implemented in the form of computer-readable codes recorded on a computer-readable recording medium. The computer-readable recording medium may be any data storage device readable by the computer and capable of storing data. For example, the computer-readable recording medium may be a ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical disk, hard disk drive, or the like. In addition, the computer-readable code or program stored in the computer-readable recording medium may be transmitted through a network connected between computers.

In addition, although preferred embodiments of the present invention have been illustrated and described above, the present invention is not limited to the specific embodiments described above, and the technical field to which the present invention belongs without departing from the gist of the present invention as claimed in the claims In addition, various modifications are possible by those of ordinary skill in the art, and these modifications should not be individually understood from the technical spirit or perspective of the present invention.

Claims

receiving data;

a first checking step of checking whether duplicate data having a time field containing the same time as the time field of the received data exists in the history table;

If there is duplicate data in the history table, a second checking step of checking whether the data reception time is newer than the duplicate data;

and updating the data stored in the history table with the received data if it is the latest as a result of the check in the second checking step.
The method according to claim 1,

The first confirmation step is,

A method for storing data, comprising comparing a plurality of time fields of the received data with a plurality of time fields of data stored in a history table.
3. The method according to claim 2,

The first confirmation step is,

A data storage method characterized in that only a time field set by a user is compared among a plurality of time fields.
The method according to claim 1,

If there is no duplicate data in the history table, storing the received data in the history table; data storage method further comprising a.
The method according to claim 1,

Checking whether the anti-duplication flag is set; further comprising;

The first confirmation step is,

A data storage method, characterized in that it is performed when the anti-duplication flag is set.
The method according to claim 1,

If the duplicate prevention flag is not set, storing the received data in a history table; data storage method further comprising a.
The method according to claim 1,

If the data reception time is newer than the reception time of the data stored in the latest table, updating the data stored in the latest table with the received data;
a communication unit for receiving data;

a storage unit in which a history table is stored; and

Check if there is duplicate data in the history table that has a time field with the same time as the time field of the received data. If there is duplicate data in the history table, check whether the data received time is newer than the duplicate data. A data platform comprising: a processor that updates the data stored in the history table with the received data.