KR102033383B1

KR102033383B1 - Method and system for managing data geographically distributed

Info

Publication number: KR102033383B1
Application number: KR1020160019221A
Authority: KR
Inventors: 차우; 원희선
Original assignee: 한국전자통신연구원
Priority date: 2016-02-18
Filing date: 2016-02-18
Publication date: 2019-10-17
Also published as: KR20170097448A

Abstract

According to an aspect of the present invention, a data management method includes: requesting a user to upload data to a management unit of a local data center; Confirming, by the management unit, the write permission of the user and the capacity of the node of the local data center; Uploading the data to the node if the user has write permission and the capacity of the node is greater than the capacity of the data as a result of the checking; Updating, by the management unit of the regional data center, the data map and the metadata database of the regional data center according to the uploaded result; Transmitting, by a management unit of the regional data center, a data map and metadata update information of the regional data center to a management unit of a central data center; And the management unit of the central data center updates the update information to the data map and metadata database of the central data center and transmits the updated metadata database to the management unit of all local data centers connected to the central data center to maintain synchronization. Characterized in that it comprises a step.

Description

METHOD AND SYSTEM FOR MANAGING DATA GEOGRAPHICALLY DISTRIBUTED}

The present invention relates to a data management method, and more particularly, to a method and system for efficiently managing big data in a geographically dispersed data environment.

Recently, as interest in big data has increased, many improvements have been made in the big data field by experts in various fields such as theorists, system builders, scientists, and application developers.

As the amount of data to be processed grows exponentially and the demand for data is diversified, more and more data centers are being deployed in various regions.

Since these data centers have different purposes, structures, and software specifications, the connections between the data centers are not organized, resulting in the inability to fully utilize the data in the data centers, especially in sharing or accessing data. If there is a problem that the efficiency falls.

This problem can be even worse if you need to use all the data that is distributed across data centers.

In order to solve these problems, the prior art employs a middle server to approach users' requests and coordinate their work.

However, there are limitations in using the Mediation Server because it has the following limitations in processing a large amount of data.

First, as a problem of scalability of a system, if a large amount of user requests arrive at the same time, it may not be able to handle the entire system.

Second, there is a problem that it is difficult to provide a data view for close management and integrated management between data centers.

Finally, there is a limit to the inability to provide complex analysis that connects all of the data centers.

SUMMARY OF THE INVENTION The present invention has been made in the technical background as described above, and an object thereof is to provide a system and method for enabling processing of data processing, storage, and access of various users in a geographically dispersed data center. do.

The object of the present invention is not limited to the above-mentioned object, and other objects that are not mentioned will be clearly understood by those skilled in the art from the following description.

Data management method according to the first aspect of the present invention for achieving the above object, the user requesting the data upload to the management unit of the local data center; Confirming, by the management unit, the write permission of the user and the capacity of the node of the local data center; Uploading the data to the node if the user has write permission and the capacity of the node is greater than the capacity of the data as a result of the checking; Updating, by the management unit of the regional data center, the data map and the metadata database of the regional data center according to the uploaded result; Transmitting, by a management unit of the regional data center, a data map and metadata update information of the regional data center to a management unit of a central data center; And the management unit of the central data center updates the update information to the data map and metadata database of the central data center and transmits the updated metadata database to the management unit of all local data centers connected to the central data center to maintain synchronization. Steps.
In addition, a data management system in a distributed data environment according to a second aspect of the present invention includes a local data map in which the data and node information are stored, a slave manager for processing a user's request and managing data stored in the local data map. And a central data map comprising a plurality of slave data centers including a metadata database storing information for determining whether a node or a user has access to the data, and a plurality of regional data maps, and the slave manager. A master manager for processing the request of the controller and managing metadata of the slave data center based on the central data map, and information for determining whether the local data center, node, or user has access to the data. Containing stored metadata Sum and a data center.
The data stored in the local data map may include at least one of an ID, a location of a slave data center, a location of a node, a format of a node, and size of a node.
The user's request is a user's data upload request, and as the slave manager receives a data upload request from the user, confirms the user's write permission and the capacity of the node of the local data center, based on the result of the confirmation. To upload the data to the node.
The slave manager may upload the data to the node if the user has a write right and the capacity of the node is greater than the capacity of the data.
According to the uploaded result, the slave manager updates the regional data map and the metadata database, and transmits update information of the local data map and the metadata database to the master manager, and the master manager sends the update information to the central data. The updated metadata database may be maintained by updating the map and the metadata database and transmitting the updated metadata database to the slave manager of all slave data centers connected to the integrated data center.
The user's request is a user's data access request, and as the slave manager receives a data access request from the user, confirms whether the user has access authority by using access information stored in the metadata database, and confirms If there is a result access authority, the presence or absence of data requested by the user may be checked based on the local data map, and the slave manager may transmit the location information of the data to the user based on the confirmation result.
The slave manager requests the master manager to provide the location information of the data requested by the user when the confirmation result data does not exist, and the master manager retrieves the location information from the central data map. The terminal may receive the searched location information and transmit location information of the searched data to the user.
The user's request is a user's data processing request, and as the slave manager receives a data processing request from the user, access information stored in the metadata database and whether or not the capacity of the node of the local database is exceeded is determined. Transmits the data processing request to the master management unit, and the master management unit searches for a node corresponding to the data processing request and allocates a job to process data, so that the slave manager transmits the data processing result to a user. Can be.
The user's request is a data copy request of the user, and the slave manager receives the data copy request from the user, based on whether access information stored in the metadata database and whether or not the capacity of the node of the local database is exceeded Transmits the data copy request to the master management unit, and the master management unit checks the location of the source data and available resources in the central data map so that the slave data center of the source role and the slave of the destination role When the data center is determined, a data copy request is transmitted to the determined slave data center, and the slave manager of the determined slave data center updates the local data map and the metadata database as the copy of the data is completed, and masks the updated information. It can be transferred to the administration.
The master manager may update the central data map and the metadata database in response to receiving the updated information, and transmit the updated metadata database to the slave manager of all slave data centers connected to the integrated data center to maintain synchronization. .

delete

According to the present invention, it is possible to provide a system and method for integrated management of big data by centrally managing locally distributed data centers and providing a centralized data center that is scalable to various requirements of users. It has an effect.

1 is a structural diagram of an entire system according to an embodiment of the present invention.
2 is a flowchart of a data upload method according to an embodiment of the present invention;
3 is a flowchart of a data download method according to an embodiment of the present invention;
4 is a flowchart of a data processing method according to an embodiment of the present invention.
5 is a flowchart of a data copying method according to an embodiment of the present invention.
6 is a structural diagram of a computer system in which a data management method is executed according to another embodiment of the present invention.

Advantages and features of the present invention and methods for achieving them will be apparent with reference to the embodiments described below in detail with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be implemented in various forms, and only the present embodiments are intended to complete the disclosure of the present invention, and the general knowledge in the art to which the present invention pertains. It is provided to fully convey the scope of the invention to those skilled in the art, and the present invention is defined only by the scope of the claims. Meanwhile, the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. In this specification, the singular also includes the plural unless specifically stated otherwise in the phrase. As used herein, “comprises” and / or “comprising” refers to a component, step, operation and / or device that is present in one or more other components, steps, operations and / or elements. Or does not exclude additions.

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. 1 shows the overall structure of a system according to the invention.

The system consists of an Integrated Data Center (100) and a Slave Data Center (110-130) in each region.

There may be a plurality of devices in the slave data center 110, but there is one slave manager 114. The slave manager 114 processes user requests and manages information in the data center, which is managed based on a local data map in which data and node information are stored.

The data includes the identity (ID), the location of the local data center, the location of the node and other information (format, size, etc.).

Nodes refer to devices in a data center and include information such as ID, data center location, IP address, status, resources (CPU, RAM, etc.).

The master manager 104 processes the requests of the slave managers 110 to 130 and manages metadata of all slave data centers based on the central data map 102. The central data map 102 consists of a combination of regional data maps 112 of all regional data centers 110-130.

The master manager 104 and the slave manager 114 both use a meta database (Meta Database Base 106, 116). The meta database 106, 116 uses the data of a data center, node, or user or user group. Determine if you have access to In addition, information such as user and user group information and data usage for resource management is also stored in the meta DBs 106 and 116.

The connection between the central data center 100 and the regional data centers 110 to 130 is activated when a user processes data such as reading, writing, and copying data.

2 is a flowchart illustrating a method for writing data by a user according to the present invention.

The user first transmits a data upload request to the slave manager 114 (S210).

The user transmits a user ID, a data ID, and an access list to the slave manager 114 to upload the request, and the slave manager 114 that receives the request indicates whether the user has authority or has not exceeded its capacity. In order to confirm, a request (Query) for checking access information in the meta DB 116 is made (S220).

If there is a problem of authority or capacity, the user repeats step S210 of requesting the slave manager 114 again. If there is no problem of authority or capacity, the slave manager uploads data to a node in the local data center 110. Allowed to (S230).

After uploading the data of the user, the slave manager 114 updates the information of the local data map 112 and the data access information of the meta DB 116 accordingly (S240).

After the update is completed, the slave manager 114 transmits the information of the local data # 112 and the meta DB 116 including the node information of the local data center 110 to the master manager 104 (S250).

The master manager 104 updates the central data map 102 and the meta DB 106 in the central data center 100, and sends the meta DB of the central data center 100 to all other slave data centers 120 and 130. Request to perform synchronization with the 106 (S260). This is to ensure that the metaDBs of all data centers are kept the same through synchronization.

3 is a flowchart illustrating a method for reading data by a user according to the present invention.

The user transmits a data access request including the user ID and the data ID to the slave manager 114 to access the data (S310).

Upon receiving the request, the slave manager 114 checks whether the user has access authority by using the access information stored in the meta DB 116 (S320).

If the user does not have access rights, the user repeats the step of transmitting an access request again (S310), and if there is access authority, the slave management unit 114 accesses the corresponding data center 110 through the corresponding local data map 112. Check whether there is data requested (S330).

If the data exists in the data center 110, the slave manager 114 informs the user of the location of the data (S360), and the user can read the data by accessing the data with reference to the date location (S370).

On the other hand, if the data requested by the user does not exist in the corresponding data center 110, the slave manager 114 requests the data location from the master manager 104 (S340), and the master manager 104 controls the central data map 102. The searcher finds the location information, and transmits the result to the slave manager 114 (S350).

When receiving the location information including the data center ID and the node ID where the data is located, the user can access the desired data using the location information (S370).

4 is a flowchart illustrating a data processing method according to the present invention.

The user transmits a data processing request including a user ID, a job file (Job file), an input data ID, an output data ID, and the like to the slave manager 114 of the local data center 110 (S410). ).

Upon receiving the request, the slave manager 114 checks the access information and the capacity information in the meta DB 116 to check whether there is a violation such as an access right or a capacity exceeded (S420).

If there is no access authority or the capacity is exceeded, the process of waiting for a data processing request is repeated again (S410). If there is no such violation, the slave manager 114 transmits a data processing request to the master manager 104 (S430). ).

The master manager 104 searches for an optimal node and allocates a task in consideration of the location of input data and the availability of resources for performing a process (S440).

After the job is assigned and the data processing is completed, the master manager 104 notifies the user of the job result through the slave manager 114 (S450).

5 shows a flowchart of a data copying method according to the present invention.

The user transmits a data copy request including the user ID, the data ID, and the data center ID to the slave manager 114 (S510).

The slave manager 114 checks whether the user has a problem of access right or capacity exceeding using the access information in the meta DB 116 (S520).

If there is a problem, the process returns to the previous step (S510) and waits for the user's request. If there is no problem with the access right or the capacity excess, the slave manager 114 transmits a data copy request to the master manager 104 of the central data center 100. (S530).

The master management unit 104 determines the source data center and the destination data center by checking the location and available resources of the source data in the central data map 102 (S540), and copies the data to the corresponding data center. Request (S550).

After copying data, the slave manager of the target data center updates the local data map and the meta DB (S560), and transmits the updated information to the master manager 104 (S570).

The master manager 104 updates the received information to the central data map 102 and the meta DB 106, and transmits the updated meta DB 106 information to all regional data centers 110 to 130 to synchronize the information. MetaDBs can be maintained in the same state (S580).

This data management method not only effectively manages data in geographically dispersed data centers, but also enables users to efficiently access data regardless of which region's data center. have.

On the other hand, the data management method according to an embodiment of the present invention may be implemented in a computer system or recorded on a recording medium. As shown in FIG. 6, a computer system includes at least one processor 721, a memory 723, a user input device 726, a data communication bus 722, a user output device 727, May include a reservoir 728. Each of the components described above communicates data via a data communication bus 722.

The computer system can further include a network interface 729 coupled to the network. The processor 721 may be a central processing unit (CPU) or a semiconductor device that processes instructions stored in the memory 723 and / or the storage 728.

The memory 723 and the storage 728 may include various types of volatile or nonvolatile storage media. For example, the memory 723 may include a ROM 724 and a RAM 725.

Therefore, the data management method according to the embodiment of the present invention can be implemented in a computer executable method. When a data management method according to an embodiment of the present invention is performed in a computer device, computer readable instructions may perform the recognition method according to the present invention.

On the other hand, the data management method according to the present invention described above can be implemented as a computer-readable code on a computer-readable recording medium. Computer-readable recording media include all kinds of recording media having data stored thereon that can be decrypted by a computer system. For example, there may be a read only memory (ROM), a random access memory (RAM), a magnetic tape, a magnetic disk, a flash memory, an optical data storage device, and the like. The computer readable recording medium can also be distributed over computer systems connected over a computer network, stored and executed as readable code in a distributed fashion.

In the above, the configuration of the present invention has been described in detail with reference to the accompanying drawings, which are merely examples, and those skilled in the art to which the present invention pertains various modifications and changes within the scope of the technical idea of the present invention. Of course this is possible. Therefore, the protection scope of the present invention should not be limited to the above-described embodiment but should be defined by the following claims.

Claims

In a data management system in a distributed data environment,
A local data map in which the data and node information are stored, a slave manager for processing a user's request and managing data stored in the local data map, and information for determining whether the user has access to the data. A plurality of slave data centers including stored metadata databases and
A central data map composed of a combination of the plurality of regional data maps, a master manager for processing a request of the slave manager and managing metadata of the slave data center based on the central data map, and the data of the user; A data management system comprising an integrated data center that includes a metadata database that stores information for determining whether to have access.

The method of claim 1,
And the data stored in the local data map includes at least one of an ID, a location of a slave data center, a location of a node, a format of a node, and size of a node.

The method of claim 1,
The user's request is a user's data upload request,
When the slave manager receives a data upload request from the user, the slave manager verifies the write permission of the user and the capacity of the node of the slave data center, and uploads the data to the node based on the result of the check. system.

The method of claim 3, wherein
And the slave manager uploads the data to the node if the user has write permission and the capacity of the node is greater than the capacity of the data.

The method of claim 3, wherein
The slave manager updates the regional data map and the metadata database according to the uploaded result, and transmits the update information of the regional data map and the metadata database to the master manager.
And the master manager updates the update information in the central data map and the metadata database and transmits the updated metadata database to the slave manager of all slave data centers connected to the integrated data center to maintain synchronization.

The method of claim 1,
The user's request is a user's data access request,
As the slave manager receives a data access request from the user, the slave manager verifies whether the user has access authority by using access information stored in the metadata database.
And confirming the presence or absence of data requested by the user based on the local data map when the access result is authorized, and transmitting the location information of the data to the user based on the confirmation result.

The method of claim 6,
The slave manager requests the master manager to provide the location information of the data requested by the user, if the confirmation result data does not exist;
And the slave manager receives the retrieved location information and transmits the location information of the retrieved data to the user as the master manager retrieves the location information from a central data map.

The method of claim 1,
The user's request is a user's data processing request,
As the slave manager receives a data processing request from the user, the slave manager transmits the data to the master manager based on whether access information stored in the metadata database and capacity of a node of the metadata database included in the slave data center are exceeded. Send processing requests,
And the slave manager transmits the data processing result to a user as the master manager searches for a node corresponding to the data processing request and allocates a job to process data.

The method of claim 1,
The user's request is a user's request to copy data,
As the slave manager receives a data copy request from the user, the slave manager transmits the data to the master manager based on whether access information stored in the metadata database and the capacity of a node of the metadata database included in the slave data center are exceeded. Send a copy request,
The master manager checks the location and available resources of the source data in the central data map to determine the slave data center of the source role and the slave data center of the destination role. Send a copy request,
The slave management unit of the determined slave data center updates the local data map and the metadata database as data copying is completed, and transmits the updated information to the master management unit.

The method of claim 9,
The master manager updates the central data map and the metadata database in response to receiving the updated information, and transmits the updated metadata database to the slave manager of all slave data centers connected to the integrated data center to maintain synchronization. Data management system.