KR20200109547A

KR20200109547A - Method and network attached storage apparatus for sharing files between computers

Info

Publication number: KR20200109547A
Application number: KR1020190028734A
Authority: KR
Inventors: 김한규
Original assignee: 김한규
Priority date: 2019-03-13
Filing date: 2019-03-13
Publication date: 2020-09-23

Abstract

The present invention relates to a method for storing and sharing files without damage even when storing the files through each local file system without cooperation between file systems of computers by allowing computers of a big data processing distributed system such as Hadoop to share network attached disks, and more particularly, to a method for sharing files between computers by providing permission to read to other computers by allocating a disk space exclusively to store files exclusively to each computer so that each computer stores the files only in the disk space allocated to store the files exclusively. Therefore, when using a file sharing method and a network attached storage device of the present invention, files are shared without causing network communication between computers to share the files, thereby having an effect of increasing an effective main memory of each computer and an effect of increasing a CPU effective time to have an effect of increasing the data processing performance of the entire distributed system.

Description

[Method and network attached storage apparatus for sharing files between computers}

본 발명은 파일을 공유하는 방법과 공유 파일을 저장하는 저장장치에 관한 것으로서, 더욱 상세하게는 하둡(Hadoop) 시스템과 같이 빅 데이터를 처리하는 대규모 분산 컴퓨팅시스템에서 네트워크 부착 디스크들을 사용하여 컴퓨터들 사이에서 파일들을 공유하는 방법과 파일을 저장하는 네트워크 부착 저장장치에 관한 것이다.The present invention relates to a method for sharing files and a storage device for storing shared files, and more particularly, between computers using network attached disks in a large-scale distributed computing system that processes big data such as a Hadoop system. It relates to a method of sharing files in a network and a network attached storage device for storing files.

컴퓨터의 내부 버스에 장착하는 보통의 디스크에 반해 네트워크 부착 디스크는 네트워크에 부착하여 컴퓨터에게 디스크 장치로서 제공되는 저장장치의 일종이다. 애초에는 SAN (Storage Area Network)이라고 불리는 광 채널 (Fibre Channel) 통신 프로토콜 네트워크를 통해 저장장치를 제공하는 기술로 시작하였으나 이후 대표적인 일반 네트워크인 이더넷(Ethernet)에 부착하여 디스크 공간을 컴퓨터들에게 제공하는 기술들이 개발되어 널리 사용되고 있다. Network-attached disks are a type of storage device that is attached to a network and provided to a computer as a disk device, whereas a normal disk mounted on the internal bus of a computer. Initially, it started as a technology that provides storage devices through a Fiber Channel communication protocol network called SAN (Storage Area Network), but afterwards, it is attached to Ethernet, a typical general network, to provide disk space to computers. Technologies have been developed and are widely used.

이더넷에 부착하는 네트워크 부착 디스크에는 iSCSI 디스크, 넷디스크 디스크, AoE 디스크, FCoE 디스크 등이 있다. iSCSI는 Internet Small Computer Systems Interface의 머리글자로서, 1998년 IBM과 시스코(Cisco)가 주도하여 2000년도에 표준으로 채택된 네트워크를 통해 컴퓨터에 디스크 장치를 제공하여 데이터를 송수신하는 표준 프로토콜로서 인터넷 통신 프로토콜을 기반으로 한다. 특허 공개 2002-0059139호는 이더넷과 같은 범용 네트워크의 포트에 직접 부착하여 사용하는 디스크에 대한 발명으로서 해당 특허를 기반으로 미국 자이메타(Ximeta, Inc.)가 넷디스크(NetDisk)를 개발하여 2002년 출시하였다. 2003년에 제안되어 2007년에 프로토콜이 공식화된 FCoE는 Fibre Channel over Ethernet의 머리글자로서, 고성능 컴퓨터들과 저장장치들 사이에서 고속으로 데이터를 전송하는 광 채널 (Fibre Channel) 프로토콜을 광 채널이 아닌 이더넷을 사용하여 저장장치들을 컴퓨터에게 연결하는 프로토콜 기술이다. 2004년 발표된 AoE는 ATA over Ethernet의 머리글자로서, AT/ATAPI (AT Attachement Packet Interface) 인터페이스 표준을 사용하는 보통의 하드디스크 및 SSD (Solid State Drive)와 같은 장치들을 이더넷에 연결하여 컴퓨터에게 제공하는 프로토콜이다. AoE 기술은 SCSI (Small Computer System Interface) 디스크 장치를 이더넷에 연결하는 상기 iSCSI와 달리 상대적으로 저렴한 AT/ATAPI 디스크를 이더넷에 연결하여 컴퓨터에게 제공한다. 특허 등록 10-0724028호는 하드디스크를 탑재하는 AV (audio video) 기기에 홈 서버 기능을 하드디스크 장치로서 제공하는 것을 가능하게 한다. 특허 등록 10-1509183호는 네트워크에 직접 부착되어 디스크의 페일오버(failover)를 구현하는 페일오버 디스크를 구비함으로써, 종래의 RAID(Redundant Array of Independent Disks) 방식에 비해 경제적인 네트워크 직접 부착방식의 저장장치에 관한 것이다. Network-attached disks attached to Ethernet include iSCSI disks, netdisk disks, AoE disks, and FCoE disks. iSCSI is an acronym for Internet Small Computer Systems Interface. It is a standard protocol for sending and receiving data by providing a disk device to a computer through a network adopted as a standard in 2000 by IBM and Cisco in 1998. Is based on. Patent Publication No. 2002-0059139 is an invention for a disk that is directly attached to a port of a general-purpose network such as Ethernet. Based on the patent, Ximeta, Inc. of the United States developed NetDisk in 2002. Released. FCoE, which was proposed in 2003 and formalized in 2007, is an acronym for Fiber Channel over Ethernet. It is the Fiber Channel protocol that transmits data at high speeds between high-performance computers and storage devices. It is a protocol technology that connects storage devices to a computer using Ethernet. AoE, which was announced in 2004, is an acronym for ATA over Ethernet, and provides computers by connecting devices such as normal hard disks and solid state drives (SSDs) that use AT/ATAPI (AT Attachement Packet Interface) interface standards to Ethernet. It is a protocol to do. Unlike the iSCSI, which connects a small computer system interface (SCSI) disk device to Ethernet, the AoE technology provides a computer with a relatively inexpensive AT/ATAPI disk connected to Ethernet. Patent Registration No. 10-0724028 makes it possible to provide a home server function as a hard disk device to an AV (audio video) device equipped with a hard disk. Patent Registration No. 10-1509183 has a failover disk that is directly attached to the network and implements a failover of the disk, so it is economical compared to the conventional RAID (Redundant Array of Independent Disks) method, which is a storage method of direct network attachment. It relates to the device.

이들 기존의 네트워크 부착 디스크 기술들은 디스크 장치를 네트워크에 연결하여 데이터 저장 공간을 컴퓨터에게 제공하는 기술과 장치이지만, 복수 개의 컴퓨터가 네트워크 부착 디스크를 공유하여 이들 장치에 파일을 저장하여 복수 개의 컴퓨터들이 상호 파일을 공유할 수 있도록 하는 기능은 없다. These existing network-attached disk technologies are technologies and devices that connect a disk device to a network to provide a computer with data storage space. However, multiple computers share a network-attached disk and store files on these devices, allowing multiple computers to mutually interact. There is no function to allow file sharing.

대한민국특허공개공보 제10-2002-0059139호Korean Patent Publication No. 10-2002-0059139 대한민국특허등록공보 제10-0724028호Korean Patent Registration Publication No. 10-0724028 대한민국특허등록공보 제10-1509183호Korean Patent Registration Publication No. 10-1509183

하둡과 같은 분산 데이터 처리 시스템의 다수의 컴퓨터들 사이에서 파일들을 공유할 필요가 높음에도 불구하고 이처럼 네트워크 부착 디스크는 각 컴퓨터의 지역 파일시스템의 파일 저장장치로서 공유할 수 없는 제한점으로 인해서 하둡 시스템에 활용되지 못하고 있는 실정이다. 특히 빅 데이터 처리 분산시스템에서 서버들이 파일을 공유하기 위해서 네트워크를 통해 파일을 전송하는 오버헤드로 인해 전체 시스템의 성능 저하의 주요 원인이 되는 것을 감안할 때 네트워크 부착 디스크를 서버들 사이에서 공유하여 파일을 네트워크를 통해 전송하지 않고 네트워크 부착 디스크에서 직접 공유하는 기술을 제공함에 본 발명의 목적이 있다. Despite the high need to share files among multiple computers in a distributed data processing system such as Hadoop, network-attached disks can not be shared as a file storage device of each computer's local file system. It is not being utilized. In particular, in a big data processing distributed system, the network attached disk is shared among servers, considering that the overhead of transmitting files through the network to share files is a major cause of the performance degradation of the entire system. It is an object of the present invention to provide a technology for directly sharing in a network attached disk without transmitting through a network.

하둡 시스템과 같은 분산시스템의 컴퓨터들이 파일을 공유하는 기존의 방법은 각 컴퓨터가 자신의 내부 디스크에 저장한 파일을 자신의 메인메모리로 읽어 들인 후에 네트워크를 통해 다른 컴퓨터들에게 전송하고 파일을 전송받은 컴퓨터는 자신의 메인메모리에 파일을 탑재한 후에 자신의 내부 디스크에 저장하여 컴퓨터들 사이에서 파일을 공유하는 것인데, 이 과정에서 각 컴퓨터들이 파일을 공유하기 위해 유발되는 많은 네트워크 통신으로 인한 오보헤드와 메인메모리의 상당량이 파일 전송으로 인해 점유되어 결과적으로 데이터를 가공 처리하는데 할당되어야 할 메인메모리가 줄어드는 효과가 발생한다. 처리해야 할 데이터를 메인메모리가 부족하여 메인메모리에 탑재하지 못하고 디스크에 저장하게 되는 스왑(swap) 현상이 발생하면 시스템의 성능이 기하급수적으로 급격히 저하되므로 유효 메인메모리의 양을 높이고 네트워크 통신을 유발하는 파일 교환을 줄여서 성능 제고를 제공함에 본 발명의 또 다른 목적이 있다. The existing method of sharing files between computers in a distributed system such as the Hadoop system is that each computer reads a file stored in its internal disk into its main memory, then transfers the file to other computers through the network and receives the file. Computers share files between computers by loading files in their main memory and then storing them in their internal disks. In this process, the overhead and overhead of many network communications caused by computers to share files A significant amount of the main memory is occupied by file transfer, resulting in the effect of reducing the main memory to be allocated for processing data. If the data to be processed cannot be loaded into the main memory due to insufficient main memory and a swap phenomenon occurs, the system performance is exponentially degraded, thus increasing the amount of effective main memory and causing network communication. Another object of the present invention is to provide improved performance by reducing the number of files exchanged.

본 발명은 네트워크 부착 디스크들을 다수의 컴퓨터들이 공유하고 파일을 네트워크 부착 디스크에 저장함으로써 공유할 파일을 네트워크로 전송하지 않고 컴퓨터들이 네트워크 부착 디스크에 저장되어 있는 파일에 직접 접근하여 공유함으로써 전체 시스템의 성능을 높이는 것을 제공함에 본 발명의 또 다른 목적이 있다.In the present invention, network-attached disks are shared by a plurality of computers and the files are stored in the network-attached disk, so that the files to be shared are not transmitted over the network, and the computers directly access and share the files stored in the network-attached disk. There is another object of the present invention in providing to increase.

본 발명의 목적을 달성하기 위한 구체적 수단으로, 파일들을 저장한 네트워크 부착 디스크를 공유하여 복수 개의 컴퓨터들 사이에서 파일을 공유하는 방법이 있고,As a specific means for achieving the object of the present invention, there is a method of sharing files among a plurality of computers by sharing a network attached disk storing files,

상기 네트워크 부착 디스크에 지역 파일시스템을 통하여 파일들을 생성하고, 생성된 파일들에 상기 각 컴퓨터들이 각자 독점적으로 데이터를 저장할 수 있도록 상호 배타적으로 파일들을 배정하여, 상기 각 컴퓨터는 배정받은 파일들에 파일 쓰기(write)를 수행하여 데이터를 저장하고, 상기 컴퓨터들은 자신에게 배정되지 않은 파일들은 읽기(read)를 수행할 수 있도록 하여 컴퓨터들 사이에서 파일들을 공유하고,Files are created on the network-attached disk through a local file system, and files are mutually exclusively allocated to the generated files so that each computer can store data exclusively, and each computer is assigned a file to the assigned files. Data is stored by performing write, and the computers share files between computers by allowing them to read files that are not assigned to them,

상기 네트워크 부착 디스크의 디스크 섹터들 중 일부에 상기 각 컴퓨터들이 각자 독점적으로 데이터를 저장할 수 있도록 상호 배타적으로 디스크 섹터들을 배정하여, 상기 각 컴퓨터는 배정받은 섹터들에 파일 쓰기(write)를 수행하여 데이터를 저장하고, 상기 컴퓨터들은 자신에게 배정되지 않은 섹터들에 저장된 파일들은 읽기(read)를 수행할 수 있도록 하여 컴퓨터들 사이에서 파일들을 공유하며, The disk sectors are mutually exclusively allocated to some of the disk sectors of the network-attached disk so that the computers can independently store data, and each computer writes a file to the allocated sectors to And the computers share files between computers by allowing them to read files stored in sectors not assigned to them,

상기 네트워크 부착 디스크의 파티션을 상기 컴퓨터 중 하나가 독점적으로 파일 쓰기(write)와 읽기(read)를 수행할 수 있도록 마운트 하고, 다른 상기 컴퓨터들은 상기 네트워크 부착 디스크의 해당 파티션을 읽기만을 수행할 수 있도록 마운트 하여 컴퓨터들이 파일을 공유하고, Mount the partition of the network-attached disk so that one of the computers can exclusively write and read files, and the other computers can only read the corresponding partition of the network-attached disk. Mount and allow computers to share files,

상기 네트워크 부착 디스크 전체를 상기 컴퓨터 중 하나가 독점적으로 파일 쓰기(write)와 읽기(read)를 수행할 수 있도록 마운트 하고, 다른 상기 컴퓨터들은 해당 상기 네트워크 부착 디스크를 읽기만을 수행할 수 있도록 마운트 하여 컴퓨터들이 파일을 공유하며, The entire network-attached disk is mounted so that one of the computers can exclusively write and read files, and the other computers mount the network-attached disk so that only reading can be performed. Share files,

상기 네트워크 부착 디스크의 데이터 저장 매체가 하드디스크, SSD (Solid State Drive), 플래시 메모리, 복수 배열 독립 디스크 (RAID, Redundant Array of Inexpensive Disks), 및 JBOD (Just Bunch of Disks)인 네트워크 부착 디스크를 공유하여 복수 개의 컴퓨터들 사이에서 파일을 공유하며,The data storage medium of the network-attached disk is a hard disk, solid state drive (SSD), flash memory, multiple array independent disks (RAID, Redundant Array of Inexpensive Disks), and JBOD (Just Bunch of Disks). To share files between multiple computers,

네트워크에 연결하여 컴퓨터에게 디스크를 제공하는 네트워크 부착 저장장치로서, 저장장치에 저장된 HDFS (Hadoop Distributed File System, 하둡 분산 파일시스템)의 블록을 네트워크를 통해 다른 저장장치로 송신하고, 다른 저장장치로부터 네트워크를 통해 HDFS 블록을 수신할 수 있는 블록송수신로직을 장착한 네트워크 부착 저장장치를 구성하며, As a network attached storage device that provides disks to a computer by connecting to a network, it transmits blocks of HDFS (Hadoop Distributed File System) stored in the storage device to other storage devices through the network, and transmits the blocks from other storage devices to the network. It composes a network attached storage device equipped with block transmission/reception logic that can receive HDFS blocks through

또한, 네트워크에 연결하여 컴퓨터에게 디스크를 제공하는 저장장치로서 컴퓨터가 독점적으로 데이터를 저장할 파일을 배정해 줄 것을 요청하면 적절한 파일을 배정하는 배정로직과 독점적 데이터 저장 권한을 가지지 못한 파일에 데이터 쓰기를 시도하는지를 점검하는 점검로직을 장착한 네트워크 부착 저장장치를 구성하고,In addition, as a storage device that connects to a network and provides a disk to a computer, when the computer requests that the computer allocate a file to store data exclusively, the assignment logic that allocates the appropriate file and writes the data to a file that does not have exclusive data storage rights. Compose a network attached storage device equipped with an inspection logic to check whether it is attempting,

또한, 네트워크에 연결하여 컴퓨터에게 디스크를 제공하는 저장장치로서 컴퓨터가 독점적으로 데이터를 저장할 디스크 섹터들을 배정해 줄 것을 요청하면 적절한 디스크 섹터들을 배정하는 배정로직과 독점적 데이터 저장 권한을 가지지 못한 섹터에 데이터 쓰기를 시도하는지를 점검하는 점검로직을 장착한 네트워크 부착 저장장치를 구성하며,In addition, as a storage device that provides disks to the computer by connecting to the network, when the computer requests to allocate disk sectors for exclusively storing data, the allocation logic that allocates the appropriate disk sectors and data to the sectors that do not have exclusive data storage rights Configures a storage device attached to a network equipped with an inspection logic that checks whether writing is attempted,

또한, 네트워크에 연결하여 컴퓨터에게 디스크를 제공하는 저장장치로서 컴퓨터가 독점적으로 데이터를 저장할 디스크 파티션을 배정해 줄 것을 요청하면 적절한 디스크 파티션을 배정하는 배정로직과 독점적 데이터 저장 권한을 가지지 못한 파티션에 데이터 쓰기를 시도하는지를 점검하는 점검로직을 장착한 네트워크 부착 저장장치를 구성하여 본 발명의 목적을 달성할 수 있을 것이다.In addition, as a storage device that provides disks to the computer by connecting to the network, when the computer requests to allocate a disk partition to exclusively store data, the allocation logic that allocates the appropriate disk partition and the partition that does not have exclusive data storage rights The object of the present invention may be achieved by configuring a storage device attached to a network equipped with a check logic that checks whether writing is attempted.

이상에서와 같이 본 발명에 따른 파일 공유 방법과 네트워크 부착 저장장치를 사용하면 하둡과 같은 기존의 분산 시스템의 컴퓨터들 사이에서 파일 데이터를 전송하기 위해 컴퓨터의 메인메모리를 점유하던 것을 회피하고 파일 공유를 위해 필요했던 네트워크 통신을 야기하지 않으면서 파일을 공유하므로 전체 분산시스템의 데이터 처리 성능을 높이는 탁월한 효과가 있다.As described above, the use of the file sharing method and the network attached storage device according to the present invention avoids occupying the main memory of the computer in order to transfer file data between computers of an existing distributed system such as Hadoop. Because files are shared without causing network communication, which was necessary for the purpose, there is an excellent effect of improving the data processing performance of the entire distributed system.

도 1은 하둡 시스템의 구성을 보여주는 개략도이다.
도 2는 하둡 시스템의 매퍼(mapper)와 리듀서(reducer) 사이에서 중간결과 파일을 송수신하는 과정을 보여주는 개념도이다.
도 3은 제1 실시 예를 설명하기 위해 다수의 컴퓨터들이 네트워크 부착 디스크들을 공유하여 파일을 공유하는 것을 보여주는 개념도이다.
도 4는 제 1 실시 예를 설명하기 위해 컴퓨터들이 네트워크 부착 디스크들을 공유하는 것을 보여주는 시스템 구성도이다.
도 5는 제 1 실시 예인 컴퓨터들 사이에서 파일시스템 사이의 협조 없이 자신의 지역 파일시스템만을 사용하여 파일을 공유하는 방법을 나타내는 개념도이다.
도 6은 제 2 실시 예를 설명하기 위해 디스크 파티션(partition)을 LBA(Logical Block Addressing) 방식으로 도식화한 것이다.
도 7은 제 2 실시 예인 네트워크 부착 디스크를 단순 블록 저장장치로 사용하여 컴퓨터들 사이에서 파일 데이터를 공유하는 방법을 나타내는 개념도이다.
도 8은 제 3 실시 예를 설명하기 위해 기존의 HDFS (Hadoop Distributed File System, 하둡 분산 파일시스템)에서 데이터 블록에 데이터를 저장하는 과정을 도식화한 것이다.
도 9는 제 3 실시 예인 본 발명의 네트워크 부착 저장장치들을 사용하여 HDFS 데이터노드의 메인메모리를 점유하지 않고 HDFS 블록의 복제가 이루어지는 것을 도식화한 것이다.
도 10은 제 4 실시 예인 본 발명의 네트워크 부착 저장장치의 기능 요소 구성도이다.
도 11은 제 5 실시 예인 디스크 파티션 전체를 컴퓨터들에게 읽기 쓰기 권한을 독점적으로 갖도록 배정하는 방법을 도식화한 것이다.
도 12는 제 6 실시 예인 디스크 파티션 전체를 배정하는 본 발명의 네트워크 부착 저장장치의 기능 요소 구성도이다. 1 is a schematic diagram showing the configuration of a Hadoop system.
2 is a conceptual diagram illustrating a process of transmitting and receiving an intermediate result file between a mapper and a reducer of a Hadoop system.
3 is a conceptual diagram showing that a plurality of computers share a file by sharing network attached disks to illustrate the first embodiment.
4 is a system configuration diagram showing computers sharing network attached disks to illustrate the first embodiment.
FIG. 5 is a conceptual diagram illustrating a method of sharing files between computers according to the first embodiment using only their own local file system without cooperation between file systems.
6 is a schematic diagram of a disk partition in a logical block addressing (LBA) method to describe the second embodiment.
7 is a conceptual diagram showing a method of sharing file data between computers using a network attached disk as a simple block storage device according to the second embodiment.
FIG. 8 is a schematic diagram of a process of storing data in a data block in a conventional HDFS (Hadoop Distributed File System, Hadoop Distributed File System) to explain the third embodiment.
FIG. 9 is a schematic diagram illustrating that the HDFS block is copied without occupying the main memory of the HDFS data node by using the network attached storage devices of the present invention, which is a third embodiment.
10 is a block diagram of functional elements of the network attached storage device of the present invention, which is a fourth embodiment.
FIG. 11 is a schematic diagram of a method of allocating an entire disk partition, which is a fifth embodiment, to computers to exclusively have read/write rights.
12 is a block diagram of functional elements of the network-attached storage device of the present invention in which the entire disk partition is allocated according to the sixth embodiment.

상기 목적을 달성하기 위한 본 발명은 다수의 컴퓨터들이 네트워크 부착 저장장치들을 공유하면서 다른 서버들과의 협조 없이 제각각 자신의 지역 파일시스템을 통하여 파일들을 생성하여 공유하면서도 네트워크 부착 저장장치에 저장된 파일의 데이터 훼손 없이 파일들을 공유하는 방법과 네트워크 부착 저장장치의 설계를 기술적 구성상의 기본 특징으로 한다. The present invention for achieving the above object is that a plurality of computers share network-attached storage devices and create and share files through their own local file system without cooperation with other servers. A method of sharing files without damage and the design of a network attached storage device are the basic features of the technical configuration.

이하, 본 발명에 따른 파일 공유 방법과 네트워크 부착 저장장치의 바람직한 실시 예들을 도 1 내지 도 12를 참조하여 설명한다. 본 발명의 명세서의 실시 예에서 언급한 디스크는 하드디스크 (HDD) 또는 SSD (solid state drive)를 지칭하는 것은 물론 USB 드라이브, SD (secure digital) 카드 등 임의의 비휘발성 블록 저장장치를 지칭한다.Hereinafter, preferred embodiments of a file sharing method and a network attached storage device according to the present invention will be described with reference to FIGS. 1 to 12. The disk referred to in the embodiments of the present specification refers to a hard disk (HDD) or a solid state drive (SSD), as well as an arbitrary nonvolatile block storage device such as a USB drive or a secure digital (SD) card.

본 발명의 명세서에서 기본적으로 하둡 시스템을 예로 들어 실시 예를 설명하였으나 본 발명의 파일 공유 방법과 저장장치는 하둡 시스템 외에도 다수의 컴퓨터들이 네트워크에 연결된 일반적인 분산시스템의 컴퓨터들 사이에서 파일 데이터를 공유하는 방법으로 사용될 수 있다는 것은 자명하다. In the specification of the present invention, a Hadoop system has been described as an example. However, the file sharing method and storage device of the present invention include a number of computers in addition to the Hadoop system to share file data between computers of a general distributed system connected to a network. It is obvious that it can be used in any way.

도 1은 대규모의 데이터를 처리하기 위해 수 천 대에 이르는 컴퓨터인 데이터노드(Data Node)들(1-1, 1-2, 1-3, 1-4)이 네트워크(2)로 연결된 하둡 시스템의 구성을 보여주는 개략도이다. 하둡 시스템에서 데이터를 처리하는 각 개별적인 컴퓨터인 데이터노드(1-1, 1-2, 1-3, 1-4)에는 데이터를 처리하는 함수인 매퍼(mapper)들(20-1, 20-2, 20-3, 20-4, 20-5)과 리듀서(reducer)들(21-1, 21-2, 21-3, 21-4, 21-5, 21-6)이 한 개 이상 실행되면서 대규모의 데이터를 다수의 데이터노드에서 나누어 처리한다. 매퍼는 데이터를 처리하여 중간결과를 도출하는 함수이다. 처리할 데이터는 여러 데이터노드에 분산되어 데이터 블록들(16-1, 16-2, 16-3)에 저장되며, 전체 데이터를 여러 매퍼들이 나누어 각각 자신의 데이터노드에 저장되어 있는 데이터블록의 데이터를 처리함으로써 전체 데이터를 병렬적으로 처리한다. 1 is a Hadoop system in which data nodes (1-1, 1-2, 1-3, 1-4), which are thousands of computers, are connected through a network (2) to process large-scale data. It is a schematic diagram showing the composition of. Data nodes (1-1, 1-2, 1-3, 1-4), which are individual computers that process data in the Hadoop system, have mappers (20-1, 20-2), which are functions that process data. , 20-3, 20-4, 20-5) and one or more reducers (21-1, 21-2, 21-3, 21-4, 21-5, 21-6) Large-scale data is divided and processed by multiple data nodes. Mapper is a function that processes data and derives intermediate results. The data to be processed is distributed to several data nodes and stored in data blocks (16-1, 16-2, 16-3), and the data of the data blocks stored in their own data nodes by dividing the entire data by several mappers. The entire data is processed in parallel by processing.

데이터를 처리하는 각 매퍼들은 중간결과 데이터(intermediate data)를 파일들(11-1, 11-2, 11-3, 11-4, 11-5, 11-6)로 생성하여 각 매퍼가 실행되고 있는 데이터노드의 지역 파일시스템(local file system)(10-1, 10-2, 10-3, 10-4)을 통하여 자기 데이터노드의 디스크에 중간결과 파일들(12-1, 12-2, 12-3, 12-4, 12-5, 12-6)로 저장한다. 이때 각 매퍼는 통상 여러 개의 중간결과 데이터 파일들을 생성한다. 매퍼들에 의해 생성된 중간결과 파일들(11-1, 11-2, 11-3, 11-4, 11-5, 11-6)은 지역 디스크에 파일들(12-1, 12-2, 12-3, 12-4, 12-5, 12-6)로 저장되기 전에 우선 각 데이터노드의 운영체제의 디스크 버퍼에 생성되는데 디스크 버퍼에 임시로 생성된 파일들과 디스크에 저장된 파일들은 데이터가 동일한 하나의 같은 파일이지만 데이터노드의 메인메모리(3-1, 3-2, 3-3, 3-4)에 위치한 디스크 버퍼에 있는 파일인지 데이터노드의 디스크에 저장되어 있는 파일인지를 구별하기 위해서 발명의 모든 도면에서는 디스크 버퍼에 있는 중간결과 파일들은 11번대로 표시하고 디스크에 저장된 중간결과 파일들은 파일의 번호를 12번대로 붙여서 구별하여 표시한다. Each mapper that processes data creates intermediate data as files (11-1, 11-2, 11-3, 11-4, 11-5, 11-6) and executes each mapper. Intermediate result files (12-1, 12-2, 12-1, 12-2, etc.) on the disk of the own data node through the local file system (10-1, 10-2, 10-3, 10-4) of the existing data node. 12-3, 12-4, 12-5, 12-6). At this time, each mapper usually creates several intermediate result data files. The intermediate result files (11-1, 11-2, 11-3, 11-4, 11-5, 11-6) generated by the mappers are files (12-1, 12-2, and 11-6) on the local disk. 12-3, 12-4, 12-5, 12-6) are first created in the disk buffer of the operating system of each data node. Files temporarily created in the disk buffer and files stored on the disk have the same data. Invented to distinguish whether a file is the same file but is a file in a disk buffer located in the main memory (3-1, 3-2, 3-3, 3-4) of the data node or a file stored in the disk of the data node In all of the drawings in Fig. 11, intermediate result files in the disk buffer are indicated by number 11, and intermediate result files stored in the disk are indicated by attaching the file number by number 12.

매퍼에 의해 생성된 여러 개의 중간결과 데이터 파일들은 여러 리듀서들(21-1, 21-2, 21-3, 21-4, 21-5, 21-6)에게 전송된다. 리듀서는 중간결과 데이터 파일을 수합하여 처리하는 함수이다. 리듀서들은 매퍼와 마찬가지로 여러 데이터노드들에 분산되어 실행되며, 하나의 작업을 처리하기 위한 리듀서들이 데이터노드마다 다수 존재할 수 있다. 매퍼들은 문제 해결을 위해 중간결과 파일(12-1, 12-2, 12-3, 12-4, 12-5, 12-6)들을 여러 리듀서들에게 보내고 각 리듀서는 여러 매퍼들로부터 받은 중간결과 파일들을 수합하여 처리한다. 각 리듀서에게는 서로 다른 매퍼들에 의해 생성되어 최종적으로 수합되어서 처리되어야 하는 중간결과 파일들이 여러 매퍼들로부터 전송된다. 이와 같이 각 리듀서들이 처리한 결과 파일들(14-1, 14-2, 14-3)은 각 데이터노드의 지역 파일시스템을 통해 지역 디스크에 파일들(15-1, 15-2, 15-3)로 저장되며 이러한 결과 파일들(15-1, 15-2, 15-3)의 전체 집합이 최종 처리 결과 데이터가 된다. Several intermediate result data files generated by the mapper are transmitted to several reducers (21-1, 21-2, 21-3, 21-4, 21-5, 21-6). Reducer is a function that collects and processes intermediate result data files. Like the mapper, reducers are distributed across multiple data nodes, and multiple reducers for processing a single task can exist for each data node. Mappers send intermediate result files (12-1, 12-2, 12-3, 12-4, 12-5, 12-6) to several reducers to solve the problem, and each reducer sends intermediate results received from several mappers. Collect and process files. To each reducer, intermediate result files that are generated by different mappers and are finally collected and processed are transmitted from several mappers. The result files (14-1, 14-2, 14-3) processed by each reducer in this way are stored on the local disk through the local file system of each data node (15-1, 15-2, 15-3). ), and the entire set of these result files (15-1, 15-2, 15-3) becomes the final processing result data.

이처럼 다수의 매퍼들과 리듀서들을 네트워크에 연결된 데이터노드에 분산시켜 실행하는 이유는 대규모의 전체 데이터를 다수의 데이터노드에 분산시켜 병렬적으로 처리하여 데이터 처리 작업의 속도를 높이기 위한 것이다. The reason for distributing and executing multiple mappers and reducers across data nodes connected to the network is to increase the speed of data processing by distributing large-scale data across multiple data nodes and processing them in parallel.

도 2는 매퍼가 생성한 중간결과 파일을 http (hyper text transport protocol) 통신 프로토콜을 이용하여 리듀서에게 전송하는 통상적인 과정을 보여준다. 이를테면 데이터노드 M(1-1)의 매퍼 j(20-1)에 의해 메인메모리(3-1) 안에 생성된 중간결과 파일 p(11-7)를 디스크(6-1)에 파일 p(12-7)로 저장하고, HTTP 인터페이스 프로그램(22-1)을 통해 데이터노드 M(1-1)의 운영체제의 네트워크 스택(4-1)과 네트워크 인터페이스 하드웨어인 NIC(Network Interface Card 또는 Network Interface Chip)(5-1)을 거쳐 네트워크(2)를 통해 전송하여, 중간결과 파일 수신을 기다리는 리듀서 h(21-1)가 실행되고 있는 데이터노드 N(1-2)에게 보낸다. 이때 HTTP 인터페이스 프로그램(22-1)을 통해 디스크(6-1)에 저장되어 있는 중간결과 파일 p(12-7)를 직접 네트워크(2)를 통해 보내지는 못하며 디스크(6-1)에 있는 중간결과 파일 p(12-7)를 중간결과 파일 p(11-8)로 메인메모리(3-1)에 탑재하여야 전송할 수 있다. 이와 같이 전송된 중간결과 파일 p(11-8)는 리듀서 h(21-1)가 실행되고 있는 데이터노드 N(1-2)의 네트워크 인터페이스 하드웨어인 NIC(Network Interface Card 또는 Network Interface Chip)(5-2)을 거쳐 운영체제의 네트워크 스택(4-2) 및 HTTP 인터페이스 프로그램(22-2)과 지역 파일시스템(10-2)을 통해 메인메모리(3-2) 안에 중간결과 파일 p(11-9)로 탑재한 후 지역 디스크(6-2)에 파일 p(12-8)로 저장되고, 이후 리듀서 h(21-1)가 다시 디스크(6-2)에 저장되어 있는 중간결과 파일 p(12-8)에 접근하여 리듀서 h(21-1)가 작업을 할 수 있도록 메인메모리(3-2) 안에 중간결과 파일 p(11-10)로 탑재한 다음에 파일 p(11-10)의 데이터를 처리한다. 도 2에서 매퍼 j(20-1)가 생성한 파일 p(11-7)와 두 데이터노드(1-1, 1-2)의 디스크들(6-1, 6-2)에 각각 저장된 파일 p(12-7, 12-8) 및 메인메모리에 탑재된 파일 p(11-8, 11-9, 11-10)는 모두 데이터가 동일한 파일들이지만 어디에 위치한 파일들인지를 구별하기 위해 도 1에서와 같이 메인메모리에 탑재된 파일들은 11번대로 표시하고 디스크에 저장된 파일들은 12번대로 표시하였다. 2 shows a typical process of transmitting an intermediate result file generated by a mapper to a reducer using a hypertext transport protocol (http) communication protocol. For example, the intermediate result file p(11-7) created in the main memory (3-1) by the mapper j(20-1) of data node M(1-1) is transferred to the disk(6-1). -7), and through the HTTP interface program (22-1), the network stack (4-1) of the operating system of the data node M (1-1) and the network interface hardware NIC (Network Interface Card or Network Interface Chip) Transmitted through the network 2 via (5-1), the reducer h(21-1) waiting for the reception of the intermediate result file is sent to the running data node N(1-2). At this time, the intermediate result file p (12-7) stored in the disk (6-1) through the HTTP interface program (22-1) cannot be directly sent through the network (2). The result file p(12-7) must be loaded into the main memory 3-1 as the intermediate result file p(11-8) to be transmitted. The intermediate result file p(11-8) transmitted in this way is the network interface hardware NIC (Network Interface Card or Network Interface Chip)(5) of data node N(1-2) where reducer h(21-1) is running. -2), the intermediate result file p(11-9) in the main memory (3-2) through the network stack (4-2) of the operating system and the HTTP interface program (22-2) and the local file system (10-2). ), and then saved as file p(12-8) on the local disk (6-2), and then reducer h(21-1) is saved on the disk (6-2) again, and the intermediate result file p(12 After accessing -8) and loading the intermediate result file p(11-10) into the main memory (3-2) so that reducer h(21-1) can work, the data of file p(11-10) Process. In FIG. 2, a file p (11-7) created by mapper j (20-1) and a file p stored in the disks 6-1 and 6-2 of the two data nodes 1-1 and 1-2, respectively. (12-7, 12-8) and the files p(11-8, 11-9, 11-10) loaded in the main memory are all files that have the same data, but to distinguish where the files are located, as in FIG. Likewise, the files loaded in the main memory are displayed in number 11, and the files stored in the disk are displayed in number 12.

이처럼 각 데이터노드는 매퍼들이 생성한 중간결과 파일들을 여러 다른 데이터노드들의 리듀서들에게 전송하는데, 이 과정에서 각 데이터노드가 중간결과 파일들을 송수신하는 과정에서 자신의 메인메모리(3-1, 3-2)를 점유하여 송수신하는 네트워크 통신에 따른 오버헤드 때문에 메인메모리와 CPU를 데이터 처리를 수행하는 유효한 작업에 할당하지 못하게 되며 이에 따라 전체 데이터 처리 성능을 저하시키는 원인이 된다. As such, each data node transmits the intermediate result files generated by the mappers to the reducers of various data nodes. In this process, each data node transmits and receives the intermediate result files, and its main memory (3-1, 3- 2) Due to the overhead of network communication that occupies and transmits/receives, it is not possible to allocate the main memory and the CPU to an effective task that performs data processing, which causes the overall data processing performance to deteriorate.

본 발명은 하둡 시스템과 같은 대규모 데이터 처리 시스템의 성능을 개선하는 방법과 저장장치에 대한 것으로서, 더욱 상세하게는 중간결과 파일들을 네트워크에 부착된 저장장치에 저장하여 데이터노드들 즉 컴퓨터들이 중간결과 파일들을 공유하도록 하는 것이다. 본 발명의 파일 공유 방법을 사용하면, 기존에는 중간결과 파일들을 공유하기 위해 네트워크를 통해 송수신하는 과정에서 야기되던 디스크 버퍼와 네트워크 스택에 할당되던 메인메모리 점유를 회피하게 되고 통신 오버헤드를 줄여서 데이터 처리 성능을 높이게 된다. The present invention relates to a method and a storage device for improving the performance of a large-scale data processing system such as a Hadoop system. More specifically, intermediate result files are stored in a storage device attached to a network so that data nodes, that is, computers, are used as intermediate result files. To share them. When the file sharing method of the present invention is used, the occupancy of the main memory allocated to the disk buffer and the network stack, which was caused in the process of transmitting/receiving through a network to share intermediate result files, is avoided, and data processing by reducing communication overhead. It increases performance.

도 3은 네트워크 부착 디스크들을 다수의 데이터노드들 즉 컴퓨터들이 공유하는 본 발명의 시스템 구성 개념도이다. 중간결과 파일을 네트워크로 전송할 때 야기되는 메인메모리 점유를 회피하기 위해서 본 발명에서는 네트워크 부착 디스크(7-1)를 사용하여 컴퓨터들 사이에서 파일을 공유한다. 네트워크에 부착하는 디스크에는 전술한 것과 같이 iSCSI (Internet Small Computer Systems Interface) 디스크, 넷디스크, AoE (ATA over Ethernet) 디스크 등이 있다. 본 발명에서는 네트워크 부착 디스크(7-1)를 도 3에서와 같이 여러 데이터노드들(1-1, 1-2)이 연결된 네트워크(2)에 부착하여 각 데이터노드들(1-1, 1-2)의 지역 디스크로 사용한다. 3 is a conceptual diagram of a system configuration of the present invention in which network attached disks are shared by a plurality of data nodes, that is, computers. In order to avoid occupying the main memory caused when transferring the intermediate result file to the network, in the present invention, the file is shared between computers using the network attached disk 7-1. As described above, disks attached to the network include Internet Small Computer Systems Interface (iSCSI) disks, net disks, and AoE (ATA over Ethernet) disks. In the present invention, the network attachment disk 7-1 is attached to the network 2 to which several data nodes 1-1 and 1-2 are connected, as shown in FIG. 3, so that the data nodes 1-1 and 1- 2) It is used as a local disk.

도 3의 매퍼 j(20-7)가 자신의 지역 파일시스템(10-1)을 통해 생성한 중간결과 파일 h(11-1)를 네트워크 스택(4-1)과 NIC(5-1)을 거쳐 네트워크 부착 디스크(7-1)에 파일 h(12-1)로 저장한다. 매퍼 j(20-7)가 생성한 파일 h(11-1)는 매퍼 j(20-7)가 실행되고 있는 데이터노드 M(1-1)의 운영체제의 디스크 버퍼에 있는 파일을 가리키고 동일한 데이터의 파일 h(12-1)는 네트워크 부착 디스크(7-1)에 저장된 파일을 가리킨다. 네트워크(2)를 통해 연결된 다른 데이터노드 N(1-2)에서 실행되고 있는 리듀서 k(21-7)는 네트워크 부착 디스크(7-1)에 저장된 중간결과 파일 h(12-1)를 자신의 NIC(5-2)과 네트워크 스택(4-2) 및 지역 파일시스템(10-2)을 통해 자신의 메인메모리에 위치한 자신의 디스크 버퍼에 중간결과 파일 h(11-2)로 적재하여 데이터를 처리한다. 도 2에 보인 통상적 방법과는 달리 도 3의 본 발명의 파일 공유 방법은 파일을 공유하기 위해 한 데이터노드가 다른 데이터노드에게 네트워크를 통해 파일을 전송하는 대신에 네트워크 부착 디스크에 저장된 파일에 데이터노드들이 직접 접근한다.The intermediate result file h(11-1) generated by mapper j (20-7) of FIG. 3 through its local file system (10-1) is transferred to the network stack (4-1) and the NIC (5-1). Then, it is stored as a file h (12-1) on the network attached disk (7-1). The file h(11-1) created by mapper j(20-7) points to a file in the disk buffer of the operating system of data node M(1-1) on which mapper j(20-7) is running, and contains the same data. The file h(12-1) indicates a file stored on the network attachment disk 7-1. Reducer k(21-7) running on another data node N(1-2) connected via the network(2) saves the intermediate result file h(12-1) stored on the network attached disk(7-1). Through the NIC (5-2), the network stack (4-2), and the local file system (10-2), the data is loaded into its own disk buffer located in its main memory as the intermediate result file h (11-2). Process. Unlike the conventional method shown in FIG. 2, the file sharing method of the present invention of FIG. 3 uses a data node in a file stored on a network attached disk instead of transmitting a file from one data node to another data node through a network to share a file. Approach directly.

도 4는 다수의 네트워크 부착 디스크들(7-1, 7-2, 7-3, 7-4, 7-5)을 다수의 데이터노드들(1-1, 1-2, 1-3) 즉 컴퓨터들이 공유하는 시스템 구성도이다. 네트워크(2)로 연결되어 있는 모든 데이터노드들은 네트워크 부착 디스크들(7-1, 7-2, 7-3, 7-4, 7-5)을 각 데이터노드에 탑재된 네트워크 부착 디스크를 제어하는 디바이스 드라이버 소프트웨어 모듈(13-1, 13-2, 13-3)을 통해 자신의 지역 디스크로 인식한다. 디바이스 드라이버(13-1, 13-2, 13-3)로는 리눅스 운영체제의 경우 2000년대 초부터 iSCSI 장치를 위한 소프트웨어 이니시에이터(initiator, 초기자)가 제공되어 사용되고 있으며 윈도우즈 및 VMware를 비롯한 여러 운영체제들도 네트워크 부착 디스크를 자신의 지역 디스크로 인식하여 사용할 수 있도록 디바이스 드라이버인 이니시에이터 소프트웨어를 제공해 오고 있다. 넷디스크의 경우는 2000년대 초부터 리눅스와 윈도우즈 시스템에서 넷디스크 디스크를 위한 디바이스 드라이버를 제공하고 있다. 시스템 내부에 디스크를 장착한 컴퓨터는 지역 파일시스템을 통하여 내부 디스크용 디바이스 드라이브 소프트웨어를 통해 내부에 장착한 디스크를 제어하여 디스크에 파일들을 쓰고 읽는다. 반면에 네트워크 부착 디스크를 사용하는 컴퓨터 시스템은 지역 파일시스템을 통해 네트워크 부착 디스크에 파일을 읽고 쓰는 것은 동일하지만, 네트워크 부착 디스크용 디바이스 드라이버 소프트웨어가 네트워크를 통해 네트워크에 부착되어 있는 디스크에 접근하여 네트워크 부착 디스크를 제어한다. 4 shows a plurality of network-attached disks 7-1, 7-2, 7-3, 7-4, and 7-5 with a plurality of data nodes 1-1, 1-2, 1-3, that is, This is a system configuration diagram shared by computers. All data nodes connected to the network (2) control the network attached disks (7-1, 7-2, 7-3, 7-4, 7-5) attached to each data node. It is recognized as its own local disk through the device driver software modules 13-1, 13-2, and 13-3. As device drivers (13-1, 13-2, 13-3), in the case of Linux operating systems, software initiators for iSCSI devices have been provided and used since the early 2000s, and various operating systems including Windows and VMware Initiator software, a device driver, has been provided so that network-attached disks can be recognized and used as their own local disks. In the case of NetDisk, device drivers for NetDisk disks have been provided in Linux and Windows systems since the early 2000s. A computer with a disk installed inside the system controls the disk installed inside through the device drive software for internal disk through a local file system to write and read files to the disk. On the other hand, a computer system using a network-attached disk reads and writes files to a network-attached disk through a local file system, but the device driver software for the network-attached disk accesses the network-attached disk through the network and attaches it to the network. Control the disk.

그런데 도 4에서처럼 네트워크 부착 디스크를 네트워크에 단순히 연결하는 것만으로는 여러 데이터노드들이 독립적으로 네트워크 부착 디스크에 파일을 공유하면서 읽거나 쓸 수 없다. 그 이유는 각각의 데이터노드들은 다른 데이터노드들과 상관없이 자신만의 지역 파일시스템을 통해서 파일들을 읽고 쓰기 때문에 네트워크 부착 디스크를 공유하는 데이터노드 각각은 다른 데이터노드들이 디스크의 어느 섹터에 어느 파일을 저장하는지 알지 못하므로 여러 데이터노드에 의해 공유되는 네트워크 부착 디스크의 동일한 디스크 섹터에 서로 다른 데이터노드에 의해 서로 다른 데이터가 덧씌워지는 등 파일 데이터의 무결성(integrity)을 훼손하기 때문이다. 즉 각각의 지역 파일시스템들이 서로 협력하지 않으므로 디스크의 어느 섹터에 어느 파일이 저장되는지의 정보를 담은 메타데이터를 각 데이터노드가 서로 다르게 유지하게 되어 공유하는 네트워크 부착 디스크에 일관된 파일시스템을 유지할 수 없게 되는 것이다. 이러한 이유로 인해 iSCSI, AoE, FCoE, 넷디스크 등의 네트워크 부착 디스크들 자체만으로는 네트워크의 컴퓨터들이 공유하여 파일을 저장하지 못한다.However, simply connecting the network-attached disk to the network as shown in FIG. 4 cannot allow multiple data nodes to independently read or write files while sharing files on the network-attached disk. The reason is that each data node reads and writes files through its own local file system regardless of other data nodes, so each data node sharing a network-attached disk has different data nodes in any sector of the disk. This is because the integrity of the file data is damaged by overwriting different data by different data nodes on the same disk sector of a network-attached disk shared by several data nodes. In other words, since each local file system does not cooperate with each other, each data node maintains different metadata containing information about which file is stored in which sector of the disk, making it impossible to maintain a consistent file system on the shared network attached disk. It becomes. For this reason, network-attached disks such as iSCSI, AoE, FCoE, and NetDisk cannot be shared by network computers to store files.

도 5는 각 컴퓨터들(1-11, 1-12, 1-15)이 다른 컴퓨터들과 파일시스템들(10-11, 10-12, 10-15) 사이에서 서로 협조하는 일 없이 자신의 지역 파일시스템만을 사용하면서 컴퓨터들(1-11, 1-12, 1-13) 사이에서 파일을 공유하는 본 발명의 파일 공유 방법을 도식화한 것이다. 도 5에 도식화된 본 발명의 파일 공유 방법은 통상적인 네트워크 부착 디스크들(7-1, 7-2, 7-3)과 파일 배정서버(1-15)를 사용한다. 디스크 파티션은 하드디스크나 SSD (Solid State Drive)와 같은 저장장치의 섹터들의 연속적인 모임인데, 통상적으로 하나의 물리적 하드디스크 또는 SSD는 몇 개 정도의 파티션으로 나눌 수 있으며 개별적인 디스크 파티션은 운영체제에 의해 독립적인 저장장치로 인식되어 하나의 디스크 파티션에는 하나의 파일시스템만이 탑재된다. 본 발명의 모든 실시 예의 파일 공유 방법은 디스크 전체는 물론 디스크 파티션에 대해서도 동일하게 적용되므로 본 발명의 모든 실시 예들의 설명에서는 디스크와 디스크 파티션을 구분하지 않고 지칭하여도 무관하다. Figure 5 shows that each computer (1-11, 1-12, 1-15) does not cooperate with each other between other computers and file systems (10-11, 10-12, 10-15). This is a schematic diagram of the file sharing method of the present invention in which files are shared between computers 1-11, 1-12, and 1-13 while using only the file system. The file sharing method of the present invention illustrated in FIG. 5 uses conventional network-attached disks 7-1, 7-2, and 7-3 and a file allocation server 1-15. A disk partition is a contiguous group of sectors of a storage device such as a hard disk or SSD (Solid State Drive). Typically, one physical hard disk or SSD can be divided into several partitions, and individual disk partitions are divided by the operating system. It is recognized as an independent storage device, so only one file system is mounted on one disk partition. Since the file sharing method of all the embodiments of the present invention is applied equally to the entire disk as well as to the disk partition, in the description of all the embodiments of the present invention, the disk and the disk partition may be referred to without distinction.

도 5의 각 컴퓨터들(1-11, 1-12)과 파일 배정서버(1-15)는 이들 네트워크 부착 디스크 또는 파티션들(7-1, 7-2, 7-3)을 자신들의 지역 파일시스템(10-11, 10-12, 10-15)에 읽기는 물론 쓰기 권한 모두를 가진 디스크들 또는 파티션으로 마운트(mount) 한다. 디스크 또는 디스크 파티션을 마운트 한다는 것은 파일시스템 구조 내의 어느 디렉토리에 디스크 또는 파티션을 논리적으로 붙이는 절차로서 마운트 절차를 거친 다음에 지역 파일시스템을 통하여 마운트 된 디스크에 파일들을 입출력할 수 있다. 마운트 명령이나 관련 소프트웨어 도구들은 모든 운영체제에 제공되어 있으며, 리눅스 운영체제의 경우를 예로 들면 컴퓨터 1(1-11)의 지역 파일시스템(10-11)의 /networkdisk1 디렉토리에 네트워크 부착 디스크를 마운트 하는 것은 다음과 같이 mount 명령을 사용하여 수행한다. 예에서는 도 5의 네트워크 부착 디스크 중 어느 하나의 이름이 /dev/sda라고 가정하여 해당 이름의 디스크를 마운트 하는 예를 보여준다. 다른 디스크들도 마찬가지 방식으로 마운트 할 수 있는 것은 물론이며 /dev/sda와 같은 네트워크 부착 디스크의 이름은 이니시에이터와 같은 네트워크 부착 디스크의 디바이스 드라이버를 통해 제공된다.Each of the computers (1-11, 1-12) and the file allocation server (1-15) in Fig. 5 are used to transfer these network attached disks or partitions (7-1, 7-2, 7-3) to their local files. Mount disks or partitions that have both read and write privileges on the system (10-11, 10-12, 10-15). Mounting a disk or disk partition is a procedure in which the disk or partition is logically attached to a directory in the file system structure. After the mounting procedure, files can be input and output to the mounted disk through the local file system. Mount commands and related software tools are provided for all operating systems. For example, for Linux operating systems, mounting a network-attached disk in the /networkdisk1 directory of the local file system (10-11) of computer 1 (1-11) is as follows: It is executed by using the mount command as shown below. In the example, assuming that any one of the network-attached disks of FIG. 5 is /dev/sda, an example of mounting a disk with a corresponding name is shown. Other disks can of course be mounted in the same way, and the name of a network-attached disk such as /dev/sda is provided through the device driver of the network-attached disk such as the initiator.

도 5에서 각 네트워크 부착 디스크 또는 디스크 파티션(7-1, 7-2, 7-3)들이 특정 컴퓨터 하나에만 쓰기 권한으로 마운트 되지 않고 여러 컴퓨터들에 중복적으로 읽기 쓰기 권한으로 마운트 되는 것을 강조하기 위해 점선(100 ~ 108)으로 나타내었다. 본 발명의 파일 공유 방법에서는 각 컴퓨터들(1-11, 1-12)이 네트워크 부착 디스크들 또는 파티션들(7-1, 7-2, 7-3)에 접근하여 파일 입출력을 시작하기 이전에 파일 배정서버(1-15)가 네트워크 부착 디스크들 또는 파티션들(7-1, 7-2, 7-3)에 미리 파일들(12-10, 12-11, 12-12, 12-13, 12-14, 12-15, 12-16, 12-17)을 생성하여 둔다. 미리 생성하는 파일들의 크기가 동일할 필요는 없다. 실제 데이터를 저장하기 이전에 원하는 크기의 파일을 생성하는 것은 리눅스 운영체제의 경우 fallocate 명령을 사용하여 수행할 수 있다. fallocate는 리눅스 운영체제에서 제공하는 시스템 콜 및 명령어로서 file allocate의 줄인 말이다. fallocate는 실제로는 데이터를 저장하지 않은 상태에서 미리 일정 크기의 디스크 공간을 차지하는 파일을 생성해준다. 아래는 fallocate 명령을 파일 이름 file1.txt의 10 MB 크기의 파일을 생성하는 예이다.In FIG. 5, it is emphasized that each network-attached disk or disk partition (7-1, 7-2, 7-3) is not mounted with write permission on only one specific computer, but is repeatedly mounted with read-write permission on multiple computers. For this, it is indicated by a dotted line (100 ~ 108). In the file sharing method of the present invention, before each computer (1-11, 1-12) accesses the network attached disks or partitions (7-1, 7-2, 7-3) and starts file input/output. The file allocation server (1-15) has files (12-10, 12-11, 12-12, 12-13, 12-10, 12-11, 12-12, 12-13) on the network attached disks or partitions (7-1, 7-2, 7-3) in advance. 12-14, 12-15, 12-16, 12-17). Files created in advance do not have to be the same size. Creating a file of the desired size before saving the actual data can be done using the fallocate command in the case of Linux operating systems. fallocate is a system call and command provided by the Linux operating system, short for file allocate. fallocate creates a file that occupies a certain amount of disk space in advance without actually saving data. The following is an example of creating a 10 MB file with the file name file1.txt using the fallocate command.

fallocate 명령은 실제 데이터를 저장하지 않은 상태에서 지정된 크기의 파일 데이터를 담을 수 있는 해당 크기의 디스크 공간을 확보한다. 이를테면 위의 fallocate 명령의 예에서는 file1.txt 파일에는 아직 실제로는 유효한 데이터가 저장되지 않은 상태에서 10 MB의 디스크 섹터들을 미리 확보하여 파일을 생성한다. 파일 배정서버(1-5)는 이와 같이 fallocate 명령을 필요한 회수만큼 반복하여 수행하면서 미리 원하는 개수의 파일들(12-10, 12-11, 12-12, 12-13, 12-14, 12-15, 12-16, 12-17)을 생성하여 네트워크 부착 디스크들(7-1, 7-2, 7-3)에 저장해 둔다. The fallocate command secures a disk space of the size that can hold file data of a specified size without actually saving the data. For example, in the example of the fallocate command above, the file1.txt file creates a file by securing 10 MB of disk sectors in advance while actually not storing valid data. The file allocation server (1-5) repeats the fallocate command as many times as necessary, and in advance, the desired number of files (12-10, 12-11, 12-12, 12-13, 12-14, 12- 15, 12-16, 12-17) are created and stored in network attachment disks (7-1, 7-2, 7-3).

이후 컴퓨터들(1-11, 1-12)이 파일 배정서버(1-15)에게 자신이 쓰기 권한을 갖는 파일을 배정해 줄 것을 요청하면 파일 배정서버(1-15)는 생성해 둔 파일들(12-10 ~ 12-17) 중에서 선정하여 요청한 컴퓨터에게 선정된 파일이 위치한 디스크 또는 디스크 파티션의 이름과 디렉토리 경로를 포함한 해당 파일의 이름을 요청한 컴퓨터에게 응답으로 전송하여 요청한 컴퓨터가 해당 파일에 데이터를 저장할 수 있도록 한다. 각 컴퓨터는 이와 같은 방법으로 배정받은 파일들을에 데이터를 저장하고 다른 컴퓨터들은 저장된 데이터를 담고 있는 해당 파일들을 읽음으로써 각 컴퓨터들이 다른 컴퓨터들의 파일시스템과 서로 협조하지 않고 자신의 지역 파일시스템을 통하여 다른 컴퓨터가 생성한 파일을 공유할 수 있다. 다른 컴퓨터와 공유할 목적으로 파일을 생성하려는 컴퓨터는 임의로 자신이 직접 파일을 생성하지 않고 반드시 본 발명의 파일 배정서버(1-15)로부터 배정받은 파일만을 배정받아 배정받은 파일에 데이터를 저장하여 다른 컴퓨터들과 해당 파일을 공유한다. After that, when the computers (1-11, 1-12) request the file assignment server (1-15) to allocate a file to which they have write permission, the file assignment server (1-15) creates files (12-10 ~ 12-17), the name of the file including the name and directory path of the disk or disk partition on which the selected file is located is sent to the requesting computer in response to the requesting computer, and the requesting computer sends the data to the file. To be able to save. Each computer stores data in the assigned files in this way, and other computers read the corresponding files containing the stored data, so that each computer does not cooperate with the file systems of other computers, but through its own local file system. You can share computer-generated files. A computer that wants to create a file for the purpose of sharing with other computers does not arbitrarily create a file by itself, but always receives only the file assigned from the file assignment server 1-15 of the present invention and stores the data in the assigned file. Share the file with computers.

파일 배정서버(1-15)로부터 파일을 배정받은 컴퓨터가 배정받은 파일에 데이터를 저장할 때는 배정받은 파일 이름을 사용하여 통상적인 방법으로 해당 파일을 오픈(open)하고 데이터 쓰기(write)를 수행하여 해당 파일에 데이터를 저장한다. 아래는 배정받은 파일 이름을 사용하여 해당 파일을 오픈하고 데이터를 저장하는 프로그램의 간단한 예이다. 아래 프로그램에서는 배열 buf에 4000 바이트의 데이터가 담겨 있으며 파일 file1.txt가 /shared 디렉토리에 저장되어 있다고 가정한다. When the computer assigned a file from the file assignment server (1-15) saves data in the assigned file, it opens the file in the usual way using the assigned file name and writes the data. Save the data to the file. Below is a simple example of a program that opens the file and saves data using the assigned file name. In the program below, it is assumed that 4000 bytes of data is contained in the array buf and the file file1.txt is stored in the /shared directory.

프로그램의 라인 (1)의 fd는 배정받은 파일 file1.txt를 사용하여 파일을 읽기 쓰기 권한으로 오픈(open)한 결과로 운영체제로부터 리턴(return) 받은 파일기술자(file descriptor)이다. 읽기 쓰기 권한 모두를 갖도록 파일을 오픈하기 위해서는 open 시스템 콜의 두 번째 인자(argument)로 O_RDWR 플래그(flag)를 설정해야 한다. 프로그램 라인 (2)와 같이 쓰기(write) 시스템 콜을 호출하여 앞서 오픈한 파일에 데이터 쓰기를 수행한다. 위 프로그램의 라인 (3)은 파일의 데이터를 동기화(synchronization)하는 fsync 시스템 콜을 호출하여 직전에 파일에 저장한 데이터가 운영체제의 디스크 버퍼에만 머물러 있지 않고 물리적으로 디스크 파티션에 파일 데이터가 저장되도록 강제하는 것인데 이와 같이 하여 다른 컴퓨터에서 해당 파일을 읽을 때 실제 데이터를 읽을 수 있도록 한다. 파일의 데이터뿐만 아니라 해당 파일과 관련한 메타데이터도 디스크 파티션에 반영되어 저장되어야 하는데 이를 위해서는 위 프로그램의 라인 (5)와 같이 해당 파일이 위치해 있는 /shared 디렉토리도 동기화한다. 운영체제는 디렉토리도 파일로 취급하므로 프로그램 라인 (4)와 같이 동기화할 디렉토리에 대해 오픈(open) 시스템 콜을 호출하여 디렉토리의 파일기술자를 운영체제로부터 리턴 받아 라인 (5)에서 fsync 시스템 콜의 인자(argument)로 사용한다. 파일과 디렉토리를 동기화한 후에는 라인 (6)과 라인 (7)과 같이 close 시스템 콜을 호출하여 해당 파일과 디렉토리를 사용하는 것을 종료한다. 이처럼 데이터 쓰기를 완료한 후에는 파일 데이터와 관련 메타데이터가 자신의 지역 파일시스템의 디스크 버퍼에만 머무르지 않고 디스크에 반영되도록 동기화하여 해당 파일을 읽기를 원하는 다른 컴퓨터들이 해당 파일에 접근하여 실제 데이터를 읽어 들일 수 있도록 해야 한다. The fd in line (1) of the program is the file descriptor returned from the operating system as a result of opening the file with read-write permission using the assigned file file1.txt. In order to open a file with both read and write privileges, the O_RDWR flag must be set as the second argument of the open system call. As in the program line (2), the write system call is called to write data to the previously opened file. Line (3) of the above program calls the fsync system call that synchronizes the data of the file, and forces the data stored in the file to be physically stored in the disk partition rather than staying only in the disk buffer of the operating system. In this way, when the file is read by another computer, the actual data can be read. Not only the data of the file, but also the metadata related to the file should be reflected in the disk partition and saved. To do this, synchronize the /shared directory where the file is located as shown in line (5) of the program above. Since the operating system also treats directories as files, an open system call is called for the directory to be synchronized as in the program line (4), and the file descriptor of the directory is returned from the operating system, and the argument of the fsync system call on line (5). ). After synchronizing the files and directories, the use of the files and directories is ended by calling the close system call as shown in lines (6) and (7). After writing data in this way, the file data and related metadata are synchronized so that they are reflected on the disk instead of staying in the disk buffer of the local file system, so that other computers that want to read the file access the file and actually read the data. It should be made readable.

이와 같이 생성한 파일들을 다른 컴퓨터들이 읽기 공유를 수행하기 위해서는 파일 읽기를 원하는 컴퓨터들에게 해당 파일의 디렉토리 경로와 이름을 전달하여 읽기를 원하는 컴퓨터에서는 위 프로그램의 예와 같이 해당 파일을 오픈(open)하여 읽으면 된다. 그러나 본 발명의 방법에서는 데이터를 파일에 저장하는 컴퓨터가 실제 데이터를 저장하기 이전에 이미 fallocate 명령을 통해 미리 파일의 크기를 지정하면서 파일을 생성하여 둔 것이므로 실제 데이터를 해당 파일에 저장할 때는 애초에 fallocate 명령을 사용하여 설정할 때의 파일의 크기와 다른 것이 보통이다. 따라서 읽기 공유를 원하는 컴퓨터들에게 공유할 파일의 이름 외에도 실제 저장된 데이터의 크기도 함께 알려주어서 읽기 공유를 하는 컴퓨터들의 프로그램이 읽기 공유하는 파일의 처음부터 저장된 실제 데이터 크기까지의 데이터만 읽고 그 뒤의 파일은 읽지 않도록 한다. 이를테면 위 프로그램 예에서는 4000 바이트의 데이터를 file1.txt 파일에 저장하였는데 file1.txt는 미리 10 MB의 디스크 공간을 확보하여 생성되었으므로 file1.txt의 크기는 10 MB이다. 그런데 file1.txt 파일의 크기는 10 MB 이지만 유효한 데이터는 file1.txt 파일의 앞에서부터 4000 바이트에 저장되어 있으므로 file1.txt의 유효 데이터를 읽기를 원하는 컴퓨터에게는 파일 이름 외에도 저장되어 있는 유효 데이터 크기인 4000을 전달하여 10 MB가 아닌 4000 바이트까지만 읽어 들일 수 있도록 해야 한다. 아래는 위 프로그램 예에서 생성된 4000 바이트의 데이터를 file1.txt 파일로부터 읽어 들이는 프로그램의 예이다. 아래 프로그램의 data_buf는 크기가 4000 바이트인 배열로서 file1.txt 파일의 데이터를 읽어 들여 데이터를 저장하기 위해 사용하는 배열의 이름이다. In order for other computers to read and share the files created in this way, pass the directory path and name of the file to computers that want to read the file, and open the file on the computer that wants to read, as in the example of the program above. And read it. However, in the method of the present invention, the computer storing the data in the file creates the file while specifying the size of the file through the fallocate command before saving the actual data. Therefore, when storing the actual data in the file, the fallocate command It is usually different from the size of the file when it is set using. Therefore, the program of the computers that read and share not only the name of the file to be shared, but also the size of the data to be shared, read and share only the data from the beginning of the file to the size of the saved data to the computers that want to read and share. Do not read the file. For example, in the above program example, 4000 bytes of data are stored in the file1.txt file, but file1.txt was created by securing 10 MB of disk space in advance, so the size of file1.txt is 10 MB. By the way, the size of the file1.txt file is 10 MB, but the valid data is stored in 4000 bytes from the front of the file1.txt file. Therefore, for the computer that wants to read the valid data of file1.txt, the effective data size stored in addition to the file name is 4000. Should be passed so that only 4000 bytes can be read, not 10 MB. The following is an example of a program that reads 4000 bytes of data created in the above program example from the file1.txt file. The data_buf in the program below is an array whose size is 4000 bytes and is the name of the array used to read data from the file1.txt file and store the data.

만일 배정받은 파일 하나의 애초의 크기보다 더 많은 양의 데이터를 저장해야 하는 경우가 발생하면 해당 컴퓨터는 배정 받은 파일 두 개 이상을 사용하여 데이터를 모두 저장한 다음에 데이터를 저장하고 있는 두 개 이상의 파일을 하나의 파일로 통합하면 된다. 예를 들어 각각 1 MB 크기의 파일 두 개를 배정 받았는데 1.5 MB의 데이터를 배정받은 파일에 저장하기 위해서는 1 MB의 데이터는 배정받은 파일 중 하나에 저장하고 나머지 0.5 MB는 다른 파일에 저장한 후 두 파일을 통합한다. 파일을 통합하는 방법은 리눅스의 경우 cat 명령을 사용하여 수행할 수 있다. 다음은 리눅스 운영체제의 cat 명령을 사용하여 파일의 뒤에 다른 파일을 붙여서(append)하여 단일 파일로 만드는 예로서 file1의 뒤에 file2를 붙인 파일을 통합하여 합쳐진 파일을 file1로 이름붙이는 예이다. 즉 큰 용량의 데이터를 우선 file1에 채우고 나서 나머지를 file2에 저장하였다면 이러한 파일 통합 방법으로 미리 배정받은 파일들을 통합하여 하나의 파일에 데이터를 저장할 수 있다.If there is a case where a larger amount of data needs to be stored than the original size of a single assigned file, the computer uses two or more assigned files to store all the data, and then the two or more files storing the data are stored. You just need to combine the files into one file. For example, two files with a size of 1 MB each were assigned, but in order to save 1.5 MB of data to the assigned file, save 1 MB of data to one of the assigned files and save the remaining 0.5 MB to another file. Consolidate files. Consolidating files can be done using the cat command on Linux. The following is an example of creating a single file by appending another file to the end of a file using the cat command of the Linux operating system. The example of merging the files with file2 after file1 and naming the combined file as file1. That is, if a large amount of data is first filled in file1 and then the rest is stored in file2, the previously allocated files can be combined and stored in one file using this file integration method.

본 발명의 실시 예 1의 방법에서 파일을 배정받을 때는 파일이 필요할 때마다 한 번에 하나씩 배정받을 수도 있지만 한 번에 여러 개의 파일을 배정서버(1-15)로부터 배정받도록 할 수 있다. 또한 파일들을 미리 생성하여 준비하였다가 배정하는 대신에 파일 배정서버(1-15)가 컴퓨터들(1-11, 1-12)로부터 파일 배정 요청을 받으면 그때마다 동적으로 fallocate 명령을 통하여 파일을 생성한 후 배정할 수도 있다. 동적으로 파일을 생성하여 배정하든지 또는 미리 생성해 두든지 간에 fallocate 명령을 사용하여 파일을 생성한 후 컴퓨터에게 파일을 배정하기 이전에 파일 배정컴퓨터(1-15)는 생성한 파일들에 대해 상기 fsync 시스템 콜을 호출하여 자신이 생성한 파일과 메타데이터가 네트워크 부착 디스크에 동기화되어 저장되도록 한다. In the method of the first embodiment of the present invention, when files are allocated, files may be allocated one at a time whenever necessary, but several files may be allocated from the allocation server 1-15 at a time. In addition, instead of creating and preparing files in advance and assigning them, when the file assignment server (1-15) receives a file assignment request from computers (1-11, 1-12), it dynamically creates a file through the fallocate command. You can also assign it after you have done it. Whether a file is dynamically created and assigned or created in advance, the file assignment computer (1-15) generates the file using the fallocate command and before assigning the file to the computer. By calling a system call, the files and metadata created by the user are synchronized and stored on the network attached disk.

전술한 것처럼 리눅스 운영체제의 경우에는 fsync 시스템 콜을 사용하여 동기화를 할 수 있는 것은 물론 sync 명령을 사용하여 생성된 파일의 데이터 및 메타데이터까지 디스크에 반영되도록 할 수 있다. 다음은 리눅스에서 sync 명령을 사용하여 데이터 및 메타데이터를 디스크에 반영하는 예이다. 다음 예에서 file1.txt는 배정서버(1-15)가 생성한 파일의 이름이다.As described above, in the case of the Linux operating system, the fsync system call can be used for synchronization, as well as the data and metadata of files created using the sync command can be reflected on the disk. The following is an example of reflecting data and metadata to disk using the sync command in Linux. In the following example, file1.txt is the name of the file created by the assignment server (1-15).

도 5의 본 발명의 파일 공유 방법을 요약하면 아직 실제 데이터를 저장하지 않고 디스크 섹터만 차지하도록 생성한 파일들 중에서 배정서버(1-15)가 각 컴퓨터마다 서로 다른 파일들을 배정하고 각 컴퓨터는 자신이 배정받은 파일들에만 데이터 쓰기를 수행하도록 한정하고 다른 컴퓨터들은 해당 파일에 쓰기를 수행할 수 없도록 함으로써 각 컴퓨터가 자신의 지역 파일시스템만을 통하여 파일들을 쓰고 읽으면서 공유하여도 파일들의 데이터 무결성(integrity)을 유지하는 것이다.To summarize the file sharing method of the present invention of FIG. 5, among the files created to occupy only the disk sector without storing actual data yet, the assignment server 1-15 allocates different files to each computer, and each computer The data integrity of files is limited even if each computer writes, reads, and shares files only through its own local file system by limiting data write to the assigned files and preventing other computers from writing to the file. ) To maintain.

본 발명의 실시 예 1은 각 컴퓨터의 지역 파일시스템을 통하여 네트워크 부착 디스크에 파일들 저장하고 다른 컴퓨터들과 파일을 공유하는 방법이다. 그런데 지역 파일시스템을 사용하지 않고 디스크 또는 디스크 파티션을 단순히 블록 저장장치로만 사용하여 실시 예 1과 유사한 방법으로 여러 컴퓨터들 사이에서 파일 데이터의 무결성을 유지하며 공유할 수 있다. The first embodiment of the present invention is a method of storing files on a network attached disk and sharing files with other computers through a local file system of each computer. However, the integrity of file data can be maintained and shared among multiple computers in a similar manner to the first embodiment by simply using a disk or disk partition as a block storage device without using a local file system.

도 6은 디스크 파티션을 구성하는 디스크 섹터들을 LBA(Logical Block Addressing)방식의 선형적 표시법으로 나타낸 것이다. 하드디스크의 물리적인 구조는 실린더, 헤드, 섹터로 되어 있으나 통상 0번 섹터부터 시작하여 연속된 섹터로 지칭하는 표기 방법인 LBA 방식으로 디스크를 나타낸다. 즉 디스크 또는 디스크의 파티션은 통상 512 바이트의 일정한 크기의 디스크 섹터들이 연속하여 있는 것으로 나타내며, 모든 디스크 장치에는 LBA 방식으로 디스크 섹터를 가리키면 실제 하드디스크의 해당 실린더 번호, 헤드 번호, 섹터 번호로 전환하는 로직이 장착되어 있어서 디스크의 섹터를 LBA 방식으로 지칭하여 원하는 섹터에 접근한다. 6 shows the disk sectors constituting the disk partition in a linear notation method of the LBA (Logical Block Addressing) method. The physical structure of the hard disk is composed of a cylinder, a head, and a sector, but the disk is indicated in the LBA method, which is a notation method that generally starts from sector 0 and refers to a continuous sector. In other words, a disk or disk partition indicates that disk sectors of a regular size of 512 bytes are contiguous, and if the disk sector is pointed to the LBA method in all disk devices, the corresponding cylinder number, head number, and sector number of the actual hard disk are converted. Since logic is installed, the sector of the disk is referred to as the LBA method to access the desired sector.

도 6에 보인 것처럼 디스크 파티션의 맨 앞부터 시작하여 바이트 주소로 디스크 위치를 가리킬 수 있으며 이렇게 디스크 위치를 바이트 주소로 나타낸 것을 오프세트(offset)이라고 부른다. 예를 들면 오프세트의 값이 2,000이라면 디스크 파티션의 맨 앞에서 2001 번째에 위치한 바이트를 가리킨다. 바이트 주소는 1이 아닌 0부터 시작하므로 오프세트 2000은 2001 번째 바이트인 것이다. 하나의 디스크 섹터가 통상 512 바이트이므로 오프세트의 값이 2000이라면 파티션의 앞에서부터 4 번째 섹터의 465 번째 바이트를 가리킨다. 즉 오프세트 2000 = 2001 번째 바이트이므로 512 바이트짜리 섹터 3 개 다음의 4 번째 섹터 중 465 바이트를 가리킨다. 2001 = 512 * 3 + 465 라는 것에 유의하라.As shown in FIG. 6, a disk location may be indicated by a byte address starting from the beginning of the disk partition, and thus the disk location expressed by a byte address is called an offset. For example, if the offset value is 2,000, it points to the 2001-th byte from the front of the disk partition. Since the byte address starts from 0, not 1, the offset 2000 is the 2001 th byte. Since one disk sector is usually 512 bytes, if the offset value is 2000, it indicates the 465th byte of the fourth sector from the front of the partition. That is, since the offset 2000 = 2001th byte, it indicates 465 bytes out of the 4th sector after 3 512-byte sectors. Note that 2001 = 512 * 3 + 465.

도 7은 네트워크 부착 디스크에 파일시스템을 탑재하지 않고 단순히 블록 저장장치로 사용하여 여러 컴퓨터들 사이에서 지역 파일시스템을 통하지 않고 데이터를 공유하는 방법을 도식화 한 것으로서, 실시 예 1의 도 5의 설명에서와 같이 지역 파일시스템에서 생성한 파일을 사용하여 데이터를 저장하고 공유하는 대신에 네트워크 부착 디스크 또는 파티션 전체(7-1)를 하나의 장치파일로 사용한다는 점이 다를 뿐 장치파일을 공유하는 개념은 유사하다.FIG. 7 is a schematic diagram of a method of sharing data between multiple computers without mounting a file system on a network-attached disk, but simply using it as a block storage device, and in the description of FIG. 5 of the first embodiment. The concept of sharing device files is similar, except that the entire network-attached disk or partition (7-1) is used as one device file instead of storing and sharing data using files created in the local file system. Do.

도 7의 각 번들(bundle)들(30-1, 30-2, 30-3, 30-4)은 각각 연속하는 디스크 섹터들로 이루어진다. 여기에서 번들은 일반적으로 통용되는 용어가 아니라 본 발명의 실시 예 2를 설명하기 위하여 일련의 디스크 섹터들을 지칭하는 본 명세서에서 정의한 용어이다. 디스크 파티션 전체는 번들들의 전체 집합으로 볼 수 있다. 이를테면 도 7의 디스크 파티션(7-1)은 첫 번째 번들 0(30-1)부터 시작하여 마지막 번들 n(30-4)까지 연속하는 번들들로 이루어져 있다.Each of the bundles 30-1, 30-2, 30-3, and 30-4 of FIG. 7 is composed of consecutive disk sectors. Herein, the bundle is not a commonly used term, but a term defined herein that refers to a series of disk sectors to describe Embodiment 2 of the present invention. The entire disk partition can be viewed as a whole set of bundles. For example, the disk partition 7-1 of FIG. 7 includes bundles that start from the first bundle 0 (30-1) and continue to the last bundle n (30-4).

이렇게 정의된 각각의 개별적인 번들을 특정 컴퓨터에게만 쓰기 권한을 갖도록 배정하고 나머지 컴퓨터들에게는 읽기만을 할 수 있도록 제한하여 쓰기 권한을 가진 컴퓨터만 해당 번들의 디스크 섹터들에 자신이 생성한 데이터 파일을 저장할 수 있게 한다. 이와 같은 방식으로 서로 다른 컴퓨터에게 각 개별적인 번들에 대해 독점적인 쓰기 권한을 상호 배타적으로 부여하면, 각 컴퓨터들은 자신만이 독점적 쓰기 권한을 갖는 번들에 데이터 파일을 저장하되 다른 컴퓨터들은 해당 번들을 읽을 수는 있으므로 디스크 파티션(7-1)의 어떠한 섹터에도 서로 다른 두 개 이상의 컴퓨터들이 파일을 저장하여 덧쓰기를 하는 일이 없이 파일 데이터를 공유한다. Each individual bundle defined in this way is assigned to a specific computer to have write access, and the rest of the computers are limited to read only, so that only computers with write access can store the data files they create in the disk sectors of the bundle. To be. In this way, if different computers are mutually exclusively given exclusive write access to each individual bundle, each computer will store the data file in the bundle to which only it has exclusive write access, but other computers can read the bundle. Therefore, two or more different computers store files in any sector of the disk partition 7-1 and share the file data without overwriting.

이를테면 도 7의 번들 배정서버(1-25)에 번들을 배정하는 소프트웨어 모듈(23)을 장착하여, 각 컴퓨터(1-21, 1-22)가 독점적으로 데이터를 저장할 수 있는 번들을 요구(201, 202)하면 번들 배정서버(1-25)는 요청한 컴퓨터들(1-21, 1-22)에게 번들이 위치한 저장장치 이름 즉 디스크 이름 또는 디스크 파티션 이름과 시작 바이트 오프세트 및 번들에 포함된 디스크 섹터 개수를 전송(203, 204)하여 번들을 배정한다. 예를 들어 도 7의 디스크 또는 디스크 파티션(7-1)의 이름이 /dev/sda1이고 번들 0(30-1)이 파티션(7-1)의 맨 앞 바이트 주소부터 시작하는 50 개의 섹터로 구성되어있다고 가정하고 번들 배정서버(1-25)가 번들 0(30-1)을 컴퓨터 1(1-21)에게만 독점적 쓰기 권한을 갖도록 배정한다면, 도 7의 컴퓨터 1(1-21)의 배정받은 번들정보(24-1)에는 (/dev/sda1, 0, 50)이 포함된다. 이렇게 독점적으로 배정받은 번들 0(30-1)에는 컴퓨터 1(1-21)만이 파일 데이터를 저장할 수 있고 다른 컴퓨터 k(1-22)는 저장된 파일 데이터를 읽을 수 있으므로 컴퓨터 1(1-21)이 저장한 파일 데이터를 다른 컴퓨터 k(1-22)가 읽기 공유를 하는 것이다. For example, by mounting the software module 23 for allocating bundles to the bundle allocation server 1-25 of FIG. 7, each computer (1-21, 1-22) requests a bundle capable of exclusively storing data (201 , 202), the bundle allocation server (1-25) sends the requested computers (1-21, 1-22) to the storage device name where the bundle is located, that is, the disk name or disk partition name, the offset of the start byte, and the disk included in the bundle. The number of sectors is transmitted (203, 204) to allocate a bundle. For example, the name of the disk or disk partition 7-1 in FIG. 7 is /dev/sda1, and the bundle 0 (30-1) consists of 50 sectors starting from the first byte address of the partition 7-1. If the bundle allocation server 1-25 assigns the bundle 0 (30-1) to only computer 1 (1-21) to have exclusive write access, assuming that it is configured, the computer 1 (1-21) of FIG. 7 Bundle information 24-1 includes (/dev/sda1, 0, 50). In this exclusively assigned bundle 0 (30-1), only computer 1 (1-21) can store file data, while other computers k (1-22) can read the stored file data, so computer 1 (1-21) This saved file data is read and shared by another computer k (1-22).

이때 번들을 배정받은 컴퓨터가 해당 번들에 데이터를 저장할 때 지역 파일시스템을 통하지 않고 해당 번들 즉 디스크 장치의 연속된 디스크 섹터에 데이터를 저장하는 것은 실시 예 1에서 예시한 프로그램에서 보인 것과 같이 통상적인 오픈(open) 시스템 콜 및 쓰기(write), 읽기(read) 시스템 콜을 사용하여 구현할 수 있다. 다음은 배정받은 번들에 데이터를 저장하는 예시 프로그램이다. 다음 예시 프로그램에서는 파티션(7-1)의 이름이 /dev/sda1 이라고 가정한다. 또한 예시 프로그램에서는 번들 0(30-1)은 디스크 파티션(7-1)의 0번 바이트부터 시작하는 50 개의 연속된 섹터로 구성되어 있다고 가정한다. At this time, when the computer to which the bundle has been assigned saves data in the bundle, storing data in the bundle, that is, a contiguous disk sector of the disk device, without going through the local file system, is normally open as shown in the program exemplified in Example 1. It can be implemented using the (open) system call and the write and read system calls. The following is an example program that saves data in an assigned bundle. In the following example program, it is assumed that the name of the partition (7-1) is /dev/sda1. In addition, in the example program, it is assumed that the bundle 0 (30-1) is composed of 50 consecutive sectors starting from byte 0 of the disk partition 7-1.

프로그램 라인 (1)은 파티션 이름과 읽기 쓰기 권한을 설정하는 O_RDWR 플래그를 인자로 전달하며 오픈(open) 시스템 콜을 호출한다. fd는 파일기술자로서 실시 예 1에서 예시한 프로그램의 경우와 유사하지만 실시 예 1의 예시 프로그램에서는 파일시스템을 통해 생성한 파일의 이름을 인자로 전달한 것에 반해 실시 예 2의 예시 프로그램에서는 파일 이름이 아니라 파티션 이름 즉 저장장치의 이름을 인자로 전달하는 차이가 있다. 운영체제는 파일시스템을 통하여 생성된 파일 뿐만 아니라 데이터를 입력하고 출력하는 저장장치 자체도 파일의 한 가지 즉 장치파일로 취급하며, 파일시스템을 통해 생성된 파일과 마찬가지로 취급하여 장치파일에 데이터를 입력하고 출력하는 쓰기(write), 읽기(read) 등의 시스템 콜을 사용하도록 허용한다. 즉 실시 예 2는 파티션(7-1)에 지역 파일시스템을 탑재하지 않고 단순한 저장장치로 사용하여 데이터를 저장하는 방법으로 컴퓨터들(1-21, 1-22) 사이에서 파일 데이터를 공유하는 것이다. Program line (1) calls the open system call, passing the O_RDWR flag, which sets the partition name and read/write permission, as arguments. fd is a file descriptor, similar to the case of the program exemplified in Example 1, but in the example program of Example 1, the name of the file created through the file system is passed as an argument, whereas the example program of Example 2 is not a file name. There is a difference in passing the partition name, that is, the name of the storage device as an argument. The operating system treats not only the files created through the file system, but also the storage device itself that inputs and outputs data as one of the files, that is, the device file, and treats it as a file created through the file system, and inputs data into the device file. Allows to use system calls such as write and read that are output. That is, the second embodiment is a method of storing data using a simple storage device without mounting a local file system on the partition 7-1, and sharing file data between the computers 1-21 and 1-22. .

프로그램 라인 (2)의 lseek() 시스템 콜은 오픈한 파일의 오프세트를 지정하는 시스템 콜로서 lseek()의 두 번째 인자로 전달받은 바이트 주소가 파일 입출력을 시작하는 지점으로 되도록 지정한다. 예시 프로그램에서는 오프세트의 값은 512 * 10 즉 5120이다. SEEK_SET은 오프세트가 상대적인 위치가 아니라 디스크 파티션(7-1)의 맨 앞에서부터 시작하는 절대적 바이트 주소 위치를 가리킨다는 것을 명시하기 위한 whence 값이다. 첫 번째 섹터의 첫 번째 바이트 주소는 1이 아니라 0이므로 위 예시 프로그램에서는 바이트 주소 5120을 가리키므로 디스크 파티션의 11 번째 섹터의 맨 앞 바이트 주소를 가리킨다. 따라서 라인 (3)은 배열 buf에 있는 4000 바이트의 데이터를 디스크 파티션의 11 번째 섹터에서부터 시작하여 4,000 바이트를 저장하도록 명령하는 것이다. 이때 4000 = 512 * 7 + 416 이므로, 4,000 바이트의 데이터를 512 바이트 크기의 섹터에 저장하기 위해서는 디스크 섹터 8개가 필요하므로 예시 프로그램에서는 11 번째 섹터에서 시작하여 18 번째 섹터까지 8개의 섹터에 걸쳐 4000 바이트의 데이터가 저장된다. 마지막 18 번째 섹터에는 416 바이트의 데이터가 저장된다.The lseek() system call in the program line (2) is a system call that designates the offset of the opened file. The byte address received as the second argument of lseek() is the point where file input/output starts. In the example program, the value of the offset is 512 * 10, or 5120. SEEK_SET is a whence value to specify that the offset refers to an absolute byte address position starting from the beginning of the disk partition (7-1), not a relative position. Since the address of the first byte of the first sector is 0, not 1, in the example program above, it points to the byte address 5120, so it points to the first byte address of the 11th sector of the disk partition. Thus, line (3) commands 4000 bytes of data in the array buf to be stored starting from the 11th sector of the disk partition. At this time, 4000 = 512 * 7 + 416, so 8 disk sectors are required to store 4,000 bytes of data in a 512 byte sector. In the example program, 4000 bytes spanning 8 sectors starting from the 11th sector and up to the 18th sector Data is saved. 416 bytes of data are stored in the last 18th sector.

프로그램 라인 (5)는 라인 (3)에서 저장한 4000 바이트의 파일 데이터를 읽어 들이는 프로그램인데 라인 (3)에서 쓰기(write)를 시작한 파일의 오프세트인 바이트 주소 5210부터 시작하여 저장된 파일의 데이터를 읽기 위하여 프로그램 라인 (4)에서와 같이 먼저 lseek 시스템 콜을 다시 호출하여 오프세트를 프로그램 라인 (3)의 4000 바이트의 데이터를 저장하기 시작한 곳으로 돌려놓은 후에, 프로그램 라인 (5)의 읽기(read) 시스템 콜을 호출하여 라인 (3)에서 디스크 파티션의 5210 바이트부터 시작하여 저장한 4000 바이트의 데이터를 읽어 들이는 예이다. Program line (5) is a program that reads 4000 bytes of file data stored in line (3), and the data of the saved file starting from byte address 5210, which is the offset of the file that started writing on line (3). As in program line (4), first call the lseek system call again to return the offset to where it started storing 4000 bytes of data in program line (3), and then read ( Read) This is an example of reading the stored 4000 bytes of data starting from 5210 bytes of the disk partition in line (3) by calling the system call.

위 예시 프로그램의 라인 (4)와 (5)는 컴퓨터 1(1-21)이 저장한 파일을 다른 컴퓨터 k(1-22)가 읽기를 원할 때 수행하는 프로그램이 될 수 있다. 위 프로그램의 예를 다시 사용하면 컴퓨터 1(1-21)이 디스크 파티션 5210바이트부터 4000 바이트의 데이터를 저장한 후에 파티션 이름, 시작 오프세트, 파일의 데이터 바이트 크기, 즉 위의 예에서는 (/dev/sda1, 5210, 4000)을 컴퓨터 k(1-22)에게 전송하면 컴퓨터 k(1-22)에서 다음과 같은 예시 프로그램을 수행하여 컴퓨터 1(1-21)이 저장한 파일의 데이터를 읽을 수 있다. Lines (4) and (5) of the example program above may be programs that are executed when the computer 1 (1-21) wants to read the file stored by the other computer k (1-22). Using the example of the above program again, after Computer 1 (1-21) stores the disk partition 5210 bytes to 4000 bytes of data, the partition name, start offset, and data bytes size of the file, i.e. in the example above (/dev /sda1, 5210, 4000) to computer k (1-22), computer k (1-22) can execute the following example program to read the data of the file saved by computer 1 (1-21). have.

번들0(30-1)에 대해서는 읽기 권한만을 갖고 있는 컴퓨터 k(1-22)는 위 예시 프로그램의 라인 (1)과 같이 오픈 시스템 콜의 두 번째 인자로 O_RDONLY를 전달하여야 하는데 이는 파일을 읽기 권한만을 가지고 접근하도록 지정하는 것이다.For bundle 0 (30-1), computer k (1-22), which has only read permission, must pass O_RDONLY as the second argument of the open system call as in line (1) of the example program above, which is the permission to read the file. It is to designate to be accessed with only.

위 프로그램의 예에서 디스크 파티션 /dev/sda1의 바이트 주소 5120부터 4000 바이트를 사용하여 파일을 저장하는 것을 보여주었는데 파티션 /dev/sda1의 바이트 주소 5210부터 4000 바이트는 당연히 컴퓨터 1(1-21)이 독점적 쓰기 권한을 갖도록 배정받은 번들 0(30-1) 안에 위치해야만 한다. 예시 프로그램에서는 번들 0(30-1)이 디스크 파티션 /dev/sda1의 처음부터 50 개 섹터 즉 0번 바이트부터 (512 * 50 - 1) = 25,599번 바이트에 걸친 연속된 섹터들로 구성되어 있으므로 위 예시 프로그램의 파일은 번들 0(30-1)의 5210 바이트 주소부터 9209 바이트 주소까지에 걸친 4000 바이트에 저장된다. 만일 저장할 파일의 크기가 쓰기 권한으로 부여받은 번들 하나에 저장하지 못할 정도로 크다면 두 개 이상의 번들을 사용하여 위 프로그램과 유사한 방법으로 저장하는데 단지 사용되는 번들이 파티션 내에 연속해서 위치하지 않은 경우라면 앞의 번들에 저장하고, 나머지 데이터는 그 다음 번들의 시작 바이트 위치로 오프세트를 설정한 후에 lseek 시스템 콜을 호출하여 파일 입출력을 위한 오프세트를 조정한 후에 파일의 나머지 데이터 쓰기를 계속하는 점에 유의하여 위 예시 프로그램과 동일한 방법으로 파일을 저장한다. In the example of the program above, it is shown that files are stored using the byte address 5120 to 4000 bytes of the disk partition /dev/sda1. Of course, computer 1 (1-21) has the byte address 5210 to 4000 bytes of the partition /dev/sda1. It must be located in bundle 0(30-1), which is assigned to have exclusive write access. In the example program, bundle 0(30-1) consists of consecutive sectors spanning 50 sectors from the beginning of disk partition /dev/sda1, that is, byte 0 (512 * 50-1) = 25,599 bytes. The file of the example program is stored in 4000 bytes ranging from address 5210 bytes to address 9209 bytes in bundle 0 (30-1). If the size of the file to be saved is so large that it cannot be saved in one bundle given with write permission, two or more bundles are used to save the file in a similar way to the above program. If the bundles used are not consecutively located in the partition, Note that after setting the offset to the next bundle's starting byte position, the rest of the data is set to the next bundle's starting byte position, and then the lseek system call is called to adjust the offset for file I/O and then continue writing the remaining data of the file. Then, save the file in the same way as the example program above.

도 7의 파일 공유 방법은 임의의 컴퓨터가 임의의 섹터에 쓰기를 수행하여 파일을 훼손하는 것을 방지하기 위해 파일 쓰기를 요청한 컴퓨터에게 본 발명의 배정서버(1-25)가 다른 컴퓨터에게는 쓰기 권한을 부여하지 않은 디스크 섹터들을 독점적으로 제공함으로써 컴퓨터 간 상호 배타적으로 디스크 섹터들을 배분하고 각 컴퓨터는 자신이 독점적 쓰기 권한을 가진 디스크 섹터에 파일을 저장하고 다른 컴퓨터들은 읽기 공유를 하도록 하는 것이다. 도 7의 방법에서 각 번들은 모두 일정한 크기일 필요가 없고 임의의 적절한 크기로 구성될 수 있는 것은 물론이거니와 번들들을 배정할 때 서로 연속하지 않는 번들들을 배정해도 된다. In the file sharing method of FIG. 7, the assignment server 1-25 of the present invention grants write permission to another computer to a computer requesting a file write to prevent any computer from damaging the file by writing to a certain sector. By exclusively providing disk sectors that are not assigned, the disk sectors are mutually exclusively distributed between computers, and each computer stores files in disk sectors that it has exclusive write access to, and other computers read and share them. In the method of FIG. 7, all of the bundles do not need to be of a constant size and may be of any suitable size, and when the bundles are allocated, bundles that are not consecutive to each other may be allocated.

실시 예 1과 실시 예 2는 하둡과 같은 분산 시스템에서 파일 데이터를 공유하는 방법을 보여주었는데 이러한 방법은 하둡의 시스템에서 규정한 분산파일시스템인 HDFS (Hadoop Distributed File System)에도 유사하게 적용될 수 있다. 도 8은 복제 인수(replication factor)가 3인 통상적인 경우에 HDFS의 데이터 블록에 데이터를 저장하는 방법을 도식화한 것이다. HDFS에서는 네임노드(8)가 여러 데이터노드들(1-1, 1-2, 1-3)에 분산되어 있는 HDFS 블록(25-1. 25-2, 25-3)들을 관리하여 데이터노드의 어느 블록에 어떤 데이터를 저장할지를 지정한다. HDFS 블록은 통상 128 MB의 단위로 이루어져 데이터를 저장하는데 복제 인수를 3으로 하는 것이 보통이다. 즉 하나의 블록을 데이터노드에 주(primary) 블록(25-1)을 저장하면 추가로 다른 두 개의 데이터노드들에 동일 내용의 복제 블록(25-2, 25-3) 2 개를 각각 복제해 두어 주 블록(25-1)에 장애가 발생하면 복제 블록(25-2, 25-3)에 접근하여 데이터를 처리할 수 있도록 한다. Embodiments 1 and 2 show a method of sharing file data in a distributed system such as Hadoop, and this method can be similarly applied to HDFS (Hadoop Distributed File System), a distributed file system defined by Hadoop's system. 8 is a schematic diagram of a method of storing data in a data block of HDFS in a typical case where the replication factor is 3. In HDFS, the name node (8) manages the HDFS blocks (25-1, 25-2, 25-3), which are distributed among several data nodes (1-1, 1-2, 1-3). Specifies which data to store in which block. HDFS blocks are usually composed of 128 MB units to store data, and the copy factor is usually set to 3. That is, if one block is stored in the data node, the primary block (25-1) is additionally copied to two other data nodes with the same content of two duplicate blocks (25-2, 25-3). In a couple of cases, when a failure occurs in the main block 25-1, the duplicate blocks 25-2 and 25-3 are accessed so that data can be processed.

도 8의 클라이언트(9)가 데이터(26)를 HDFS 파일로 저장하기 위해서 클라이언트(9)가 데이터를 저장할 파일을 생성해 줄 것을 네임노드(8)에게 요청하면 네임노드(18)는 클라이언트에게 어느 데이터노드들(1-1, 1-2, 1-3)의 어느 블록들에 데이터를 나누어 저장할지를 지정한다. When the client 9 of FIG. 8 requests the name node 8 to create a file for the client 9 to store the data in order to save the data 26 as an HDFS file, the name node 18 Specifies which blocks of the data nodes (1-1, 1-2, 1-3) to divide and store data.

도 8에서는 클라이언트가 데이터노드들에 저장할 주 블록들 중 어느 하나의 주 블록을 데이터노드의 블록에 저장하는 예를 보여준다. 도 8의 주 블록(25-1)이 저장되는 데이터노드 1(1-1)은 네임노드(8)에 의해 지정된 데이터노드 2(1-2)에게 주 블록(25-1)의 데이터를 전송(211)하여 데이터노드2(1-2)가 복제 블록 1(25-2)을 저장하도록 한다. 이때 데이터노드 1(1-1)은 주 블록(25-1)의 데이터를 자신의 메인메모리에 있는 네트워크 스택에 탑재한 후에 네트워크 인터페이스를 통하여 전송하므로 블록 데이터를 전송하는 과정에서 메인메모리를 점유하고 자신의 CPU 시간을 소모하게 된다. 첫 번째 복제 블록 1(25-2)을 저장하는 데이터노드 2(1-2)도 마찬가지로 두 번째 복제 블록 2(25-3)을 저장할 데이터노드 3(1-3)에게 복제 블록 1(25-2)의 데이터를 전송(212)하며 이때 블록 데이터를 전송하는 과정에서 자신의 메인메모리를 점유하고 CPU 시간을 소모하는 것은 마찬가지이다. FIG. 8 shows an example in which a client stores one of the main blocks to be stored in the data nodes in a block of the data node. Data node 1 (1-1) in which the main block 25-1 of Fig. 8 is stored transmits the data of the main block 25-1 to data node 2 (1-2) designated by the name node 8. (211) so that data node 2 (1-2) stores the duplicate block 1 (25-2). At this time, data node 1 (1-1) loads the data of the main block (25-1) on the network stack in its main memory and then transmits it through the network interface, thus occupying the main memory in the process of transmitting block data. It consumes its own CPU time. Data node 2 (1-2), which stores the first replica block 1 (25-2), is also sent to data node 3 (1-3) that stores the second replica block 2 (25-3). 2) data is transmitted 212, and at this time, it occupies its own main memory and consumes CPU time in the process of transmitting block data.

HDFS 블록은 각 데이터노드의 지역 파일시스템의 파일로서 존재한다. 클라이언트(9)로부터 HDFS 파일 생성을 요청받으면 네임노드(8)는 데이터를 저장할 데이터노드의 고유 식별자 이름과 함께 블록들을 저장할 각 데이터노드의 파일 이름들을 클라이언트에게 알려주며, 네임노드(8)에 의해 이와 같이 주어진 파일 이름은 해당 데이터노드(1-1, 1-2, 1-3)의 HDFS 블록(25-1, 25-2, 25-3)을 가리킨다. HDFS blocks exist as files of each data node's local file system. When a request to create an HDFS file is received from the client 9, the name node 8 informs the client of the file name of each data node to store blocks along with the unique identifier name of the data node to store the data. The given file name refers to the HDFS block (25-1, 25-2, 25-3) of the data node (1-1, 1-2, 1-3).

도 9는 본 발명의 저장장치들 ND1(40-1), ND2(40-2), ND3(40-3)을 사용하여 데이터 블록을 복제할 때 데이터노드의 메인메모리를 점유하지 않고 본 발명의 저장장치들 사이에서 복제가 이루어지는 것을 도식화한 것이다. HDFS 네임노드(8)와 데이터노드들(1-1, 1-2, 1-3) 모두는 본 발명의 저장장치들 ND1(40-1), ND2(40-2), ND3(40-3) 각각을 자신들의 지역 파일시스템(10-5, 10-1, 10-2, 10-3)에 읽기 및 쓰기 권한으로 마운트 하여 본 발명의 저장장치들 ND1(40-1), ND2(40-2), ND3(40-3)을 자신의 지역 디스크로서 접근한다. 도 9에서는 네임노드(8)와 데이터노드들(10-1, 10-2, 10-3)이 각각 본 발명의 저장장치들(40-1, 40-2, 40-3)을 읽기 및 쓰기 권한으로 마운트 한 것을 나타내기 위해 점선으로 표시하였다.9 is a diagram of the present invention without occupying the main memory of a data node when duplicating a data block using the storage devices ND1 (40-1), ND2 (40-2), and ND3 (40-3) of the present invention. It is a schematic diagram of the replication between storage devices. All of the HDFS name node 8 and data nodes 1-1, 1-2, and 1-3 are the storage devices ND1 (40-1), ND2 (40-2), and ND3 (40-3) of the present invention. ) Each of the storage devices ND1 (40-1) and ND2 (40-) of the present invention are mounted with read and write privileges on their local file systems (10-5, 10-1, 10-2, 10-3). 2) Access ND3(40-3) as its own local disk. In FIG. 9, the name node 8 and the data nodes 10-1, 10-2, and 10-3 read and write the storage devices 40-1, 40-2, and 40-3 of the present invention, respectively. It is marked with a dotted line to indicate that it was mounted with authority.

이때 데이터노드들이 다른 데이터노드들과의 상호 협조 없이 독자적으로 네트워크 부착 디스크들에 HDFS 블록을 쓰기를 하더라도 HDFS 블록의 무결성을 유지하는 방법은 도 5에서 설명한 실시 예 1의 파일 공유 방법과 유사하다. 도 5의 본 발명의 파일 공유 방법은 다수의 컴퓨터들이 자신들의 지역 파일시스템을 통하여 네트워크 부착 디스크를 접근하되 파일 배정서버에 의해 배정받은 파일에 대해서만 데이터 쓰기를 하도록 제한하여 네트워크 부착 디스크에 저장된 파일들의 무결성을 유지하는 방법으로서, HDFS 블록 저장에서는 도 5의 배정서버 대신에 도 9 HDFS의 네임노드(8)가 동일한 방식으로 블록을 저장할 데이터노드를 지정하여 본 발명의 저장장치들에 블록을 저장한다. In this case, even if data nodes independently write HDFS blocks to network-attached disks without mutual cooperation with other data nodes, the method of maintaining the integrity of the HDFS block is similar to the file sharing method of the first embodiment described in FIG. 5. In the file sharing method of the present invention of FIG. 5, a plurality of computers access a network-attached disk through their local file system, but restricts writing data to only files allocated by the file assignment server. As a method of maintaining integrity, in HDFS block storage, the name node 8 of FIG. 9 HDFS designates a data node to store the block in the same manner instead of the allocation server of FIG. 5, and stores the block in the storage devices of the present invention. .

구체적으로는, HDFS 네임노드(8)는 본 발명의 저장장치들 ND1(40-1), ND2(40-2), ND3(40-3)에 미리 HDFS 블록들(25-10, 25-11, 25-12, 25-13, 25-14, 25-15, 25-16, 25-17)을 각각의 파일들로 생성해 둔다. 블록을 생성하는 한 가지 방법은 도 5에서 설명한 것과 같이 리눅스의 경우 fallocate 명령을 사용하여 HDFS 블록의 크기 즉 각각 128 MB크기의 블록들을 파일로 미리 만들어 두는 것이다. HDFS 블록들(25-10 ~ 25-17)은 실제로는 각 블록이 저장되어 있는 데이터노드의 지역 파일시스템의 파일들이라는 점을 강조하기 위해 도 9의 블록들(25-10 ~ 25-17)에 파일 이름을 괄호 속에 넣어 표시하였다. 또한 도 9에서 본 발명의 저장장치 ND1(40-1), ND2(40-2), ND3(40-3)는 네임노드(8) 및 모든 데이터노드들(1-1, 1-2, 1-3)에게 읽기 및 쓰기 권한으로 마운트 되는 것을 나타내기 위해 점선으로 표시하였다.Specifically, the HDFS name node 8 has HDFS blocks 25-10 and 25-11 in advance in the storage devices ND1 (40-1), ND2 (40-2), and ND3 (40-3) of the present invention. , 25-12, 25-13, 25-14, 25-15, 25-16, 25-17) as separate files. One way to create a block is to use the fallocate command in Linux as described in FIG. 5 to create a HDFS block size, that is, blocks each of 128 MB in size as a file in advance. In order to emphasize that the HDFS blocks 25-10 to 25-17 are actually files of the local file system of the data node where each block is stored, the blocks 25-10 to 25-17 of FIG. 9 The file name is enclosed in parentheses. 9, the storage devices ND1 (40-1), ND2 (40-2), and ND3 (40-3) of the present invention are the name node 8 and all data nodes 1-1, 1-2, 1 It is marked with a dotted line to indicate that it is mounted with read and write permission to -3).

예를 들어 도 9의 클라이언트(9)가 본 발명의 저장장치 ND 1(40-1)의 블록 1(25-10)에 주 블록을 저장하고 첫 번째 복제 블록은 본 발명의 저장장치 ND 2(40-2)의 블록 q(25-13)에 저장하고 두 번째 복제 블록은 본 발명의 저장장치 ND 3(40-3)의 블록 t(25-16)에 저장하도록 네임노드(8)로부터 지정받았다고 하면, 통상적인 HDFS 경우와 마차가지로 클라이언트(9)는 데이터노드 1(1-1)에게 데이터를 전송하고 데이터노드 1(1-1)은 ND 1(40-1)의 블록 1(25-10)에 주 블록을 저장한다.For example, the client 9 of FIG. 9 stores the main block in block 1 (25-10) of the storage device ND 1 (40-1) of the present invention, and the first duplicate block is the storage device ND 2 ( Designated by the name node 8 to store in block q (25-13) of 40-2) and the second duplicate block in block t (25-16) of storage device ND 3 (40-3) of the present invention If it is received, the client 9 transmits data to data node 1 (1-1), and data node 1 (1-1) transmits data to data node 1 (1-1) and block 1 (25) of ND 1 (40-1), as in the case of normal HDFS. Save the main block in -10).

그러나 본 발명에서 HDFS의 복제 블록들을 저장하는 방법은 통상적인 HDFS의 경우 데이터노드들이 복제 블록을 저장하는 것과는 달리 데이터노드들의 간여 없이 본 발명의 저장장치들(40-1, 40-2, 40-3) 사이에서 직접 이루어진다. 즉 본 발명의 저장장치(40-1, 40-2, 40-3)에는 본 발명의 저장장치 사이에서 직접 블록을 송수신할 수 있도록 블록송수신로직(50-1, 50-2, 50-3)을 장착하여 복제 블록을 데이터노드의 간여 없이 본 발명의 저장장치 사이에서 직접 주고받는다. 이를테면 본 발명의 저장장치 ND 1(40-1)은 주 블록을 자신의 데이터블록 1(25-10)에 저장하기 시작하면서 동시에 블록 1에 저장되는 데이터를 자신의 블록 송수신로직(50-1)을 통하여(220) 첫 번째 복제 블록이 저장될 ND 2(40-2)의 블록 송수신 로직(50-2)에게 블록 1(25-10)의 데이터를 전송(221)하고 ND 2(40-2)가 자신의 블록 q(25-13)에 복제 블록을 저장하도록 한다. 첫 번째 복제 블록을 받은 ND 2(40-2)의 블록 송수신 로직(50-2)은 ND 1(40-1)의 블록 송수신 로직(50-1)으로부터 데이터를 받아 자신의 블록 q(25-13)에 복제 블록을 저장(222)하는 것과 동시에 ND 3(40-3)의 블록 송수신 로직(50-3)에게 데이터를 전송(223)하여 ND 3(40-3)이 블록 t(25-16)에 두 번째 복제 블록을 저장(224)한다. However, the method of storing the duplicate blocks of HDFS in the present invention is different from the data nodes storing the duplicate blocks in the case of a conventional HDFS, the storage devices 40-1, 40-2, and 40- of the present invention without the involvement of the data nodes. 3) It is made directly between. That is, in the storage devices 40-1, 40-2, and 40-3 of the present invention, block transmit/receive logics 50-1, 50-2, 50-3 so that blocks can be directly transmitted/received between the storage devices of the present invention. Is installed to directly exchange and receive duplicate blocks between the storage devices of the present invention without intervening data nodes. For example, the storage device ND 1 (40-1) of the present invention starts to store the main block in its own data block 1 (25-10) and simultaneously transmits the data stored in the block 1 to its own block transmission/reception logic (50-1). Through 220, the data of block 1 (25-10) is transmitted (221) to the block transmission/reception logic (50-2) of ND 2 (40-2) where the first duplicate block is to be stored (221), and ND 2 (40-2) ) Stores the duplicate block in its own block q(25-13). The block transmission/reception logic 50-2 of ND 2 (40-2) receiving the first duplicate block receives data from the block transmission/reception logic 50-1 of ND 1 (40-1) and 13), the duplicate block is stored (222) and data is transmitted (223) to the block transmission/reception logic (50-3) of the ND 3 (40-3), so that the ND 3 (40-3) is transferred to the block t (25- 16), the second duplicate block is stored (224).

즉 통상적인 HDFS 블록 저장과 본 발명의 방법의 차이점은 데이터노드들이 복제 블록을 전송하느라고 데이터노드들의 메인메모리를 점유하는 일 없이 본 발명의 저장장치들(40-1, 40-2, 40-3) 사이에서 직접 복제가 이루어지는 것으로서, 본 발명의 저장장치를 사용하면 기존의 HDFS의 복제 블록을 전송하는 데에 수반되는 각 데이터노드들의 네트워크 오버헤드를 제거하여 유효 작업을 수행하는 속도를 높일 수 있도록 하는 것이다.That is, the difference between the conventional HDFS block storage and the method of the present invention is that the data nodes do not occupy the main memory of the data nodes to transmit the duplicate block, and the storage devices 40-1, 40-2, and 40-3 of the present invention. ), so that the use of the storage device of the present invention eliminates the network overhead of each data node involved in transmitting the existing HDFS copy block, thereby increasing the speed of performing effective operations. Is to do.

도 9의 설명에서는 주 블록은 데이터노드가 본 발명의 저장장치의 블록에 저장하는 것으로 설명하였으나 HDFS 클라이언트(9)도 데이터노드들과 마찬가지로 본 발명의 저장장치들(40-1, 40-2, 40-3)을 읽기 및 쓰기 권한으로 클라이언트(9)에 마운트 하도록 허락하면 주 블록을 데이터노드에게 전송하지 않고 네임노드(8)가 지정한 본 발명의 저장장치의 미리 생성해 둔 HDFS 블록에 클라이언트(9)가 직접 저장할 수 있다. In the description of FIG. 9, it has been described that the main block is stored in the block of the storage device of the present invention by the data node, but the HDFS client 9 also includes the storage devices 40-1, 40-2, and If 40-3) is allowed to be mounted on the client 9 with read and write privileges, the main block is not transmitted to the data node, and the client is placed in the HDFS block previously created in the storage device of the present invention designated by the name node 8. 9) Can be saved directly.

또한 도 5에서 설명한 것과 마찬가지로 본 발명의 저장장치에 블록들을 저장할 수 있는 HDFS 블록의 크기와 같은 크기의 파일들을 미리 생성해 두지 않고 HDFS 블록 쓰기 요구가 생길 때마다 네임노드(8)가 새로운 블록을 도 5에서의 예시와 같이 falloate 명령을 사용하여 필요할 때마다 HDFS 블록 파일을 동적으로 생성하여 사용할 수도 있다.In addition, as described in FIG. 5, files having the same size as the HDFS block size capable of storing blocks in the storage device of the present invention are not created in advance, and the name node 8 writes a new block whenever an HDFS block write request occurs. As in the example in FIG. 5, an HDFS block file may be dynamically generated and used whenever necessary by using the falloate command.

본 발명의 저장장치(40-1 ~ 40-2)를 사용하면 복제 블록들을 만들 때 데이터노드들의 메인메모리를 점유하지 않는 이점 외에도 하둡 시스템의 자원 관리를 효율적으로 할 수 있는 부가적인 이점이 생긴다. 일반적으로 데이터 블록들이 위치한 데이터노드들에서 데이터 처리가 수행되며 따라서 하둡 시스템은 자원 관리를 하는 프로세스들을 운영하며 처리 대상 블록의 위치에 따라 데이터 처리 작업 스케줄을 결정하므로 작업 스케줄에 따라 전체 하둡 시스템의 데이터 처리 성능에 영향을 미친다. 본 발명의 저장장치를 사용하는 경우에는 본 발명의 각각의 저장장치들이 모든 데이터노드들의 지역 디스크가 된다. 따라서 어느 데이터 블록이 어느 데이터노드에 위치하고 있는지를 구별할 필요 없이 임의의 데이터노드가 임의의 본 발명의 저장장치의 블록을 접근할 수 있으므로 작업 스케줄링과 자원 관리가 단순해지고 결과적으로 전체 데이터 처리 성능이 높아진다. When the storage devices 40-1 to 40-2 of the present invention are used, in addition to the advantage of not occupying the main memory of data nodes when creating duplicate blocks, there is an additional advantage of efficiently managing resources of the Hadoop system. In general, data processing is performed at the data nodes where the data blocks are located. Therefore, the Hadoop system operates processes that manage resources and determines the data processing schedule according to the location of the block to be processed. Affects processing performance. In the case of using the storage device of the present invention, each storage device of the present invention becomes a local disk of all data nodes. Therefore, since any data node can access any block of the storage device of the present invention without needing to distinguish which data block is located at which data node, job scheduling and resource management are simplified, resulting in overall data processing performance. It gets higher.

실시 예 2는 번들 배정서버를 사용하여 기존의 네트워크 부착 디스크에 파일 데이터를 저장하는 방법이다. 그러나 별도의 배정서버를 두는 방법은 배정서버가 단일 장애지점(point of failure)이 되어 배정서버에 장애가 발생하면 전체 시스템이 동작하지 않는 단점이 있다. Embodiment 2 is a method of storing file data on an existing network attached disk using a bundle allocation server. However, the method of having a separate assignment server has a disadvantage that the entire system does not operate if the assignment server becomes a single point of failure and a failure occurs in the assignment server.

도 10에 보인 실시 예 4는 각각의 네트워크 부착 디스크에 번들 배정로직(300)과 무결성 점검로직(301)을 장착한 본 발명의 저장장치(45)를 사용하여 실시 예 2의 배정서버가 단일 장애지점이 될 수 있는 단점을 극복한다.Example 4 shown in FIG. 10 uses the storage device 45 of the present invention equipped with the bundle allocation logic 300 and the integrity check logic 301 to each network attached disk, and the allocation server of the second embodiment has a single failure. Overcome the shortcomings that can become points.

예를 들어 도 10의 컴퓨터 1(1-31)이 본 발명의 저장장치(45)에게 컴퓨터 1(1-31)이 원하는 분량의 독점적 쓰기 권한을 갖는 디스크 번들 배정 요청(160)을 전송하면(230) 본 발명의 저장장치에 장착된 번들 배정로직(300)은 번들 0(30-1)과 번들 1(30-2)을 컴퓨터 1(1-11)에게 독점적 쓰기 권한을 갖도록 배정하는 응답을 컴퓨터 1(1-31)에게 전송한다(231)고 가정하자. 이때 본 발명의 저장장치(45)가 컴퓨터 1(1-31)에게 전송하는 응답에는 배정할 번들이 위치한 디스크 파티션(70)의 이름과 배정할 번들이 시작하는 디스크 오프세트 바이트 주소 및 번들을 구성하는 연속하는 디스크 섹터의 개수를 전송한다. 독점적 쓰기 권한을 갖는 번들 0(30-1)과 번들 1(30-2)을 배정받은 컴퓨터 1(1-31)은 데이터 쓰기 명령(161-1)을 통해 데이터 파일을 배정받은 번들에 저장(232)한다. 배정받은 번들에 데이터를 쓰는 방법과 데이터가 저장된 번들을 읽어 들이는 방법은 실시 예시 2에서 보인 예시 프로그램들과 동일하다. For example, when computer 1 (1-31) of FIG. 10 transmits a disk bundle allocation request 160 having exclusive write rights of the desired amount of computer 1 (1-31) to the storage device 45 of the present invention ( 230) The bundle allocation logic 300 mounted in the storage device of the present invention responds to allocating the bundle 0 (30-1) and the bundle 1 (30-2) to the computer 1 (1-11) to have exclusive write permission. Suppose that it is transmitted to computer 1 (1-31) (231). At this time, in the response transmitted from the storage device 45 of the present invention to the computer 1 (1-31), the name of the disk partition 70 in which the bundle to be allocated is located, the disk offset byte address from which the bundle to be allocated starts, and the bundle are configured. The number of consecutive disk sectors is transmitted. Computer 1 (1-31) assigned with bundle 0 (30-1) and bundle 1 (30-2) with exclusive write permission saves the data file to the assigned bundle through the data write command (161-1) ( 232). A method of writing data to an assigned bundle and a method of reading a bundle in which data is stored are the same as those of the example programs shown in Example 2.

실시 예 4에서는 통상적인 파일시스템의 파일을 사용하는 대신에 전체 파티션을 단일 장치파일로 사용하므로 한 컴퓨터가 저장한 데이터를 다른 컴퓨터가 읽기 위해서는 도 7의 설명에서와 같이 데이터를 저장한 컴퓨터 1(1-31)이 해당 데이터가 디스크 파티션의 어느 바이트 주소부터 시작하며 크기가 몇 바이트인지를 해당 데이터를 읽기를 원하는 다른 컴퓨터 k(1-32)에게 전해준다(233). 도 10의 공유 파일 정보 교환 모듈들(180-1, 180-2)은 각 데이터노드(1-31, 1-32) 사이에서 데이터가 저장되어 있는 디스크 파티션의 오프세트 바이트 주소와 데이터의 크기를 주고받는 모듈이다. In Example 4, the entire partition is used as a single device file instead of using a file of a conventional file system. Therefore, in order for another computer to read data stored by one computer, computer 1 storing data as shown in FIG. 7 ( 1-31) transfers the data starting from which byte address of the disk partition and the size of the data to another computer k (1-32) that wants to read the data (233). The shared file information exchange modules 180-1 and 180-2 of FIG. 10 determine the offset byte address of the disk partition in which data is stored and the size of the data between the data nodes 1-31 and 1-32. It is a send and receive module.

도 10의 본 발명의 저장장치(45)의 무결성 점검로직(301)은 쓰기(write) 명령을 전송한 컴퓨터가 해당 번들에 쓰기 권한을 가진 컴퓨터인지를 점검하여 권한이 없는 컴퓨터가 해당 번들에 데이터를 쓰는(write) 것을 방지한다.The integrity check logic 301 of the storage device 45 of the present invention of FIG. 10 checks whether the computer that has transmitted the write command is a computer that has write permission to the bundle, Prevents writing.

도 11은 데이터노드 즉 컴퓨터에게 네트워크 부착 디스크 또는 파티션 전체를 읽기 쓰기 권한을 독점적으로 갖도록 배정하는 방법을 도식화한 것이다. 즉 실시 예 1에서와 같이 개별 파일들을 컴퓨터에게 독점적 쓰기 권한을 부여하며 배정하는 대신에 디스크 또는 디스크 파티션 전체를 배정하는 방법이다. FIG. 11 is a schematic diagram of a method of allocating a data node, that is, a computer, to have read-write access exclusively to the entire network attached disk or partition. In other words, as in the first embodiment, the entire disk or disk partition is allocated instead of granting and allocating individual files exclusively to the computer.

이를테면 도 11의 컴퓨터 1(1-41)은 네트워크 부착 디스크 1(7-11)의 파티션 /dev/sda1(70-1)과 네트워크 부착 디스크 2(7-12)의 파티션 /dev/sdb1(70-2)을 자신의 지역 파일시스템에 읽기/쓰기 (read/write) 권한으로 마운트 하고 나머지 컴퓨터들은 이들 파티션을 각자의 지역 파일시스템에 읽기 권한만으로 마운트 한다. 이와 같이 함으로써 컴퓨터 1(1-41)은 네트워크 부착 디스크 1(7-11)의 파티션 /dev/sda1(70-1)과 네트워크 부착 디스크 2(7-12)의 파티션 /dev/sdb1(70-2)에 파일을 저장할 수 있지만, 다른 컴퓨터들(1-42, 1-43, 1-44)은 네트워크 부착 디스크 1(7-11)의 파티션 /dev/sda1(70-1)과 네트워크 부착 디스크 2(7-12)의 파티션 /dev/sdb1(70-2)에 파일을 저장할 수는 없으나 저장된 파일들을 읽을 수는 있다. For example, computer 1 (1-41) of FIG. 11 is the partition /dev/sda1 (70-1) of the network-attached disk 1 (7-11) and the partition /dev/sdb1 (70) of the network-attached disk 2 (7-12). Mount -2) on their local filesystem with read/write privileges, and the rest of the computers mount these partitions on their local filesystem with read privileges only. By doing this, computer 1 (1-41) is assigned to partition /dev/sda1 (70-1) of network-attached disk 1 (7-11) and partition /dev/sdb1 (70-) of network-attached disk 2 (7-12). 2), but other computers (1-42, 1-43, 1-44) have a partition /dev/sda1 (70-1) of network-attached disk 1 (7-11) and a network-attached disk. You cannot save files in partition /dev/sdb1(70-2) of 2(7-12), but you can read the saved files.

네트워크 부착 디스크 또는 파티션들을 각자의 지역 파일시스템에 읽기 전용 (read only)으로 마운트 하는 것은 도 5 실시 예 1에서 설명한 네트워크 부착 디스크를 마운트 하는 것과 비교하면 읽기 전용이라는 옵션을 지정하는 것 외에 동일하다. 다음은 컴퓨터 1(1-41)을 제외한 다른 컴퓨터들(1-42, 1-43, 1-44)에서 파티션 /dev/sda1(70-1) 과 /dev/sdb1(70-2)을 읽기 전용으로 마운트 하는 명령의 예이다.Mounting the network-attached disks or partitions to their respective local filesystems as read-only (read-only) is the same as that of mounting the network-attached disks described in Example 1 of FIG. 5 except for specifying the read-only option. Next, read partitions /dev/sda1(70-1) and /dev/sdb1(70-2) from computers (1-42, 1-43, 1-44) except computer 1 (1-41). This is an example of a command to mount exclusively.

여기에서 ro는 읽기 전용 (read only) 권한을 의미한다. Here, ro means read-only permission.

이때 파일을 생성하는 컴퓨터는 파일을 생성하여 데이터 쓰기를 완료한 후에 다른 컴퓨터들이 해당 파일의 데이터를 공유할 수 있도록 해당 디스크 파티션에 중간결과 파일의 데이터는 물론 해당 파일의 데이터가 디스크의 어느 섹터에 저장되어 있는지를 가리키는 정보를 포함한 메타데이터까지 디스크 파티션에 모두 반영되어 저장되도록 실시 예 1의 도 5의 설명과 같이 fsync 시스템 콜을 호출하여 운영체제의 디스크 버퍼에 머무르고 있는 데이터와 메타데이터가 실제로 디스크에 저장되도록 동기화를 한다. At this time, the computer generating the file creates the file and writes the data to the corresponding disk partition so that other computers can share the data of the file. Data and metadata remaining in the disk buffer of the operating system are actually saved in the disk by calling the fsync system call as described in FIG. 5 of the first embodiment so that all metadata including information indicating whether it is stored is reflected in the disk partition. Synchronize to be saved.

생성된 중간결과 파일을 디스크 파티션에 동기화하는 다른 방법으로는 애초에 파일을 오픈(open)할 때 O_SYNC 옵션을 지정하여 오픈하여 데이터 쓰기를 할 때마다 데이터와 관련 메타데이터를 즉시 디스크 파티션에 반영하는 방법도 있다. 리눅스에서 예를 들면 다음 예시 프로그램과 같이 O_SYNC 옵션을 지정하여 파일을 오픈(open)한 후에 데이터 쓰기(write)를 하면 디스크 버퍼에 임시로 데이터를 저장하지 않고 물리적 디스크 파티션에 직접 데이터 및 관련 메타데이터가 저장된다.Another method of synchronizing the created intermediate result file to the disk partition is to open the file by specifying the O_SYNC option when opening the file in the first place, and immediately reflect the data and related metadata to the disk partition whenever data is written. There is also. In Linux, for example, if data is written after opening a file by specifying the O_SYNC option as in the following example program, data and related metadata are not temporarily stored in the disk buffer, but directly on the physical disk partition. Is saved.

도 12는 각 컴퓨터들이 파티션을 배정해 줄 것을 요청하면 각 개별적인 본 발명의 저장장치(46)가 파티션을 배정하는 쓰기권한 배정로직(310)과 파티션에 쓰기를 수행하는 컴퓨터들이 쓰기 권한을 가지고 있는지를 점검하는 무결성 점검로직(311)을 장착한 본 발명의 저장장치의 기능 요소 구성도이다. 12 shows whether each individual storage device 46 of the present invention has a write permission assignment logic 310 that allocates a partition and computers that write to the partition have write permission when each computer requests that a partition be allocated. It is a functional element configuration diagram of the storage device of the present invention equipped with the integrity check logic 311 for checking.

예를 들면 컴퓨터 M(1-51)이 독점적 쓰기 권한을 갖는 파티션 배정을 원하는 요청(162-1)을 네트워크(2)를 통해 보내면(240) 본 발명의 저장장치(46)의 파티션 배정 로직(310)이 파티션 1(70-10)의 이름을 배정을 요청한 컴퓨터에 응답하고(241), 파티션 1(70-10)을 배정받은 컴퓨터 M(1-51)이 파티션 1(70-10)에 파일 쓰기를 수행하여 파티션 1(70-10)에 속한 디스크 섹터에 쓰기 명령(242)을 내리게 되며 이 쓰기 명령이 네트워크(2)를 통해 본 발명의 저장장치(46)에게 전송된다. For example, if the computer M (1-51) sends a request (162-1) for allocating a partition having exclusive write rights through the network (2) (240), the partition allocation logic of the storage device 46 of the present invention ( 310) responds to the computer requesting that partition 1 (70-10) be assigned a name (241), and computer M (1-51) assigned partition 1 (70-10) is assigned to partition 1 (70-10). A write command 242 is issued to the disk sector belonging to the partition 1 (70-10) by performing file writing, and the write command is transmitted to the storage device 46 of the present invention through the network (2).

본 발명의 저장장치의 무결성 점검 로직(311)은 쓰기 권한을 가지지 못한 파티션에 쓰기 시도를 하는 것을 탐지하여 해당 쓰기 시도를 허락하지 않는다. 이를테면 컴퓨터 N(1-52)이 쓰기 권한이 없는 파티션 1(70-10)에 쓰기 명령(163-2)을 전송하였다면(243) 본 발명의 저장장치(56)의 무결성 점검로직(303)이 탐지하여 해당명령을 수행하지 않고 컴퓨터 N(1-52)에게 쓰기 작업이 실패했다는 것을 통보한다(244). The integrity check logic 311 of the storage device of the present invention detects that an attempt is made to write to a partition that does not have write permission and does not allow the write attempt. For example, if the computer N (1-52) transmits the write command 163-2 to the partition 1 (70-10) that does not have write permission (243), the integrity check logic 303 of the storage device 56 of the present invention is It detects and notifies the computer N (1-52) that the write operation has failed without executing the command (244).

1-1 : 데이터노드
1-2 : 데이터노드
2 : 네트워크
4-1 : 네트워크 스택
5-1 : 네트워크 인터페이스 카드
6-1 : 디스크
7-1 : 네트워크 부착 디스크
8 : 네임노드
9 : 클라이언트
10-1 : 지역 파일시스템
11-1 : 중간결과 파일
20-1 : 매퍼
21-1 : 리듀서1-1: Data node
1-2: data node
2: network
4-1: network stack
5-1: network interface card
6-1: disk
7-1: Network attached disk
8: Name node
9: client
10-1: Local file system
11-1: Interim result file
20-1: Mapper
21-1: reducer

Claims

A method of sharing files among multiple computers by sharing a network-attached disk storing files.

The method of claim 1,
Files are created on the network-attached disk through a local file system, and files are mutually exclusively allocated to the generated files so that each computer can store data exclusively, and each computer is assigned a file to the assigned files. A method of sharing files between the computers by performing write to store data and allowing the computers to read files that are not assigned to them

The method of claim 1,
The disk sectors are mutually exclusively allocated to some of the disk sectors of the network-attached disk so that the computers can independently store data, and each computer writes a file to the allocated sectors to And the computers can read files stored in sectors that are not assigned to them, thereby sharing files between the computers

The method of claim 1,
Mount the partition of the network-attached disk so that one of the computers can exclusively write and read files, and the other computers can only read the corresponding partition of the network-attached disk. Mounting and how the computers share files

The method of claim 1,
The entire network-attached disk is mounted so that one of the computers can exclusively write and read files, and the other computers mount the network-attached disk so that only reading can be performed. How computers share files

The method of claim 1,
The data storage medium of the network-attached disk is a hard disk, SSD (Solid State Drive), flash memory, multiple array independent disks (RAID, Redundant Array of Inexpensive Disks), and JBOD (Just Bunch of Disks). To share a file between a plurality of the computers.

As a network attached storage device that provides disks to a computer by connecting to a network, it transmits blocks of HDFS (Hadoop Distributed File System) stored in the storage device to other storage devices through the network, and transmits the blocks from other storage devices to the network. Network attached storage device equipped with block transmission/reception logic capable of receiving HDFS blocks through

A storage device that connects to a network and provides a disk to the computer, and if the computer asks to allocate a file to store its data exclusively, the assignment logic that allocates the appropriate file and whether it attempts to write data to a file that does not have exclusive data storage rights. Network attached storage device equipped with inspection logic to check

A storage device that connects to a network and provides disks to a computer. When the computer requests that the computer allocate disk sectors to store data exclusively, the allocation logic to allocate the appropriate disk sectors and write data to the sectors that do not have exclusive data storage rights. Network attached storage device equipped with inspection logic to check if attempting

A storage device that connects to the network and provides disks to the computer, and when the computer requests that a disk partition be allocated exclusively for data storage, the allocation logic to allocate the appropriate disk partition and write data to the partition that does not have exclusive data storage rights. Network attached storage device equipped with inspection logic to check if attempting