KR20170079141A - Distributed file system and method for processing file operation the same - Google Patents

Distributed file system and method for processing file operation the same Download PDF

Info

Publication number
KR20170079141A
KR20170079141A KR1020150189376A KR20150189376A KR20170079141A KR 20170079141 A KR20170079141 A KR 20170079141A KR 1020150189376 A KR1020150189376 A KR 1020150189376A KR 20150189376 A KR20150189376 A KR 20150189376A KR 20170079141 A KR20170079141 A KR 20170079141A
Authority
KR
South Korea
Prior art keywords
server
address
metadata
client
file
Prior art date
Application number
KR1020150189376A
Other languages
Korean (ko)
Other versions
KR102024934B1 (en
Inventor
박정숙
김영창
김영균
김홍연
Original Assignee
한국전자통신연구원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 한국전자통신연구원 filed Critical 한국전자통신연구원
Priority to KR1020150189376A priority Critical patent/KR102024934B1/en
Publication of KR20170079141A publication Critical patent/KR20170079141A/en
Application granted granted Critical
Publication of KR102024934B1 publication Critical patent/KR102024934B1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • G06F17/30194
    • G06F17/30557
    • H04L61/2007
    • H04L67/42

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A distributed file system based on a torus network according to the present invention includes a plurality of metadata servers for storing metadata of files, a plurality of data servers for dividing and storing data, and at least one management server for managing the metadata server and the data server Wherein the plurality of metadata servers and the plurality of data servers are respectively disposed on first to n-th planes composed of a plurality of nodes, and a plurality of nodes included in the first to the n-th planes are assigned a plane address , A row address, a column address, a source port, a destination port, and relative position information.

Figure P1020150189376

Description

[0001] DISTRIBUTED FILE SYSTEM AND METHOD FOR PROCESSING FILE OPERATION THE SAME [0002]

The present invention relates to a distributed file system and a file operation processing method thereof.

As smartphones, tablets, and wearable devices become more popular in recent years, high-quality, unstructured data continues to increase, increasing the storage capacity of the cloud. And the things that are interconnected and virtualized. The large amount of data produced by Internet communications is also stored in cloud storage, which is why the need for cost-effective, high-capacity cloud storage technology is growing.

On the other hand, the development of exabytes of cloud storage is one of the issues to be solved when data production is estimated to be around 44,000EB by 2020. While petabytes of cloud storage is not uncommon to be deployed, exabytes of cloud storage technology can be difficult to solve with conventional technologies.

In addition to the large number of storage servers required to provide an Exabyte scale, the fat-tree network approach with switches that were previously used to build networks supports cost and high availability. There is a limit in terms of configuration complexity.

To overcome these limitations, it is possible to utilize a Torus network that connects servers and servers directly without a switch. Computational supercomputers such as K-Computer and Titan / Cray in Japan use a torus network structure, There is no concrete example of the construction of nodes yet.

In this regard, Korean Patent Laid-Open Publication No. 10-2011-0142500 (entitled: routing system and routing method using a torus topology in an on-chip network) uses deadlock recovery with a Token (DRT) , Which can minimize the size of an additional buffer (virtual channel) while utilizing the rich wire provided by the 2D torus topology.

In the embodiment of the present invention, storage servers are directly connected to each other without a switch to constitute a torus topology, and clients are connected to a switch, and position information in the torus network of each node is expressed using coordinate values formed by planes, rows, A distributed file system capable of providing an exabytes-class distributed file system and a method of operating the same.

It should be understood, however, that the technical scope of the present invention is not limited to the above-described technical problems, and other technical problems may exist.

According to a first aspect of the present invention, there is provided a distributed file system based on a torus network, comprising: a plurality of metadata servers for storing metadata of files; a plurality of metadata servers for dividing and storing data, A data server, and one or more management servers for managing the metadata server and the data server. In this case, the plurality of metadata servers and the plurality of data servers are respectively arranged on first to n-th planes composed of a plurality of nodes, and a plurality of nodes included in the first to n- , A column address, a source port, a destination port, and relative position information.

Further, a file operation process of a distributed file system including a plurality of metadata servers, a plurality of data servers, and one or more management servers arranged on first to n-th planes constituted by a plurality of nodes according to the second aspect of the present invention The method comprising: receiving a mount request including volume information to be accessed from the client by the management server; The management server searches the metadata server corresponding to the root directory information included in the volume information, and the management server transmits the IP address of the searched metadata server to the client. At this time, a plurality of IP addresses including a plane address, a row address, a column address, a source port, a destination port, and relative position information are allocated to a plurality of nodes included in the first to n-th planes, Wherein the plurality of metadata servers and the plurality of data servers included are directly connected to the plurality of metadata servers and the plurality of data servers included in the second to the n-th planes without connection of the switches, The method comprising: transmitting a plurality of metadata servers included in the first plane and a plurality of data servers together with a client as an IP address of the first plane, and transmitting the IP address of the searched metadata server and a plurality of metadata Store the server and IP addresses of multiple data servers until they are unmounted on local storage.

According to any one of the above-described objects, the present invention provides a system and method for transmitting / receiving shortest time information by selecting a shortest path by expressing positional information of a meta data server or a data server connected based on a torus network as coordinate values have.

In addition, topology monitoring of the file system can be performed more efficiently.

In addition, it is possible to solve the problem of supporting an exabytes-level capacity which is difficult to solve with a conventional hierarchical fat-tree method using a switch.

In addition, storage servers can connect directly to each other without a switch to form a torus topology, and clients can connect to switches to minimize system complexity.

In addition, Exabytes of storage can be provided without many modifications to the existing distributed file system.

1 is a block diagram of a distributed file system according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating an example of location information of each node in a three-dimensional torus structure and a location information notation of a server of a distributed file system according to an embodiment of the present invention.
3 is a diagram illustrating an example of a port configuration of each node in the distributed file system according to an embodiment of the present invention.
4 is a diagram illustrating an example of a method of configuring an IP address for each port of a node in a distributed file system according to an embodiment of the present invention.
5 is a diagram illustrating an example of an information structure to be managed by a management server, a metadata server, and a client in relation to coordinate values in a distributed file system according to an embodiment of the present invention.
FIG. 6 is a flowchart of a mounting step in a file operation processing method of a distributed file system according to an embodiment of the present invention.
FIG. 7 is a flow chart of a middle file open process of a file operation processing method of a distributed file system according to an embodiment of the present invention.
8 is a flowchart of a file reading process in a file operation processing method of a distributed file system according to an embodiment of the present invention.
9 is a flowchart of a file writing process in a file operation processing method of a distributed file system according to an embodiment of the present invention.
FIG. 10 is a flow chart of failure occurrence and processing steps of a relay server in a file operation processing method of a distributed file system according to an embodiment of the present invention.
FIG. 11 is a flow chart of a fault occurrence and processing step of a metadata server or a data server in a file operation processing method of a distributed file system according to an embodiment of the present invention.

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, which will be readily apparent to those skilled in the art. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly explain the present invention in the drawings, parts not related to the description are omitted.

Whenever a component is referred to as "including" an element throughout the specification, it is to be understood that the element may include other elements, not the exclusion of any other element, unless the context clearly dictates otherwise.

Hereinafter, a distributed file system 100 according to an embodiment of the present invention will be described with reference to FIGS. 1 to 5. FIG.

1 is a block diagram of a distributed file system 100 in accordance with an embodiment of the present invention. FIG. 2 is a diagram illustrating an example of location information of each node in a three-dimensional torus structure and a location information notation of a server of the distributed file system 100 according to an embodiment of the present invention. FIG. 3 is a diagram illustrating an example of a port configuration of each node in the distributed file system 100 according to an embodiment of the present invention.

A distributed file system 100 based on a torus network according to an exemplary embodiment of the present invention includes at least one management server 110, a plurality of metadata servers 120, a plurality of data servers 130, and a plurality of clients 140, .

The plurality of metadata servers 120 store metadata of the files. At this time, a plurality of metadata servers 120, for example, two or three metadata servers 110 are grouped to provide high availability while all of the plurality of metadata servers 120 operate in an active state, It operates in an active-standby mode.

The metadata server 120 may manage a group of metadata servers 120 grouped by a predetermined number of metadata servers 120. In this case, one metadata server 120 may operate in an active mode for one of the plurality of groups, and may operate in a standby mode for another group.

The plurality of data servers 130 divides and stores the data. That is, the data server 130 divides an actual file or data into chunks of a predetermined size and stores the chunks.

The management server 110 manages a plurality of metadata servers 120 and a plurality of data servers 130. The management server 110 monitors not only the metadata server 120 and the data server 130 but also a plurality of clients 140 and performs a recovery procedure when the metadata server 120 fails. The management server 130 may be located in the interior of the torus network, or may be independently disposed in the form of being directly connected to the switch 150 while being outside the torus network as shown in FIG. At this time, when the management server 130 exists in the torus network, the management server 130 exists in the first plane and can communicate with the client 140 through the switch 150. [

Meanwhile, one or more management servers 110 may be provided, but in an embodiment of the present invention, two management servers 110 are preferably provided. In addition, the management server 110 is also operated in an active-standby mode in order to provide high availability.

One or more clients 140 access the distributed file system 100 to perform file operations. When the routing function is not activated, the client 140 in the embodiment of the present invention transmits the information of each of the information through the switch 150 to the metadata server 120 or the data server 130 through the shortest path using the coordinate values It is possible to transmit and receive.

The plurality of metadata servers 120, the plurality of data servers 130, the management server 110 and the clients 140 are connected to a communication module (not shown), a memory (not shown) and a processor (not shown) Lt; / RTI >

At this time, the communication module may include both a wired communication module and a wireless communication module. The wired communication module may be implemented by a power line communication device, a telephone line communication device, a cable home (MoCA), an Ethernet, an IEEE1294, an integrated wired home network, and an RS-485 control device. In addition, the wireless communication module can be implemented with a wireless LAN (WLAN), Bluetooth, HDR WPAN, UWB, ZigBee, Impulse Radio, 60 GHz WPAN, Binary-CDMA, wireless USB technology and wireless HDMI technology.

The object detection program is stored in the memory. Herein, the memory 120 is collectively referred to as a nonvolatile storage device and a volatile storage device which keep the stored information even when power is not supplied.

For example, the memory may be a memory card such as a compact flash (CF) card, a secure digital (SD) card, a memory stick, a solid-state drive (SSD) A magnetic computer storage device such as a NAND flash memory, a hard disk drive (HDD), and the like, and an optical disc drive such as a CD-ROM, a DVD-ROM, and the like.

In addition, the program stored in the memory may be implemented in hardware such as software or an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit), and may perform predetermined roles.

The management server 110, the plurality of metadata servers 120, the plurality of data servers 130, and the clients 140 may be connected through a network. The network refers to a connection structure in which information can be exchanged between each node such as terminals and servers. One example of such a network is a 3rd Generation Partnership Project (3GPP) network, a Long Term Evolution (LTE) , A WAN (Wide Area Network), a PAN (Personal Area Network), a Bluetooth (Bluetooth), a wireless LAN (Local Area Network) Network, satellite broadcast network, analog broadcast network, Digital Multimedia Broadcasting (DMB) network, WiFi, and the like.

Meanwhile, in the distributed file system 100 according to an embodiment of the present invention, in order to provide storage of exabytes size, many servers are used, and when a network connected with an existing fat-tree structure is used, There is a problem that the complexity of the configuration increases.

Accordingly, in an embodiment of the present invention, the metadata server 120 and the data server 130 are configured in a three-dimensional torus topology without the switch 150, and are connected to each other by a routing function in each server . In addition, the clients 140 are connected to the switch 150 to access the distributed file system 100.

Also, in the torus structure, the access performance may vary depending on the location of the server and the number of hops. Accordingly, in one embodiment of the present invention, the concept of planes can be introduced and storage performance can be optimized by placing or relocating on a different plane depending on the demand characteristics (performance, capacity, access speed, etc.) of the workload.

To this end, in the distributed file system 100 according to the embodiment of the present invention, the plurality of metadata servers 120 and the plurality of data servers 130 are arranged on the first to the n-th planes, do.

At this time, nodes included in the first plane are connected in a fat-tree manner through the plurality of clients 140 and the switch 150. The plurality of metadata servers 120 and the plurality of data servers 130 included in the first plane are connected to the client 140 through the switch 150 for interfacing with the outside.

The plurality of metadata servers 120 and the plurality of data servers 130 included in the first plane include a plurality of metadata servers 120 and a plurality of data servers 130 included in the second to n- Can be directly connected to each other based on the torus network without the connection of the switches 150. In other words, the nodes included in the first plane can be connected to each other through the direct network cable connection with the nodes included in the second to the n-th planes without the switch 150 and in the form of a torus network.

In the embodiment having the above structure, each node included in the first to n-th planes may have four IP addresses in the two-dimensional torus structure and six IP addresses in the three-dimensional torus structure.

At this time, the IP address may be configured to include a plane address, a row address, a column address, a source port, a destination port, and relative position information.

Referring to FIG. 2 (a), servers connected by a torus are represented by a combination of plane (p), row (x), and column (y) values. That is, information of each node can be represented by coordinate values (p, x, y). Accordingly, the coordinate values of the 3-dimensional torus structure of 4x4x4 can be expressed as shown in FIG. 2 (b).

If the distributed file system 100 according to an embodiment of the present invention is a three-dimensional torus structure, as shown in FIG. 3A, a total of six IP addresses ip1, ip2, ip3, ip4, ip5, ip6) can be used. A plurality of metadata servers 120 and a plurality of data servers 130 included in the first plane connected to the client 140 via the switch 150 may be configured as shown in FIG. (Pip) of the switch 150 connected to the client 140 to the IP addresses ip1, ip2, ip3, ip4, ip5, and ip6.

Hereinafter, an IP address configuration method and an information structure to be managed in relation to coordinate values in the distributed file system 100 according to an embodiment of the present invention will be described with reference to FIG. 4 and FIG.

4 is a diagram illustrating an example of a method of configuring an IP address for each port of a node in the distributed file system 100 according to an embodiment of the present invention.

In the distributed file system 100 according to an embodiment of the present invention, the IP address allocated to the plurality of nodes included in the first to the n-th planes includes a plane address 410 in the first byte, a row address 420 in the second byte, The third byte contains the column address 430 and the remaining bytes contain the source port 440, the destination port 450 and the relative location information 460 represented by 1 or 2.

It is desirable that such an IP address is expressed so that the relative position of the corresponding node in the torus structure can be easily understood. And the rule of expressing the IP address can be specified to follow the number of the smaller of the plane, row, and column values.

Meanwhile, if the IP address itself is used to know the relative position of the server, the source port 440, the destination port 450 and the relative position information 460 expressed by 1 or 2 can be analyzed have.

Specifically, when the value of the relative position information 460 included in the IP address allocated to the plurality of nodes included in the first to the n-th planes is 1, the plane address 410, the row address 420, 430) is a coordinate value corresponding to its own position value.

On the contrary, when the value of the relative position information 460 is 2, it can be seen that the plane address 410, the row address 420, and the column address 430 are coordinate values other than their own position values. In this case, it is possible to determine which of the plane address 410, the row address 420, and the column address 430 should be changed by analyzing the values of the source port 440 and the destination port 450.

The structure of such an IP address can be used to express the state of the topology and the like.

5 is a diagram illustrating an example of an information structure to be managed in the management server 110, the metadata server 120, and the client in relation to coordinate values in the distributed file system 100 according to an embodiment of the present invention. to be.

First, FIG. 5A shows a server management table corresponding to a plurality of nodes included in the first to n-th planes stored in the management server 110. The server management table includes a host name 510 serving as an identifier of each server, a plane address, a row address, a column address 520, and a plurality of IP addresses 530.

In this case, the plurality of metadata servers 120 and the data server 130 disposed on the first plane may further include an address 540 of the switch 150 connected to the client 140.

Next, FIG. 5 (b) shows a file layout for each file stored in the metadata server 120. The file layout includes inode information 550 and one or more chunk-specific identification information 560a, ..., 560n. At this time, the identification information 560a, ..., 560n for each chunk next to the inode information may be consecutively constructed.

On the other hand, the chunk identification information 560a, ..., 560n may include IP address information. Such an IP address is not allocated in real time and arbitrarily, but an IP address selected as an optimal chunk access path may be allocated. At this time, the optimal chunk access path may be the shortest path to access the chunks.

The IP address assigned in this way can be changed only if the chunk access path fails.

5C shows the relay server information table 570 stored in the client 140 and the IP address information 580 of the metadata server.

The client 140 includes a relay server information table 570 corresponding to each node included in the first plane. That is, the relay server information table 570 stores all information of the servers on the first plane. The servers in the first plane serve as a relay for communication between the client 140 and the storage server.

The client 140 can select one of the servers included in the relay server information table located at the shortest distance as the relay server based on the coordinate values of the metadata server 120 or the data server 130 to request the operation . That is, the client 140 can select a server of the first plane, which can be the most optimal path, as the relay server based on the coordinate value information of the storage server, in order to access any storage server.

In addition, the client 140 includes IP address information 580 of the metadata server 120 storing the root directory information of the mounted volume. Accordingly, the client 140 can select the most optimal IP address when accessing the metadata server 120, that is, an IP address accessible with the shortest distance.

The IP address included in the IP address information of the metadata server 120 may be assigned a lane's IP address when a failure occurs in the path corresponding to the IP address. That is, when a failure occurs in the path corresponding to the IP address corresponding to the IP address information of the metadata server 120 of the client 140 in the three-dimensional torus network structure, the IP address corresponding to the failed path is excluded Of the remaining five IP addresses, it is possible to select the IP address of the lane, that is, the IP address corresponding to the route having the second shortest distance.

1 may be implemented in hardware such as software or an FPGA (Field Programmable Gate Array) or ASIC (Application Specific Integrated Circuit), and may perform predetermined roles can do.

However, 'components' are not meant to be limited to software or hardware, and each component may be configured to reside on an addressable storage medium and configured to play one or more processors.

Thus, by way of example, an element may comprise components such as software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, Routines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.

The components and functions provided within those components may be combined into a smaller number of components or further separated into additional components.

Meanwhile, the distributed file system 100 according to an embodiment of the present invention can perform a mount procedure, a file open procedure, a file read procedure, a file write procedure, a failure occurrence and a procedure procedure, 11A and 11B.

FIG. 6 is a flowchart of a mount step of a file operation processing method of the distributed file system 100 according to an embodiment of the present invention.

The mounting step according to the embodiment of the present invention firstly receives the mount request including the volume information to be accessed from the client 140 in step S610, And searches the metadata server 120 corresponding to the included root directory information (S620). The IP address of the retrieved metadata server 120 is transmitted to the client in operation S630.

The management server 110 not only includes the IP address of the metadata server 120 searched by the client 140 but also volume information, a plurality of metadata servers 120 included in the first plane, and a plurality of data servers 130 ) ≪ / RTI > together. Accordingly, the client 140 may store the volume information, the IP address of the searched metadata server 120, and the IP address corresponding to the plurality of nodes included in the first plane until unmounted in the local storage, .

FIG. 7 is a flowchart of a file open step in a file operation processing method of the distributed file system 100 according to an embodiment of the present invention.

In the file open procedure according to an embodiment of the present invention, the first relay server receives a file open request from the client 140 (S710). At this time, the first relay server can be selected by the client 140. That is, the client 140 stores the row address of the retrieved metadata server 120 corresponding to the root directory information included in the volume information among the plurality of metadata servers 120 and data servers 130 included in the first plane, A server having the same address as the column address (x, y) can be selected as the first relay server.

Next, the first relay server transmits a file open request to the searched metadata server 120 (S720), receives metadata of the file corresponding to the file open request searched by the metadata server 120 (S730 , And transmits the received metadata to the client 140 (S740).

8 is a flowchart of a file reading process in the file operation processing method of the distributed file system 100 according to an embodiment of the present invention.

In the file reading process according to an embodiment of the present invention, when the first relay server receives the file layout request from the client 140 (S810), the first relay server searches the file corresponding to the root directory information included in the volume information The metadata server 120 requests the file layout information (S820).

Next, the retrieved metadata server 120 retrieves the file layout according to the file layout information request and transmits the file layout to the first relay server (S830), and the first relay server receives the file layout and transmits the file layout to the client 140 S840).

Next, when the second relay server receives a file read request from the client 140 to the data server 130 (S850), the second relay server transmits a file read request to the data server 130. [ When the data server 130 receives the requested file corresponding to the file read request (S860), it transmits the requested file to the client 140 (S870).

When the client 140 receives the requested file from the second relay server, the client 140 can return the received file to the user.

On the other hand, the second relay server can be selected by the client 140. That is, the client 140 stores the row address and the column address (x, y) of the data server 130 in which a plurality of metadata servers 120 included in the first plane and the data server 130 are read, The server having the same address as the second relay server can be selected as the second relay server.

FIG. 9 is a flowchart of a file writing process in the file operation processing method of the distributed file system 100 according to an embodiment of the present invention.

The file writing procedure according to an embodiment of the present invention first manages the root directory information of the volume, and the metadata server 120 retrieved receives the file layout information request from the client 130 through the first relay server S910).

Next, the metadata server 120 allocates a chunk space for file writing (S920), and transmits the file layout information to the client 140 through the first relay server (S930).

Next, the data server 130 receives a file write request from the client 140 through the second relay server (S940), and performs a file write operation in response to the file write request (S950).

When the file writing procedure is completed, the data server 130 returns the result of writing the file to the second relay server, and the second relay server returns it to the client 140. Accordingly, the client 140 can return the file writing result to the user.

FIG. 10 is a flowchart of failure occurrence and processing steps of a relay server in a file operation processing method of the distributed file system 100 according to an embodiment of the present invention.

The file operation processing method of the distributed file system 100 according to an embodiment of the present invention can process a failure of the relay server when the failure occurs.

First, when the client 140 requests an operation from any one relay server corresponding to each node included in the first plane (S1010), and if a response is received within a predetermined time, the normal subsequent operation is processed (S1030).

Alternatively, if the response to the operation request is not received within a predetermined time period (S1020), the management server 110 requests the relay server information table 570 again and receives the request (S1040).

Based on the relay server table 570 including the status information of the relay server received from the management server 110, the client 140 determines whether the relay server is in the shortest distance from the relay server, The relay server may request the operation again (S1050).

FIG. 11 is a flowchart of a fault occurrence and processing step of the metadata server 120 or the data server 130 among the file operation processing methods of the distributed file system 100 according to an embodiment of the present invention.

The file operation processing method of the distributed file system 100 according to an exemplary embodiment of the present invention can perform a process of processing a failure of IP access to the metadata server 120 or the data server 130.

First, the relay server corresponding to each node included in the first plane attempts connection to the IP address of the data server 130 or the metadata server 120 in response to an operation request from the client 140 (S 1110) .

Next, if the relay server receives a response to the operation request from the data server 130 or the metadata server 120 within a predetermined time, the normal subsequent operation is processed (S1130).

Alternatively, if the relay server does not receive a response to the operation request from the data server 130 or the metadata server 120 for a predetermined period of time (S1120), the relay server transmits the operation request to the data server 130 or the metadata server 120 (S1140). The IP address of the IP address of the IP address of the IP address of the IP address of the IP address of the IP address of the IP address of the IP address of the IP address. If the connection to all the IP addresses fails (S1150), the connection failure result is returned to the client 140 (S1170).

On the other hand, if there is an IP address to which the connection succeeds as the relay server tries to connect, the IP address returned to the client 140 is returned to the client 140 (S1160). The client 140 stores the successfully accessed IP address and can use the stored IP address at the next connection attempt (S1165).

Meanwhile, in the above description, steps S610 to S1170 may be further divided into further steps or combined into fewer steps, according to an embodiment of the present invention. Also, some of the steps may be omitted as necessary, and the order between the steps may be changed. In addition, even if other contents are omitted, the contents already described with respect to the distributed file system 100 in FIGS. 1 to 5 are also applied to the file operation processing method of FIG. 6 to FIG.

According to any one of the embodiments of the present invention described above, the positional information of the metadata server 120 or the data server 130 linked with the torus network is expressed as a coordinate value, so that the shortest time Information transmission / reception performance can be ensured.

In addition, topology monitoring of the distributed file system 100 can be performed more efficiently.

In addition, it is possible to solve the problem of supporting an exabytes-class capacity which is difficult to solve in the conventional hierarchical fat-tree method using a switch.

In addition, the storage servers can directly connect to each other without a switch to configure the torus topology, and the clients 140 can be connected to the switch 150 to minimize the complexity of the system.

It is also possible to provide exabytes of storage without much modification of the distributed file system 100 that has been used previously.

The file operation processing method in the distributed file system 100 according to an embodiment of the present invention can also be implemented in the form of a computer program stored in a medium executed by a computer or a recording medium including instructions executable by the computer have. Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. In addition, the computer-readable medium may include both computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Communication media typically includes any information delivery media, including computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave, or other transport mechanism.

While the methods and systems of the present invention have been described in connection with specific embodiments, some or all of those elements or operations may be implemented using a computer system having a general purpose hardware architecture.

It will be understood by those skilled in the art that the foregoing description of the present invention is for illustrative purposes only and that those of ordinary skill in the art can readily understand that various changes and modifications may be made without departing from the spirit or essential characteristics of the present invention. will be. It is therefore to be understood that the above-described embodiments are illustrative in all aspects and not restrictive. For example, each component described as a single entity may be distributed and implemented, and components described as being distributed may also be implemented in a combined form.

The scope of the present invention is defined by the appended claims rather than the detailed description and all changes or modifications derived from the meaning and scope of the claims and their equivalents are to be construed as being included within the scope of the present invention do.

100: Distributed File System 110: Management Server
120: metadata server 130: data server
140: Client 150: Switch
510: host name 520: plane, row, column address
530: IP address 540: Switch address
550: inode information 560a to 560n: identification information per chunk
570: Relay server information table 580: Meta data server ip address information

Claims (20)

In a torus network based distributed file system,
A plurality of metadata servers for storing metadata of files,
A plurality of data servers for dividing and storing the data,
And one or more management servers for managing the metadata server and the data server,
Wherein the plurality of metadata servers and the plurality of data servers are respectively arranged on first to n-th planes constituted by a plurality of nodes,
Wherein a plurality of IP addresses including a plane address, a row address, a column address, a source port, a destination port and relative position information are allocated to the plurality of nodes included in the first to the n-th planes.
The method according to claim 1,
Wherein the first plane is connected to a plurality of clients through a switch,
Wherein a plurality of metadata servers and a plurality of data servers included in the first plane are directly connected to a plurality of metadata servers and a plurality of data servers included in the second to the n-th planes, Distributed file system.
3. The method of claim 2,
Wherein the plurality of metadata servers and the plurality of data servers included in the first plane are assigned the IP addresses further including an address of a switch connected to the client.
The method according to claim 1,
When the value of the relative location information included in the IP address allocated to the plurality of nodes included in the first to the n-th planes is 1, the plane address, the row address, and the column address correspond to their own position values,
And the plane address, the row address, and the column address are different from their own position values when the value of the relative position information is 2.
The method according to claim 1,
Wherein the management server includes a server management table corresponding to a plurality of nodes included in the first to the n-th planes,
Wherein the server management table includes a host name of each server, the plane address, the row address, the column address, and the plurality of IP addresses,
And a plurality of metadata servers and a data server disposed in the first plane, the address of the switch connected to the client.
The method according to claim 1,
Wherein the metadata server includes a file-by-file layout,
Wherein the file layout includes inode information and one or more chunk-by-chunk identification information,
Wherein the IP address included in the chunk identification information is assigned an IP address selected as an optimal chunk access path and the assigned IP address is changed when a failure occurs in the chunk access path.
The method according to claim 1,
Wherein the client includes IP address information of a metadata server including a relay server information table corresponding to each node included in the first plane and root directory information of a mounted volume,
The client selects one of the servers included in the relay server information table as a relay server, which is located at the shortest distance from the metadata server or the data server to request the operation,
Wherein the IP address included in the address information of the metadata server is an IP address accessible from the plurality of IP addresses at a shortest distance from the metadata server.
8. The method of claim 7,
Wherein the IP address included in the IP address information of the metadata server is assigned a second IP address among the plurality of IP addresses when a failure occurs in the path corresponding to the IP address.
The method according to claim 1,
Wherein the management server searches the metadata server corresponding to the root directory information included in the volume information when receiving a mount request including volume information to be accessed from the client, Distributed file system that is to be sent to the client.
10. The method of claim 9,
Wherein the management server transmits the IP addresses of a plurality of metadata servers and a plurality of data servers included in the first plane to the client,
Wherein the client stores the IP address of the searched metadata server and the IP addresses of the plurality of metadata servers and the plurality of data servers included in the first plane until unmounted in the local storage.
11. The method of claim 10,
The client selects a server having a same address as a row address and a column address of the retrieved metadata server among a plurality of metadata servers and a plurality of data servers included in the first plane as a first relay server, 1 When requesting to open a file to a relay server,
Wherein the first relay server transmits the file open request to the searched metadata server, receives metadata of a file corresponding to the file open request searched by the metadata server, and transmits the meta data to the client File system.
12. The method of claim 11,
Wherein the first relay server requests the file layout information to the retrieved metadata server when receiving the file layout information request from the client,
Wherein the retrieved metadata server retrieves the file layout information according to the file layout information request and transmits the retrieved file layout information to the client through the first relay server,
The client having a plurality of metadata servers and a plurality of data servers included in the first plane having a same address as a row address and a column address of a data server in which a file to be read by the client is stored, And receiving a file read request from the data server through the second relay server in response to the file read request when a file read request is transmitted to the data server through the selected second relay server, Distributed file system.
12. The method of claim 11,
Upon receiving the file layout information request from the client through the first relay server, the searched metadata server allocates a chunk space for file writing and transmits the file layout information to the client through the first relay server ,
Wherein the data server performs a file write operation upon receiving a file write request from the client through the second relay server.
The method according to claim 1,
If the client does not receive a response to the operation request from any of the relay servers corresponding to the respective nodes included in the first plane for a preset time, the client requests the relay server information table to the management server Receiving,
And requests the operation from the relay server that has not received the response to the relay server that is the shortest distance among the normally operated relay servers based on the received relay server table.
The method according to claim 1,
The relay server corresponding to each node included in the first plane tries to access the IP address of the data server or the metadata server in response to the operation request from the client,
If the response to the operation request is not received from the data server or the metadata server for a predetermined time,
The relay server attempts to connect to the remaining IP addresses except for the IP address of the data server or the metadata server that attempted the connection and if the connection to all the IP addresses fails, A distributed file system that returns.
16. The method of claim 15,
Wherein the relay server returns an IP address to the client when the IP address successfully accessed according to the connection attempt is present,
Wherein the client stores the IP address in which the connection is successful and uses the stored IP address in the next connection attempt.
A file operation processing method of a distributed file system including a plurality of metadata servers, a plurality of data servers, and at least one management server disposed on first to n-th planes composed of a plurality of nodes,
Receiving a mount request including volume information to be accessed from the client by the management server;
Searching the metadata server corresponding to the root directory information included in the volume information by the management server;
The management server transmitting the IP address of the searched metadata server to the client,
A plurality of nodes included in the first to the n-th planes are assigned a plurality of IP addresses including a plane address, a row address, a column address, a source port, a destination port, and relative position information,
Wherein a plurality of metadata servers and a plurality of data servers included in the first plane are directly connected to a plurality of metadata servers and a plurality of data servers included in the second to the nth planes without connection of the switches,
Wherein the management server transmits the IP addresses of a plurality of metadata servers and a plurality of data servers included in the first plane to the client,
Wherein the client stores the IP address of the searched metadata server and the IP addresses of the plurality of metadata servers and the plurality of data servers included in the first plane until unmounted in the local storage.
18. The method of claim 17,
The first relay server selected by the client receiving a file open request from the client;
The first relay server transmitting the file open request to the searched metadata server;
The first relay server receiving metadata of a file corresponding to the file open request retrieved by the metadata server, and
The first relay server transmitting metadata of the received file to the client,
Wherein the first relay server is selected by the client as a server having an address identical to a row address and a column address of the retrieved metadata server among a plurality of metadata servers and a plurality of data servers included in the first plane In file operation processing method.
19. The method of claim 18,
The first relay server receiving a file layout information request from the client;
The first relay server requesting the file layout information to the searched metadata server;
Receiving the file layout information retrieved from the retrieved metadata server by the first relay server;
The first relay server transmitting the file layout information to the client;
A second relay server selected by the client receiving a file read request from the client to a data server storing a file to be read by the client;
The second relay server receiving a requested file corresponding to the file read request from the data server, and
The second relay server transmitting the requested file to the client,
Wherein the second relay server comprises a server that has the same address as the row address and column address of the data server in which the file to be read is stored among the plurality of metadata servers and the plurality of data servers included in the first plane, Is selected.
19. The method of claim 18,
The retrieved metadata server receiving a file layout information request from the client through the first relay server;
Allocating a chunk space for writing a file to the searched metadata server;
The retrieved metadata server transmitting the file layout information to the client through the first relay server;
The data server receiving a file write request from the client via the second relay server; and
Further comprising the step of the data server performing a file write operation in response to the file write request.
KR1020150189376A 2015-12-30 2015-12-30 Distributed file system and method for processing file operation the same KR102024934B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020150189376A KR102024934B1 (en) 2015-12-30 2015-12-30 Distributed file system and method for processing file operation the same

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020150189376A KR102024934B1 (en) 2015-12-30 2015-12-30 Distributed file system and method for processing file operation the same

Publications (2)

Publication Number Publication Date
KR20170079141A true KR20170079141A (en) 2017-07-10
KR102024934B1 KR102024934B1 (en) 2019-11-04

Family

ID=59355642

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020150189376A KR102024934B1 (en) 2015-12-30 2015-12-30 Distributed file system and method for processing file operation the same

Country Status (1)

Country Link
KR (1) KR102024934B1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190058992A (en) * 2017-11-22 2019-05-30 한국전자통신연구원 Server for distributed file system based on torus network and method using the same
KR20200074610A (en) * 2018-12-17 2020-06-25 한국전자통신연구원 Apparatus and method for optimizing volume performance of distributed file system based on torus network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20030077033A (en) * 2001-02-24 2003-09-29 인터내셔널 비지네스 머신즈 코포레이션 A novel massively parallel supercomputer
KR20070086231A (en) * 2004-11-17 2007-08-27 레이던 컴퍼니 Scheduling in a high-performance computing (hpc) system
WO2015194937A1 (en) * 2014-06-19 2015-12-23 Mimos Berhad System and method for distributed secure data storage in torus network topology

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20030077033A (en) * 2001-02-24 2003-09-29 인터내셔널 비지네스 머신즈 코포레이션 A novel massively parallel supercomputer
KR20070086231A (en) * 2004-11-17 2007-08-27 레이던 컴퍼니 Scheduling in a high-performance computing (hpc) system
WO2015194937A1 (en) * 2014-06-19 2015-12-23 Mimos Berhad System and method for distributed secure data storage in torus network topology

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Paolo Costa, "Bridging the Gap Between Applications and Networks in Data Centers", ACM SIGOPS, Jan 2013* *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190058992A (en) * 2017-11-22 2019-05-30 한국전자통신연구원 Server for distributed file system based on torus network and method using the same
KR20200074610A (en) * 2018-12-17 2020-06-25 한국전자통신연구원 Apparatus and method for optimizing volume performance of distributed file system based on torus network

Also Published As

Publication number Publication date
KR102024934B1 (en) 2019-11-04

Similar Documents

Publication Publication Date Title
US11354039B2 (en) Tenant-level sharding of disks with tenant-specific storage modules to enable policies per tenant in a distributed storage system
US9900397B1 (en) System and method for scale-out node-local data caching using network-attached non-volatile memories
US9892129B2 (en) Distributed file system and operating method of the same
US9811546B1 (en) Storing data and metadata in respective virtual shards on sharded storage systems
US7725603B1 (en) Automatic network cluster path management
US10466935B2 (en) Methods for sharing NVM SSD across a cluster group and devices thereof
US10191916B1 (en) Storage system comprising cluster file system storage nodes and software-defined storage pool in cloud infrastructure
US20180203866A1 (en) Distributed object storage
US9158714B2 (en) Method and system for multi-layer differential load balancing in tightly coupled clusters
JP2021510215A (en) I / O request processing method and device
WO2012000348A1 (en) Method and apparatus for providing highly-scalable network storage for well-gridded objects
US10157003B1 (en) Storage system with distributed tiered parallel file system comprising software-defined unified memory cluster
US9961145B1 (en) Multi-tier storage system having front-end storage tier implemented utilizing software-defined storage functionality
US9674312B2 (en) Dynamic protocol selection
KR102024934B1 (en) Distributed file system and method for processing file operation the same
US9641611B2 (en) Logical interface encoding
US20240171634A1 (en) Scalable autonomous storage networks
US10447585B2 (en) Programmable and low latency switch fabric for scale-out router
US9942326B1 (en) In-memory database with memory clustering utilizing software-defined storage functionality
KR102001572B1 (en) Distributed file system and method for managing data the same
CN112636949A (en) Communication method and device for electromagnetic transient real-time parallel simulation data
US10764330B2 (en) LAN/SAN network security management
KR102025801B1 (en) Distributed file system and method for protecting data thereof
KR102610984B1 (en) Distributed file system using torus network and method for operating of the distributed file system using torus network
CN107832005B (en) Distributed data access system and method

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
E701 Decision to grant or registration of patent right
GRNT Written decision to grant