CN115687250A

CN115687250A - Storage method, equipment, system and computer storage medium

Info

Publication number: CN115687250A
Application number: CN202110823074.5A
Authority: CN
Inventors: 许家桐; 谢昌龙; 席金玉; 王福成
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Suzhou Software Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Suzhou Software Technology Co Ltd
Priority date: 2021-07-21
Filing date: 2021-07-21
Publication date: 2023-02-03

Abstract

The invention discloses a storage method, which is applied to access equipment and comprises the following steps: under the condition that a storage request aiming at a first file is received, an identification code is distributed to the first file, an index number of the first file is generated by utilizing an existing parent directory and the identification code of the first file, the association relation between the name of the first file and the index number of the first file is stored, and the first file and the index number of the first file are sent to an indexing device. The embodiment of the invention also discloses equipment, a system and a computer storage medium, which improve the reliability of file storage and further improve the storage efficiency of file storage.

Description

Storage method, equipment, system and computer storage medium

Technical Field

The present invention relates to a file storage technology of a computer, and in particular, to a storage method, device, system, and computer storage medium.

Background

A Distributed File System (DFS) may be used for storage and management of data. At present, a common distributed file system usually realizes distributed storage of data across servers by additionally adding modules on the basis of a single file system, and manages the data by sharing the same namespace.

Among various stand-alone File systems, a dynamic File System (ZFS, zettabyte File System) can span the physical location of data, as compared with other stand-alone File systems, due to the combination of File systems, device management, and the like.

However, in the distributed file system, ZFS cannot break through the limitation of its own stand-alone file system, and there are still many deficiencies in service availability. For example, it is often difficult to provide service when a ZFS server is down or fails.

Disclosure of Invention

In view of this, the present invention provides a storage method, device, system and computer storage medium, so as to solve the technical problem that it is generally difficult to provide services when a ZFS server is down or fails in the prior art.

The technical scheme of the invention is realized as follows:

in a first aspect, an embodiment of the present invention provides a storage method, where the method is applied to an access device, and includes:

in the event that a storage request for a first file is received, assigning an identification code to the first file;

generating an index number of the first file by using the existing parent directory of the first file and the identification code;

storing the association relationship between the name of the first file and the index number of the first file;

sending the first file and the index number of the first file to the indexing equipment;

the index device is used for storing the corresponding relation between the index number of the first file and the identifier of the storage node of the first file into the first device and the second device of the index device after the identifier of the storage node of the first file in the storage device is determined.

In a second aspect, an embodiment of the present invention provides a storage method, where the method is applied to an index device, and includes:

receiving a first file sent by access equipment and an index number of the first file; the index number of the first file is generated by an access device by using an existing parent directory of the first file and an identification code allocated to the first file by the access device;

determining the identification of a storage node in the storage equipment, which is used for storing the first file;

and storing the corresponding relation between the index number of the first file and the identifier of the storage node of the first file in first equipment and second equipment of the index equipment.

In a third aspect, an embodiment of the present invention provides an access device, including:

the device comprises an allocation module, a storage module and a processing module, wherein the allocation module is used for allocating an identification code to a first file under the condition of receiving a storage request aiming at the first file;

the generating module is used for generating the index number of the first file by utilizing the existing parent directory of the first file and the identification code;

the first storage module is used for storing the association relationship between the name of the first file and the index number of the first file;

the sending module is used for sending the first file and the index number of the first file to an index device;

In a fourth aspect, an embodiment of the present invention provides an indexing apparatus, including:

the receiving module is used for receiving a first file sent by access equipment and an index number of the first file; the index number of the first file is generated by an access device by using an existing parent directory of the first file and an identification code allocated to the first file by the access device;

the determining module is used for determining the identification of a storage node in the storage equipment, which is used for storing the first file;

and the second storage module is used for storing the corresponding relation between the index number of the first file and the identifier of the storage node of the first file in the first equipment and the second equipment of the index equipment.

In a fifth aspect, an embodiment of the present invention further provides an access device, where the access device includes: the storage medium depends on the processor to perform operations through a communication bus, and when the instructions are executed by the processor, the storage method of one or more embodiments is performed.

In a sixth aspect, an embodiment of the present invention further provides an indexing device, where the indexing device includes: the storage medium depends on the processor to perform operations through a communication bus, and when the instructions are executed by the processor, the storage method of one or more embodiments is performed.

In a seventh aspect, an embodiment of the present invention provides a computer storage medium storing executable instructions, and when the executable instructions are executed by one or more processors, the processors execute the storage method described in one or more embodiments above.

The invention provides a storage method, a device, a system and a computer storage medium, wherein the method comprises the following steps: the method comprises the steps that an access device allocates an identification code to a first file under the condition that a storage request for the first file is received, an existing parent directory and the identification code of the first file are utilized to generate an index number of the first file, the association relation between the name of the first file and the index number of the first file is stored, the index number of the first file and the index number of the first file are sent to an index device, wherein the index device is used for storing the corresponding relation between the index number of the first file and the identification of a storage node of the first file into a first device and a second device of the index device after the identification of the storage node of the first file in the storage device is determined; that is to say, in the present invention, when the access device acquires the storage request to allocate the identification code for the first file, the access device generates the index number of the first file through the existing parent directory of the first file and the identification code, so that the association relationship between the name of each file and the index number of the file can be stored in the access device, which is beneficial to reading the file and modifying the file, and the index numbers of the first file and the first file are sent to the index device, so that the index device can determine the identifier of the storage node for storing the first file, and the index number of the first file and the identifier of the storage node of the first file are stored in the first device and the second device of the index device in a corresponding manner, so that when the first device fails, the second device can also reset the route, thereby ensuring normal storage and reading of the file, and further improving the reliability of file storage, and further improving the storage efficiency of the file storage.

Drawings

FIG. 1 is a schematic structural diagram of an alternative file system in an embodiment of the present invention;

FIG. 2 is a schematic flow interaction diagram of an alternative storage method in an embodiment of the present invention;

FIG. 3 is a block diagram of an example of an alternative file system in an embodiment of the present invention;

FIG. 4 is a block diagram illustrating an example of an alternative file system in accordance with an embodiment of the present invention;

FIG. 5 is a schematic diagram of an alternative directory in an embodiment of the present invention;

FIG. 6 is a flow chart of an alternative Raft algorithm in an embodiment of the present invention;

FIG. 7 is a flow chart illustrating an alternative storage method according to an embodiment of the present invention;

FIG. 8 is a schematic flow chart of an alternative storage method in an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an alternative access device in an embodiment of the present invention;

FIG. 10 is a schematic structural diagram of an alternative indexing device in an embodiment of the present invention;

fig. 11 is a schematic structural diagram of another alternative access device in the embodiment of the present invention;

FIG. 12 is a schematic structural diagram of an alternative indexing device in an embodiment of the present invention;

fig. 13 is a schematic structural diagram of an alternative file system in the embodiment of the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

Example one

An embodiment of the present invention provides a storage method, where the storage method is applied to a file system, and fig. 1 is a schematic structural diagram of an optional file system in an embodiment of the present invention, and as shown in fig. 1, the file system may include: the access device 11, the index device 12 and the storage device 13; wherein, the indexing device 12 includes two types of devices, which are a first device 121 and a second device 122; wherein the content of the first and second substances,

the access device 11 has a communication connection with the index device 12, and the access device 11 may be a user client or a gateway server and plays a role as a client in a file system; the first device 121 and the second device 122 of the Index device 12 are both computing servers, the first device 121 may be a ZFS server, the second device 122 may be an Index server, and the Index server is configured to manage a route of the ZFS server; the storage device 13 is typically a storage server for storing files.

Based on the file system shown in fig. 1, an embodiment of the present invention provides a storage method, and fig. 2 is a schematic view illustrating an alternative storage process interaction in an embodiment of the present invention, where as shown in fig. 2, the storage method may include:

s201: the access device 11, in the case of receiving a storage request for a first file, assigns an identification code to the first file;

at present, the mainstream distributed file system has good performance in the aspect of reading and writing large files, the performance bottleneck is generally limited to reading and writing of massive small files and index access, and the problem of reading and writing of small files can also be solved as the problem of index design. Mainstream distributed file system indexes are roughly divided into two directions, wherein a bottleneck exists in centralized index expansibility, and decentralized indexes are insufficient in basic performance. The mainstream distributed file system can only promote the upper application layer or make small file aggregation at the protocol access layer in general to relieve the bottleneck limitation under partial models.

Meanwhile, these distributed systems are encumbered by a traditional consistency algorithm (e.g., PAXOS algorithm or Multi-PAXOS algorithm), and efficient systems cannot be designed simultaneously when PAXOS semantics are realized, and finally, the actual security of the designed systems cannot be proved. The conditions that various mainstream distributed file systems generate split brains and lose data are rare in a production environment, and the self-repairing of faults to ensure the final consistency is a normal state. The principle of the PAXOS algorithm is completely secure and provable, but the most important drawback is that it is difficult to understand and implement, a lot of extra overhead is often required for distributed file storage, and finally, security cannot be guaranteed accurately, and final consistency needs to be maintained again.

In order to improve the file storage efficiency, in the embodiment of the present invention, after receiving a storage request of a first file, the access device 11 allocates an identification code to the first file, specifically, when receiving the first file to be stored, the access device 11 may generate an identification code for the first file according to a preset rule, for example, the identification code may be encoded according to a serial number of the stored file, or the file to be stored may be numbered according to a letter plus a number, which is not limited in this embodiment of the present invention.

S202: the access device 11 generates an index number of the first file by using the existing parent directory and the identification code of the first file;

specifically, after the access device 11 allocates the identification code for the first file, the index number of the first file may be generated by using the parent directory of the first file and the identification code of the first file, for example, the identification code allocated for the first file is 345, and the parent directory of the first file is an F disk, then the index number of the first file may be F/345, and thus, the index number of the first file may be obtained.

S203: the access device 11 stores the association relationship between the name of the first file and the index number of the first file;

after the index number of the first file is generated, the name of the first file and the index number of the first file can be associated and stored, so that the index number of the file can be found through the name of the file when the file is read or lost, and the file can be read or modified from the storage device.

S204: the access device 11 sends the name of the first file and the index number of the first file to the indexing device;

in order to determine a device corresponding to the first file from the first devices of the index device, in an optional embodiment, the method may further include:

the access device 11 sends an acquisition request for the routing information of the index device to the second device 122;

the second device 122 determines the routing information of the first device managed by the second device 122 as the routing information of the index device, and sends the routing information to the access device 11;

the access device 11 determines the received routing information of the index device as the determined routing information of the index device.

The access device 11 determines a device corresponding to the first file from the first device of the index device according to the determined routing information of the index device and the existing parent directory of the first file;

specifically, the access device 11 sends an acquisition request for the routing information of the index device to the second device 122, where it should be noted that the second device 122 here indicates all the second devices in the index device, and each second device has at least two first devices connected thereto and manages the at least two first devices, then, after receiving the acquisition request, the second device 122 sends the routing information of each first device under the second device to the access device 11 as the routing information of the index device, so that the access device 11 determines the routing information of the index device received as the determined routing information of the index device.

Therefore, the routing information of the index device is determined by sending the acquisition request, so that the determined routing information of the index device is effective routing information, and normal reading and writing of the file system are ensured.

The determined routing information of the index device is stored in the access device 11, and since the first device corresponding to the parent directory is stored in the routing information of the index device, the access device 11 may determine, from the determined routing information of the index device, the first device corresponding to the parent directory of the first file, and determine the first device corresponding to the parent directory of the first file as the device corresponding to the first file.

It should be noted that, the routing information of the determined index device stored in the access device 11 may be valid routing information or invalid routing information, and in order to make the determined routing information of the index device be valid routing information to ensure normal reading and writing and modification of the storage device, any one of the following manners may be adopted, for example, whether the routing information is valid routing information may be determined by comparing the version number of the routing information, in an optional embodiment, the method may further include:

when the version number of the received routing information of the index device is the same as the version number of the stored routing information of the index device, the access device 11 determines the stored routing information of the index device as the determined routing information of the index device;

when the version number of the received routing information of the index device is different from the version number of the stored routing information of the index device, the access device 11 determines the received routing information of the index device as the determined routing information of the index device.

That is, the access device 11 compares whether the received version number of the routing information of the index device is the same as the version number of the routing information of the stored index device, and if the received version number of the routing information of the index device is the same as the version number of the routing information of the stored index device, it indicates that the routing information of the stored index device in the access device 11 is valid routing information, so that the access device 11 determines the stored routing information of the index device as the determined routing information of the index device.

If the two pieces of routing information are different, it is determined that the stored routing information of the index device in the access device 11 is the routing information of the index device, and therefore the routing information of the index device received by the access device is determined as the determined routing information of the index device.

Therefore, the determined routing information of the index device can be ensured to be effective routing information, and the problem caused by using invalid routing information by the access device is prevented.

After the access device 11 determines the index number of the first file, the first file and the index number of the first file are sent to the index device, so that the index device is used for storing the corresponding relationship between the index number of the first file and the identifier of the storage node of the first file into the first device and the second device of the index device after determining the identifier of the storage node of the first file in the storage device.

It should be noted that, between S203 and S204, S203 may be executed first, S204 may be executed first, and S203 and S204 may also be executed simultaneously, which is specifically limited in this embodiment of the application.

S205: the index device 12 determines an identifier of a storage node in the storage device, which is used for storing the first file;

the storage node may include a master node and a backup node.

Here, specifically, after acquiring the first file and the index number of the first file, the first device of the index device 12 stores, from the storage nodes that are free in the storage device, an identifier of the storage node used for storing the first file; the storage node may include a master node and a backup node, where it should be noted that, usually, one master node is provided, and 2 or 3 backup nodes may be provided, where this is not specifically limited in this embodiment of the present invention.

S206: the index device 12 stores the corresponding relationship between the index number of the first file and the identifier of the storage node of the first file in the first device and the second device of the index device 12;

specifically, the first device of the index device, that is, the device for tracing the first file determined from the first device of the index device, needs to store a corresponding relationship between the index number of the first file and the identifier of the master node and the identifier of the backup node of the first file, so that when a user needs to read the stored file, the file can be read through the corresponding relationship.

In addition, the first device also needs to send the corresponding relationship to a second device managing the first device, so that the second device stores the corresponding relationship, thereby preventing the situation that a file cannot be read or written when a certain first device fails, and ensuring the reliability of the file system.

In order to improve consistency of reading and writing files in the file system, in an optional embodiment, the method further includes:

when the device corresponding to the first file receives a reply message of successful writing of the storage device 13 to the first file, the device corresponding to the first file sends a response message of successful writing to the first file to the access device 11;

when receiving a response message for successful writing of the first file, the access device 11 generates a prompt message; the prompt information is used for prompting the user that the first file is successfully stored.

That is to say, when the device corresponding to the first file successfully stores the first file in the storage device 13, the storage device 13 sends a reply message that the writing of the first file is successful to the device corresponding to the first file, and after receiving the reply message, the device corresponding to the first file sends a response message that the writing is successful to the access device 11, so that the access device 11 knows that the storage of the first file is successful, and after knowing that the storage is successful, the access device 11 generates a prompt message to prompt the user that the storage of the first file is successful.

When the writing fails, the storage device 13 does not send a reply message for successful writing of the first file to the device corresponding to the first file, and the device corresponding to the first file does not send a message to the access device 11, so that when the access device 11 does not receive a response message for successful writing within a period of time, it is determined that the first file is failed to be stored, at this time, it is prompted that the user fails to store the first file, then, the file system does not store the first file, and thus, the consistency of data in the file system can be ensured through the response between the access device 11 and the index device 12.

For a file system, in addition to storing a file, the stored file may also be read from the file system, and in an optional embodiment, the method may further include:

the access device 11 receives a read request of the stored second file;

the access equipment 11 determines the index number of the second file according to the incidence relation between the name of the stored file and the index number of the file;

the access device 11 determines a device corresponding to the second file from the first device of the index device according to the determined routing information of the index device and the existing parent directory of the second file;

the access device 11 sends the index number of the second file to the device corresponding to the second file;

the equipment corresponding to the second file determines the storage node of the second file according to the corresponding relation between the index number of the stored file and the identifier of the storage node of the file;

and reading the second file from the storage node of the second file by the equipment corresponding to the second file.

In order to read the stored second file, the user sends a read request of the second file to the access device 11 through the interface of the access device 11, and since the access device 11 stores the association relationship between the name of the file and the index number of the file, the access device 11 can determine the index number of the second file.

And then, determining first equipment corresponding to the existing parent directory of the second file from the determined routing information of the index equipment, determining the first equipment as equipment corresponding to the second file, and sending the index number of the second file to the equipment corresponding to the second file.

Because the corresponding relation between the index number of the file and the identifier of the storage node of the file is stored in the device corresponding to the second file, the device corresponding to the second file determines the storage node of the second file according to the corresponding relation, and reads the second file from the storage node of the second file.

For a file system, in addition to storing a file, the stored file may be modified from the file system, and in an alternative embodiment, the method may further include:

when the data of the master node in the storage node of the stored third file is changed, the access device 11 determines the index number of the third file according to the association relationship between the name of the stored file and the index number of the file;

the access device 11 determines, based on the determined routing information of the indexing device, a device corresponding to the third file from the first device of the indexing device according to the existing parent directory of the third file;

and the access device 11 sends the modified third file and the index number of the third file to the device corresponding to the third file.

And the equipment corresponding to the third file determines a backup node in the storage nodes of the third file according to the corresponding relation between the index number of the stored file and the identification of the storage node of the file.

The device corresponding to the third file is used for determining a backup node in the storage node of the third file according to the index number of the third file, and the backup node of the third file is used for updating the stored data into the modified third file.

In order to modify the stored third file, the user may determine the index number of the third file by using the data of the master node of the access device 11, where the access device 11 stores the association relationship between the name of the file and the index number of the file.

Then, the first device corresponding to the existing parent directory of the third file is determined from the determined routing information of the index device, the first device is determined as the device corresponding to the third file, and the index number of the third file is sent to the device corresponding to the third file.

The device corresponding to the third file determines the backup node of the third file according to the corresponding relationship, and the demarcated point of the third file updates the data in the backup node of the third file by using the modified third file.

In order to ensure the normal operation of the file system when a file system fails, in an optional embodiment, the method may further include:

when one of the first devices managed by the second device fails, the second device selects one of the first devices from the other first devices managed by the second device;

and the second equipment sends the corresponding relation between the index number of the file stored in one of the first equipment and the identifier of the storage node of the file to the selected one of the first equipment for storage, and updates the routing information of the index equipment.

That is to say, for a certain second device, when one of the first devices managed by the second device fails, in order to ensure normal operation of the file system, the second device selects one first device from the other first devices managed by the second device, and sends and stores a correspondence between an index number of a file stored in one of the first devices and an identifier of a storage node of the file, which is stored in advance, to the selected one first device.

Therefore, when the first device fails, the file system can be timely responded by other normal first devices, so that the file system can be ensured to respond to the emergency, the reliability of the file system is improved, and the reading and writing efficiency of the file system is improved.

The storage method described in one or more of the above embodiments is described below by way of example.

Fig. 3 is a schematic structural diagram of an example of an optional file system in an embodiment of the present invention, and as shown in fig. 3, the file system may include: an access layer, an index layer and a storage layer;

the access layer can be a user Client or a gateway server (Posix File System Client), the index layer is composed of index clusters and comprises two computing servers, namely ZFSServer and IndexServer, and the storage layer is composed of storage clusters and comprises at least two storage servers (ZFS zpool).

It should be noted that the above three-layer architecture is not absolute 1:1: the 1 relationship, storage cluster is the storage resource provider of the index cluster, while the interfacing relationship depends entirely on the level of isolation to the physical or logical failure domain. For example, a plurality of index clusters may share a set of resources of a storage cluster, and the index clusters may provide services to the same access gateway.

A file writing process:

in the first step, a device in the access stratum first allocates an Inode number (equivalent to an identification code) in a distributed file system (DZFS, distribute ZFS) for a write file.

Secondly, the device in the access layer converts the parent directory (including DZFS root directory and directory for dividing the name space) of the written file into prefix _ id, and converts the Inode number of the device in DZFS into file _ id; and splicing the two ids into the index number of the write-in file, and associating the write-in file with the index number of the write-in file.

Thirdly, because the access layer caches the route, the device in the access layer can directly send the write-in file to the ZFServer corresponding to the write-in file when the route is not invalid; and if the route fails, acquiring ZFSServer corresponding to the written file from the IndexServer again, and sending the written file to the ZFSServer.

And fourthly, selecting a main node and a redundant node (equivalent to a backup node) from the idle nodes in the storage layer by the ZFServer, and simultaneously writing files into the main node and the redundant node.

And fifthly, replying that the ZFSServer is successfully written after the primary node and the redundant node are written together.

And sixthly, the ZFSServer corresponds and stores the index number of the written file, the identification of the main node and the identification of the redundant node of the written file.

Seventhly, the ZFSServer replies that the writing of the access layer is completed, and the access layer replies that the writing of the user is completed.

A file reading process:

the first step is as follows: the device in the access layer acquires the name of the read file, determines the index number of the read file, and sends the index number of the read file to the corresponding ZFServer of the read file;

and secondly, the ZFSServer determines the identifier of the main node and/or the identifier of the redundant node for reading the file according to the index number of the read file, and reads the file from the ZFS pool.

Fig. 4 is a schematic diagram of an architecture of an example of an optional file system in an embodiment of the present invention, and as shown in fig. 4, an access layer is responsible for parsing a file semantic, and converts a semantic of an application operating on a file and a directory into a semantic recognizable for a DZFS back-end index layer and a storage layer. The multi-tenant isolation and the like are also required to be responsible in a public cloud environment, the layer is usually stateless, and the NFSv4 interface, SMB2.0 and the like are generally supported in the industry.

The index layer is responsible for biomimetic, typically multi-tenant, sharing of directories and file concepts of the local file system over a distributed architecture. The benefits of this architecture are:

index data is separated, more indexes are enhanced Central Processing units (CPU-Intensive, central Processing Unit-Intensive), data are enhanced Input/Output (IO-Intensive, input/Output-Intensive), and different hardware requirements of the two enable the data to be respectively deployed with the most cost performance under more targeted hardware, and respectively extend outwards (scale out) and facilitate independent architecture evolution. If mixed, there may be problems: the usage rate of the storage pool occupied by the data is low, but the index consumption is large, and the capacity must be expanded, so that the overall resource utilization rate is reduced.

Reducing index and IO interference: metadata operations actually occupy a large part in the whole file system operations, for example, HEAD requests for back source use of a Content Delivery Network (CDN) do not need to access data in most cases, which is particularly obvious in a multi-tenant scenario, and if indexes and data coexist, it is easy to cause the index-intensive applications to interfere with IO, so that performance of both the indexes and data is reduced.

More suitably supporting the resilient framework: for example, in a scenario where an index and data coexist, it is difficult to represent a very large file (e.g., PB level), and in a separate architecture, operations such as data scheduling can be performed more conveniently according to an access model of the data.

The storage layer is a guarantee of high reliability and high availability of data, the requirement of the business system on an IO access mode must be considered, and the random IO requirement required by file storage is considered to be realized more efficiently. ZFS is already single-machine file storage and has the file system with integrated disk management reliability and availability top level, so that the storage layer design considers more distributed availability and consistency.

For each layer deployment the following:

for the access stratum, there are two main points: the protocol Layer (protocol Layer) performs semantic conversion of the protocol, for example, the above NFSv4 or smb 2.0.0 protocol; communication among the access layer, the index layer and the storage layer is completed through a universal Remote Procedure Call (RPC) framework.

For the index layer, in terms of implementation, two parts of directory and file index are implemented respectively, which are called DirIdx and FileIdx. On the persistence of the index data, the two parts are stored in a split mode.

DirIdx is responsible for IndexServer, depends on external database storage persistence, is not huge even if massive small file scenes exist, can be efficiently accessed on an external distributed database (distributed DB, distributed Data Base), and meanwhile, the index frequency of access layer access to IndexServer is not high as will be explained below.

FileIdx is responsible for ZFSServer, is persisted by each storage node ZFS, and manages the primary index from the file to the parent directory (including the namespace root directory).

Each file in the DZFS is assigned a unique number (equivalent to the Inode number of a conventional file store) in the namespace to which it belongs, and the DZFS uses the file parent directory (including the namespace root directory) number as prefix _ id, the file itself number as file _ id, and prefix _ id/file _ id as unique key values indexed in the entire namespace.

DirIdx only describes the basis for directory entry existence and serves as the basis for real-time distribution of subfiles under directory entries. After the client creates the directory, dirIdx is recorded by the database of the IndexServer, wherein the primary key is the unique ID of the directory entry, and the database entry contains the subfile routing real-time distribution of the subfile. Colloquially, newly created directories are allocated to different ZFSserver by IndexServer, and once ZFSServer is confirmed in normal operation of the cluster, the client side can directly communicate ZFSServer to operate the directories and subfiles.

Fig. 5 is a schematic diagram of an alternative directory in the embodiment of the present invention, as shown in fig. 5, including directory/dir 0/dirx/, dir0/diry/,/dir0/diry/dile α/,/dir1/,/dir2/file0/; multiple subdirectories are divided into multiple ZFSServer processes. As can be seen from the above routing design using the unique number in DirIdx as the key, the client reads and writes the file and can obtain the prefix _ id, and the route can be obtained through the prefix _ id, thereby finally achieving the purpose of data distribution.

The index layer is described in more detail:

and (3) discovering the nodes: a lease heartbeat (heartbeat) between the IndexServer and the ZFSServer is initiated by the IndexServer, so that the IndexServer does not need to expose the service to the ZFSServer, and node discovery is guaranteed by some external service discovery applications such as zookeeper. Therefore, indexServer needs to record a zfserver list of allowed services in the current cluster, and only zfservers in the list allow services.

Route maintenance: normally, indexServer needs to ensure that one leaf directory is served by only one ZFSServer, and needs to maintain the correctness of this routing table through, for example, a lease mechanism. The route acquired by the client has only two situations: one is route correct: normally, the route is not changed, and the route table maintained by the client is the same as that maintained by the IndexServer. The other is routing expiration: indexServer schedules the DirIdx routes, resulting in the routing table maintained by the client being out of date.

After the client acquires the route of a certain subdirectory for the first time, the client does not need to communicate with the IndexServer regularly to acquire the latest route, because whether the route is expired or not is checked in normal communication with the ZFSServer, and thus the pressure of the IndexServer is not high. Meanwhile, the single entry of the subdirectory route only identifies the route of the single directory to the ZFServer, so that the cache pressure on the client is extremely low. And leasing is realized between the ZFServer and the IndexServer so as to prevent the ZFServer from actively quitting the service after the network partition appears.

The ZFServer is a core module on a key IO link, and writes data into a storage node through interaction with bottom storage. The ZFSServer has the structural characteristic of no state, and realizes the persistence of client files and file attributes through a bottom storage layer module and ZFS, so that the high reliability of data is ensured.

Routing scheduling: indexServer only performs routing scheduling when: one case is directory splitting: if the single sub-directory is too hot and the sub-directory is too many, splitting the directory by the same semantics as the client and changing the file key to prefix _ id/num/file _ id to achieve the purpose of distribution. And the heat and the usage are reported by ZFSServer through lease heartbeat. Another case is directory migration: if the overall ZFSServer heat usage is not balanced, the directory route is migrated to the idle ZFSServer (including the newly added ZFSServer). Reported by lease heartbeat as above. Yet another case is ZFSServer service exception: and (4) reporting and quitting by the ZFSServer leasing heartbeat per se, or scheduling by the client overtime. Other situations are network anomalies, network zoning, etc.

IndexServer high availability: since the access pressure of the IndexServer is very small, the structure of the main and standby devices is realized by external service discovery such as zookeeper, only the main IndexServer can update (update) the database, and the auxiliary IndexServer only enjoys the right of inquiring (query) the database to maintain the buffer space of the auxiliary IndexServer so as to be beneficial to quick switching in case of abnormity.

Additional services: the tenant functions of file system additional services such as QUOTA, QOS, snapshot and the like can be generally attributed to the setting of the attributes of the directory file in the distributed file system.

That is, the database indexing with unique id as the primary key by the indexing layer also achieves the efficiency of list (size range list). In conclusion, the purposes of index separation, index IO interference reduction and the like are met.

For the storage layer, efficient random IO and high reliability required for file storage have been achieved by ZFS standalone file systems. The storage layer also needs to be highly available through an additional module, and a distributed file system is generally guaranteed by using redundancy, so that the Raft is introduced as a consistency algorithm in the redundancy.

The DirIdx indexes a single ZFSServer during indexing, the ZFSServer selects a storage layer single machine ZFS process when a subfile based on the DirIdx is written, the directory is regarded as a primary copy on the ZFS, and the ZFS process controls and initiates copy writing and synchronization to other nodes. After the copy writing is completed, the master ZFS stand-alone copies the file returned by ZFS server to which nodes are copied, and the ZFS server records the copied nodes for the file index in the memory. In the absence of anomalies, the file read only passes through the primary copy.

The most important implementation of the storage layer is to maintain consistency between copies through Raft. Fig. 6 is a schematic flow diagram of an optional Raft algorithm in the embodiment of the present invention, and as shown in fig. 6, depending on that ZFS is a transactional file storage system, an entry is left in the aspect of Raft access, and meanwhile, storage in a copy-on-write (COW) form can ensure power of transaction log (log) operation, and the like, so that it is simpler to reform storage layer ZFS transaction processing into distributed consistency.

Redundant synchronous write:

as shown in the figure 6 of the drawings,

(1) ZFS submits transactions to the Raft node.

(2) Raft records wal log to persist operations (e.g., x = 1) to the transaction space that most of the nodes can commit to the state machine to apply after persistence.

(3) And the Raft StateMechine converts the commit state application operation, and operates the transaction log storage space (including cache) of the ZFS. (the log spatial content is such as: x:9, y 22, z 33, which after application of the secondary operations becomes: x:1, y 22.

(4) (5) transaction flow of ZFS itself, ensuring atomicity of file IO with transaction log.

The asynchronous snapshot stores the transaction log of ZFS, and the snapshot is generated and acquired by the Raft control and used as the Raft wave log compact to avoid the infinite growth of the Raft wave log.

Therefore, the final distributed consistency is achieved through the atomicity ensured by the transaction flow of the ZFS in the whole process and the algorithm safety provided by the Raft.

Data migration: as described above, in routing scheduling, data migration is caused by triggering index layer DirIdx scheduling under any condition, and the ZFSServer newly scheduled in the directory can start data migration by lease and route expiration prompt, but due to the COW characteristic of ZFS, the impact of data migration overhead on cluster performance is small.

File indexing: fileIdx and file information and attributes are maintained by ZFS zpool itself, and metadata storage flattened by prefix _ id/file _ id can play a very high performance. After prefix _ id/file _ id is used as a main identifier, the files in the ZFS Pool do not need to distinguish directories through a directory tree, but the prefix _ id is used as a list basis, and the prefix _ id/file _ id is used as a direct index basis, so that the method is extremely efficient.

Additional services: and the list of prefix _ id is additionally served by using the characteristic of ZFS, so that the efficiency is extremely high.

Therefore, based on the three-layer architecture design, the DZFS system can shield single-module single-node faults.

For example, a storage tier storage node ZFS zpool failure is first perceived by all ZFS servers of its customers, and the ZFS servers can modify all file indexes currently maintained by the storage tier storage node to become master nodes for further providing normal services. The ZFSServer will then send notification messages to these new primary nodes notifying the reselection of a secondary node store for the file concerned. (e.g., to restore three copies from only the remaining two copies).

The index layer ZFServer failure is sensed by the IndexServer, and the maintained route (DirIdx) is redistributed to the ZFServer nodes without failure. The ZFSServer needs only a simple access to all storage tier nodes to be able to collect all the indices of the files that should now be maintained at once by means of prefix _ id as described above.

And the index layer IndexServer fault is sensed by services such as zookeeper and the like, informs the slave IndexServer to be switched to be the master, and acquires the latest route distribution from the database Query to continuously maintain the index layer route distribution.

The access layer client or gateway has faults, which are managed by user application, and the DZFS index layer obviously supports multi-client access, so that the access layer can be constructed by multi-path.

From the perspective of three-layer docking, the access layer can allocate files to different index clusters according to different namespaces, and the failure of one namespace does not affect the other namespace. While different index clusters may employ different redundancy settings, which are maintained by IndexServer. For example, using two index clusters of the same set of storage cluster physical resources, one may set two copy redundancies and one may set an erasure code redundancy because redundancy setting information is included when sending writes to the storage tier.

The invention provides a storage method, which comprises the following steps: the method comprises the steps that under the condition that access equipment receives a storage request aiming at a first file, an identification code is distributed to the first file, an index number of the first file is generated by utilizing an existing parent directory and the identification code of the first file, the association relation between the name of the first file and the index number of the first file is stored, and the index number of the first file are sent to index equipment, wherein the index equipment is used for storing the corresponding relation between the index number of the first file and the identification of the storage node of the first file into first equipment and second equipment of the index equipment after the identification of the storage node used for storing the first file in the storage equipment is determined; that is to say, in the present invention, when the access device acquires the storage request to allocate the identification code for the first file, the access device generates the index number of the first file through the existing parent directory of the first file and the identification code, so that the association relationship between the name of each file and the index number of the file can be stored in the access device, which is beneficial to reading the file and modifying the file, and the index numbers of the first file and the first file are sent to the index device, so that the index device can determine the identifier of the storage node for storing the first file, and the index number of the first file and the identifier of the storage node of the first file are stored in the first device and the second device of the index device in a corresponding manner, so that when the first device fails, the second device can also reset the route, thereby ensuring normal storage and reading of the file, and further improving the reliability of file storage, and further improving the storage efficiency of the file storage.

Example two

The storage method is described below with respect to each device side deployed in the file system.

First, a storage method is described with an access device side.

An embodiment of the present invention provides a storage method, where the method is applied to an access device, fig. 7 is a schematic flowchart of an optional storage method in an embodiment of the present invention, and as shown in fig. 7, the storage method may include:

s701: in the case of receiving a storage request for a first file, assigning an identification code to the first file;

s702: generating an index number of the first file by using the existing parent directory and the identification code of the first file;

s703: storing the incidence relation between the name of the first file and the index number of the first file;

s704: sending the first file and the index number of the first file to first equipment of the index equipment;

the index device is used for storing the corresponding relation between the index number of the first file and the identification of the storage node of the first file into the first device and the second device of the index device after the identification of the storage node of the first file in the storage device is determined.

In an optional embodiment, the method further includes:

sending an acquisition request aiming at the routing information of the index equipment to the second equipment;

receiving routing information of the index device sent by the second device;

when the received version number of the routing information of the index device is the same as the stored version number of the routing information of the index device, determining the stored routing information of the index device as the determined routing information of the index device;

when the version number of the received routing information of the index device is different from the stored version number of the routing information of the index device, determining the received routing information of the index device as the determined routing information of the index device;

and determining equipment corresponding to the first file from the first equipment of the indexing equipment according to the determined routing information of the indexing equipment and the existing parent directory of the first file.

In an optional embodiment, the method further includes:

when a response message which is sent by equipment corresponding to the first file and aims at successful writing of the first file is received, prompt information is generated; the prompt information is used for prompting the user that the first file is successfully stored.

In an optional embodiment, the method further comprises:

receiving a reading request of a stored second file;

determining the index number of the second file according to the incidence relation between the name of the stored file and the index number of the file;

determining equipment corresponding to the second file from the first equipment of the indexing equipment according to the determined routing information of the indexing equipment and the existing parent directory of the second file;

sending the index number of the second file to the equipment corresponding to the second file;

wherein the index number of the second file is used to: and the device corresponding to the second file determines the storage node of the second file and reads the second file.

In an optional embodiment, the method further includes:

when the data of the main node in the storage node of the stored third file is changed, determining the index number of the third file according to the incidence relation between the name of the stored file and the index number of the file;

determining equipment corresponding to the third file from the first equipment of the indexing equipment according to the determined routing information of the indexing equipment and the existing parent directory of the third file;

sending the modified third file and the index number of the third file to equipment corresponding to the third file;

Then, the storage method is described with the index device side.

An embodiment of the present invention provides a storage method, where the method is applied to an index device, fig. 8 is a schematic flow diagram of another optional storage method in the embodiment of the present invention, and as shown in fig. 8, the method may include:

s801: receiving a first file sent by access equipment and an index number of the first file;

the index number of the first file is generated by the access device by using the existing parent directory of the first file and the identification code allocated to the first file by the access device;

s802: determining an identifier of a storage node in the storage device for storing the first file;

s803: and storing the corresponding relation between the index number of the first file and the identifier of the storage node of the first file in the first device and the second device of the index device.

In an optional embodiment, the method further includes:

when a reply message of successful writing of the storage equipment to the first file is received, sending a response message of successful writing of the first file to the access equipment; and the response message is used for the access equipment to prompt the user that the first file is successfully stored.

In an optional embodiment, the method further comprises:

when the access equipment receives a reading request of a stored second file, the first equipment receives an index number of the second file;

the first equipment determines a storage node of a second file according to the corresponding relation between the index number of the stored file and the identifier of the storage node of the file;

the first device reads the second file from a storage node of the second file.

In an optional embodiment, the method further includes:

when the data of the main node in the storage node of the stored third file is modified, the first device receives the modified third file and the index number of the third file sent by the access device;

the first equipment determines a backup node in the storage nodes of the third file according to the corresponding relation between the index number of the stored file and the identification of the storage node of the file;

wherein the backup node for the third file is to: and updating the stored data to the modified third file.

In an optional embodiment, the method further comprises:

the second equipment receives an acquisition request of the routing information aiming at the index equipment from the access equipment;

the second device determines the routing information of the first device managed by the second device as the routing information of the index device, and sends the routing information to the access device.

In an optional embodiment, the method further comprises:

Based on the same inventive concept of the foregoing embodiments, embodiments of the present invention provide an access device, which is consistent with the access device provided in one or more of the above embodiments.

Fig. 9 is a schematic structural diagram of an optional access device in an embodiment of the present invention, and as shown in fig. 9, the access device includes:

an assigning module 91 configured to assign an identification code to the first file in a case where a storage request for the first file is received;

a generating module 92, configured to generate an index number of the first file by using an existing parent directory and an identification code of the first file;

the first storage module 93 is configured to store an association relationship between a name of the first file and an index number of the first file;

a sending module 94, configured to send the name of the first file and the index number of the first file to the indexing device;

In an optional embodiment, the access device is further configured to:

sending an acquisition request aiming at the routing information of the index device to the second device;

receiving routing information of the index device sent by the second device;

when the version number of the received routing information of the index device is different from the version number of the stored routing information of the index device, determining the received routing information of the index device as the determined routing information of the index device;

In an optional embodiment, the access device is further configured to:

when a response message aiming at successful writing of the first file and sent by equipment corresponding to the first file is received, prompt information is generated; and the prompt information is used for prompting the user that the first file is successfully stored.

In an optional embodiment, the access device is further configured to:

receiving a reading request of a stored second file;

wherein the index number of the second file is used for: and the device corresponding to the second file determines the storage node of the second file and reads the second file.

In an optional embodiment, the access device is further configured to:

In practical applications, the distribution module 91, the generation module 92, the first storage module 93 and the sending module 94 may be implemented by a processor located on an access device, specifically, implemented by a Central Processing Unit (CPU), a Microprocessor Unit (MPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like.

Embodiments of the present invention provide a first device, which is the same as the first device described in one or more embodiments above.

Fig. 10 is a schematic structural diagram of an optional indexing device in an embodiment of the present invention, and as shown in fig. 10, the indexing device includes:

a receiving module 101, configured to receive a first file and an index number of the first file sent by an access device; the index number of the first file is generated by the access device by using the existing parent directory of the first file and the identification code allocated to the first file by the access device;

a determining module 102, configured to determine an identifier of a storage node in a storage device, where the storage node is used to store a first file;

the second storage module 103 is configured to store a correspondence between the index number of the first file and the identifier of the storage node of the first file in the first device and the second device of the index device.

In an optional embodiment, the first device is further configured to:

when a reply message of successful writing of the storage equipment to the first file is received, sending a response message of successful writing to the first file to the access equipment; and the response message is used for the access equipment to prompt the user that the first file is successfully stored.

In an alternative embodiment, the first device is further configured to:

when the access equipment receives a reading request of a stored second file, receiving an index number of the second file;

determining a storage node of a second file according to the corresponding relation between the index number of the stored file and the identifier of the storage node of the file;

and reading the second file from the storage node of the second file.

In an optional embodiment, the first device is further configured to:

when the data of the main node in the storage node of the stored third file is changed, receiving the modified third file and the index number of the third file sent by the access equipment;

determining a backup node in the storage nodes of the third file according to the corresponding relation between the index number of the stored file and the identification of the storage node of the file;

In an optional embodiment, the second device is further configured to:

receiving an acquisition request of routing information aiming at the index equipment from the access equipment;

and determining the routing information of the first equipment managed by the second equipment as the routing information of the index equipment, and sending the routing information to the access equipment.

In an optional embodiment, the second device is further configured to:

when one of the first devices managed by the second device fails, selecting one of the first devices from the other first devices managed by the second device;

and sending the corresponding relation between the index number of the file stored in one of the first devices and the identifier of the storage node of the file to the selected one of the first devices for storage, and updating the routing information of the index device.

In practical applications, the receiving module 101, the determining module 102 and the second storing module 103 may be implemented by a processor located on the indexing device, specifically, implemented by a CPU, an MPU, a DSP or an FPGA.

Fig. 11 is a schematic structural diagram of another optional access device in an embodiment of the present invention, and as shown in fig. 11, an embodiment of the present invention provides an access device 1100, including:

a processor 111 and a storage medium 112 storing instructions executable by the processor 111, wherein the storage medium 112 depends on the processor 111 to perform operations through a communication bus 113, and when the instructions are executed by the processor 111, the storage method of the first embodiment is performed.

It should be noted that, in practical applications, the various components in the terminal are coupled together by a communication bus 113. It is understood that the communication bus 113 is used to enable connection communication between these components. The communication bus 113 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration the various buses are labeled in figure 11 as communication bus 113.

Fig. 12 is a schematic structural diagram of another optional indexing device in an embodiment of the present invention, and as shown in fig. 12, an embodiment of the present invention provides an indexing device 1200, including:

a processor 121 and a storage medium 122 storing instructions executable by the processor 121, wherein the storage medium 122 depends on the processor 121 to perform operations through a communication bus 123, and when the instructions are executed by the processor 121, the storage method of the first embodiment is performed.

It should be noted that, in practical applications, the various components in the terminal are coupled together by a communication bus 123. It is understood that the communication bus 123 is used to enable connection communication between these components. The communication bus 123 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration the various busses are labeled in figure 12 as communication bus 123.

Fig. 13 is a schematic structural diagram of an optional file system in the embodiment of the present invention, and as shown in fig. 13, the file system 1300 includes the access device in one or more embodiments, the index device in one or more embodiments, and a storage device.

Embodiments of the present invention provide a computer storage medium storing executable instructions, and when the executable instructions are executed by one or more processors, the processors execute the storage method described in the first embodiment.

The computer-readable storage medium may be a magnetic random access Memory (FRAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical Disc, or a Compact Disc Read-Only Memory (CD-ROM).

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims

1. A storage method is applied to access equipment and comprises the following steps:

storing the incidence relation between the name of the first file and the index number of the first file;

sending the first file and the index number of the first file to index equipment;

2. The method of claim 1, further comprising:

receiving routing information of the index device sent by the second device;

3. The method according to claim 1 or 2, characterized in that the method further comprises:

receiving a reading request of a stored second file;

determining the index number of the second file according to the association relationship between the name of the stored file and the index number of the file;

determining equipment corresponding to the second file from first equipment of the indexing equipment according to the determined routing information of the indexing equipment and the existing parent directory of the second file;

4. The method according to claim 1 or 2, characterized in that the method further comprises:

when the data of a main node in a storage node of a stored third file is changed, determining the index number of the third file according to the association relationship between the name of the stored file and the index number of the file;

determining equipment corresponding to the third file from first equipment of the indexing equipment according to the determined routing information of the indexing equipment and the existing parent directory of the third file;

the device corresponding to the third file is configured to determine a backup node in the storage node of the third file according to the index number of the third file, where the backup node of the third file is configured to update the stored data to the modified third file.

5. A storage method is applied to an index device and comprises the following steps:

determining an identifier of a storage node in the storage device for storing the first file;

6. The method of claim 5, further comprising:

the first equipment determines the storage node of the second file according to the corresponding relation between the index number of the stored file and the identifier of the storage node of the file;

and the first device reads the second file from the storage node of the second file.

7. The method of claim 5, further comprising:

when the data of the main node in the storage node of the stored third file is changed, the first device receives the modified third file sent by the access device and the index number of the third file;

wherein the backup node of the third file is configured to: updating the stored data to the modified third file.

8. The method of claim 5, further comprising:

the second equipment receives an acquisition request of the routing information of the index equipment from the access equipment;

and the second equipment determines the routing information of the first equipment managed by the second equipment as the routing information of the index equipment, and sends the routing information to the access equipment.

9. The method of claim 8, further comprising:

when one of the first devices managed by the second device fails, the second device selects one first device from the other first devices managed by the second device;

10. An access device, comprising:

the generating module is used for generating an index number of the first file by utilizing the existing parent directory of the first file and the identification code;

11. An indexing device, comprising:

12. An access device, comprising:

a processor and a storage medium storing instructions executable by the processor to perform operations dependent on the processor via a communication bus, the instructions when executed by the processor performing the storage method of any of claims 1 to 4.

13. An indexing apparatus, comprising:

a processor and a storage medium storing processor-executable instructions that depend on the processor via a communication bus to perform operations, the instructions when executed by the processor performing the storage method of any of claims 5 to 9 above.

14. A computer storage medium having stored thereon executable instructions which, when executed by one or more processors, perform the storage method of any one of claims 1 to 4 or the storage method of any one of claims 5 to 9.