CN110019083B - Storage method and device of distributed file system, electronic equipment and storage medium - Google Patents

Storage method and device of distributed file system, electronic equipment and storage medium Download PDF

Info

Publication number
CN110019083B
CN110019083B CN201710849424.9A CN201710849424A CN110019083B CN 110019083 B CN110019083 B CN 110019083B CN 201710849424 A CN201710849424 A CN 201710849424A CN 110019083 B CN110019083 B CN 110019083B
Authority
CN
China
Prior art keywords
data
space
storage
disk
size
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710849424.9A
Other languages
Chinese (zh)
Other versions
CN110019083A (en
Inventor
吴益群
吴洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201710849424.9A priority Critical patent/CN110019083B/en
Publication of CN110019083A publication Critical patent/CN110019083A/en
Application granted granted Critical
Publication of CN110019083B publication Critical patent/CN110019083B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1847File system types specifically adapted to static storage, e.g. adapted to flash memory or SSD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A storage method, device, electronic equipment and storage medium of a distributed file system; the storage method of the distributed file system comprises the following steps: determining available storage spaces on a plurality of disks according to reserved spaces on the plurality of disks respectively used by user programs; the reserved space is determined according to configuration information; the available storage space comprises a storage space which is not occupied by user data except the reserved space in a data disc of a magnetic disc; and storing the user data of the distributed file system to the corresponding disk according to the available storage space on the disks. At least one embodiment of the application can dynamically adjust the storage space of the user program in the distributed file system.

Description

Storage method and device of distributed file system, electronic equipment and storage medium
Technical Field
The present invention relates to a distributed file system, and in particular, to a storage method and apparatus for a distributed file system, an electronic device, and a storage medium.
Background
In a distributed file system, the physical machines that contain the data managed by the distributed file system for storing the actual user data are generally referred to as data storage nodes. In a data storage node, the use of a disk is generally to divide the disk into a system disk and a data disk; data disks are typically used to store user data managed by a distributed file system, and system disks are typically used to store operating systems, data storage node management programs (programs for managing disk space and user data on data storage nodes), and so on. The data disks are uniformly managed by the distributed file system, and all the data disks are used for storing user data written into the distributed file system by a user (namely, the user data managed by the distributed file system).
In general, in addition to the data storage node management program, other user programs are also executed in the data storage node, and these user programs also have data storage requirements. These user programs and the distributed file system can be viewed as distinct applications on the application layer, so that the data storage of the user programs is not managed by the distributed file system. To meet the storage requirements of user programs running on data storage nodes, there are currently two solutions:
one is to use an independent disk not managed by the distributed file system for the user program;
one is to use a separate disk partition that is not managed by the distributed file system for use by the user program.
The above solution has the following problems:
when the independent disk partition is used or the independent disk meets the storage requirement of the user program, the size of the independent disk partition or the size of the independent disk are determined in advance, and dynamic adjustment cannot be performed according to the actual storage requirement of the user program;
in the independent disk partition, or under the condition that the independent disk has hardware failure, the purpose of repairing the space can be achieved only by maintaining the hardware.
Disclosure of Invention
The application provides a storage method and device of a distributed file system, electronic equipment and a storage medium, which can dynamically adjust the storage space of a user program in the distributed file system.
The technical scheme is as follows.
A storage method of a distributed file system comprises the following steps:
determining available storage spaces on a plurality of disks according to reserved spaces on the plurality of disks respectively used by user programs; the reserved space is determined according to configuration information; the available storage space comprises a storage space which is not occupied by user data except the reserved space in a data disc of a magnetic disc;
and storing the user data of the distributed file system to the corresponding disks according to the available storage spaces on the disks.
Wherein, the determining the available storage spaces on the plurality of disks according to the reserved spaces on the plurality of disks respectively used by the user program may include:
determining the size of available storage space on the plurality of disks respectively; the size of the available storage space on any disk is: the size of the storage space of the data disk of the disk is subtracted by the size of the reserved space on the disk and the size of the storage space occupied by the user data.
Wherein, the storage method may further include:
and updating the parameters of the reserved space in the configuration information according to the modification of the reserved space by the remote calling interface called by the user.
Wherein, the storage method may further include:
when the configuration information is updated, for the disk needing to increase the reserved space, if the size of the available storage space on the disk does not reach the size to be increased of the reserved space, the user data with the corresponding size in the disk is moved to other disks in the data storage nodes or other data storage nodes of the distributed file system, and the metadata is correspondingly modified.
Wherein the user data of the respective size may refer to user data of a size at least equal to the size of the increase in the reserved space requirement minus the size of the available storage space.
A storage device of a distributed file system, comprising:
the determining module is used for determining available storage spaces on the plurality of disks according to reserved spaces on the plurality of disks respectively used by the user program; the size of the reserved space is determined according to configuration information; the available storage space comprises a storage space which is not occupied by user data except the reserved space in a data disc of a magnetic disc;
and the storage module is used for storing the user data of the distributed file system to the corresponding disk according to the available storage space on the disks.
The determining, by the determining module, the available storage spaces on the multiple disks according to the reserved spaces on the multiple disks respectively used by the user program may include:
the determining module determines the size of the available storage space on the plurality of disks respectively; the size of the available storage space on any disk is as follows: the size of the storage space of the data disk of the disk is subtracted by the size of the reserved space on the disk and the size of the storage space occupied by the user data.
Wherein, the storage device can also comprise:
and the updating module is used for updating the parameters of the reserved space in the configuration information according to the modification of the reserved space by the remote calling interface called by the user.
Wherein, the storage device can also include:
and the data migration module is used for moving the user data with the corresponding size in the disk to other disks in the data storage nodes or to other data storage nodes of the distributed file system and correspondingly modifying the metadata if the size of the available storage space on the disk does not reach the size to be increased of the reserved space for the disk needing to increase the reserved space when the configuration information is updated.
Wherein the user data of the respective size may refer to user data of a size at least equal to the size of the increase in the reserved space requirement minus the size of the available storage space.
An electronic device for storage in a distributed file system, comprising: a memory and a processor;
the memory is used for storing programs for storing in the distributed file system; the program for storage management in a distributed file system, when read and executed by the processor, performs the following operations:
determining available storage spaces on a plurality of disks according to reserved spaces on the plurality of disks respectively used by user programs; the size of the reserved space is determined according to configuration information; the available storage space comprises a storage space which is not occupied by user data except the reserved space in a data disc of a magnetic disc;
and storing the user data of the distributed file system to the corresponding disk according to the available storage space on the disks.
A storage medium storing a program for storage in a distributed file system; the program for storing when executed in a distributed file system performs the following operations:
determining available storage spaces on a plurality of disks according to reserved spaces on the plurality of disks respectively used by user programs; the reserved space is determined according to configuration information; the available storage space comprises a storage space which is not occupied by user data except the reserved space in a data disc of a magnetic disc;
and storing the user data of the distributed file system to the corresponding disk according to the available storage space on the disks.
The application includes the following advantages:
according to at least one embodiment of the application, the storage space of the user data and the storage space used by the user program are uniformly managed by the distributed file system, and part of the disk space can be reserved according to the configuration information for the user program to use, so that the storage space used by the user program can be adjusted according to requirements, and the data storage requirements of the user program on the data storage node can be met more flexibly.
In an implementation manner of the embodiment of the application, the parameters of the reserved space can be dynamically modified in a manner of updating the configuration information, so that dynamic changes of the storage requirements of the user programs are met.
Of course, it is not necessary for any product to achieve all of the above-described advantages at the same time for the practice of the present application.
Drawings
Fig. 1 is a flowchart of a storage method of a distributed file system according to a first embodiment;
FIG. 2 is a schematic diagram of storage space in a data disk of a magnetic disk in an example of the first embodiment;
fig. 3 is a schematic diagram of a storage space after user data is stored in a data disk of a magnetic disk according to an example of the first embodiment;
FIG. 4a is a diagram of a storage space when the reserved space is to be increased in the first embodiment;
FIG. 4b is a diagram of a storage space after the reserved space is increased in the example of the first embodiment;
fig. 5 is a schematic diagram of a storage apparatus of a distributed file system according to a second embodiment.
Detailed Description
The technical solution of the present application will be described in more detail with reference to the accompanying drawings and embodiments.
It should be noted that, if not conflicting, different features in the embodiments and implementations of the present application may be combined with each other and are within the scope of protection of the present application. Additionally, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.
In one configuration, a computing device storing in a data storage node or distributed file system may include one or more processors (CPUs), input/output interfaces, network interfaces, and memory (memories).
The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium. The memory may include one or more modules.
Computer-readable media include both non-transitory and non-transitory, removable and non-removable storage media that can implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
Herein, user data may refer to data managed by the distributed file system (typically managed by a data storage node management program) written by a user to the distributed file system, and may be stored in a disk of a data storage node of the distributed file system.
The data required to be stored by the user program may include one or more of: input data of the user program, process data generated in the operation process, operation environment data, operation result data, and the like. The code or executable file of the user program itself may be stored together with the user data, or may be stored in a storage space used by the user program, and selected by the user.
An embodiment of a storage method of a distributed file system, as shown in fig. 1, includes steps S110 to S120:
s110, determining available storage spaces on a plurality of disks according to reserved spaces on the plurality of disks, which are respectively used by user programs; the size of the reserved space is determined according to configuration information; the available storage space comprises storage space which is not occupied by user data except the reserved space in a data disk of a magnetic disk;
and S120, storing the user data of the distributed file system to the corresponding disk according to the available storage space on the disks.
In this embodiment, the storage space of the user data and the storage space for the user program (i.e., the reserved space) may be uniformly arranged by the distributed file system, and a part of the disk space may be reserved according to the configuration information for the user program to use, so that the storage space of the user program may be adjusted according to the requirement, and the data storage requirement of the user program on the data storage node may be more flexibly satisfied.
In this embodiment, the distributed file system may reserve a part of storage space in a data disk of a disk originally used for storing user data as a reserved space for use by a user program, and although the content stored in the reserved space is not managed by the distributed file system, the reserved space itself may be managed by the distributed file system in a unified manner. That is, although the storage space of the user data and the storage space used by the user program are still isolated from each other, and the distributed file system cannot read and write the reserved space, both the storage space of the user data and the storage space used by the user program can be uniformly managed by the distributed file system.
In this embodiment, the multiple disks may be all or part of the disks in the distributed file.
In this embodiment, the available storage spaces on the multiple disks may be determined when the user data is written, or the available storage space on the disk where the user data is written this time may be updated after the user data is written each time, so that the available storage space on the disk is determined when the user data is written to the distributed file system.
The method of this embodiment may be, but is not limited to being, performed by a data storage node in a distributed file system, such as by upgrading an existing data storage node management program, to implement the method of this embodiment.
In this embodiment, the parameters of the reserved spaces on the multiple disks may be the same or different; when the parameters of the reserved space on different disks are different, the parameters of the reserved space on different disks can be respectively specified in the configuration information. Wherein the parameters of the reserved space may include one or more of the following: the size of the reserved space, the address range, etc.
In this embodiment, when the parameter of the reserved space includes the size, the size of the reserved space on different disks may be directly specified in the configuration information, or the size of the total reserved space may also be specified, and the size of the reserved space on different disks is specified by the data storage node itself.
In this embodiment, the storage space serving as the reserved space on the disk may not be specified in advance, that is, only the size of the reserved space on the disk may be set, but specific storage spaces are not specified as the reserved space. When storing both the user data written in the distributed file system and the data required to be stored by the user program in a disk, the underlying operating system can ensure that the user data and the data do not conflict with each other, i.e. the same storage space is not occupied.
In this embodiment, the storage space on the disk as the reserved space may also be specified in advance, for example, which addresses are used as the reserved space.
In this embodiment, the user data of the distributed file system may be allocated to the corresponding disk for storage according to the available storage spaces on the multiple disks, and when there is no available storage space on one disk (for example, the size of the available storage space on the disk is 0), the user data may not be allocated to the disk any more, so as to achieve the purpose of reserving the reserved space for the user program to use.
In this embodiment, when allocating user data of the distributed file system to a corresponding disk for storage, the allocation may be performed according to a preset policy; the preset strategy can be set or modified according to actual requirements.
In this embodiment, the configuration information may be set or modified by a user, or may be set or modified by a system administrator or the like. The configuration information may be set or modified by calling a remote call interface, or the set or modified configuration information may be distributed by the distributed file system to the data storage node or the execution main body of step S110 for storage, so that the stored configuration information may be read when step S110 is executed.
In this embodiment, the form of the configuration information is not limited, and may be a configuration file, a parameter list, and the like.
In one implementation, the method may further include:
and virtualizing the storage space which is used as the reserved space on the disk into one or more disks used by the user program.
In this implementation manner, a storage space may be first allocated on the disk as a reserved space, an address or an address range of the storage space allocated on the disk as the reserved space is recorded, and the storage space corresponding to the recorded address or address range is virtualized into one or more disks and provided to the user program for use.
After the implementation mode is adopted, for a user, the user can know how much space for the user program exists as in the case of adopting the independent disk or the independent disk partition to meet the storage requirement of the user program, and does not need to know where the storage space for the user program is specifically located and whether the storage space and the user data are located on the same disk.
In other implementations, the storage space allocated as the reserved space may be directly provided to the user program without virtualization.
In one implementation, the determining available storage space on the multiple disks according to the reserved spaces on the multiple disks respectively used by the user program may include:
determining the size of available storage space on the plurality of disks respectively; the size of the available storage space on any disk is: the size of the storage space of the data disk of the disk is subtracted by the size of the reserved space on the disk and the size of the storage space occupied by the user data.
The size of the data disk storage space of the disk may refer to the size of the total storage space of the data disk of the disk managed by the data storage node management program.
The storage space occupied by the user data may refer to a storage space occupied by the user data stored on the disk.
In this implementation manner, the storage space in the data disk storage space of the disk, except the reserved space, may be a storage space occupied by the user data on the disk, that is, a part of the storage space for storing the user data of the distributed file system; when user data of the distributed file system is stored in a disk, the storage space of the user data on the disk is occupied; of the storage space of the user data, the part of the storage space that has been occupied by the user data may be referred to as used storage space, and the remaining part is available storage space, that is, storage space that can also store the user data.
For example, if the storage space of the disk data disk is 1000M and the reserved space is 200M, the storage space of the user data on the disk is 800M; if 300M of user data has been saved on the disk, the available storage space for the disk is 500M.
In other implementation manners, the size S of the available storage space of the disk may also be recorded (if the user data has not been written in the disk, S is equal to the size of the storage space of the user data), and each time the user data is written, the recorded S is used to subtract the data amount written this time, and the obtained result is used to update S, so as to obtain the size of the available storage space on the disk at present.
In other implementation manners, the address range of the available storage space may also be determined according to the address range of the reserved space in the disk and the address range of the storage space occupied by the user data; the size of the available memory space may also be determined from the address range.
In one implementation, the method may further include:
and updating the parameters of the reserved space in the configuration information according to the modification of the reserved space by calling a remote calling interface by a user.
In this implementation, the user may modify the parameters such as the size and address range of the reserved space, and the parameters of the reserved space in the configuration information are updated accordingly.
In this implementation, when the parameter of the reserved space on one or more disks changes, the available storage space on the disk or the disks may be updated.
In this implementation manner, the parameter of the reserved space in the configuration information at the initial time may be a preset value or a default value (for example, the size of the reserved space at the initial time may be 0), and when the user needs to run the user program, the remote call interface may be called to modify the parameter of the reserved space in the configuration information.
In this implementation manner, if the existing distributed file system is updated to implement the storage method of this embodiment, the process of allocating the reserved space after updating may be regarded as a process of updating the size of the reserved space from 0 to the set size of the reserved space.
In this implementation, the user may specify to modify parameters of the reserved space on one or more disks, or may specify to modify parameters of the reserved space in the distributed file system. In the case that the parameters of the reserved space of the multiple disks are the same, the parameters of the reserved space modified by the user will be applied to the multiple disks.
In this implementation, the modification of the reserved space by the user may be directly modifying the configuration information, or may be adjusting the reserved space in the user interface, and then updating the configuration information according to the adjustment of the user. In other implementation manners, the data storage node may also set parameters of the reserved spaces on the multiple disks respectively; or the user can only set the parameters of the total reserved space on the data storage node, and the data storage node automatically allocates the reserved space on the plurality of disks according to a preset allocation scheme.
In one implementation, the method may further include:
when the configuration information is updated, for the disk needing to increase the reserved space, if the size of the available storage space on the disk does not reach the size to be increased of the reserved space, the user data with the corresponding size in the disk is moved to other disks in the data storage nodes or other data storage nodes of the distributed file system, and the metadata is correspondingly modified.
In this implementation, when the available storage space is not enough to increase the reserved space, the storage space occupied by the user data may be released as the reserved space. For example, the available storage space of a certain disk is only 50M, and the reserved space on the disk needs to be increased by 200M, at this time, 150M user data can be migrated from the storage space occupied by the user data, so that the 150M storage space is freed, and the available storage space of the previous 50M can be used as the increased part of the reserved space on the disk; the size of the available storage space of the disk is updated.
In this implementation, the user data of the corresponding size may refer to user data whose size is at least equal to the size of the increase in the requirement of the reserved space minus the size of the available storage space.
In this implementation manner, the user data in the disk may also be copied to another disk in the data storage nodes or another data storage node of the distributed file system, and a storage space occupied by the copied user data is regarded as an available storage space, and when data of the user program is written in this available storage space, the copied user data may be directly overwritten.
In this implementation, if the reserved space is increased and the size of the available storage space on the disk reaches or exceeds the size to be increased of the reserved space, or the reserved space is decreased, the size of the reserved space may be directly updated, and the updated size of the reserved space is used to determine the available storage space on the disk.
In this implementation manner, if the existing distributed file system is updated to implement the storage method of this embodiment, the process of allocating the reserved space after updating may be regarded as a process of increasing the size of the reserved space (from 0 to the set size of the reserved space).
In other implementations, the message of the increase failure may also be fed back when the available storage space is not enough to increase the reserved space.
The present embodiment is described below by way of an example. In this example, the data storage node management program executes the storage method of this example, and the parameter of the reserved space is the size of the reserved space.
In this example, the data storage node management program records and persistently stores the configuration information to the disk of the data storage node, so that the size of the reserved space can be correctly set when the data storage node is restarted.
In this example, the configuration information includes the size of the reserved space;
in this example, the size of the reserved space in the configuration information may be set or modified by, but not limited to, a user.
In this example, the data storage node manager may provide a Remote Procedure Calling (RPC) interface to the user, allowing the user to modify the configuration information by calling the RPC of the data storage node manager. Other ways of sending the configuration information to the data storage node manager may also be used.
In this example, the configuration information obtained by different data storage nodes may be the same or different.
In this example, the possible formats of the configuration information may include the following two types:
(1) Disk ID1: reserved space size, \8230;, disk IDn: and (5) reserving the space size.
(2) Disk ID1| \8230 | disk IDN, reserved space size 1| \8230 | reserved space size N.
In practical applications, the format of the configuration information is not limited to the two listed above, and may be any format as long as the format can be identified by the main body managing the storage space on the data storage node.
In this example, the data storage node management program calculates the size of the available storage space of the plurality of disks, and stores the user data to the corresponding disk according to the available storage space of the plurality of disks and a preset policy.
In this example, when the data storage node management program calculates the size of the available storage space of a certain disk in the located data storage node, the data storage node management program queries the configuration information stored by itself, and as shown in fig. 2, first subtracts the size of the reserved space in the configuration information from the size of the data disk storage space of the disk to obtain the size of the storage space of the user data on the disk, that is:
the size of the storage space for user data = the size of the data disk storage space of the disk-the size of the reserved space (equation 1)
In this example, the size of the data disk storage space of the magnetic disk is the size of the total storage space of the data disk managed by the data storage node management program.
In this example, the storage space for the user data may be used as a storage space for storing the user data managed by the distributed file system (i.e., the user data written by the user into the distributed file system).
Then, as shown in fig. 3, the available storage space of a certain disk is calculated according to the size of the storage space of the user data (the size of the data disk storage space of the disk — the size of the reserved space), and the size of the used storage space of the disk (i.e. the storage space occupied by the user data on the disk of the distributed file system or the user data on the disk of the distributed file system), that is, calculated according to the following formula:
size of available storage space = size of data disk storage space of magnetic disk-size of reserved space-size of used storage space (equation 2)
In this example, the available storage space of a disk (hereinafter referred to as available storage space) may refer to the remaining storage space on the disk that is available for storing user data managed by the distributed file system; the used storage space of a disk (hereinafter referred to as used storage space) may refer to the storage space occupied by the user data stored in the disk.
In this example, the size of the used storage space is obtained in the following two ways:
in this example, the data storage node management program may record statistics in the memory when storing the user data.
In this example, the data storage node manager scans all files of user data on the corresponding disk (i.e., all files outside the reserved space in the data disk of the disk).
If the statistical value mode is adopted, the disk does not need to be accessed, and the burden on the use of the disk is avoided.
In this example, if the requirement of the storage space of the user program changes, the user may send a request for modifying the configuration information through the RPC interface, and the data storage node management program receives the RPC and updates the configuration information, and performs the following operations:
if the reserved space becomes smaller, the calculation result of the size of the available storage space is updated directly according to equation (2).
If the reserved space is enlarged, the data storage node management program judges whether the size of the available storage space on the disk appointed by the user reaches or exceeds the size to be increased of the reserved space, if not, data copying is carried out, user data managed by the distributed file system with certain size (at least equal to the size required to be increased of the reserved space minus the size of the available storage space) is copied or moved to other disks in the data storage node or other data storage nodes of the distributed file system, metadata recorded by the data storage node management program is correspondingly modified, the purpose of releasing the storage space to the reserved space is achieved, and the size of the available storage space of the disk is updated. If the size of the available storage space reaches the size of the increase of the reserved space requirement, the calculation result of the size of the available storage space of the disk can be directly updated according to the equation (2).
Assuming that there is no available storage space in a disk, namely: the size of the used storage space is equal to the size of the storage space for user data, and the size of the available storage space is 0, as shown in fig. 4 a. At this point, the reserved space on the disk needs to be increased, and the slashed area in fig. 4a indicates the reserved space to be increased.
In this case, the user data in the storage space indicated by the diagonal filled area in fig. 4a is copied or moved to another disk in the data storage node or another data storage node in the distributed file system; the situation after copying or moving is as shown in fig. 4b, the reserved space becomes larger, the size of the increased reserved space is equal to the size of the original reserved space plus the size of the increased reserved space, and the oblique line filling area in fig. 4b represents the increased reserved space; the size of the used memory space, as well as the size of the memory space for user data, is reduced accordingly.
It should be noted that fig. 2, 3, 4a, and 4b are for clarity of illustration of the relationship between the reserved space and other storage spaces, and do not indicate that the reserved space or any storage space must be contiguous.
In a second embodiment, a storage device of a distributed file system is shown in fig. 5, and includes:
a determining module 21, configured to determine available storage spaces on multiple disks according to reserved spaces on the multiple disks, which are respectively used by a user program; the size of the reserved space is determined according to configuration information; the available storage space comprises a storage space which is not occupied by user data except the reserved space in a data disc of a magnetic disc;
and the storage module 22 is configured to store the user data of the distributed file system to the corresponding disk according to the available storage space on the multiple disks.
In this embodiment, the determining module 21 is a part of the storage device responsible for determining the available storage space, and may be software, hardware, or a combination of the two.
In this embodiment, the storage module 22 is a part of the storage device that is responsible for storing user data to a disk, and may be software, hardware, or a combination of the two.
In one implementation manner, the determining, by the determining module, the available storage space on the multiple disks according to the reserved spaces on the multiple disks respectively used by the user program may include:
the determining module determines the size of the available storage space on the plurality of disks respectively; the size of the available storage space on any disk is: the size of the storage space of the data disk of the disk is subtracted by the size of the reserved space on the disk and the size of the storage space occupied by the user data.
In one implementation, the storage device may further include:
and the updating module is used for updating the parameters of the reserved space in the configuration information according to the modification of the reserved space by the remote calling interface called by the user.
In one implementation, the storage device may further include:
and the data migration module is used for moving the user data with the corresponding size in the disk to other disks in the data storage nodes or to other data storage nodes of the distributed file system and correspondingly modifying the metadata if the size of the available storage space on the disk does not reach the size to be increased of the reserved space for the disk needing to increase the reserved space when the configuration information is updated.
In this implementation, the user data with the corresponding size may refer to user data with a size at least equal to the size of the increase in the reserved space requirement minus the size of the available storage space.
In this embodiment, the operations of the determining module and the storage module of the storage device of the distributed file system may respectively correspond to steps S110 and S120 in the first embodiment, and other implementation details of the operations of the modules may be referred to in the first embodiment.
In a third embodiment, an electronic device for storing in a distributed file system includes: a memory and a processor;
the memory is used for storing programs for storing in the distributed file system; the program for storage management in a distributed file system, when read and executed by the processor, performs the following operations:
determining available storage spaces on a plurality of disks according to reserved spaces on the plurality of disks respectively used by user programs; the size of the reserved space is determined according to configuration information; the available storage space comprises storage space which is not occupied by user data except the reserved space in a data disk of a magnetic disk;
and storing the user data of the distributed file system to the corresponding disk according to the available storage space on the disks.
In one implementation, the determining available storage space on the multiple disks according to the sizes of the reserved spaces on the multiple disks respectively used by the user program may include:
determining the size of available storage space on the plurality of disks respectively; the size of the available storage space on any disk is: the size of the storage space of the data disk of the disk is subtracted by the size of the reserved space on the disk and the storage space occupied by the user data.
In one implementation, the program for storage management in a distributed file system, when read and executed by the processor, may further perform the following operations:
and updating the parameters of the reserved space in the configuration information according to the modification of the reserved space by the remote calling interface called by the user.
In one implementation, the program for storage management in a distributed file system, when read and executed by the processor, may further perform the following operations:
when the configuration information is updated, for the disk needing to increase the reserved space, if the size of the available storage space on the disk does not reach the size to be increased of the reserved space, the user data with the corresponding size in the disk is moved to other disks in the data storage nodes or other data storage nodes of the distributed file system, and the metadata is correspondingly modified.
In this implementation, the user data with the corresponding size may refer to user data with a size at least equal to the size of the increase of the reserved space requirement minus the size of the available storage space.
In this embodiment, when the program for performing storage management in the distributed file system is read and executed by the processor, the operations performed correspond to steps S110 to S120 in the first embodiment; additional details of the operations performed by the program can be found in example one.
In a fourth embodiment, a storage medium stores a program for storage in a distributed file system; the program for storing when executed in a distributed file system performs the following operations:
determining available storage spaces on a plurality of disks according to reserved spaces on the plurality of disks respectively used by user programs; the reserved space is determined according to configuration information; the available storage space comprises a storage space which is not occupied by user data except the reserved space in a data disc of a magnetic disc;
and storing the user data of the distributed file system to the corresponding disk according to the available storage space on the disks.
In this embodiment, when the program for performing storage management in the distributed file system is executed, the operations performed correspond to steps S110 to S120 in the first embodiment; further details of the operations performed when the program is executed can be found in embodiment one.
It will be understood by those skilled in the art that all or part of the steps of the above methods may be implemented by instructing the relevant hardware through a program, and the program may be stored in a computer readable storage medium, such as a read-only memory, a magnetic or optical disk, and the like. Alternatively, all or part of the steps of the above embodiments may be implemented using one or more integrated circuits. Accordingly, each module/unit in the above embodiments may be implemented in the form of hardware, and may also be implemented in the form of a software functional module. The present application is not limited to any specific form of hardware or software combination.
The present application is capable of other embodiments, and various changes and modifications can be made by one skilled in the art without departing from the spirit and scope of the application, which should be limited only by the claims appended hereto.

Claims (12)

1. A storage method of a distributed file system comprises the following steps:
determining available storage spaces on the disks of a plurality of data storage nodes in a distributed file system according to reserved spaces on the disks of the data storage nodes, which are respectively used by a user program; the reserved space is determined according to configuration information; the available storage space comprises a storage space which is not occupied by user data except the reserved space in a data disc of a magnetic disc of the data storage node; wherein the configuration information can be set or modified; the user data refers to data which is written into the distributed file system by a user, managed by the distributed file system and stored in a disk of the data storage node;
and storing the user data of the distributed file system to the disks of the corresponding data storage nodes according to the available storage space on the disks of the data storage nodes.
2. The storage method of claim 1, wherein said determining available storage space on the plurality of disks based on the reserved space on the plurality of disks for use by the user program, respectively, comprises:
determining the size of available storage space on the plurality of disks respectively; the size of the available storage space on any disk is as follows: the size of the storage space of the data disk of the disk is subtracted by the size of the reserved space on the disk and the size of the storage space occupied by the user data.
3. The storage method of claim 1, further comprising:
and updating the parameters of the reserved space in the configuration information according to the modification of the reserved space by calling a remote calling interface by a user.
4. The storage method of claim 1, further comprising:
when the configuration information is updated, for the disk needing to increase the reserved space, if the size of the available storage space on the disk does not reach the size to be increased of the reserved space, the user data with the corresponding size in the disk is moved to other disks in the data storage nodes or other data storage nodes of the distributed file system, and the metadata is modified correspondingly.
5. The storage method of claim 4, wherein:
the user data of the respective size is user data of a size at least equal to the size of the increase of the reserved space requirement minus the size of the available storage space.
6. A storage device of a distributed file system, comprising:
the determining module is used for determining available storage spaces on the disks of the data storage nodes according to reserved spaces on the disks of the data storage nodes in the distributed file system, wherein the reserved spaces are respectively used by a user program; the size of the reserved space is determined according to configuration information; the available storage space comprises a storage space which is not occupied by user data except the reserved space in a data disk of a disk of the data storage node; wherein the configuration information can be set or modified; the user data refers to data which is written into the distributed file system by a user, managed by the distributed file system and stored in a disk of the data storage node;
and the storage module is used for storing the user data of the distributed file system to the disks of the corresponding data storage nodes according to the available storage space on the disks of the data storage nodes.
7. The storage device of claim 6, wherein the determining module determines the available storage space on the plurality of disks according to the reserved space on the plurality of disks respectively used by the user program comprises:
the determining module determines the size of the available storage space on the plurality of disks respectively; the size of the available storage space on any disk is: the size of the storage space of the data disk of the disk is subtracted by the size of the reserved space on the disk and the storage space occupied by the user data.
8. The storage device of claim 6, further comprising:
and the updating module is used for updating the parameters of the reserved space in the configuration information according to the modification of the reserved space by the remote calling interface called by the user.
9. The storage device of claim 6, further comprising:
and the data migration module is used for moving the user data with the corresponding size in the disk to other disks in the data storage nodes or to other data storage nodes of the distributed file system and correspondingly modifying the metadata if the size of the available storage space on the disk does not reach the size to be increased of the reserved space for the disk needing to increase the reserved space when the configuration information is updated.
10. The storage device of claim 9, wherein:
the user data of the respective size is user data of a size at least equal to the size of the increase of the reserved space requirement minus the size of the available storage space.
11. An electronic device for storage in a distributed file system, comprising: a memory and a processor; the method is characterized in that:
the memory is used for storing programs for storing in the distributed file system; the program for storage management in a distributed file system, when read and executed by the processor, performs the following operations:
determining available storage spaces on the disks of a plurality of data storage nodes in a distributed file system according to reserved spaces on the disks of the data storage nodes, which are respectively used by a user program; the size of the reserved space is determined according to configuration information; the available storage space comprises a storage space which is not occupied by user data except the reserved space in a data disk of a disk of the data storage node; wherein the configuration information can be set or modified; the user data refers to data which is written into the distributed file system by a user, managed by the distributed file system and stored in a disk of the data storage node;
and storing the user data of the distributed file system to the disks of the corresponding data storage nodes according to the available storage space on the disks of the data storage nodes.
12. A storage medium, characterized by:
the storage medium stores a program for storing in a distributed file system; the program for storing when executed in a distributed file system performs the following operations:
determining available storage spaces on disks of a plurality of data storage nodes in a distributed file system according to reserved spaces on the disks of the data storage nodes, which are respectively used by a user program; the reserved space is determined according to configuration information; the available storage space comprises a storage space which is not occupied by user data except the reserved space in a data disc of a magnetic disc of the data storage node; wherein the configuration information can be set or modified; the user data refers to data which is written into the distributed file system by a user, managed by the distributed file system and stored in a disk of the data storage node;
and storing the user data of the distributed file system to the disks of the corresponding data storage nodes according to the available storage space on the disks of the data storage nodes.
CN201710849424.9A 2017-09-20 2017-09-20 Storage method and device of distributed file system, electronic equipment and storage medium Active CN110019083B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710849424.9A CN110019083B (en) 2017-09-20 2017-09-20 Storage method and device of distributed file system, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710849424.9A CN110019083B (en) 2017-09-20 2017-09-20 Storage method and device of distributed file system, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110019083A CN110019083A (en) 2019-07-16
CN110019083B true CN110019083B (en) 2023-01-24

Family

ID=67186298

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710849424.9A Active CN110019083B (en) 2017-09-20 2017-09-20 Storage method and device of distributed file system, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110019083B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112558859A (en) * 2019-09-26 2021-03-26 杭州海康威视数字技术股份有限公司 Hard disk, storage system and hard disk capacity marking method
CN112506547B (en) * 2020-12-16 2024-09-17 杭州和利时自动化有限公司 Configuration data downloading method, device, equipment and medium of distributed control system
CN113778332A (en) * 2021-08-16 2021-12-10 联想凌拓科技有限公司 Information determination method, first storage server and storage medium
CN113741816B (en) * 2021-08-31 2024-09-24 杭州海康威视数字技术股份有限公司 Method, apparatus, device and machine-readable storage medium for operating block device
CN114356237A (en) * 2021-12-31 2022-04-15 联想(北京)有限公司 Control method, memory and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105009091A (en) * 2012-12-26 2015-10-28 西部数据技术公司 Dynamic overprovisioning for data storage systems
CN105335441A (en) * 2014-08-12 2016-02-17 阳平 Local area network based distributed file system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105009091A (en) * 2012-12-26 2015-10-28 西部数据技术公司 Dynamic overprovisioning for data storage systems
CN105335441A (en) * 2014-08-12 2016-02-17 阳平 Local area network based distributed file system

Also Published As

Publication number Publication date
CN110019083A (en) 2019-07-16

Similar Documents

Publication Publication Date Title
CN110019083B (en) Storage method and device of distributed file system, electronic equipment and storage medium
US10579364B2 (en) Upgrading bundled applications in a distributed computing system
US10268408B2 (en) Flexible efficient runtime placement of data across multiple disks
US9361218B2 (en) Method of allocating referenced memory pages from a free list
KR101137172B1 (en) System, method and program to manage memory of a virtual machine
US8095772B2 (en) Large memory pages for shared libraries
US20060136667A1 (en) System, method and program to preserve a cache of a virtual machine
KR101729097B1 (en) Method for sharing reference data among application programs executed by a plurality of virtual machines and Reference data management apparatus and system therefor
US10922276B2 (en) Online file system check
CN109947787A (en) A kind of storage of data hierarchy, hierarchical query method and device
US8635425B1 (en) Upgrading computing devices
US20170371749A1 (en) Backup image restore
WO2017050064A1 (en) Memory management method and device for shared memory database
EP3974974A1 (en) Virtualization method and system for persistent memory
CN106357703B (en) Cluster switching method and device
US11409451B2 (en) Systems, methods, and storage media for using the otherwise-unutilized storage space on a storage device
CN115576716A (en) Memory management method based on multiple processes
US20140289739A1 (en) Allocating and sharing a data object among program instances
CN107832097B (en) Data loading method and device
WO2019212727A1 (en) Storage reserve in a file system
CN113434470A (en) Data distribution method and device and electronic equipment
US11176089B2 (en) Systems and methods for implementing dynamic file systems
US20190179803A1 (en) Apparatus and method for file sharing between applications
CN109508140B (en) Storage resource management method and device, electronic equipment and system
US9372700B2 (en) Network boot system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40010850

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant