CN116955272A - File storage method and device - Google Patents

File storage method and device Download PDF

Info

Publication number
CN116955272A
CN116955272A CN202210413439.1A CN202210413439A CN116955272A CN 116955272 A CN116955272 A CN 116955272A CN 202210413439 A CN202210413439 A CN 202210413439A CN 116955272 A CN116955272 A CN 116955272A
Authority
CN
China
Prior art keywords
file
address space
data
information
metadata
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210413439.1A
Other languages
Chinese (zh)
Inventor
戴志威
吴启庆
罗日新
张峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202210413439.1A priority Critical patent/CN116955272A/en
Publication of CN116955272A publication Critical patent/CN116955272A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/113Details of archiving

Abstract

The application discloses a file storage method and a file storage device, wherein the file comprises file data and metadata of the file data, and the method comprises the following steps: allocating a first logical address space for the file data and a second logical address space for the metadata; wherein the first logical address space corresponds to a physical address space in at least one first memory and the second logical address space corresponds to a physical address space in a second memory; the metadata comprises an Inode and extension information, and the extension information comprises organization mode Schema information of the file data. According to the application, the access streams of the file data and the metadata can be isolated, so that the bandwidth for reading the file data is improved, and the data access delay is further reduced.

Description

File storage method and device
Technical Field
The present application relates to the field of data storage technologies in big data, and in particular, to a method and an apparatus for storing a file.
Background
Today, the rapid development of big data and high performance computing (High Performance Compute, HPC) ecological applications, computing power and efficiency become key elements of production environment concerns, while the storage system serves as a data base for upper layer applications, and Input/Output (I/O) access bandwidth and latency indicators directly affect upper layer service performance.
In general, different data storage formats (corresponding to different data storage methods) may cause different access data reading performance when an upper layer application accesses the storage system. For example, file formats such as a column storage format (part, optimized column storage (Optimized Row Columnar, ORC)) and Avro have significant advantages over Comma Separated Values (CSV) file formats in terms of storage space utilization, I/O performance, network transmission efficiency, and the like.
However, with the deep large data scene, the data storage modes corresponding to the files in the format of ORC, part, avro and the like can cause the storage medium to enter into the performance bottleneck in advance under a large number of I/O access requests, and the data access delay is increased sharply.
Disclosure of Invention
The embodiment of the application provides a file storage method and a file storage device, which can isolate access streams of file data and metadata so as to improve the bandwidth for reading the file data and further reduce the data access delay.
In a first aspect, the present application provides a file storage method, where the file includes file data and metadata of the file data, and the method includes: allocating a first logical address space for the file data and a second logical address space for the metadata; wherein the first logical address space corresponds to a physical address space in at least one first memory and the second logical address space corresponds to a physical address space in a second memory; the metadata comprises an Inode and extension information, and the extension information comprises organization mode Schema information of the file data.
From the technical effect, in the file format provided by the application, the organization mode Schema information and the metadata such as the inode are managed and stored together, and are isolated from the management and maintenance of the file data. Specifically, on the memory side, metadata is independently stored in a corresponding memory (namely a second memory) and is isolated from the storage of file data, compared with the prior art that metadata and file data are stored in the same memory, the method and the device can directly isolate the access flow of the metadata and the access flow of the file data, can effectively avoid the occupation of bandwidth when a large amount of metadata access is used for reading the file data in a large amount of I/O access scenes, namely, the bandwidth for reading the file data is improved, and the access delay is reduced.
In a possible implementation manner, the read-write speed of the first memory is smaller than or equal to the read-write speed of the second memory.
From the technical effect, because the metadata access has a larger influence on the whole access process in the data access process, the metadata access performance can be directly improved by storing the metadata in the memory with higher read-write performance, and the time delay in the metadata access is reduced, so that the whole data access performance is improved.
In a possible implementation manner, the file data comprises N data blocks, the organization mode information is characterized by a linked list tree, the linked list tree comprises N tree structures which are sequentially connected, and N is a positive integer; and the ith tree structure in the N tree structures is used for representing the storage mode of the ith data block in the N data blocks, and i is a positive integer less than or equal to N.
Compared with other data structures in the prior art, in the data reading process, after the data is positioned in one tree structure, the sub-nodes on the tree structure can be traversed to determine the sub-blocks meeting the requirements, and the process does not need to access the corresponding tree structures of other data blocks, so that the metadata retrieval efficiency in the data reading process can be effectively improved.
In a possible implementation manner, the ith tree structure includes M layers, each of the M layers includes at least one node, a first layer of the M layers is a root node, the root node is used for storing a logical address of the ith data block corresponding to the ith tree structure, and M is a positive integer; a first node contained in a j-th layer in the M layers corresponds to E nodes on a j+1-th layer in the M layers, wherein the first node is used for storing logical addresses of a first data block in the i-th data block, the first data block contains E sub-blocks, and the E nodes respectively store the logical addresses of the E sub-blocks; wherein j is a positive integer less than or equal to M-1, E is a positive integer.
N root nodes in the linked list tree are organized in a linked list form; the first node is any node in the j-th layer, and the first data block is any data block in the i-th data block.
From the technical effect, the organization mode information is characterized by the linked list tree, compared with the data structure in the prior art, the root node of each data block can be rapidly positioned in the data reading process, so that the metadata retrieval efficiency in the data reading process is improved, and the data reading performance is further improved.
In a possible embodiment, the extension information further includes compression information; the compression information is used for representing whether the file data is compressed or not and a compression algorithm used in compression.
In a possible implementation, the Inode contains the size of the file data, local id, global id, creation time, last access time, last update time, owner, access rights, and logical address.
In a possible implementation manner, the extension information further includes custom information; the custom information comprises encryption information of the file data, wherein the encryption information is used for representing whether the file data is encrypted or not and an encryption algorithm used in encryption.
In a second aspect, the present application provides a file storage device, the device comprising a processing unit, at least one first storage unit and a second storage unit; the processing unit is used for distributing a first logic address space for file data in a file and distributing a second logic address space for metadata in the file; the at least one first storage unit is used for storing the file data in a physical address space corresponding to the first logical address space; the second storage unit is used for storing the metadata in a physical address space corresponding to the second logical address space; the metadata comprises an Inode and extension information, wherein the extension information comprises organization mode Schema information of the file data.
In a possible implementation manner, the read-write speed of the first storage unit is less than or equal to the read-write speed of the second storage unit.
From the technical effect, because the metadata access has a larger influence on the whole access process in the data access process, the metadata access performance can be directly improved by storing the metadata in the memory with higher read-write performance, and the time delay in the metadata access is reduced, so that the whole data access performance is improved.
In a possible implementation manner, the file data comprises N data blocks, the organization mode information is characterized by a linked list tree, the linked list tree comprises N tree structures which are sequentially connected, and N is a positive integer; and the ith tree structure in the N tree structures is used for representing the storage mode of the ith data block in the N data blocks, and i is a positive integer less than or equal to N.
In a possible implementation manner, the ith tree structure includes M layers, each of the M layers includes at least one node, a first layer of the M layers is a root node, the root node is used for storing a logical address of the ith data block corresponding to the ith tree structure, and M is a positive integer; a first node contained in a j-th layer in the M layers corresponds to E nodes on a j+1-th layer in the M layers, wherein the first node is used for storing logical addresses of a first data block in the i-th data block, the first data block contains E sub-blocks, and the E nodes respectively store the logical addresses of the E sub-blocks; wherein j is a positive integer less than or equal to M-1, E is a positive integer.
In a possible embodiment, the extension information further includes compression information; the compression information is used for representing whether the file data is compressed or not and a compression algorithm used in compression.
In a possible implementation, the Inode contains the size of the file data, local id, global id, creation time, last access time, last update time, owner, access rights, and logical address.
In a possible implementation manner, the extension information further includes custom information; the custom information comprises encryption information of the file data, wherein the encryption information is used for representing whether the file data is encrypted or not and an encryption algorithm used in encryption.
In a third aspect, the present application provides a computer device comprising a host, at least one first memory and a second memory; the processor is used for distributing a first logic address space for file data in a file and distributing a second logic address space for metadata in the file; the at least one first memory is configured to store the file data in a physical address space corresponding to the first logical address space; the second memory is configured to store the metadata in a physical address space corresponding to the second logical address space; the metadata comprises an Inode and extension information, wherein the extension information comprises organization mode Schema information of the file data.
In a fourth aspect, the present application provides a distributed system comprising a scheduling device, at least one first storage server and a second storage server; the scheduling equipment is used for distributing a first logic address space for file data in a file and distributing a second logic address space for metadata in the file; the at least one first storage server is used for storing the file data in a physical address space corresponding to the first logical address space; the second storage server is configured to store the metadata in a physical address space corresponding to the second logical address space; the metadata comprises an Inode and extension information, wherein the extension information comprises organization mode Schema information of the file data.
In a fifth aspect, an embodiment of the present application provides a chip system, where the chip system includes at least one processor, a memory, and an interface circuit, where the memory, the interface circuit, and the at least one processor are interconnected by a line, and where an instruction is stored in the at least one memory; the method of any of the above first aspects is implemented when the instructions are executed by the processor.
In a sixth aspect, an embodiment of the present application provides a computer readable storage medium storing a computer program, where the method according to any one of the first aspects is implemented when the computer program is executed.
In a seventh aspect, an embodiment of the present application provides a computer program comprising instructions which, when executed, implement a method according to any one of the first aspects above.
Drawings
The drawings used in the embodiments of the present application are described below.
FIG. 1 is a schematic diagram of a system architecture according to an embodiment of the present application;
FIG. 2 is a schematic diagram of another system architecture according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a file format according to an embodiment of the present application;
FIG. 4 is a flowchart of a method for storing files according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a linked list tree structure according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a file storage device according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. Wherein, in the description of the embodiments of the present application, unless otherwise indicated, "/" means or, for example, a/B may represent a or B; the text "and/or" is merely an association relation describing the associated object, and indicates that three relations may exist, for example, a and/or B may indicate: the three cases where a exists alone, a and B exist together, and B exists alone, and furthermore, in the description of the embodiments of the present application, "plural" means two or more than two.
The terms "first," "second," "third," and "fourth" and the like in the description and in the claims and drawings are used for distinguishing between different objects and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus. Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
The following describes the terminology involved in the present application
(1) Metadata (Meta Data): data used to describe the data or information called information. Metadata may be data that states its elements or attributes (name, size, data type, etc.), or its structure (length, field, data column), or its associated data (where, how to contact, owner). The data are file data in the file, namely data used for calculation by a user. In the prior art (such as storage methods corresponding to file formats of ORC, part, avro, etc.), metadata in a file includes two parts: inodes and self-describing metadata. The File System (FS) manages and maintains the self-description metadata and the File data together, and independently manages and maintains the index nodes. The self-description metadata includes organization pattern Schema information, compression information, and the like.
(2) File system: software mechanisms in the operating system responsible for managing and storing file information. The functions of the file system include: managing and scheduling storage space of files, and providing a logic structure, a physical structure and a storage method of the files; the mapping from the identification to the actual address of the file is realized, the control operation and the access operation of the file are realized, the sharing of the file information is realized, reliable file confidentiality and protection measures are provided, and the security measures of the file are provided.
Referring to fig. 1, fig. 1 is a schematic diagram of a system architecture of a computer device for describing a file storage method according to an embodiment of the present application, which is suitable for a data access scenario inside a single local device. As shown in fig. 1, the system architecture of the computer device may include an application layer 110 (user state), an operating system 120 (kernel state), and a device layer 130.
Alternatively, the computer device may be a mobile phone, a computer, a tablet, a server, or a wearable device, etc., to which the present application is not limited.
Alternatively, the application layer 110 may include an application layer 111 and an application framework layer 112. The application layer 111 may include a series of application packages, among other things. The application packages may include applications (also referred to as applications) for cameras, gallery, calendar, talk, map, navigation, WLAN, bluetooth, music, video, short messages, etc. The application framework layer 112 provides an application programming interface (Application Programming Interface, API) and programming framework for the applications of the application layer 111. The application framework layer 112 includes a number of predefined functions.
Alternatively, operating system 120 may include file system 121, block layer 122, and device driver 123. The file system 121 is used to manage and schedule storage space of files, and provides logical structures, physical structures, and storage methods of files. The block layer 122 is an interface for the file system 121 to access the device layer 130 for connecting the file system 121 and the device driver 123. The block layer 122 may be divided into two layers: one is the bio layer and the other is the request layer. The block layer 122 is used to encapsulate/decapsulate the relevant requests. The device driver 123 may include a display driver, a camera driver, an audio driver, a sensor driver, and the like.
Optionally, the device layer 130 may include at least one memory. Specifically, as shown in fig. 1, the device layer 130 includes: memories 1 and … and a, A being a positive integer.
Alternatively, each of the at least one memory may be an external storage device such as a Hard Disk Drive (HDD), a solid state Disk (Solid State Drive, SSD), a usb Disk, or an optical Disk, which is not limited by the present application.
In some possible implementations, the operation of the program may be divided into a user mode and a kernel mode. When the program is running in the user mode, the processor can only access data in a part of the memory, and does not allow access to peripheral devices such as a hard disk, a network card and the like. When the program is running in kernel mode, the processor can access all data in the memory, including peripheral devices such as hard disk, network card, etc. Meanwhile, the processor can also switch itself from one program to another. Typically, the applications in the application layer 110 run in user mode and the operating system 120 runs in kernel mode.
In the running process of the computer equipment, firstly, an application layer 110 initiates a read-write request; the file system 121 determines a logical address of data corresponding to the read-write request in at least one memory; the block layer 122 is used for distributing read-write requests to the device driver 123; the device driver 123 is configured to encapsulate the read-write request, and send the encapsulated read-write request and a logical address corresponding to the read-write request to the device layer 130; the device layer 130 unpacks the packed read-write request and converts the logical address corresponding to the read-write request into a corresponding physical address in at least one memory; and then writing data into the physical address or reading data from the physical address based on the read-write request.
Referring to fig. 2, fig. 2 is a schematic diagram of another system architecture provided in an embodiment of the present application, which is used to describe an architecture of a distributed system for executing the file storage method according to the present application, and is suitable for a scenario of accessing mass data such as big data. As shown in fig. 2, the distributed system includes a scheduling device 210 and a storage cluster 220.
Wherein storage cluster 220 comprises a plurality of storage servers: storage servers 1, …, and storage server K; k is a positive integer greater than or equal to 2.
Wherein each storage server includes at least one memory thereon, for example, the storage server 1 includes: memory 1, …, memory a; the storage server K includes: memory 1, …, memory B. A and B are positive integers. All memory contained on the at least one storage server constitutes a distributed storage system in the distributed architecture.
Alternatively, the specific architecture of each storage server in the scheduling device 210 and the storage cluster 220 may be the same as the computer device in fig. 1, and will not be described herein.
Alternatively, the memory on each storage server may be an external storage device such as a Hard Disk Drive (HDD), a solid state Disk (Solid State Drive, SSD), a usb Disk, or an optical Disk, which is not limited in the present application.
In the distributed system shown in fig. 2, first, a scheduling device 210 receives a read-write request (stores a file in a distributed storage system or reads a file from the distributed storage system) through a network, the scheduling device 210 determines a logical address corresponding to the read-write request through a file system, and then accesses a physical address space in a memory on a corresponding storage server in the distributed storage system based on the logical address, so as to read and write the file.
The file storage flow on the computer device in fig. 1 and the scheduling device in fig. 2 will be described below. The file storage flows in the architectures of fig. 1 and fig. 2 are executed by the file system on the computer device and the file system on the scheduling device, respectively, that is, the file system in the embodiment of the present application may refer to the file system on the computer device of fig. 1 or the file system on the scheduling device.
After receiving the file to be stored, the file system first reads the format identifier and the processing identifier of the file.
Wherein the format identification indicates the specific format of the received file, e.g. CSV, parquet, avro, etc. In particular, the file format is used to describe a specific organization and manner of management of data (i.e., file data and metadata) in a file, such as the organization and manner of management of metadata and file data.
Optionally, the organizing and managing manner includes: the way the file system allocates logical address space to metadata and file data: the file system allocates the corresponding logical address space to the metadata and the file data independently or the file system allocates the logical address space to the metadata and the file data in a unified manner.
Specifically, the above-mentioned unified allocation logical address space refers to that metadata and part of file data are allocated to physical address spaces on the same memory; the above-mentioned independently assigned corresponding logical address space means that metadata and file data are assigned to physical address spaces respectively corresponding to different memories.
Among them, metadata in the present application includes inodes and extension information (or referred to as self-description metadata). The extension information comprises file organization mode Schema information and compression information.
Specifically, the contents specifically contained in the inode and the extension information will be described in the following embodiments.
The processing identifier is used for indicating whether the received file is converted into a file format shown in fig. 3 below, and the file is read and written by adopting an organization and management mode corresponding to the file format.
Alternatively, the processing identifier may be a hint hit tag, which is not limited by the present application.
Referring to fig. 3, fig. 3 is a schematic diagram of a file format according to an embodiment of the present application, which corresponds to a file storage method according to an embodiment of the present application.
In this file format, as shown in fig. 3, the file includes two parts of metadata and file data. The metadata comprises an index node inode and extension information xattr. Wherein the extension information is also referred to as self-describing metadata.
Optionally, the file data includes N data blocks: data blocks 1, … data block N, N being a positive integer. I.e. the file data does not contain any metadata information.
Optionally, the inode contains the size of the file data, local id, global id, creation time, last access time, last update time, owner, access rights, and logical address.
Optionally, the extension information includes organization pattern Schema information and compression information.
Further optionally, the compression information is used to characterize whether the file data is compressed or not, and a compression algorithm used in compression.
Further, optionally, the extension information may further include custom information, where the custom information includes encryption information of the file data, where the encryption information is used to characterize whether the file data is encrypted, and an encryption algorithm used in encryption.
Specifically, when the file received by the file system is encrypted, the file system independently manages and organizes the expansion information and the file data; when the file received by the file system is not encrypted, the file system encrypts the file data and stores the corresponding encrypted information in the extension information.
The organization mode information is used for representing the storage mode of each data block in the file data.
In the following embodiments, the file format shown in fig. 3 is collectively referred to as an X file format.
After the file system reads the format identifier and the processing identifier of the file, whether to perform format conversion on the file can be determined based on the format identifier and the processing identifier, namely, whether to convert the received file into the above-mentioned X file format is determined, and the file storage method in the embodiment of the application is adopted for storage.
In particular, the process by which the file system decides whether to perform format conversion based on the format identification and the process identification may be described in table 1.
Table 1: format conversion decision table corresponding to different format identification and processing identification
As shown in table 1, the case where the file system decides whether to perform format conversion based on the format identification and the process identification includes:
(1) When the format identifier indicates that the received file is in a normal file format (for example, CSV format, etc.) that does not include the extension information, the received file still maintains the original normal file format regardless of whether the processing identifier indicates to perform format conversion.
(2) When the format identifier indicates that the received file is in a file format (such as ORC, part, avro, etc.) which is uniformly organized and managed by the extension information and the file data, if the processing identifier indicates that the format conversion is performed, the received file is converted into an X file format shown in fig. 3, and if the processing identifier indicates that the format conversion is not performed, the received file is maintained in the original file format.
(3) The format identifier indicates that the received file is in an X file format, and no matter whether the processing identifier indicates to perform format conversion or not, the received file does not need to be subjected to format conversion, i.e., the received file maintains the X file format.
Referring to fig. 4, fig. 4 is a flowchart of a file storing method according to an embodiment of the present application. As shown in fig. 4, the method includes step S410 and step S420. The method is used for describing the file system to convert the received file into an X file format and a corresponding storage process. The file in the X file format obtained after the format conversion comprises two parts: file data and metadata.
Step S410: and allocating a first logic address space for the file data.
Step S420: a second logical address space is allocated for the metadata. Wherein the first logical address space corresponds to a physical address space in at least one first memory and the second logical address space corresponds to a physical address space in a second memory; the metadata comprises an Inode and extension information, and the extension information comprises organization mode Schema information of the file data.
Specifically, for the file in the X file format, the file system independently manages the metadata and the file data, that is, independently configures the corresponding logical address space.
Further, the file data is written in a physical address space corresponding to the first logical address, and the metadata is written in a physical address space corresponding to the second logical address.
The file data is usually stored in a plurality of memories, i.e. the at least one first memory, due to the large data size.
From the technical effect, in the file format provided by the application, the organization mode Schema information and the metadata such as the inode are managed and stored together, and are isolated from the management and maintenance of the file data. Specifically, on the memory side, metadata is independently stored in a corresponding memory (namely a second memory) and is isolated from the storage of file data, compared with the prior art that metadata and file data are stored in the same memory, the method and the device can directly isolate the access flow of the metadata and the access flow of the file data, can effectively avoid the occupation of bandwidth when a large amount of metadata access is used for reading the file data in a large amount of I/O access scenes, namely, the bandwidth for reading the file data is improved, and the access delay is reduced.
Alternatively, the at least one first memory may be at least one memory in the device layer 130 on the computer device of fig. 1, or multiple memories on the same storage server or different storage servers in the architecture of fig. 2.
Alternatively, the second memory may be any memory other than the first memory in the computer device of fig. 1 or the distributed storage system of fig. 2.
Alternatively, the physical address space of the second logical address in the second memory may be contiguous or non-contiguous.
Optionally, the first logical addresses are contiguous or non-contiguous in physical address space in a corresponding one of the first memories.
Optionally, the read-write speed of the first memory is less than or equal to the read-write speed of the second memory.
From the technical effect, the second memory is used for storing metadata corresponding to file data, namely, under the condition that a plurality of data accesses exist, the access requests of the metadata exist in the second memory, and the data reading performance during a large number of data accesses can be effectively improved by improving the reading and writing performance (namely, the reading and writing speed) of the second memory.
Alternatively, the organization mode information contained in the metadata in the present application may be organized by using a logical structure of a linked list tree. Referring specifically to fig. 5, fig. 5 is a schematic diagram of a linked list tree structure according to an embodiment of the present application.
Optionally, the file data includes N data blocks, the organization mode information is characterized by a linked list tree, the linked list tree includes N tree structures that are sequentially connected, and N is a positive integer.
Specifically, the organization mode Schema information corresponding to the file data is characterized by a data structure-linked list tree. As shown in fig. 5, the linked list Tree includes N Tree structures Tree that are sequentially connected: tree 1, tree 2 …, tree N. The N tree structures are in one-to-one correspondence with N data blocks in the file data shown in fig. 3.
As shown in FIG. 5, the linked list tree contains P level nodes, where P is the nesting depth of the linked list tree/tree structure.
Taking Tree structure Tree 1 as an example, the root Node 1 corresponds to M nodes (Node 11, …, node 1M) in the second layer; node 11 in the second layer corresponds to K nodes (Node 111, …, node 11K) in the third layer; …; the P-th layer includes nodes Node 1 … and ….
It should be understood that the specific structure of other Tree structures in the linked list Tree is the same as the Tree structure Tree 1, and will not be described here again.
Wherein, any two adjacent tree structures in the N tree structures represent that two data blocks corresponding to the two adjacent tree structures are continuous in data content.
Optionally, the ith tree structure in the N tree structures is used for characterizing a storage mode of the ith data block in the N data blocks, and i is a positive integer less than or equal to N.
Specifically, each tree structure contains at least one level of nodes. The number of nodes contained in each layer represents the number of sub-blocks obtained by segmentation when the data blocks are segmented and stored. For example, if the ith tree structure includes 5 nodes in the 5 th layer, the 5 nodes store logical addresses of 5 sub-blocks included in the ith data block, respectively, that is, the ith data block is separated into 5 sub-blocks at the hierarchy.
Optionally, the ith tree structure includes M layers, each of the M layers includes at least one node, a first layer of the M layers is a root node, the root node is configured to store a logical address of the ith data block corresponding to the ith tree structure, and M is a positive integer.
The ith tree structure is any one of the N tree structures.
Specifically, the first layer on each tree structure is the root node, i.e., as shown in fig. 5: node 1, node 2, …, node N. Each root node is used for storing the logical address of the corresponding data block of the tree structure of the root node. Namely the logical address of data block 1 stored in Node 1, the logical address of data block 2 stored in Node 2, …, the logical address of data block N stored in Node N.
Optionally, each root node further stores information such as a maximum value, a minimum value, an average value, a median, a key character and the like of data in a data block corresponding to the root node (i.e., a data block corresponding to the tree structure where the root node is located).
Optionally, a first node included in a j-th layer in the M layers corresponds to E nodes on a j+1-th layer in the M layers, where the first node is configured to store a logical address of a first data block in the i-th data block, the first data block includes E sub-blocks, and the E nodes store the logical addresses of the E sub-blocks respectively; wherein j is a positive integer less than or equal to M-1, E is a positive integer.
The first node is any node in a j-th layer in the i-th tree structure. The first node corresponds to a first data block included in the ith data block, that is, a logical address of the first data block is stored in the first node.
Specifically, the first node in the j-th layer corresponds to the E nodes on the j+1th layer in the tree structure, that is, in the process of storing the first data block, the first data block is divided into E sub-blocks; the E nodes are in one-to-one correspondence with the E sub-blocks, namely each node in the E nodes stores a logic address of the corresponding sub-block.
Optionally, the first node further stores information such as a maximum value, a minimum value, an average value, a median, a key character, and the like of the data included in the first data block.
The j-th layer may be any layer in the i-th tree structure.
The data reading process after storage using the above-described X file format will be described below.
Specifically, the data reading flow may include three scenarios:
(1) Reading the entire file data
After the file system receives the reading request, the index node inode in the metadata stored in the second memory is accessed through an interface of the file system to acquire the logical addresses of all the data blocks in the file data, and then a reading operation is initiated to read all the data blocks in the file data.
(2) Reading one or more data blocks in file data
After receiving the reading request, the file system accesses index node and organization mode Schema information in the metadata stored in the second memory through an interface of the file system, and analyzes the organization mode information to obtain a linked list tree structure in the embodiment; then traversing the information stored in the root node of each tree structure in the linked list tree, for example, the logical address of the data block corresponding to the tree structure, the information of the maximum value, the minimum value, the average value, the median, the key character and the like of the data in the data block, so as to determine the data block meeting the reading requirement; and finally, obtaining the logic address of the data block meeting the reading requirement, and reading one or more data blocks in the file data based on the obtained logic address.
(3) Reading partial data in one or more data blocks in file data
After receiving the reading request, the file system accesses the inode and the organization mode Schema information in the metadata stored in the second memory through an interface of the file system, and analyzes the organization mode information to obtain a linked list tree structure in the embodiment; then traversing all the sub-nodes corresponding to the root node in each tree structure, and acquiring information stored in each sub-node, for example, the maximum value, the minimum value, the average value, the median, key characters and the like of data in the data block corresponding to the sub-node, so as to determine whether the data block corresponding to the sub-node meets the reading requirement (for example, the maximum value is larger than a preset value); and finally, obtaining the logical addresses of the data blocks corresponding to all the child nodes meeting the reading requirement, and reading partial data in one or more data blocks in the file data based on the obtained logical addresses.
Compared with the prior art, the file storage method in the embodiment of the application maintains and manages the extension information and the index node together, so that the access times of metadata in the data reading process can be reduced by separating the management and maintenance processes of the file data, namely, the access to the metadata in the file data reading process is one time (two times in the prior art), thereby effectively reducing the access times of the metadata in massive access scenes such as big data and the like and improving the whole data reading performance.
Referring to fig. 6, fig. 6 is a schematic diagram of a file storage device according to an embodiment of the present application. As shown in fig. 6, the apparatus comprises a processing unit, at least one first memory unit 602 and a second memory unit 603; wherein, the liquid crystal display device comprises a liquid crystal display device,
the processing unit 601 is configured to allocate a first logical address space for file data in a file, and allocate a second logical address space for metadata in the file; the at least one first storage unit 602 is configured to store the file data in a physical address space corresponding to the first logical address space; the second storage unit 603 is configured to store the metadata in a physical address space corresponding to the second logical address space; the metadata comprises an Inode and extension information, wherein the extension information comprises organization mode Schema information of the file data.
In a possible implementation manner, the read-write speed of the first storage unit is less than or equal to the read-write speed of the second storage unit.
From the technical effect, because the metadata access has a larger influence on the whole access process in the data access process, the metadata access performance can be directly improved by storing the metadata in the memory with higher read-write performance, and the time delay in the metadata access is reduced, so that the whole data access performance is improved.
In a possible implementation manner, the file data comprises N data blocks, the organization mode information is characterized by a linked list tree, the linked list tree comprises N tree structures which are sequentially connected, and N is a positive integer; and the ith tree structure in the N tree structures is used for representing the storage mode of the ith data block in the N data blocks, and i is a positive integer less than or equal to N.
In a possible implementation manner, the ith tree structure includes M layers, each of the M layers includes at least one node, a first layer of the M layers is a root node, the root node is used for storing a logical address of the ith data block corresponding to the ith tree structure, and M is a positive integer; a first node contained in a j-th layer in the M layers corresponds to E nodes on a j+1-th layer in the M layers, wherein the first node is used for storing logical addresses of a first data block in the i-th data block, the first data block contains E sub-blocks, and the E nodes respectively store the logical addresses of the E sub-blocks; wherein j is a positive integer less than or equal to M-1, E is a positive integer.
In a possible embodiment, the extension information further includes compression information; the compression information is used for representing whether the file data is compressed or not and a compression algorithm used in compression.
In a possible implementation, the Inode contains the size of the file data, local id, global id, creation time, last access time, last update time, owner, access rights, and logical address.
In a possible implementation manner, the extension information further includes custom information; the custom information comprises encryption information of the file data, wherein the encryption information is used for representing whether the file data is encrypted or not and an encryption algorithm used in encryption.
Specifically, the specific execution process of the file storage device may refer to the execution flow of the file storage method corresponding to the computer device shown in fig. 1 in the foregoing embodiment, which is not repeated herein.
Referring to fig. 7, fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 7, the computer device includes a host 701, at least one first memory 702, and a second memory 703; wherein the host 701, the first memory 702 and the second memory 703 are connected by a bus 704.
The host 701 is configured to allocate a first logical address space for file data in a file, and allocate a second logical address space for metadata in the file; the at least one first memory 702 is configured to store the file data in a physical address space corresponding to the first logical address space; the second memory 703 is configured to store the metadata in a physical address space corresponding to the second logical address space; the metadata comprises an Inode and extension information, wherein the extension information comprises organization mode Schema information of the file data.
Specifically, the specific execution process of the above-mentioned computer device may refer to the execution flow of the file storage method corresponding to the computer device shown in fig. 1 in the foregoing embodiment, which is not repeated herein.
The embodiment of the application provides a distributed system, which comprises a scheduling device, at least one first storage server and a second storage server; the scheduling equipment is used for distributing a first logic address space for file data in a file and distributing a second logic address space for metadata in the file; the at least one first storage server is used for storing the file data in a physical address space corresponding to the first logical address space; the second storage server is configured to store the metadata in a physical address space corresponding to the second logical address space; the metadata comprises an Inode and extension information, wherein the extension information comprises organization mode Schema information of the file data.
Specifically, the specific execution process of the above-mentioned distributed system may refer to the execution flow of the file storage method corresponding to the distributed system shown in fig. 2 in the foregoing embodiment, which is not repeated herein.
The embodiment of the application provides a chip system, which comprises at least one processor, a memory and an interface circuit, wherein the memory, the interface circuit and the at least one processor are interconnected through lines, and instructions are stored in the at least one memory; when executed by the processor, the instructions implement some or all of the steps recited in any of the method embodiments described above.
An embodiment of the present application provides a computer storage medium storing a computer program that, when executed, causes some or all of the steps of any one of the method embodiments described above to be implemented.
An embodiment of the present application provides a computer program comprising instructions which, when executed by a processor, cause some or all of the steps of any one of the method embodiments described above to be implemented.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments. It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, such as the above-described division of units, merely a division of logic functions, and there may be additional manners of dividing in actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, or may be in electrical or other forms.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (19)

1. A file storage method, wherein the file includes file data and metadata for the file data, the method comprising:
allocating a first logical address space for the file data and a second logical address space for the metadata;
wherein the first logical address space corresponds to a physical address space in at least one first memory and the second logical address space corresponds to a physical address space in a second memory; the metadata comprises an Inode and extension information, and the extension information comprises organization mode Schema information of the file data.
2. The method of claim 1, wherein the step of determining the position of the substrate comprises,
the read-write speed of the first memory is smaller than or equal to the read-write speed of the second memory.
3. A method according to claim 1 or 2, characterized in that,
the file data comprises N data blocks, the organization mode information is characterized by a linked list tree, the linked list tree comprises N tree structures which are sequentially connected, and N is a positive integer;
and the ith tree structure in the N tree structures is used for representing the storage mode of the ith data block in the N data blocks, and i is a positive integer less than or equal to N.
4. The method of claim 3, wherein the step of,
the ith tree structure comprises M layers, each of the M layers comprises at least one node, a first layer of the M layers is a root node, the root node is used for storing a logical address of an ith data block corresponding to the ith tree structure, and M is a positive integer;
a first node contained in a j-th layer in the M layers corresponds to E nodes on a j+1-th layer in the M layers, wherein the first node is used for storing logical addresses of a first data block in the i-th data block, the first data block contains E sub-blocks, and the E nodes respectively store the logical addresses of the E sub-blocks; wherein j is a positive integer less than or equal to M-1, E is a positive integer.
5. The method according to any one of claims 1 to 4, wherein,
the extension information further includes compression information; the compression information is used for representing whether the file data is compressed or not and a compression algorithm used in compression.
6. The method according to any one of claims 1 to 5, wherein,
the Inode contains the size, local id, global id, creation time, last access time, last update time, owner, access rights, and logical address of the file data.
7. The method according to any one of claims 1 to 6, wherein,
the extension information also comprises custom information; the custom information comprises encryption information of the file data, wherein the encryption information is used for representing whether the file data is encrypted or not and an encryption algorithm used in encryption.
8. A file storage device, the device comprising a processing unit, at least one first storage unit and a second storage unit; wherein, the liquid crystal display device comprises a liquid crystal display device,
the processing unit is used for distributing a first logic address space for file data in a file and distributing a second logic address space for metadata in the file;
the at least one first storage unit is configured to store the file data in a physical address space corresponding to the first logical address space;
the second storage unit is configured to store the metadata in a physical address space corresponding to the second logical address space;
the metadata comprises an Inode and extension information, wherein the extension information comprises organization mode Schema information of the file data.
9. The apparatus of claim 8, wherein the device comprises a plurality of sensors,
The read-write speed of the first storage unit is smaller than or equal to the read-write speed of the second storage unit.
10. The device according to claim 8 or 9, wherein,
the file data comprises N data blocks, the organization mode information is characterized by a linked list tree, the linked list tree comprises N tree structures which are sequentially connected, and N is a positive integer;
and the ith tree structure in the N tree structures is used for representing the storage mode of the ith data block in the N data blocks, and i is a positive integer less than or equal to N.
11. The apparatus of claim 10, wherein the device comprises a plurality of sensors,
the ith tree structure comprises M layers, each of the M layers comprises at least one node, a first layer of the M layers is a root node, the root node is used for storing a logical address of an ith data block corresponding to the ith tree structure, and M is a positive integer;
a first node contained in a j-th layer in the M layers corresponds to E nodes on a j+1-th layer in the M layers, wherein the first node is used for storing logical addresses of a first data block in the i-th data block, the first data block contains E sub-blocks, and the E nodes respectively store the logical addresses of the E sub-blocks; wherein j is a positive integer less than or equal to M-1, E is a positive integer.
12. The device according to any one of claims 8-11, wherein,
the extension information further includes compression information; the compression information is used for representing whether the file data is compressed or not and a compression algorithm used in compression.
13. The device according to any one of claims 8-12, wherein,
the Inode contains the size, local id, global id, creation time, last access time, last update time, owner, access rights, and logical address of the file data.
14. The device according to any one of claims 8-13, wherein,
the extension information also comprises custom information; the custom information comprises encryption information of the file data, wherein the encryption information is used for representing whether the file data is encrypted or not and an encryption algorithm used in encryption.
15. A computer device, the computer device comprising a host, at least one first memory, and a second memory; wherein, the liquid crystal display device comprises a liquid crystal display device,
the processor is used for distributing a first logic address space for file data in a file and distributing a second logic address space for metadata in the file;
The at least one first memory is configured to store the file data in a physical address space corresponding to the first logical address space;
the second memory is configured to store the metadata in a physical address space corresponding to the second logical address space;
the metadata comprises an Inode and extension information, wherein the extension information comprises organization mode Schema information of the file data.
16. A distributed system comprising a scheduling device, at least one first storage server and a second storage server; wherein, the liquid crystal display device comprises a liquid crystal display device,
the scheduling equipment is used for distributing a first logic address space for file data in a file and distributing a second logic address space for metadata in the file;
the at least one first storage server is used for storing the file data in a physical address space corresponding to the first logical address space;
the second storage server is configured to store the metadata in a physical address space corresponding to the second logical address space;
the metadata comprises an Inode and extension information, wherein the extension information comprises organization mode Schema information of the file data.
17. A chip system, comprising at least one processor, a memory and an interface circuit, wherein the memory, the interface circuit and the at least one processor are interconnected by a line, and wherein the at least one memory has instructions stored therein; the method of any of claims 1-7 being implemented when said instructions are executed by said processor.
18. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed, implements the method of any of claims 1-7.
19. A computer program, characterized in that the computer program comprises instructions which, when the computer program is executed, implement the method of any one of claims 1-7.
CN202210413439.1A 2022-04-19 2022-04-19 File storage method and device Pending CN116955272A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210413439.1A CN116955272A (en) 2022-04-19 2022-04-19 File storage method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210413439.1A CN116955272A (en) 2022-04-19 2022-04-19 File storage method and device

Publications (1)

Publication Number Publication Date
CN116955272A true CN116955272A (en) 2023-10-27

Family

ID=88451633

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210413439.1A Pending CN116955272A (en) 2022-04-19 2022-04-19 File storage method and device

Country Status (1)

Country Link
CN (1) CN116955272A (en)

Similar Documents

Publication Publication Date Title
US20220171740A1 (en) Heuristic interface for enabling a computer device to utilize data property-based data placement inside a nonvolatile memory device
US10446174B2 (en) File system for shingled magnetic recording (SMR)
US9792227B2 (en) Heterogeneous unified memory
CN102508784B (en) Data storage method of flash memory card in video monitoring equipment, and system thereof
CN107168657B (en) Virtual disk hierarchical cache design method based on distributed block storage
US10296250B2 (en) Method and apparatus for improving performance of sequential logging in a storage device
WO2017107414A1 (en) File operation method and device
CN104603739A (en) Block-level access to parallel storage
US10552936B2 (en) Solid state storage local image processing system and method
KR102646619B1 (en) Method and system providing file system for an electronic device comprising a composite memory device
US9697111B2 (en) Method of managing dynamic memory reallocation and device performing the method
US11226778B2 (en) Method, apparatus and computer program product for managing metadata migration
CN111881107A (en) Distributed storage method supporting mounting of multi-file system
CN115470156A (en) RDMA-based memory use method, system, electronic device and storage medium
CN1920796A (en) Cache method and cache system for storing file's data in memory blocks
CN112711564B (en) Merging processing method and related equipment
JP6584529B2 (en) Method and apparatus for accessing a file and storage system
CN112148226A (en) Data storage method and related device
US8918621B1 (en) Block address isolation for file systems
US20220382672A1 (en) Paging in thin-provisioned disaggregated memory
CN116955272A (en) File storage method and device
KR100785774B1 (en) Obeject based file system and method for inputting and outputting
TW202340939A (en) Method and computer program product and apparatus for data access in response to host discard commands
CN115421904A (en) Method and device for managing memory, electronic equipment and readable storage medium
CN115048035A (en) Cache management method, device and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination