CN107480150B - File loading method and device - Google Patents

File loading method and device Download PDF

Info

Publication number
CN107480150B
CN107480150B CN201610399257.8A CN201610399257A CN107480150B CN 107480150 B CN107480150 B CN 107480150B CN 201610399257 A CN201610399257 A CN 201610399257A CN 107480150 B CN107480150 B CN 107480150B
Authority
CN
China
Prior art keywords
data
reading
file
memory
read
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610399257.8A
Other languages
Chinese (zh)
Other versions
CN107480150A (en
Inventor
黄硕
刘俊峰
姚文辉
朱家稷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610399257.8A priority Critical patent/CN107480150B/en
Publication of CN107480150A publication Critical patent/CN107480150A/en
Application granted granted Critical
Publication of CN107480150B publication Critical patent/CN107480150B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A file loading method and device comprise the following steps: the file loading device establishes a memory mapping file of a disk file, sequentially reads data from the memory mapping file, and repeatedly sends a pre-reading notification to an operating system, wherein the operating system is notified each time to pre-read data at a specified position and a specified data amount in the disk file into a memory corresponding to the memory mapping file; and the file loading device stores the data read from the memory mapping file into a final data structure of the memory according to the storage format of the data. The file loading device comprises a data loading module, a file reading module and a memory management module. The file loading method and the file loading device can improve the loading speed of the file and can also enable the memory usage amount in the loading process to be stable and controllable.

Description

File loading method and device
Technical Field
The present invention relates to the field of computers, and in particular, to a file loading method and apparatus.
Background
In a large-scale distributed file system, in order to meet the access requirements of high concurrency and low delay of massive clients, a metadata Server (Meta Server) usually stores all metadata in a memory, and realizes persistent storage of the metadata by recording a metadata operation log (operation log) and periodically generating a metadata memory mapping file (checkpoint file). A metadata memory map file typically includes a plurality of data sections and a header. The data area records a large amount of metadata such as a directory path tree, a file table, a data copy position and the like, and compresses large blocks (for example, more than 4KB) of data in the data area so as to save space; the header records start and end information of the data area, compression parameters, and the like.
The quick loading of the metadata memory mapping file is of great significance for improving the availability and the operation and maintenance of the distributed file system. When the metadata server is restarted due to software upgrade, failure, or the like, in order to recover the metadata in the memory, it is necessary to load a recently generated memory image file and re-execute all the operation logs after the file. On one hand, the loading step of the metadata memory mapping file is that the time for upgrading and the time for recovering faults of the metadata server can be directly shortened when the loading is shortened on the key path for upgrading and recovering faults of the metadata server; on the other hand, the data volume of the memory mapping file of the metadata server in a large-scale scene can reach hundreds of GB, the time consumed by the loading step can account for more than 50% -80% of the total restarting time of the metadata server, and the influence on the upgrading time and the fault recovery time is obvious.
According to the file reading method, the existing file loading method can be divided into two types:
the first is a file interface based method: and sequentially reading the memory mapping File data by using a File reading interface (such as a read function of a C language and a Java File API), then processing the memory mapping File data, and storing the read metadata into a corresponding memory structure. In the process of calling the file reading interface by the reading thread, the file library (file library) continuously calls a reading function of the operating system to load data from the disk into the memory, and copies the data into a buffer (buffer) of a user or the library. In the process of reading the memory image file, when the compressed metadata block is encountered, the compressed metadata block can be synchronously decompressed and then processed, or the data can be copied into a temporary memory buffer and then delivered to a special thread to asynchronously decompress the data into a final metadata memory structure.
This approach suffers from at least two loading speed effects: 1. overhead implemented inside the repository: a file library (e.g., glibc file) needs to perform a plurality of status value updates and maintenance operations at each operation for the purposes of maintaining internal buffers, ensuring concurrency security, and the like. When the data amount read each time is small (for example, there are many integers of four bytes between data blocks in the metadata file), the ratio of the implementation overhead inside the library to the total overhead of one read operation will be high, reducing the effective utilization rate of the CPU, and thus reducing the data reading speed. 2. Memory copy overhead: after the operating system reads data from a disk into a kernel space (kernel space) of the system, the file interface needs to copy the data from the kernel space to a user space (user space) where a read thread is located by memory copy once. After that, the read thread can further decompress the data into the final metadata memory structure. Although memory copying is fast, it is time consuming at large data volumes of several hundred GB.
Another File loading method is a Memory Mapped File (Memory Mapped File) based method: and mapping the memory mapping file to a section of memory area of the process address space of the metadata server, sequentially accessing the memory area to obtain file data and then processing the file data. In addition, the operating system may also be informed that SEQUENTIAL reads will be performed later, facilitating system-optimized data loading (e.g., using the madvise function of Linux, in conjunction with the MADV _ sequence parameter). In the process of accessing the memory by the reading thread, when a memory address of which data is not loaded from the disk is accessed, a page fault interrupt (major page fault) which needs to access the disk is generated, and an operating system needs to synchronously wait for loading the data from the disk; meanwhile, the operating system asynchronously and continuously pre-reads (Read-ahead) the file content that has not been accessed from the disk into the Page Cache (Page Cache) of the memory, so as to avoid the Page fault interrupt caused by the subsequent access, and directly hit the Page Cache (minor Page fault).
This method also has the following two problems: 1. process memory occupancy overrun: after the memory mapping file is accessed, a memory Page (Page) containing corresponding data is recorded into a resident memory of the process, so that the used data occupies a memory of the metadata server. Before the part of memory pages are cleaned by an operating system, on one hand, the memory of the metadata server is likely to exceed a preset upper limit of use, and the metadata server is killed by a resource limiting program; on the other hand, the available memory for other processes is reduced. 2. Insufficient operating system read ahead: in the process of sequentially accessing the memory mapping file, the operating system can pre-read in advance, so that the parallelism of disk access and data processing is realized. But the problems of when to pre-read, how much data to pre-read each time, etc. are determined by the operating system itself. The operating system starts from universality, the amount of one-time pre-reading is not large, the requirement of parallel processing can not be met frequently, page fault interruption occurs, and the improvement of the throughput rate of a disk is not facilitated.
Similar problems exist for other application scenarios that require loading a disk file into a memory data structure according to a corresponding format.
Disclosure of Invention
In view of this, the present invention provides the following.
A file loading method is applied to a file loading device and comprises the following steps:
establishing a memory mapping file of a disk file, sequentially reading data from the memory mapping file, and sending a pre-reading notification to an operating system for multiple times, wherein the operating system is notified each time to pre-read data at a specified position and a specified data amount in the disk file into a memory corresponding to the memory mapping file;
and storing the data read from the memory mapping file into a final data structure of the memory according to the storage format of the data.
A file loading device comprises a data loading module, a file reading module and a memory management module, wherein:
the data loading module is used for continuously initiating a reading request to the file reading module and storing the obtained data into a final data structure of the memory according to the storage format of the data;
the file reading module is used for establishing a memory mapping file of a disk file, receiving a reading request of the data loading module, sequentially reading data from the memory mapping file, and triggering the memory management module to perform memory management;
the memory management module is configured to perform memory management, where the memory management includes: and sending a pre-reading notice to an operating system for multiple times, and informing the operating system to pre-read the data at the appointed position and the appointed data volume in the disk file to the memory corresponding to the memory mapping file every time.
The file loading method and the file loading device can make full use of the capacities of the disk and the CPU, improve the loading speed of the file, and enable the memory usage amount in the loading process to be stable and controllable.
Drawings
FIG. 1 is a flow chart of a method for loading a file according to an embodiment of the present invention;
fig. 2 is a block diagram of a file loading apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.
In this embodiment, a file loading method based on a memory mapped file is improved, and a file loading device actively manages pre-reading and notifies an operating system of a pre-reading position and a pre-reading data amount, so that pre-reading data of the operating system and data of the memory mapped file read by the file loading device are adapted to each other, thereby increasing the loading speed of the file.
As shown in fig. 1, the file loading method of this embodiment includes:
step 110, establishing a memory mapping file of a disk file, sequentially reading data from the memory mapping file, and sending a pre-reading notification to an operating system for multiple times, wherein the operating system is notified each time to pre-read data at a specified position and a specified data amount in the disk file into a memory corresponding to the memory mapping file;
in this embodiment, sending the pre-read notification to the operating system for multiple times is implemented by setting a pre-read position and comparing the pre-read position, and specifically, when the file loading device sequentially reads data from the memory mapped file, the file loading device sends the pre-read notification to the operating system once each time the data read position reaches the set pre-read position. Because the size of the data read each time is changed, the change of the data reading position is also large or small, and the data reading position reaches the set pre-reading position, and the data reading position can be equal to the set pre-reading position or exceed the set pre-reading position.
In this embodiment, the pre-reading positions are sequentially set as follows: and sending a pre-reading notification to the operating system once each time the data reading position reaches the pre-reading position set for the last time, resetting the pre-reading position before the end position of the data pre-read in the memory mapping file according to the pre-reading notification by the operating system, so as to ensure that the file loading device reads the data in the memory mapping file and avoid frequent page missing interruption. The initial value of the pre-reading position may be set to the initial data reading position or set before the initial data reading position, but for the first pre-reading before starting to read data, the operating system may be notified that the current data reading position reaches the pre-reading position and the operating system performs the pre-reading from the initial position of the memory mapped file without depending on the comparison between the above setting and the two initial positions. The sequential setting can avoid occupying too much resources, but it is also possible to set all the pre-reading positions in advance.
For convenience, the pre-reading position, the data reading position, and the ending position of the pre-read data in the memory map file may be represented by an offset value (e.g., an offset value of these positions relative to the starting position of the memory map file). There are many specific implementations for the re-set pre-read location to be related to the specific setting of the parameter before the end location of the pre-read data in the memory mapped file.
In this embodiment, the preset pre-reading position is determined according to the following formula: pa is Pb + Ps, wherein Pa is a preset pre-reading position; pb is the designated position, and is equal to the current data reading position or the last set pre-reading position; and Ps is equal to the difference obtained by subtracting the specified pre-reading advance from the specified data quantity. The preset pre-reading position can be before the end position of the data pre-read by the operating system in the memory mapping file and is spaced by the specified pre-reading advance amount.
When the current data reading position reaches the last set pre-reading position, there may be a deviation (exceeding a certain distance) between the two positions, the size of the deviation is related to the size of the data unit in the disk file, if the deviation is large (if there is a compressed block), the actual advance may be much smaller than the specified advance. Pb is equal to the current data reading position, and the pre-reading lead can be controlled more accurately, so that the method is suitable for various file formats. If the requirement of the pre-reading advance can be met, Pb can be taken as the last set pre-reading position. It should be noted here that the information specifying the data size and the specified location may be carried in each notification, or may be in a default manner, which is related to the specific implementation of the interface function. The initial read-ahead position may be defined by some default value, such as 0, or may be carried in the notification. These specific implementations are not intended to limit the invention in any way. In addition, when the operating system starts to pre-read data from a specified position, part of the data may have been pre-read, and the operating system can judge and skip the data.
And step 120, storing the data read from the memory mapping file into a final data structure of the memory according to the storage format of the data.
It should be noted that the processing described in steps 110 and 120 is not necessarily sequential, and is merely for convenience. In fact, the data read from the memory mapped file is stored to the final data structure of the memory, and the pre-reading process is parallel.
If the disk file has compressed blocks, for example, the disk file is a metadata memory mapping file of a distributed file system, the data size may be processed according to the read data size, as follows:
after reading data from the memory mapping file each time, judging whether the data volume of the read data is smaller than the minimum data volume of the compressed block of the disk file after decompression:
if so, storing the read data into a final memory data structure according to the storage format of the data;
if not, decompressing the read data in a multithreading and asynchronous mode, and then storing the decompressed data in a final memory data structure according to the storage format of the data.
In order to clean the memory occupied by the read data in time, the method of this embodiment further includes: recording the last three preset pre-reading positions and continuously updating; and each time the data reading position reaches the last set pre-reading position, before resetting the pre-reading position, informing the operating system to clean the memory area between the pre-reading positions set for the first two times in the last three sets of pre-reading positions. In this way, the user can specify the maximum memory footprint M, and the specified data size can be set to a value less than or equal to M/2+ AHEAD and greater than AHEAD, and preferably set to a value less than or equal to M/2+ AHEAD and greater than or equal to M/2. Through the setting, the memory space actually occupied by the memory mapping file in the loading process can be not more than M, wherein AHEAD is the appointed pre-reading advance. In this embodiment, the size of the designated data amount determines the step size of the pre-read position change.
As shown in fig. 2, the file loading apparatus of this embodiment includes a data loading module 10, a file reading module 20, and a memory management module 30, which may be implemented by software or hardware circuits running on corresponding hardware, where:
the data loading module 10 is configured to continuously initiate a read request to the file reading module, and store the obtained data in a final data structure of the memory according to a storage format of the data;
the file reading module 20 is configured to establish a memory mapping file of a disk file, receive a reading request of the data loading module, sequentially read data from the memory mapping file, and trigger the memory management module to perform memory management;
the memory management module 30 is configured to perform memory management, where the memory management includes: and sending a pre-reading notice to an operating system for multiple times, and informing the operating system to pre-read the data at the appointed position and the appointed data volume in the disk file to the memory corresponding to the memory mapping file every time.
Alternatively,
the memory management module sends a pre-reading notice to an operating system for multiple times, and the pre-reading notice comprises the following steps: and when the data are sequentially read from the memory mapping file, sending a pre-reading notice to the operating system each time the data reading position reaches the set pre-reading position.
Alternatively,
the memory management module successively sets the pre-reading position according to the following mode:
and sending a pre-reading notification to the operating system once each time the data reading position reaches the pre-reading position set for the last time, and resetting the pre-reading position before the end position of the data pre-read by the operating system in the memory mapping file according to the pre-reading notification.
Alternatively,
the memory management module determines the reset pre-reading position according to the following formula: pa is Pb + Ps, wherein Pa is a preset pre-reading position; pb is the designated position, and is equal to the current data reading position or the last set pre-reading position; and Ps is equal to the difference obtained by subtracting the specified pre-reading advance from the specified data quantity.
Alternatively,
the memory management module performs memory management, and further includes: recording the last three preset pre-reading positions and continuously updating; and each time the data reading position reaches the last set pre-reading position, before resetting the pre-reading position, informing the operating system to clean the memory area between the pre-reading positions set for the first two times in the last three sets of pre-reading positions.
Alternatively,
the specified data volume is less than or equal to M/2+ AHEAD and more than or equal to M/2, wherein M is the maximum memory occupation volume specified by the user, and AHEAD is the pre-reading advance volume specified by the user.
Alternatively,
the file loading device also comprises a decompression module;
after the file reading module reads data from the memory mapped file each time, the method further comprises: judging whether the data volume of the read data is smaller than the minimum data volume of the compressed block of the disk file after decompression: if yes, returning the read data to the data loading module; if not, the read data is handed to the decompression module for processing;
the decompression module is used for decompressing the received data in a multithreading and asynchronous mode and then storing the decompressed data into a final memory data structure according to the storage format of the data.
Alternatively,
the disk file loaded by the file loading device is a metadata memory mapping file of the distributed file system.
The invention is described below using an example in one application.
The example relates to a file loading device in a distributed file system, and a disk file to be loaded is a metadata memory mapping file of the distributed file system.
The file loading apparatus of this example includes a metadata loading module, a file reading module, a memory management module, and a decompression module, where:
and the metadata loading module is an initiator and a controller in the loading process and is used for continuously initiating a file reading request to the file reading module and storing the obtained metadata into a final metadata structure of the memory according to the storage format of the data so as to realize the recovery of the metadata.
And the file reading module is used for establishing a memory mapping file of the disk file, receiving a reading request of the data loading module, sequentially reading data from the memory mapping file, and triggering the memory management module to perform memory management. And the read data is returned to the metadata loading module or handed to the decompression module for processing.
And the decompression module is used for decompressing the received data in an asynchronous mode according to the data submitted by the file reading module, and then storing the decompressed data into a final memory data structure according to the storage format of the data. When the file reading module submits data, the memory address and length of the compressed data block, the expected length after decompression, and the metadata memory structure address (i.e. the destination address of decompression) provided by the metadata loading module may be provided to the decompression module. The decompression module can directly access the memory address to decompress the data into the metadata structure without additional user mode memory copy; the decompression module may contain multiple decompression threads to perform parallel processing, making full use of the multi-core CPU. Therefore, in the process of reading the memory map file data, when a compressed metadata block is encountered, the memory address and the length of the data can be directly handed to a special thread to asynchronously decompress the data into the final metadata structure of the memory, and no additional memory buffer and copy are needed.
A memory management module, configured to perform memory management, where the memory management includes: and sending a pre-reading notice to an operating system for multiple times, and informing the operating system to pre-read the data at the appointed position and the appointed data volume in the disk file to the memory corresponding to the memory mapping file every time. Specifically, the memory management record of this example records the current data reading position and the last three times of set pre-reading positions and updates them continuously, and this example refers to the last three times of set pre-reading positions in the order from the beginning to the end of the set time, respectively, as the position to be cleaned next time, the position at the time of last notification pre-reading, and the position at which the pre-reading needs to be notified next time, according to the relationship with the current data reading position. Therefore, on one hand, the operating system is guided to clean up useless memory pages, and the memory usage is controlled; and on the other hand, the operating system is guided to perform sufficient disk data pre-reading, and the disk throughput is improved.
In this example, the memory management module approximately controls, through a sliding interval policy, that the memory occupancy during the reading process does not exceed a maximum MSIZE (e.g., 64MB) specified by the user: in the process of sequentially reading the memory mapping file data, continuously sliding backwards by taking the MSIZE/2-AHEAD as a step length to form an interval with the size of 2 (MSIZE/2-AHEAD); when the data of the MSIZE/2 which is pre-read last time is close to being read (which can be judged by the specified pre-read advance AHEAD, for example, only 32KB is left), the memory interval of the older MSIZE/2-AHEAD is cleared away, the memory interval of the newer MSIZE/2-AHEAD is left for the decompression module to continue to access, and then the memory interval of the MSIZE/2-AHEAD is pre-read later. This ensures that the amount of memory used is approximately MSIZE during the entire sequential reading process; at the same time, a larger MSIZE results in higher disk throughput for pre-reading. The step size may be set to other values such as MSIZE/2.
In this example, the memory management module makes an independent decision according to the current data reading position, and the use condition of the decompression module for the memory interval does not need to be recorded, so that the judgment efficiency on the critical path is improved, and the loading speed is increased. Due to the fact that the multithreading decompression speed is high, when an older half of the memory interval is cleaned, the related decompression task is completed with high probability; even if a small number of outstanding decompression tasks later re-access the cleared memory space, the operating system's page fault interrupt mechanism ensures that the required data can be loaded again on demand.
In this example, the process of loading the metadata memory mapping file SFile by the file loading device is as follows:
step one, a file reading module opens a file header of the SFile, and acquires information such as data area position (Offset and Size), compression type, minimum actual data Size (for example, 4KB) of a compression block and the like; establishing a memory mapping file (for example, calling a linux function) corresponding to the memory area according to the position of the data area; initializing a decompression module according to the compression type;
initializing a memory management module, setting the maximum memory usage to be BUF _ SIZE (for example, 64MB), setting an initial data reading position CurrentPos as Offset according to the start position of the memory mapping file, setting a position "LastMadvise" where the pre-reading is notified last time, a position "nextmadvivise" where the pre-reading is required to be notified next time, and a position "ToDrop to be cleaned next time to be set as CurrentPos or before CurrentPos;
and step three, after the data loading module confirms that the initialization is successful, continuously sending a reading request req to the file reading module. The req may contain the amount of data read, req.size (e.g. 4, read an integer of four bytes), the destination location of read, req.dest (e.g. read 100 bytes into the final data structure address of some metadata);
step four, when the memory management module receives a request req, firstly judging whether the memory management operation needs to be implemented: if CurrentPos > is NextMadvise, the data read in advance last time is close to being read, three parts of work are needed:
(a) the memory management module informs the operating system to clear the MSIZE/2-AHEAD bytes of data starting from ToDrop (ToDrop is not required for CurrentPos, and this step is skipped). On Linux, memory cleaning is realized by using a madvise function, and an MADV _ dontened parameter value is used. The operating system will asynchronously and quickly clean the corresponding memory pages. Even if the decompression module has a very small probability of subsequent accesses to the cleared memory addresses, the page fault interrupt mechanism will ensure that the data is read from the disk again as needed.
(b) And the memory management module informs an operating system to perform data pre-reading and loads MSIZE/2 byte file data starting from CurrentPos. On Linux, pre-read notification is implemented using the madvise function, using MADV _ wired parameter values. The loading process is carried out asynchronously by the operating system, and the loaded data enters the Page Cache of the system.
(c) And sliding the position of the available memory interval backwards, and updating the state: ToDrop ═ LastMadvise ═ NextMadvise ═ CurrentPos + MSIZE/2-AHEAD. The AHEAD (for example, 32KB) is an advance amount for performing pre-reading in advance, and can ensure that the next pre-reading is started when the previous pre-read data is not completely read, so that part of new data is already read completely when the previous data is actually read completely, and the above-mentioned major page fault waiting for the disk is avoided between the two pre-readings.
After the memory management operation is finished, the file reading module has two conditions:
(a) if the req.size is larger than the minimum data size of the compressed block after decompression, the data is proved to be a block of data needing decompression, the size of the compressed block is read, a decompression task is generated together with the req.size (namely the size after decompression), the destination address and the current position, the decompression task is given to a decompression module for processing, then the CurrentPos is updated to be CurrentPos + the size of the compressed block, and finally the loading module is informed of asynchronous processing, and the loading module can continue;
(b) and if the requested data volume is smaller than the minimum data volume after the decompression of the compressed block, directly reading the required data from the current position, converting the required data into the required data type, returning the required data type to the loading module, and updating the CurrentPos + req.
And step six, the data loading module returns to the step three to continue execution until the loading is finished.
And step seven, after the data loading module reads all file data, waiting for the decompression module to complete all decompression tasks, and then informing the file loading module to release the memory mapping file (for example, calling a munmap function of Linux) and clear the environment. At this point, the metadata image file is loaded.
In a specific implementation, when the parameters such as the Offset and the Size are used in the functions such as Linux mmap and madvise, memory page alignment (page alignment) is performed. When reading data at the end of a file, the values of the above parameters may be adjusted to avoid exceeding the end of the file.
It can be seen that, in the above process of this example, by active management of pre-reading by the operating system, efficient determination of memory management operation time, and asynchronous multithread decompression, time-consuming decompression and full parallelism of the critical path of disk reading operation and file reading are enabled, so that the effective utilization rates of the disk and the CPU are improved, and the loading speed is increased. Meanwhile, the total memory usage is maintained near a specified maximum value by reasonably controlling the timing, position and number of pre-reading and memory cleaning, and memory overrun is avoided.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (16)

1. A file loading method is applied to a file loading device and comprises the following steps:
establishing a memory mapping file of a disk file, sequentially reading data from the memory mapping file, and sending a pre-reading notification to an operating system for multiple times, wherein the operating system is notified each time to pre-read data at a specified position and a specified data amount in the disk file into a memory corresponding to the memory mapping file;
and storing the data read from the memory mapping file into a final data structure of the memory according to the storage format of the data.
2. The method of claim 1, comprising:
the sending the pre-read notification to the operating system a plurality of times includes:
and when the data are sequentially read from the memory mapping file, sending a pre-reading notice to the operating system each time the data reading position reaches the set pre-reading position.
3. The method of claim 2, comprising:
the pre-reading position is set successively according to the following modes:
and sending a pre-reading notification to the operating system once each time the data reading position reaches the pre-reading position set for the last time, and resetting the pre-reading position before the end position of the data pre-read by the operating system in the memory mapping file according to the pre-reading notification.
4. The method of claim 3, comprising:
the reset pre-read position is determined according to the following formula:
Pa=Pb+Ps
wherein Pa is a preset pre-reading position; pb is the designated position, and is equal to the current data reading position or the last set pre-reading position; and Ps is equal to the difference obtained by subtracting the specified pre-reading advance from the specified data quantity.
5. The method of claim 3 or 4, comprising:
the method further comprises the following steps:
recording the last three preset pre-reading positions and continuously updating; and each time the data reading position reaches the last set pre-reading position, before resetting the pre-reading position, informing the operating system to clean the memory area between the pre-reading positions set for the first two times in the last three sets of pre-reading positions.
6. The method of any of claims 1-4, comprising:
the specified data volume is less than or equal to M/2+ AHEAD and more than or equal to M/2, wherein M is the maximum memory occupation volume specified by the user, and AHEAD is the pre-reading advance volume specified by the user.
7. The method of any of claims 1-4, wherein:
according to the storage format of the data, storing the data read from the memory mapping file into a final data structure of the memory, including:
after reading data from the memory mapping file each time, judging whether the data volume of the read data is smaller than the minimum data volume of the compressed block of the disk file after decompression:
if so, storing the read data into a final memory data structure according to the storage format of the data;
if not, decompressing the read data in a multithreading and asynchronous mode, and then storing the decompressed data in a final memory data structure according to the storage format of the data.
8. The method of any of claims 1-4, wherein:
the disk file is a metadata memory mapping file of the distributed file system.
9. A file loading device is characterized by comprising a data loading module, a file reading module and a memory management module, wherein:
the data loading module is used for continuously initiating a reading request to the file reading module and storing the obtained data into a final data structure of the memory according to the storage format of the data;
the file reading module is used for establishing a memory mapping file of a disk file, receiving a reading request of the data loading module, sequentially reading data from the memory mapping file, and triggering the memory management module to perform memory management;
the memory management module is configured to perform memory management, where the memory management includes: and sending a pre-reading notice to an operating system for multiple times, and informing the operating system to pre-read the data at the appointed position and the appointed data volume in the disk file to the memory corresponding to the memory mapping file every time.
10. The apparatus of claim 9, wherein:
the memory management module sends a pre-reading notice to an operating system for multiple times, and the pre-reading notice comprises the following steps: and when the data are sequentially read from the memory mapping file, sending a pre-reading notice to the operating system each time the data reading position reaches the set pre-reading position.
11. The apparatus of claim 10, wherein:
the memory management module successively sets the pre-reading position according to the following mode:
and sending a pre-reading notification to the operating system once each time the data reading position reaches the pre-reading position set for the last time, and resetting the pre-reading position before the end position of the data pre-read by the operating system in the memory mapping file according to the pre-reading notification.
12. The apparatus of claim 11, wherein:
the memory management module determines the reset pre-reading position according to the following formula: pa is Pb + Ps, wherein Pa is a preset pre-reading position; pb is the designated position, and is equal to the current data reading position or the last set pre-reading position; and Ps is equal to the difference obtained by subtracting the specified pre-reading advance from the specified data quantity.
13. The apparatus of claim 11 or 12, wherein:
the memory management module performs memory management, and further includes: recording the last three preset pre-reading positions and continuously updating; and each time the data reading position reaches the last set pre-reading position, before resetting the pre-reading position, informing the operating system to clean the memory area between the pre-reading positions set for the first two times in the last three sets of pre-reading positions.
14. The apparatus of any of claims 9-12, wherein:
the specified data volume is less than or equal to M/2+ AHEAD and more than or equal to M/2, wherein M is the maximum memory occupation volume specified by the user, and AHEAD is the pre-reading advance volume specified by the user.
15. The apparatus of any of claims 9-12, wherein:
the file loading device also comprises a decompression module;
after the file reading module reads data from the memory mapped file each time, the method further comprises: judging whether the data volume of the read data is smaller than the minimum data volume of the compressed block of the disk file after decompression: if yes, returning the read data to the data loading module; if not, the read data is handed to the decompression module for processing;
the decompression module is used for decompressing the received data in a multithreading and asynchronous mode and then storing the decompressed data into a final memory data structure according to the storage format of the data.
16. The apparatus of any of claims 9-12, wherein:
the disk file loaded by the file loading device is a metadata memory mapping file of the distributed file system.
CN201610399257.8A 2016-06-07 2016-06-07 File loading method and device Active CN107480150B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610399257.8A CN107480150B (en) 2016-06-07 2016-06-07 File loading method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610399257.8A CN107480150B (en) 2016-06-07 2016-06-07 File loading method and device

Publications (2)

Publication Number Publication Date
CN107480150A CN107480150A (en) 2017-12-15
CN107480150B true CN107480150B (en) 2020-12-08

Family

ID=60594157

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610399257.8A Active CN107480150B (en) 2016-06-07 2016-06-07 File loading method and device

Country Status (1)

Country Link
CN (1) CN107480150B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108628550B (en) * 2018-04-28 2022-02-22 深信服科技股份有限公司 Method, device and system for reading disk mapping file
CN108763104B (en) * 2018-05-23 2022-04-08 北京小米移动软件有限公司 Method and device for pre-reading file page and storage medium
CN109189480B (en) * 2018-07-02 2021-11-09 新华三技术有限公司成都分公司 File system starting method and device
CN109542361B (en) * 2018-12-04 2022-06-07 郑州云海信息技术有限公司 Distributed storage system file reading method, system and related device
CN109656892A (en) * 2018-12-26 2019-04-19 上海百事通信息技术股份有限公司 A kind of online decompression method of file
US11379420B2 (en) 2019-03-08 2022-07-05 Nvidia Corporation Decompression techniques for processing compressed data suitable for artificial neural networks
CN110008016B (en) * 2019-04-15 2021-06-22 深圳市万普拉斯科技有限公司 Anonymous page management method and device, terminal device and readable storage medium
CN111770054A (en) * 2020-05-28 2020-10-13 苏州浪潮智能科技有限公司 Interaction acceleration method and system for SMB protocol read request
CN114461589B (en) * 2021-08-24 2023-04-11 荣耀终端有限公司 Method for reading compressed file, file system and electronic equipment
CN113760191B (en) * 2021-08-31 2022-09-23 荣耀终端有限公司 Data reading method, data reading apparatus, storage medium, and program product
CN113760192B (en) * 2021-08-31 2022-09-02 荣耀终端有限公司 Data reading method, data reading apparatus, storage medium, and program product
CN113986838B (en) * 2021-12-28 2022-03-11 成都云祺科技有限公司 Mass small file processing method and system based on file system and storage medium
CN115495794A (en) * 2022-11-17 2022-12-20 北京华云安信息技术有限公司 Anti-analytic file protection method and device based on file-free technology

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3204323B2 (en) * 1991-07-05 2001-09-04 エヌイーシーマイクロシステム株式会社 Microprocessor with built-in cache memory
CN101315595A (en) * 2008-06-30 2008-12-03 华为技术有限公司 Data reading method and device
CN101814038A (en) * 2010-03-23 2010-08-25 杭州顺网科技股份有限公司 Method for increasing booting speed of computer
CN102483949A (en) * 2009-08-31 2012-05-30 桑迪士克以色列有限公司 Preloading data into a flash storage device
CN102707966A (en) * 2012-04-12 2012-10-03 腾讯科技(深圳)有限公司 Method and device for acceleratively starting operating system, and method, device and terminal for generating prefetched information
CN102750174A (en) * 2012-06-29 2012-10-24 Tcl集团股份有限公司 Method and device for loading file
CN102799456A (en) * 2012-07-24 2012-11-28 上海晨思电子科技有限公司 Method and device for uploading resource files by game engine, and computer
CN103856567A (en) * 2014-03-26 2014-06-11 西安电子科技大学 Small file storage method based on Hadoop distributed file system
WO2014138498A1 (en) * 2013-03-06 2014-09-12 Recupero Gregory Improved client spatial locality through the use of virtual request trackers
CN105487987A (en) * 2015-11-20 2016-04-13 深圳市迪菲特科技股份有限公司 Method and device for processing concurrent sequential reading IO (Input/Output)

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3204323B2 (en) * 1991-07-05 2001-09-04 エヌイーシーマイクロシステム株式会社 Microprocessor with built-in cache memory
CN101315595A (en) * 2008-06-30 2008-12-03 华为技术有限公司 Data reading method and device
CN102483949A (en) * 2009-08-31 2012-05-30 桑迪士克以色列有限公司 Preloading data into a flash storage device
CN101814038A (en) * 2010-03-23 2010-08-25 杭州顺网科技股份有限公司 Method for increasing booting speed of computer
CN102707966A (en) * 2012-04-12 2012-10-03 腾讯科技(深圳)有限公司 Method and device for acceleratively starting operating system, and method, device and terminal for generating prefetched information
CN102750174A (en) * 2012-06-29 2012-10-24 Tcl集团股份有限公司 Method and device for loading file
CN102799456A (en) * 2012-07-24 2012-11-28 上海晨思电子科技有限公司 Method and device for uploading resource files by game engine, and computer
WO2014138498A1 (en) * 2013-03-06 2014-09-12 Recupero Gregory Improved client spatial locality through the use of virtual request trackers
CN103856567A (en) * 2014-03-26 2014-06-11 西安电子科技大学 Small file storage method based on Hadoop distributed file system
CN105487987A (en) * 2015-11-20 2016-04-13 深圳市迪菲特科技股份有限公司 Method and device for processing concurrent sequential reading IO (Input/Output)

Also Published As

Publication number Publication date
CN107480150A (en) 2017-12-15

Similar Documents

Publication Publication Date Title
CN107480150B (en) File loading method and device
US11531482B2 (en) Data deduplication method and apparatus
US10983955B2 (en) Data unit cloning in memory-based file systems
US10175894B1 (en) Method for populating a cache index on a deduplicated storage system
US7392357B2 (en) Snapshot format conversion method and apparatus
US9268711B1 (en) System and method for improving cache performance
US10073649B2 (en) Storing metadata
US9081692B2 (en) Information processing apparatus and method thereof
US20150193342A1 (en) Storage apparatus and method of controlling the same
US9400754B2 (en) Asynchronous swap mechanism and page eviction from memory
JP6412244B2 (en) Dynamic integration based on load
US9268693B1 (en) System and method for improving cache performance
CN107850983B (en) Computer system, storage device and data management method
US9268696B1 (en) System and method for improving cache performance
US10877848B2 (en) Processing I/O operations in parallel while maintaining read/write consistency using range and priority queues in a data protection system
CN110990133A (en) Edge computing service migration method and device, electronic equipment and medium
US11327929B2 (en) Method and system for reduced data movement compression using in-storage computing and a customized file system
JP5444728B2 (en) Storage system, data writing method in storage system, and data writing program
US9798793B1 (en) Method for recovering an index on a deduplicated storage system
US11256434B2 (en) Data de-duplication
US10444991B1 (en) In-place resumable partial decompression
CN114442961B (en) Data processing method, device, computer equipment and storage medium
US11435955B1 (en) System and method for offloading copy processing across non-volatile memory express (NVMe) namespaces
US10204002B1 (en) Method for maintaining a cache index on a deduplicated storage system
CN111625500B (en) File snapshot method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant