WO2016155238A1

WO2016155238A1 - File reading method in distributed storage system, and server end

Info

Publication number: WO2016155238A1
Application number: PCT/CN2015/088998
Authority: WO
Inventors: 韩盛中; 李中军; 江俊杰
Original assignee: 中兴通讯股份有限公司
Priority date: 2015-03-27
Filing date: 2015-09-06
Publication date: 2016-10-06
Also published as: CN106161503A

Abstract

A file reading method in a distributed storage system, and a server end. Firstly, the server end acquires a read request from a client via a read thread; then, the server end acquires corresponding file data from a corresponding disk according to the read request; and finally, the server end sends the acquired file data via a pre-established return thread to the client. Compared with the related art, because it is to send a file to the client via the pre-established return thread, instead of returning the file data via the read thread, in this way, the read thread can be released to process the next read request as early as possible without needing to be released to process the next read request until the read file data is returned, so that the processing efficiency of the read request is improved, the efficiency of acquiring the file data is further improved, the processing time is saved, and the user experience is enhanced.

Description

File reading method and server in distributed storage system

Technical field

This document relates to, but is not limited to, the field of communications, and specifically relates to a file reading method and a server in a distributed storage system.

Background technique

Distributed file systems are becoming more widely used, and today's audio and video media files and image files are getting larger and larger, and users are increasingly demanding speed when downloading. Disks, as slow devices, are increasingly becoming bottlenecks for reading files. Although solid state drive (SSD) technology is developing rapidly today, SSD is still not a substitute for traditional mechanical disks, both in terms of capacity and price. Therefore, how to effectively utilize the read performance of the disk to satisfy the user download speed to the greatest extent becomes an urgent problem to be solved in the distributed file system. In the related art, the distributed file system is a multi-thread synchronous processing user concurrent read request. As shown in FIG. 1, the client sends a read request to the read thread, and the read thread throws the read request into the corresponding disk queue, each disk. The corresponding processing thread takes the request out of the queue, and returns to the read thread as it is after reading, at which point the read thread can return the read file to the user. In this way, each read thread must wait until the returned file is returned before processing the next read request, thus reducing the processing efficiency of the read request and further reducing the efficiency of obtaining the file.

Summary of the invention

The following is an overview of the topics detailed in this document. This Summary is not intended to limit the scope of the claims.

The embodiment of the invention provides a file reading method and a server in a distributed storage system, and solves the related art that the read thread must wait for the file to be read back before releasing the next read request, thereby causing the processing of the read request. Inefficient technical problems.

The embodiment of the invention provides a file reading method in a distributed storage system, including:

The server obtains a read request from the client through the read thread;

The server obtains corresponding file data from the corresponding disk according to the read request;

The server sends the acquired file data to the client through a pre-established return thread.

In an embodiment of the present invention, the server obtains the corresponding file data from the corresponding disk according to the read request, including:

The server reads the corresponding file data from the corresponding plurality of disks according to the read request, and stores the read file data in the data buffer area;

The server sends the obtained file data to the client through a pre-established return thread, including:

The server determines whether the file data exists in the data buffer. If the file data exists, the file data is immediately sent to the client through a pre-established return thread.

In an embodiment of the present invention, the return thread includes a plurality of sub-return threads, and each sub-return thread corresponds to one data buffer area; determining whether the data cache area has file data includes: the server returns a thread through a sub-return Querying, according to a preset rule, whether there is file data in the corresponding data cache area; and sending the file data to the client by using a pre-established return thread includes: sending the queried file data through a corresponding sub-return thread Give the client.

In an embodiment of the present invention, the method further includes: after the server obtains the read request from the client by the read thread, the server reads the corresponding file data from the corresponding disk according to the read request. The server stores the read request in the kernel asynchronous processing queue; the server obtains the corresponding file data from the corresponding disk according to the read request, and the server includes the kernel from the kernel according to a preset processing rule. The asynchronous processing queue fetches the read request, and obtains corresponding file data from the corresponding disk according to the fetched read request.

In an embodiment of the present invention, the preset processing rule includes:

All read requests in the core asynchronous processing queue are fetched according to a preset period, and multiple read requests with sector positions within a preset range value are combined, and multiple reads with sector positions within a preset range value are acquired Request corresponding file data;

Or

Obtaining a read request from the kernel asynchronous processing queue in the order in which the kernel asynchronous processing queue is stored, and acquiring corresponding file data according to the read request.

The embodiment of the invention further provides a server, including a read request acquisition module, a file data acquisition module and a return thread module:

The read request acquisition module is configured to acquire a read request from a client through a read thread;

The file data obtaining module is configured to obtain corresponding file data from the corresponding disk according to the read request;

The return thread module is configured to send the acquired file data to the client through a pre-established return thread.

In an embodiment of the present invention, the file data obtaining module includes a file data obtaining submodule and a data cache submodule:

The data acquisition submodule is configured to respectively read corresponding file data from the corresponding plurality of disks according to the read request; the data cache submodule is configured to store the file data read by the data acquisition submodule into the data buffer. ;

The returning thread module is further configured to determine whether the file data exists in the data buffer. If the file data exists, the file data is immediately sent to the client through a pre-established return thread.

In an embodiment of the present invention, the return thread includes a plurality of sub-return threads, and each sub-return thread corresponds to one data buffer area; the data acquisition sub-module is further configured to: query by a sub-return thread according to a preset query rule. Whether the file data exists in the corresponding data buffer; the return thread module is further configured to: send the queried file data to the client through the corresponding sub-return thread.

In an embodiment of the present invention, the method further includes a kernel asynchronous processing queue, where the kernel asynchronous processing queue is configured to save the read request after the read request obtaining module acquires a read request from a client by using a read thread. The file data acquisition module further includes a receiving submodule: the receiving submodule is configured to fetch a read request from the kernel asynchronous processing queue according to a preset processing rule, and the data obtaining submodule is further configured to The corresponding file data is obtained from the corresponding disk according to the read request.

In an embodiment of the present invention, the preset processing rule includes:

or

The read request is obtained from the kernel asynchronous processing queue in the order in which the kernel asynchronous processing queue is stored, and the corresponding file data is acquired according to the read request.

The present invention also provides a computer storage medium having stored therein computer executable instructions for performing the above method.

The beneficial effects of the embodiments of the present invention are:

The embodiment of the invention provides a file reading method and a server in a distributed storage system. First, the server obtains a read request from a client through a read thread. Then, the server obtains a corresponding request from the corresponding disk according to the read request. The file data; finally, the server sends the obtained file data to the client through a pre-established return thread. Compared with the related art, since the file is sent to the client through the pre-established return thread, instead of returning the file data through the read thread, the read thread does not have to wait for the returned file data to be released before processing. A read request can be released as soon as possible to process the next read request, improving the processing efficiency of the read request, further improving the efficiency of obtaining file data, saving processing time, and enhancing user experience.

Other aspects will be apparent upon reading and understanding the drawings and detailed description.

BRIEF abstract

1 is a schematic flow chart of a file reading method in a related art distributed storage system;

2 is a schematic flowchart of a file reading method in a distributed storage system according to Embodiment 1 of the present invention;

3 is a schematic structural diagram 1 of a server provided according to Embodiment 2 of the present invention;

4 is a second schematic diagram of a server structure according to Embodiment 2 of the present invention;

FIG. 5 is a schematic structural diagram 3 of a server provided according to Embodiment 2 of the present invention; FIG.

6 is a schematic flowchart of a file reading method in a distributed storage system according to Embodiment 3 of the present invention;

FIG. 7 is a schematic structural diagram of a server in a file reading method in a distributed storage system according to Embodiment 3 of the present invention; FIG.

FIG. 8 is a schematic structural diagram of an asynchronous input/output module in a file reading method in a distributed storage system according to Embodiment 3 of the present invention.

Preferred embodiment of the invention

The present application will be further described in detail below with reference to the accompanying drawings.

Embodiment 1:

2 is a schematic flowchart of a file reading method in a distributed storage system according to Embodiment 1 of the present invention, which includes the following steps:

Step S101: The server obtains a read request of the file data from the client by using the read thread.

In this step, the server obtains a read request of the file data from the client, including: a read request parameter, and the read request parameter includes: a file handle, an offset, and a length.

The client goes to the metadata server to obtain the copy location corresponding to the read request (the server location and the disk location on the server), and sends the read request to the corresponding server.

The file data here may be a complete file data corresponding to a read request, or may be part of the file data corresponding to the read request. For example, the user wants to read the A file data, which is a video file, and the A file data is distributed and stored in Disk 1, Disk 2, and Disk 3 in a video server. The server then obtains a read request for the A file data from the client through the read thread from the video server.

Step S102: The server obtains corresponding file data from the corresponding disk according to the read request.

In this step, the corresponding file data herein refers to a partial file in which the file corresponding to the read request is distributed and stored in the disk. Referring to the example in step S101, after receiving the read request of the A file data, the server receives the disk 1, the disk 2, and the disk 3 from the server storing the A file data. The partial file data of the corresponding A file data is respectively obtained.

Step S103: The server sends the acquired file data to the client through a pre-established return thread.

In this step, the return thread should be understood as a thread that can be used to return the acquired file data, unlike the read thread. Referring to the example in step S102, after the partial file data of the corresponding A file is respectively obtained from the disk 1, the disk 2, and the disk 3, the file data is sent to the client through the return thread. Instead of using the related technology, after reading, return to the client by the original read-only thread. This will quickly release the read request and process the next read request.

Optionally, since the read data of different disks may be different, avoiding the slow processing of a disk may affect the processing of multiple worker threads. It is worth noting that the worker thread here refers to all used to process the read. The requesting thread, including the thread involved in processing the read request, such as the read thread and the return thread, may cause each disk to be asynchronously processed for a certain read request. The specific implementation may be, in the above step S102, according to the read request. The corresponding file data obtained in the corresponding disk may be: reading the corresponding file data from the corresponding multiple disks according to the read request, and storing the read file data in the data buffer area; optionally, the file Once the data is read, it is placed in the data buffer. Sending the acquired file data to the client through the pre-established return thread in the foregoing step S103 includes: determining whether the file data exists in the data buffer area, and if the file data exists, immediately sending the file data to the client through the pre-established return thread. . In this way, for the first processed disk and the corresponding read request, the next read request can be processed to obtain the file data corresponding to the next read request, and the processing efficiency of the read file and the concurrent throughput utilization of the disk are improved.

Optionally, in order to quickly send file data to the client, the obtained file data blocking thread affects the data processing of the subsequent disk and the throughput of the disk. Specifically, the return thread may be configured to include multiple sub-return threads, and each sub-return thread corresponds to one data buffer area; whether the file data exists in the judgment data buffer area may be used by the server to query the corresponding data buffer area according to a preset rule by the sub-return thread. There is file data; the above file data is sent to the client through a pre-established return thread, and the file data that is queried can be sent to the client through the corresponding sub-return thread. Optionally, the preset rule query here is a polling query rule.

Optionally, in order to improve the release of the read request and improve the processing efficiency of the read request, the throughput of the disk is maximized. After the above step S101, before step S102, The method includes the steps of: after the server obtains the read request from the client by the read thread, and before reading the corresponding file data from the corresponding disk according to the read request, the step of: storing the read request into the kernel asynchronous processing queue; the above step S102 That is, the server obtains the corresponding file data from the corresponding disk according to the read request, and the server may take the read request from the kernel asynchronous processing queue according to the preset processing rule, and obtain the corresponding request from the corresponding disk according to the read request. File data. Optionally, the preset processing rule includes: extracting all read requests in the core asynchronous processing queue according to the preset period, and combining the read requests in which the sector positions are within a certain preset range value, and acquiring the sector position at a certain pre-preparation The file data corresponding to each of the plurality of read requests in the range value is set; or the read request in the queue is asynchronously processed from the kernel according to the order in which the kernel asynchronously processes the queue, and the file data corresponding to the read request is obtained. Of course, other rules can be set to increase the maximum throughput of the disk.

Embodiment 2:

The server provided in this embodiment, as shown in FIG. 3, includes a read request obtaining module 201, a file data obtaining module 202, and a returning thread module 203. The read request obtaining module is configured to obtain a read request from the client through the read thread. The file data obtaining module is configured to obtain corresponding file data from the corresponding disk according to the read request; the return thread module is configured to send the acquired file data to the client through a pre-established return thread.

Optionally, the server is further provided by the embodiment. As shown in FIG. 4, the file data obtaining module 202 includes a file data obtaining submodule 2022 and a data buffer submodule 2021: the data obtaining submodule 2022 is configured to be based on the read request. The corresponding file data is respectively read from the corresponding plurality of disks, and the data cache sub-module 2021 is configured to store the file data read by the data acquisition sub-module into the data buffer; the return thread module 203 is further configured to determine the data cache. Whether the file data exists in the area, if there is file data, the file data is immediately sent to the client through a pre-established return thread.

Optionally, the return thread includes multiple sub-return threads, and each sub-return thread corresponds to one data buffer area; the data acquisition sub-module 2021 is further configured to: pass the sub-return thread according to a preset query rule. Query whether there is file data in its corresponding data buffer; the return thread module 203 is further configured to: send the queried file data to the client through the corresponding sub-return thread.

Optionally, a server provided by the embodiment, as shown in FIG. 5, further includes a kernel asynchronous processing queue module 204. The kernel asynchronous processing queue module 204 is configured to store the read request acquired by the read request obtaining module 201. The kernel asynchronous processing queue; the file data obtaining module further includes a receiving sub-module 2023: the receiving sub-module 2023 is configured to fetch the read request from the kernel asynchronous processing queue according to a preset processing rule, and the file data obtaining sub-module 2022 is further configured to be based on the fetched read request. Obtain the corresponding file data from the corresponding disk.

Optionally, the preset processing rule includes: extracting all read requests in the core asynchronous processing queue according to the preset period, and combining the read requests in which the sector positions are within a certain preset range value, and acquiring the sector position at a certain pre-preparation The file data corresponding to each of the plurality of read requests in the range value is set; or the read request is obtained from the kernel asynchronous processing queue according to the order of storing the asynchronous processing queue of the kernel, and the corresponding file data is obtained according to the read request. It is worth noting that the read request merging process within the preset range value here refers to merging read requests with relatively close sectors to reduce unnecessary seek time.

Embodiment 3:

FIG. 6 is a schematic flowchart of a file reading method in a distributed storage system according to Embodiment 3 of the present invention, which includes the following steps:

Step S301: The user (the user here is a process for the file system, that is, the process of calling the file system interface) calls the read data interface of the file system;

Step S302: The client program obtains the corresponding copy location (server, disk) according to the read request parameter (file handle, offset, length, etc.) for submission, and sends the read request to the corresponding server.

Step S303: The server sends the read request to the kernel asynchronous processing queue of the corresponding disk according to the disk information obtained by the client at the metadata server.

Step S304: The asynchronous input/output AIO module of the kernel continuously scans the read request in the kernel asynchronous processing queue, and combines the read requests of the sectors close to each other within a certain time range, thereby minimizing the seek channel. time consuming. After the disk read request returns, the number of these responses will be Placed in the response queue as requested

Step S305: The polling thread continuously queries the corresponding disk AIO module. If there is no data in the waiting area of the AIO module, the polling is continued until the data is available, and the process proceeds to S306.

Step S306: After the polling thread acquires the data, the data is returned to the client resident program. Continue to poll the AIO module. The job of polling the thread here is to constantly scan the waiting area of the AIO module, and the data is sent to the client resident program.

Step S307: After the client resident program obtains the data, return to the user.

Optionally, in combination with FIG. 7, FIG. 7 briefly shows the composition of the server. It is worth noting that the structure here is only one type of server, and of course, it can be a server of other different structures. The asynchronous input/output (AIO) module is one of the file data acquisition modules in the above second embodiment, and the polling thread is one of the return threads. It can be seen clearly that each read thread, as long as the read request is placed in the kernel asynchronous processing queue, can be released immediately to process the next request, and will not hang over there. At this time, an asynchronous input and output (AIO) module will take out the requests placed in the asynchronous processing queue and process them (detailed in Figure 8), sorting and aggregation to maximize the disk throughput. As shown in Figure 8, the AIO module is mainly divided into three parts: AIO receiver, AIO processor and AIO waiting area. The AIO receiver is set to periodically take out read requests from the AIO queue. For example, it can be taken once, and these requests are sent. For the AIO processor, the AIO processor is set to tidy up these requests, combine read requests with closer sector locations, reduce unnecessary seek times, and send processed request data to the AIO wait area; AIO waits The zone is set to wait for the polling thread to get the request data. Optionally, the AIO receiver is one of the receiving submodules, the AIO processor is one of the data acquiring submodules, and the AIO waiting area is one of the buffer submodules.

One of ordinary skill in the art will appreciate that all or a portion of the above steps may be accomplished by a program that instructs the associated hardware, such as a read-only memory, a magnetic disk, or an optical disk. Optionally, all or part of the steps of the above embodiments may also be used. One or more integrated circuits are implemented. Correspondingly, each module/unit in the foregoing embodiment may be implemented in the form of hardware or in the form of a software function module. This application is not limited to any specific combination of hardware and software.

The above embodiments are only used to illustrate the technical solutions of the present application, and are not intended to be limiting, and the present application is only described in detail with reference to the preferred embodiments.

Industrial applicability

In the above technical solution, the read thread does not have to wait for the returned file data to be released before processing the next read request, and can be released as soon as possible to process the next read request, thereby improving the processing efficiency of the read request and further improving the efficiency of acquiring the file data. , saving processing time and enhancing user experience.

Claims

A file reading method in a distributed storage system, comprising:

The server obtains a read request from the client through the read thread;

The server obtains corresponding file data from the corresponding disk according to the read request;

The server sends the acquired file data to the client through a pre-established return thread.
A file reading method in a distributed storage system according to claim 1, wherein

The server obtains the corresponding file data from the corresponding disk according to the read request, including:

The server reads the corresponding file data from the corresponding plurality of disks according to the read request, and stores the read file data in the data buffer area;

The server sends the obtained file data to the client through a pre-established return thread, including:

The server determines whether the file data exists in the data buffer. If the file data exists, the file data is immediately sent to the client through a pre-established return thread.
A file reading method in a distributed storage system according to claim 2, wherein

The return thread includes a plurality of sub-return threads, and each sub-return thread corresponds to one data buffer area;

Determining whether the file data exists in the data buffer comprises: the server querying, by the sub-returning thread, whether the file data exists in the corresponding data buffer according to the preset rule;

The sending the file data to the client by using a pre-established return thread includes:

The queried file data is sent to the client through a corresponding sub-return thread.
The method for reading a file in a distributed storage system according to any one of claims 1 to 3, further comprising:

After the server obtains the read request from the client through the read thread, the server stores the read request into the kernel asynchronous processing queue before the server reads the corresponding file data from the corresponding disk according to the read request.

And the server obtains the corresponding file data from the corresponding disk according to the read request, including:

The server extracts the read request from the kernel asynchronous processing queue according to the preset processing rule, and obtains corresponding file data from the corresponding disk according to the read request.
The file reading method in a distributed storage system according to claim 4, wherein the preset processing rule comprises:

All read requests in the core asynchronous processing queue are fetched according to a preset period, and multiple read requests with sector positions within a preset range value are combined, and multiple reads with sector positions within a preset range value are acquired Request corresponding file data;

or

The read request is obtained from the kernel asynchronous processing queue in the order in which the kernel asynchronous processing queue is stored, and the corresponding file data is acquired according to the read request.
A server includes a read request acquisition module, a file data acquisition module, and a return thread module:

The read request obtaining module is configured to acquire a read request from a client by using a read thread;

The file data obtaining module is configured to obtain corresponding file data from the corresponding disk according to the read request;

The return thread module is configured to send the acquired file data to the client through a pre-established return thread.
The server according to claim 6, wherein the file data obtaining module comprises a file data obtaining submodule and a data cache submodule:

The data acquisition submodule is configured to respectively read corresponding file data from the corresponding plurality of disks according to the read request;

a data buffer sub-module, configured to store the file data read by the data acquisition sub-module into the data buffer;

The returning thread module is further configured to determine whether file data exists in the data buffer area, and if there is file data, immediately send the file data to the pre-established return thread Said client.
The server according to claim 7, wherein

The return thread includes a plurality of sub-return threads, and each sub-return thread corresponds to one data buffer area;

The data acquisition sub-module is further configured to query, by the sub-returning thread, whether the file data exists in the corresponding data buffer area according to the preset query rule;

The return thread module is further configured to send the queried file data to the client through a corresponding sub-return thread.
The server according to any one of claims 6-8, further comprising a kernel asynchronous processing queue;

The kernel asynchronous processing queue is configured to store the read request in a kernel asynchronous processing queue after the read request acquisition module acquires a read request from a client through a read thread;

The file data obtaining module further includes a receiving submodule: the receiving submodule is configured to fetch a read request from the kernel asynchronous processing queue according to a preset processing rule;

The data acquisition submodule is further configured to obtain corresponding file data from the corresponding disk according to the read request.
The server according to claim 9, wherein the preset processing rule comprises:

All read requests in the core asynchronous processing queue are fetched according to a preset period, and multiple read requests with sector positions within a preset range value are combined, and multiple reads with sector positions within a preset range value are acquired Request corresponding file data;

or

The read request is obtained from the kernel asynchronous processing queue in the order in which the kernel asynchronous processing queue is stored, and the corresponding file data is acquired according to the read request.
A computer storage medium having stored therein computer executable instructions for performing the method of any one of claims 1 to 5.