CN111399784A

CN111399784A - Pre-reading and pre-writing method and device for distributed storage

Info

Publication number: CN111399784A
Application number: CN202010495460.1A
Authority: CN
Inventors: 麦剑; 史伟; 闵宇
Original assignee: Guangdong Eflycloud Computing Co Ltd
Current assignee: Guangdong Eflycloud Computing Co Ltd
Priority date: 2020-06-03
Filing date: 2020-06-03
Publication date: 2020-07-10
Anticipated expiration: 2040-06-03
Also published as: CN111399784B

Abstract

The invention discloses a pre-reading and writing method and a device for distributed storage, wherein the pre-reading and writing method comprises the following steps: the storage client side continuously reads the data blocks; the statistical prediction module is used for counting the single data blocks, counting the data blocks which need to be read later, calculating the reading times of the data blocks and sequencing the data blocks; repeatedly calculating the data blocks, and completing the statistics of the data blocks to be read to form a statistical result; when the next storage client needs to read the data blocks, predicting the data blocks read by the storage client for each data block, wherein the data blocks to be read are predicted by the statistical prediction module; and predicting the data block with the maximum number of times of reading after each data block as a next data block to be read according to the statistical result. The invention improves the pre-reading efficiency of the scattered data and reduces the frequent reading frequency of the data block by effectively predicting the data block of the next block to be read.

Description

Pre-reading and pre-writing method and device for distributed storage

Technical Field

The present invention relates to the field of distributed storage technologies, and in particular, to a method and an apparatus for pre-reading and pre-writing distributed storage.

Background

Distributed storage is a common storage method, a piece of data content is divided into different small blocks to be stored on a plurality of storage devices, and the obvious difference from a centralized storage mode is distributed storage, and data is stored on different storage devices in a scattered manner.

Distributed storage currently divides a block of data into several parts according to a fixed size, and then stores the small data blocks dispersedly on the whole cluster device. Usually, in order to reduce the influence caused by the device failure, the small data blocks are distributed as dispersedly as possible. However, this distributed data storage method has a disadvantage that the pre-read/write function cannot be well implemented.

Currently, when data is read, data is generally read and written in advance, wherein the function of pre-reading and writing is to predict the next disk data to be read and written, and load in advance. In the traditional non-distributed storage, data is stored continuously, and data is stored one by one, and pre-reading and writing are generally performed by predicting the next adjacent block of data to perform pre-reading and writing. However, after the data uses distributed storage, the continuous data is divided into different small data blocks and scattered on different devices, and the system often cannot predict where the data to be read and written may be located next. Therefore, when data is pre-read and written, the distributed storage system can continuously access different small data blocks, and then the small data blocks are combined to form continuous data, so that the data reading frequency of the distributed storage system is greatly increased, and the pre-reading and writing efficiency is low.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a pre-reading and writing method and device for distributed storage, by recording data blocks which are continuously read and written and the reading times thereof, when the first block of data is read and written, the next block of data to be read and written and the corresponding storage position are analyzed and predicted according to historical statistical data, and the pre-reading and writing operation is performed on the next block of data, so that the reading and writing efficiency of the data blocks is improved.

In order to solve the technical problems, the invention provides the following technical scheme: a pre-reading and writing method for distributed storage comprises the following steps:

s1, the storage client continuously reads the data blocks of the distributed storage system;

s2, the statistics prediction module counts the single data block, counts the data blocks which need to be read, calculates the number of times of reading each data block in the data blocks, and sorts the data blocks from large to small according to the number of times;

s3, repeating the step S2 until the data block which needs to be read by the storage client is counted and a statistical result is formed, and storing the data block in a statistical prediction module;

s4, when the next storage client needs to read the data block, starting from the first block read by the storage client, predicting the data block read by each storage client, and then predicting the data block to be read by the statistical prediction module; the statistical prediction module predicts the data block with the largest number of times of reading after each data block according to the statistical result in step S3, predicts that the data block is the next data block to be read, and allows the storage client to read the data block.

Further, the method for pre-reading and writing of distributed storage further includes step S0, where the storage client writes data into the distributed storage system, and the distributed storage system divides the data into a plurality of data blocks, and then stores the data blocks in different storage devices in a distributed manner.

Further, the method for pre-reading and writing of distributed storage further includes step S5, after the data blocks that need to be read by the storage client are merged by the distributed storage system to form complete data, the complete data is sent to the storage client.

The invention also aims to provide a pre-reading and pre-writing device for distributed storage, which comprises a storage client, a distributed storage system and a statistical prediction module, wherein the distributed storage system comprises a plurality of storage devices;

the storage client is used for writing and reading data into and from the distributed storage system;

the distributed storage system is used for equally dividing data into a plurality of data blocks and dispersedly storing the data blocks in different storage devices;

the statistical prediction module is to: when the storage client continuously reads the data blocks of the distributed storage system, the statistical prediction module performs statistics on the single data blocks, performs statistics on the data blocks to be read later, calculates the number of times of reading each data block in the data blocks, and sequences the data blocks from large to small according to the number of times; the statistical prediction module is used for counting the data blocks which need to be read by the storage client and storing the data blocks in the statistical prediction module after a statistical result is formed;

the statistical prediction module is further to: when the storage client needs to read the data blocks, starting from the first block of data blocks read by the storage client, predicting each data block read by the storage client after the data block to be read by the statistical prediction module; the statistical prediction module predicts the data block with the maximum read times behind each data block as the next data block to be read according to the statistical result in the statistical prediction module, and enables the storage client to read;

and the distributed storage system is also used for merging the data blocks which need to be read by the storage client to form complete data and then sending the complete data to the storage client.

After the technical scheme is adopted, the invention at least has the following beneficial effects: the invention can predict the next data block to be read after each data block in advance by providing a statistic prediction module to count the continuously read data blocks, thereby improving the pre-reading efficiency of the dispersed data, reducing the frequent reading frequency of the data blocks and reducing the operating pressure of the distributed storage system.

Drawings

FIG. 1 is a flow chart of steps of a pre-read/write method for distributed storage according to the present invention;

FIG. 2 is a block diagram of a distributed pre-read/write apparatus according to the present invention.

Detailed Description

It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict, and the present application is further described in detail with reference to the drawings and specific embodiments.

Example 1

As shown in fig. 1, the present invention provides a pre-read/write method for distributed storage, which includes the steps of:

s10, writing data into the distributed storage system by the storage client, equally dividing the data blocks into a plurality of data blocks by the distributed storage system, and then dispersedly storing the data blocks in different storage devices;

s11, the storage client continuously reads the data blocks of the distributed storage system;

s12, the statistics prediction module counts the single data block, counts the data blocks which need to be read, calculates the number of times of reading each data block in the data blocks, and sorts the data blocks from large to small according to the number of times;

s13, repeating the step S12 until the data block which needs to be read by the storage client is counted and a statistical result is formed, and storing the data block in a statistical prediction module;

s14, when the next storage client needs to read the data block, starting from the first block read by the storage client, predicting the data block read by each storage client, and then predicting the data block to be read by the statistical prediction module; the statistical prediction module predicts the data block with the most reading times behind each data block as the next data block to be read according to the statistical result in the step S13, and allows the storage client to read;

for example, after the storage client reads the first block of data, the data block to be read after the first block of data is the second block of data, at this time, the statistical prediction module predicts the data block with the maximum number of times of reading after the first block of data as the second block of data that the storage client may read according to the statistical result in step S13, and places the second block of data behind the first block of data, so that the storage client reads the second block of data; of course, if the storage client finds that the second block data block is not the data block to be read, the storage client can discard the second block data block and search for the correct data block in the distributed storage system to serve as the second block data block;

when the second block data block is determined, the data block to be read after the second block data block is the third block data block, and at this time, the statistical prediction module predicts the data block with the largest number of times of reading after the second block data block as the third block data block which is possibly to be read by the storage client according to the statistical result in the step S13, and places the third block data block behind the second block data block for the storage client to read; of course, if the storage client finds that the third block data block is not the data block to be read, the storage client may discard the third block data block and find the correct data block in the distributed storage system as the third block data block;

therefore, the fourth block data, the fifth block data and the like are pushed to the last block data to be read, each block data is correctly read by the storage client, the data blocks to be read are predicted by the statistical prediction module and are read by the storage client,

and S15, the distributed storage system combines the data blocks which need to be read by the storage client to form complete data, and then sends the complete data to the storage client.

Example 2

The invention provides a pre-reading and writing device of distributed storage based on the method of embodiment 1, as shown in fig. 2, comprising a storage client, a distributed storage system and a statistical prediction module, wherein the distributed storage system comprises a plurality of storage devices;

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various equivalent changes, modifications, substitutions and alterations can be made herein without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.

Claims

1. A pre-reading and writing method for distributed storage is characterized by comprising the following steps:

s4, when the next storage client needs to read the data block, starting from the first block read by the storage client, predicting the data block read by each storage client, and then predicting the data block to be read by the statistical prediction module; the statistical prediction module predicts the data block with the largest number of times of reading after each data block as the next data block to be read according to the statistical result in step S3, and allows the storage client to read the data block.

2. The method of claim 1, further comprising step S0, wherein the storage client writes data into the distributed storage system, and the distributed storage system divides the data into a plurality of data blocks and then stores the data blocks in different storage devices in a distributed manner.

3. The pre-read-write method for distributed storage according to claim 2, further comprising step S5, after the distributed storage system combines the data blocks that the storage client needs to read to form complete data, the complete data is sent to the storage client.

4. The pre-reading and pre-writing device for distributed storage is characterized by comprising a storage client, a distributed storage system and a statistical prediction module, wherein the distributed storage system comprises a plurality of storage devices;