CN111399784A - Pre-reading and pre-writing method and device for distributed storage - Google Patents

Pre-reading and pre-writing method and device for distributed storage Download PDF

Info

Publication number
CN111399784A
CN111399784A CN202010495460.1A CN202010495460A CN111399784A CN 111399784 A CN111399784 A CN 111399784A CN 202010495460 A CN202010495460 A CN 202010495460A CN 111399784 A CN111399784 A CN 111399784A
Authority
CN
China
Prior art keywords
data
read
data blocks
block
data block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010495460.1A
Other languages
Chinese (zh)
Other versions
CN111399784B (en
Inventor
麦剑
史伟
闵宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Eflycloud Computing Co Ltd
Original Assignee
Guangdong Eflycloud Computing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Eflycloud Computing Co Ltd filed Critical Guangdong Eflycloud Computing Co Ltd
Priority to CN202010495460.1A priority Critical patent/CN111399784B/en
Publication of CN111399784A publication Critical patent/CN111399784A/en
Application granted granted Critical
Publication of CN111399784B publication Critical patent/CN111399784B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a pre-reading and writing method and a device for distributed storage, wherein the pre-reading and writing method comprises the following steps: the storage client side continuously reads the data blocks; the statistical prediction module is used for counting the single data blocks, counting the data blocks which need to be read later, calculating the reading times of the data blocks and sequencing the data blocks; repeatedly calculating the data blocks, and completing the statistics of the data blocks to be read to form a statistical result; when the next storage client needs to read the data blocks, predicting the data blocks read by the storage client for each data block, wherein the data blocks to be read are predicted by the statistical prediction module; and predicting the data block with the maximum number of times of reading after each data block as a next data block to be read according to the statistical result. The invention improves the pre-reading efficiency of the scattered data and reduces the frequent reading frequency of the data block by effectively predicting the data block of the next block to be read.

Description

Pre-reading and pre-writing method and device for distributed storage
Technical Field
The present invention relates to the field of distributed storage technologies, and in particular, to a method and an apparatus for pre-reading and pre-writing distributed storage.
Background
Distributed storage is a common storage method, a piece of data content is divided into different small blocks to be stored on a plurality of storage devices, and the obvious difference from a centralized storage mode is distributed storage, and data is stored on different storage devices in a scattered manner.
Distributed storage currently divides a block of data into several parts according to a fixed size, and then stores the small data blocks dispersedly on the whole cluster device. Usually, in order to reduce the influence caused by the device failure, the small data blocks are distributed as dispersedly as possible. However, this distributed data storage method has a disadvantage that the pre-read/write function cannot be well implemented.
Currently, when data is read, data is generally read and written in advance, wherein the function of pre-reading and writing is to predict the next disk data to be read and written, and load in advance. In the traditional non-distributed storage, data is stored continuously, and data is stored one by one, and pre-reading and writing are generally performed by predicting the next adjacent block of data to perform pre-reading and writing. However, after the data uses distributed storage, the continuous data is divided into different small data blocks and scattered on different devices, and the system often cannot predict where the data to be read and written may be located next. Therefore, when data is pre-read and written, the distributed storage system can continuously access different small data blocks, and then the small data blocks are combined to form continuous data, so that the data reading frequency of the distributed storage system is greatly increased, and the pre-reading and writing efficiency is low.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a pre-reading and writing method and device for distributed storage, by recording data blocks which are continuously read and written and the reading times thereof, when the first block of data is read and written, the next block of data to be read and written and the corresponding storage position are analyzed and predicted according to historical statistical data, and the pre-reading and writing operation is performed on the next block of data, so that the reading and writing efficiency of the data blocks is improved.
In order to solve the technical problems, the invention provides the following technical scheme: a pre-reading and writing method for distributed storage comprises the following steps:
s1, the storage client continuously reads the data blocks of the distributed storage system;
s2, the statistics prediction module counts the single data block, counts the data blocks which need to be read, calculates the number of times of reading each data block in the data blocks, and sorts the data blocks from large to small according to the number of times;
s3, repeating the step S2 until the data block which needs to be read by the storage client is counted and a statistical result is formed, and storing the data block in a statistical prediction module;
s4, when the next storage client needs to read the data block, starting from the first block read by the storage client, predicting the data block read by each storage client, and then predicting the data block to be read by the statistical prediction module; the statistical prediction module predicts the data block with the largest number of times of reading after each data block according to the statistical result in step S3, predicts that the data block is the next data block to be read, and allows the storage client to read the data block.
Further, the method for pre-reading and writing of distributed storage further includes step S0, where the storage client writes data into the distributed storage system, and the distributed storage system divides the data into a plurality of data blocks, and then stores the data blocks in different storage devices in a distributed manner.
Further, the method for pre-reading and writing of distributed storage further includes step S5, after the data blocks that need to be read by the storage client are merged by the distributed storage system to form complete data, the complete data is sent to the storage client.
The invention also aims to provide a pre-reading and pre-writing device for distributed storage, which comprises a storage client, a distributed storage system and a statistical prediction module, wherein the distributed storage system comprises a plurality of storage devices;
the storage client is used for writing and reading data into and from the distributed storage system;
the distributed storage system is used for equally dividing data into a plurality of data blocks and dispersedly storing the data blocks in different storage devices;
the statistical prediction module is to: when the storage client continuously reads the data blocks of the distributed storage system, the statistical prediction module performs statistics on the single data blocks, performs statistics on the data blocks to be read later, calculates the number of times of reading each data block in the data blocks, and sequences the data blocks from large to small according to the number of times; the statistical prediction module is used for counting the data blocks which need to be read by the storage client and storing the data blocks in the statistical prediction module after a statistical result is formed;
the statistical prediction module is further to: when the storage client needs to read the data blocks, starting from the first block of data blocks read by the storage client, predicting each data block read by the storage client after the data block to be read by the statistical prediction module; the statistical prediction module predicts the data block with the maximum read times behind each data block as the next data block to be read according to the statistical result in the statistical prediction module, and enables the storage client to read;
and the distributed storage system is also used for merging the data blocks which need to be read by the storage client to form complete data and then sending the complete data to the storage client.
After the technical scheme is adopted, the invention at least has the following beneficial effects: the invention can predict the next data block to be read after each data block in advance by providing a statistic prediction module to count the continuously read data blocks, thereby improving the pre-reading efficiency of the dispersed data, reducing the frequent reading frequency of the data blocks and reducing the operating pressure of the distributed storage system.
Drawings
FIG. 1 is a flow chart of steps of a pre-read/write method for distributed storage according to the present invention;
FIG. 2 is a block diagram of a distributed pre-read/write apparatus according to the present invention.
Detailed Description
It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict, and the present application is further described in detail with reference to the drawings and specific embodiments.
Example 1
As shown in fig. 1, the present invention provides a pre-read/write method for distributed storage, which includes the steps of:
s10, writing data into the distributed storage system by the storage client, equally dividing the data blocks into a plurality of data blocks by the distributed storage system, and then dispersedly storing the data blocks in different storage devices;
s11, the storage client continuously reads the data blocks of the distributed storage system;
s12, the statistics prediction module counts the single data block, counts the data blocks which need to be read, calculates the number of times of reading each data block in the data blocks, and sorts the data blocks from large to small according to the number of times;
s13, repeating the step S12 until the data block which needs to be read by the storage client is counted and a statistical result is formed, and storing the data block in a statistical prediction module;
s14, when the next storage client needs to read the data block, starting from the first block read by the storage client, predicting the data block read by each storage client, and then predicting the data block to be read by the statistical prediction module; the statistical prediction module predicts the data block with the most reading times behind each data block as the next data block to be read according to the statistical result in the step S13, and allows the storage client to read;
for example, after the storage client reads the first block of data, the data block to be read after the first block of data is the second block of data, at this time, the statistical prediction module predicts the data block with the maximum number of times of reading after the first block of data as the second block of data that the storage client may read according to the statistical result in step S13, and places the second block of data behind the first block of data, so that the storage client reads the second block of data; of course, if the storage client finds that the second block data block is not the data block to be read, the storage client can discard the second block data block and search for the correct data block in the distributed storage system to serve as the second block data block;
when the second block data block is determined, the data block to be read after the second block data block is the third block data block, and at this time, the statistical prediction module predicts the data block with the largest number of times of reading after the second block data block as the third block data block which is possibly to be read by the storage client according to the statistical result in the step S13, and places the third block data block behind the second block data block for the storage client to read; of course, if the storage client finds that the third block data block is not the data block to be read, the storage client may discard the third block data block and find the correct data block in the distributed storage system as the third block data block;
therefore, the fourth block data, the fifth block data and the like are pushed to the last block data to be read, each block data is correctly read by the storage client, the data blocks to be read are predicted by the statistical prediction module and are read by the storage client,
and S15, the distributed storage system combines the data blocks which need to be read by the storage client to form complete data, and then sends the complete data to the storage client.
Example 2
The invention provides a pre-reading and writing device of distributed storage based on the method of embodiment 1, as shown in fig. 2, comprising a storage client, a distributed storage system and a statistical prediction module, wherein the distributed storage system comprises a plurality of storage devices;
the storage client is used for writing and reading data into and from the distributed storage system;
the distributed storage system is used for equally dividing data into a plurality of data blocks and dispersedly storing the data blocks in different storage devices;
the statistical prediction module is to: when the storage client continuously reads the data blocks of the distributed storage system, the statistical prediction module performs statistics on the single data blocks, performs statistics on the data blocks to be read later, calculates the number of times of reading each data block in the data blocks, and sequences the data blocks from large to small according to the number of times; the statistical prediction module is used for counting the data blocks which need to be read by the storage client and storing the data blocks in the statistical prediction module after a statistical result is formed;
the statistical prediction module is further to: when the storage client needs to read the data blocks, starting from the first block of data blocks read by the storage client, predicting each data block read by the storage client after the data block to be read by the statistical prediction module; the statistical prediction module predicts the data block with the maximum read times behind each data block as the next data block to be read according to the statistical result in the statistical prediction module, and enables the storage client to read;
and the distributed storage system is also used for merging the data blocks which need to be read by the storage client to form complete data and then sending the complete data to the storage client.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various equivalent changes, modifications, substitutions and alterations can be made herein without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.

Claims (4)

1. A pre-reading and writing method for distributed storage is characterized by comprising the following steps:
s1, the storage client continuously reads the data blocks of the distributed storage system;
s2, the statistics prediction module counts the single data block, counts the data blocks which need to be read, calculates the number of times of reading each data block in the data blocks, and sorts the data blocks from large to small according to the number of times;
s3, repeating the step S2 until the data block which needs to be read by the storage client is counted and a statistical result is formed, and storing the data block in a statistical prediction module;
s4, when the next storage client needs to read the data block, starting from the first block read by the storage client, predicting the data block read by each storage client, and then predicting the data block to be read by the statistical prediction module; the statistical prediction module predicts the data block with the largest number of times of reading after each data block as the next data block to be read according to the statistical result in step S3, and allows the storage client to read the data block.
2. The method of claim 1, further comprising step S0, wherein the storage client writes data into the distributed storage system, and the distributed storage system divides the data into a plurality of data blocks and then stores the data blocks in different storage devices in a distributed manner.
3. The pre-read-write method for distributed storage according to claim 2, further comprising step S5, after the distributed storage system combines the data blocks that the storage client needs to read to form complete data, the complete data is sent to the storage client.
4. The pre-reading and pre-writing device for distributed storage is characterized by comprising a storage client, a distributed storage system and a statistical prediction module, wherein the distributed storage system comprises a plurality of storage devices;
the storage client is used for writing and reading data into and from the distributed storage system;
the distributed storage system is used for equally dividing data into a plurality of data blocks and dispersedly storing the data blocks in different storage devices;
the statistical prediction module is to: when the storage client continuously reads the data blocks of the distributed storage system, the statistical prediction module performs statistics on the single data blocks, performs statistics on the data blocks to be read later, calculates the number of times of reading each data block in the data blocks, and sequences the data blocks from large to small according to the number of times; the statistical prediction module is used for counting the data blocks which need to be read by the storage client and storing the data blocks in the statistical prediction module after a statistical result is formed;
the statistical prediction module is further to: when the storage client needs to read the data blocks, starting from the first block of data blocks read by the storage client, predicting each data block read by the storage client after the data block to be read by the statistical prediction module; the statistical prediction module predicts the data block with the maximum read times behind each data block as the next data block to be read according to the statistical result in the statistical prediction module, and enables the storage client to read;
and the distributed storage system is also used for merging the data blocks which need to be read by the storage client to form complete data and then sending the complete data to the storage client.
CN202010495460.1A 2020-06-03 2020-06-03 Pre-reading and pre-writing method and device for distributed storage Active CN111399784B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010495460.1A CN111399784B (en) 2020-06-03 2020-06-03 Pre-reading and pre-writing method and device for distributed storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010495460.1A CN111399784B (en) 2020-06-03 2020-06-03 Pre-reading and pre-writing method and device for distributed storage

Publications (2)

Publication Number Publication Date
CN111399784A true CN111399784A (en) 2020-07-10
CN111399784B CN111399784B (en) 2020-10-16

Family

ID=71437619

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010495460.1A Active CN111399784B (en) 2020-06-03 2020-06-03 Pre-reading and pre-writing method and device for distributed storage

Country Status (1)

Country Link
CN (1) CN111399784B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1465019A (en) * 2000-10-24 2003-12-31 皇家菲利浦电子有限公司 Method and device for prefetching a referenced resource
US8307156B1 (en) * 2002-07-31 2012-11-06 Western Digital Technologies, Inc. Adaptively modifying pre-read operations within a rotating media storage device
CN103885776A (en) * 2014-03-24 2014-06-25 广州华多网络科技有限公司 Program accelerating method and device thereof
CN106777047A (en) * 2016-12-09 2017-05-31 郑州云海信息技术有限公司 A kind of metadata read method and its device for distributed system
CN106844740A (en) * 2017-02-14 2017-06-13 华南师范大学 Data pre-head method based on memory object caching system
CN107562806A (en) * 2017-08-08 2018-01-09 上海交通大学 Mix the adaptive perception accelerated method and system of memory file system
CN107943711A (en) * 2016-10-12 2018-04-20 慧荣科技股份有限公司 Data storage device and data maintenance method thereof
CN109976679A (en) * 2019-04-11 2019-07-05 苏州浪潮智能科技有限公司 A kind of distributed type assemblies volume pre-head method, system, equipment and computer media

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1465019A (en) * 2000-10-24 2003-12-31 皇家菲利浦电子有限公司 Method and device for prefetching a referenced resource
US8307156B1 (en) * 2002-07-31 2012-11-06 Western Digital Technologies, Inc. Adaptively modifying pre-read operations within a rotating media storage device
CN103885776A (en) * 2014-03-24 2014-06-25 广州华多网络科技有限公司 Program accelerating method and device thereof
CN107943711A (en) * 2016-10-12 2018-04-20 慧荣科技股份有限公司 Data storage device and data maintenance method thereof
CN106777047A (en) * 2016-12-09 2017-05-31 郑州云海信息技术有限公司 A kind of metadata read method and its device for distributed system
CN106844740A (en) * 2017-02-14 2017-06-13 华南师范大学 Data pre-head method based on memory object caching system
CN107562806A (en) * 2017-08-08 2018-01-09 上海交通大学 Mix the adaptive perception accelerated method and system of memory file system
CN109976679A (en) * 2019-04-11 2019-07-05 苏州浪潮智能科技有限公司 A kind of distributed type assemblies volume pre-head method, system, equipment and computer media

Also Published As

Publication number Publication date
CN111399784B (en) 2020-10-16

Similar Documents

Publication Publication Date Title
CN109164980B (en) Aggregation optimization processing method for time sequence data
US8214608B2 (en) Behavioral monitoring of storage access patterns
US20190005102A1 (en) Multi-representation Storage of Time Series Data
US10685306B2 (en) Advisor generating multi-representations of time series data
US20210096777A1 (en) Method for predicting lba information, and ssd
US20120323923A1 (en) Sorting Data in Limited Memory
WO2017162086A1 (en) Task scheduling method and device
US20140297606A1 (en) Method and device for processing a time sequence based on dimensionality reduction
US11880716B2 (en) Parallelized segment generation via key-based subdivision in database systems
US8606836B2 (en) Apparatus and method for frequency division and filtering
CN111061758A (en) Data storage method, device and storage medium
CN111399784B (en) Pre-reading and pre-writing method and device for distributed storage
CN111597088A (en) Data warehouse data monitoring method, warehouse system and electronic equipment
CN114185885A (en) Streaming data processing method and system based on column storage database
CN108829355B (en) Garbage recovery method and device
CN114185919A (en) Slow query warning method, electronic equipment and storage medium
US8325188B1 (en) Method and system for implementing a waveform viewer
US11392510B2 (en) Management method of cache files in storage space and recording device for storing cache files
CN112764684A (en) Hard disk performance identification method and system of storage system
CN115858172A (en) Processor instruction execution statistical method and device and processor system
CN114860160B (en) Capacity expansion resource prediction method and system for Hadoop data platform
US11775515B2 (en) Dataset optimization framework
CN112612722B (en) Variable-length data management method, device, computer equipment and storage medium
US9922109B1 (en) Adaptive column set composition
CN112732189A (en) Data storage method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant