CN114564154A - Data reading method based on distributed storage - Google Patents

Data reading method based on distributed storage Download PDF

Info

Publication number
CN114564154A
CN114564154A CN202210194691.8A CN202210194691A CN114564154A CN 114564154 A CN114564154 A CN 114564154A CN 202210194691 A CN202210194691 A CN 202210194691A CN 114564154 A CN114564154 A CN 114564154A
Authority
CN
China
Prior art keywords
read
read operation
merging
request message
read request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210194691.8A
Other languages
Chinese (zh)
Other versions
CN114564154B (en
Inventor
刘杰
孟祥瑞
罗浩
安祥文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202210194691.8A priority Critical patent/CN114564154B/en
Publication of CN114564154A publication Critical patent/CN114564154A/en
Application granted granted Critical
Publication of CN114564154B publication Critical patent/CN114564154B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a data reading method based on distributed storage, which comprises the following steps: receiving a plurality of read operations with read request messages; executing a merging flow among a plurality of read operations, namely executing the merging flow on a read request message of the read operation, so that the size of the read request message after the merging flow is executed is integral multiple of the basic storage unit; and executing the read operation after the merging process to enter a read process, and performing data read operation. The data reading method based on distributed storage executes the merging process of the received read operation with the read request message, namely, executes the merging operation of the read request message, so that the size of most of the read request message after executing the merging process is integral multiple of the basic storage unit, thereby being in a stripe alignment state, and effectively improving the data reading efficiency and the reading performance in the stripe alignment state.

Description

Data reading method based on distributed storage
Technical Field
The invention relates to the technical field of distributed storage, in particular to a data reading method based on distributed storage.
Background
With the continuous development of information technology, data is gradually valued as a precious resource, and how to quickly process data resources and obtain expected results becomes one of the key problems of resource-to-asset transition. Data are generated by various activities of people in work and life, useful information can be obtained by collecting the data and analyzing and processing the data, and the conversion from resources to assets is realized, so that the rapid development of big data and high-performance calculation is catalyzed. Data storage, one of the core elements of data resources, has also been in the period of rapid development. The traditional network storage system adopts a centralized storage server to store all data, the storage server becomes the bottleneck of the system performance, is also the focus of reliability and safety, and cannot meet the requirement of large-scale storage application. The distributed network storage system adopts an expandable system structure, not only improves the reliability, the availability and the access efficiency of the system, but also is easy to expand, thereby being accepted and accepted by more and more enterprise units.
in a scene of iSCSI (Internet Small Computer System Interface) butt-joint distributed storage, mounting stored volumes on a computing node, and reading and writing the volumes through a virtual machine; under a large-scale deployment and use environment, the computing nodes and the virtual machine nodes are elastically managed according to actual use conditions, and batch and automatic operation is particularly important at the moment. Under a default condition, a 1M read request initiated by a virtual machine to a data disk is split into two IO models of [504k,504,16k ] and [512k,512k ] and issued to distributed storage, and the distributed storage has low efficiency in processing the two non-stripe aligned IO models, thereby affecting the read performance.
Disclosure of Invention
In order to solve the technical problem, the invention provides a data reading method based on distributed storage, which can improve the data reading efficiency in the distributed storage.
In order to achieve the above object, the present application proposes a first technical solution:
a data reading method based on distributed storage comprises the following steps:
receiving a plurality of read operations with read request messages;
executing a merging flow among a plurality of read operations, namely executing the merging flow on a read request message of the read operation, so that the size of the read request message after the merging flow is executed is integral multiple of the basic storage unit;
and executing the read operation after the merging process to enter a read process, and performing data read operation.
In an embodiment of the present invention, before the performing the merge process between the multiple read operations, the method further includes:
the received read operations are added to the first queue and periodically transferred from the first queue to the second queue.
In an embodiment of the present invention, the executing the merging process among the multiple read operations specifically includes:
taking the read operation in the first queue as a source read operation and the read operation in the second queue as a target read operation, and executing a merging process, wherein the first queue can continuously add new read operations;
in the second queue, the read operation which does not execute the merging process is used as a source read operation, the read operation which executes the merging process is used as a target read operation, and the merging process is executed;
and executing a merging process between the rest read operations in the second queue.
In an embodiment of the present invention, the merging process specifically includes:
judging whether the source read operation and the target read operation meet a merging condition;
if the merging condition is met, inserting the read request message of the source read operation into the read request message of the target read operation;
and if the merging condition is not met, feeding the source read operation which does not meet the merging condition into a second queue.
In an embodiment of the present invention, after the read request message of the source read operation is inserted into the read request message of the destination read operation, the method further includes:
and adding a merging mark to both the source read operation and the destination read operation for executing the merging of the read request messages.
In an embodiment of the present invention, after the read request message of the source read operation is inserted into the read request message of the destination read operation, the method further includes:
and recording the merged message record of the source read operation and the destination read operation.
In an embodiment of the present invention, after the performing the merge procedure among the plurality of read operations, the method further includes:
judging whether the size of the read request message can be divided by the basic storage unit or reaches a waiting time threshold value;
if the size of the read request message can be divided by the basic storage unit or reaches the waiting time threshold, continuing to execute the read process;
if the size of the read request message is not evenly divisible by the atomic memory unit and does not reach the latency threshold, then read operations that are not evenly divisible by the atomic memory unit or do not reach the latency threshold are added to the second queue.
In an embodiment of the present invention, the read process specifically includes:
judging whether the read operation has a merging mark, if not, returning a reply message according to the original flow, and ending the read flow; if a merge flag is present, the merge message record is traversed.
In an embodiment of the present invention, after the traversing and merging the message records, the traversing and merging further includes:
splitting the read data and the combined read request message according to the combined message record of the read request message, forming a reply message by the split data and the corresponding read request message, returning the reply message, and ending the read process.
In order to achieve the above object, the present application proposes a second technical solution:
a distributed storage based data reading system, the system comprising:
the object storage module receives the read operation from the client module;
the operation merging module executes a merging process of the read operation;
the integer division judging module is used for judging whether the read request message of the read operation can be integer divided by the basic storage unit;
and the data reading module is used for reading the data in the object storage module.
In order to achieve the above object, the present application proposes a third technical solution:
a computer-readable storage medium characterized by: the computer-readable storage medium stores a program that, when executed by a processor, causes the processor to perform the steps of a distributed storage based data reading method.
Compared with the prior art, the technical scheme of the invention has the following advantages:
the data reading method based on distributed storage executes the merging process of the received read operation with the read request message, namely, executes the merging operation of the read request message, so that the size of most of the read request message after executing the merging process is integral multiple of the basic storage unit, thereby being in a state of stripe alignment, and effectively improving the data reading efficiency and the reading performance in the state of stripe alignment.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of a first method of a distributed storage based data reading method of the present invention;
FIG. 2 is a merged flow chart of the distributed storage based data reading method of the present invention;
FIG. 3 is a flow chart of a second method of the distributed storage based data reading method of the present invention;
fig. 4 is a system configuration diagram of the distributed storage based data reading system of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and the steps are interchanged to achieve the same or similar effects, and all the embodiments are within the scope of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The first embodiment is as follows:
referring to fig. 1, fig. 1 is a flowchart illustrating a first method of a distributed storage-based data reading method according to the present invention.
The method of the embodiment comprises the following steps:
receiving a read operation with a read request message;
the read operation carries read request information, and corresponding data is read through the read request information, so that the read operation with the read request information is received first, and the read request information carried by the read operation is read.
Executing a merging process among a plurality of read operations, namely executing the merging process on the read request information of the read operations, so that the size of a read request message after the merging process is executed is integral multiple of the basic storage unit;
in the prior art, the distributed storage is generally in a non-stripe alignment situation, in which the efficiency of data reading is low, and the reading performance is greatly influenced. The invention executes a merging process for a plurality of read operations, each read operation corresponds to one piece of read request information, and corresponding data is read according to the read request information. The merging of the read operation essentially refers to the merging of the read request messages, the merging process of executing the read operation refers to the merging process of executing the read request messages, and the size of most of the read request messages after executing the merging process is integral multiple of the basic storage unit by executing the merging process on the read request messages, so that the read request messages are in a stripe alignment state, and the data reading efficiency and the reading performance are effectively improved in the stripe alignment state. Wherein the value of the basic memory cell is 1 million (1M).
And executing the read operation after the merging process to enter a read process, and performing data read operation.
The merging process is executed on the read request message, so that the size of most of the read request message after the merging process is executed is integral multiple of the basic storage unit, the read request message is in the condition of stripe alignment, then the read operation after the merging process is executed enters the read process, and the corresponding data in the read request message is read, so that the data read operation is completed.
In one embodiment, before the performing the merge process among the plurality of read operations, the method further includes:
the received read operations are added to the first queue and periodically transferred from the first queue to the second queue.
In the operation of executing the merge process, two storage queues are generally set, and one storage queue is used for storing a new read operation, that is, a first queue; the other storage queue is used to store the merged read operation, i.e., the second queue. When the merging process is executed at the beginning, the first queue and the second queue are both empty queues, and no read operation exists. After receiving the read operation, the received read operation is added to the first queue, and the read operation in the first queue is periodically transferred to the second queue, so that the read operation in the first queue and the read operation in the second queue are performed a merging process.
In one embodiment, after adding the received read operation to the first queue, the method further includes:
the enqueue time of the read operation is recorded.
After each read operation is added into the first queue, the enqueue time of each read operation is recorded, so that whether the read operation reaches the waiting time threshold value or not is judged subsequently, the read operation reaching the time threshold value is executed in a read process, or the read operation not reaching the time threshold value is sent into the second queue.
In one embodiment, the executing the merging process among the multiple read operations specifically includes:
taking the read operation in the first queue as a source read operation and the read operation in the second queue as a target read operation, and executing a merging process, wherein the first queue can continuously add new read operations;
after the read operation is transferred, the read operation in the first queue is traversed, then the read operation in the first queue is used as a source read operation, the read operation in the second queue is used as a destination read operation, and the source read operation and the destination read operation are sequentially executed to form a merging process, that is, a read request message of the source read operation and a read request message of the destination read operation are executed to form a merging process. For example, the a read operation and the B read operation perform a merge process, that is, the read request message of the a read operation and the read request message of the B read operation perform a merge process, that is, the a read request message is inserted into the read request message of the B read operation, the a read operation in the whole process is called a source read operation, and the B read operation is called a destination read operation. The size of the read request message is integral multiple of the basic storage unit through the merging process, so that the read request message of the integral multiple of the basic storage unit is in a state of stripe alignment, and the data reading efficiency is effectively improved. Wherein, a read operation corresponds to a read request message, and the merge procedure for executing the read operation is substantially the merge procedure for executing the read request message. In the process of executing the merge process, new read operations are continuously added to the first queue, and each time the first queue is traversed, the new read operations are sent to the merge process of the read operations.
In the second queue, the read operation which does not execute the merging process is used as a source read operation, the read operation which executes the merging process is used as a target read operation, and the merging process is executed;
when the read request message of the first queue and the read request message of the second queue are subjected to the merging process, the read request message of the first queue is actually inserted into the read request message of the second queue. Therefore, after the read operation traversal of the first queue is completed, most of the read request information in the second queue is merged, and at this time, the read operation in the first queue, which cannot be merged with the second queue, is transferred to the second queue, and there is a possibility that the read operations in the second queue can be merged, so that the read operation in the second queue is traversed, that is, the read operation in the second queue, which has performed the merge process, is taken as a destination read operation, and the read operation in which has not performed the merge process is taken as a source read operation, so that the read operation in the second queue, which can be merged, is performed again in the merge process, so that most of the merged read request information can be evenly divided by the basic storage unit.
And executing a merging process among the rest read operations in the second queue.
After the read operations in the first queue and the read operations in the second queue are sequentially merged, read operations which cannot meet merging conditions exist, the read operations are sent to the second queue, and the merging process of the second queue is executed; after the merge process of the second queue is executed, there may still be a read operation that cannot satisfy the merge condition, or the size of the read request message of the read operation that has already executed the merge process in the second queue cannot be divided by the basic storage unit, and the merge process may still be executed again between the read operations that have already executed the merge process in the second queue, so that the size of the read request message can be divided by the basic storage unit. Through the merging flows, most of the read operations capable of executing the merging flows are merged, and most of the read request messages can be divided by the basic storage unit, so that the data reading efficiency is greatly improved. However, the existence of read operations that cannot be executed in the merging process all the time is not excluded, and the influence of the read operations that cannot be executed in the merging process on the efficiency of the whole data reading process is not large and can be ignored.
In one embodiment, the merging process specifically includes:
judging whether the source read operation and the target read operation meet a merging condition;
if the merging condition is met, inserting the read request message of the source read operation into the read request message of the target read operation;
and if the merging condition is not met, feeding the source read operation which does not meet the merging condition into a second queue.
Not all read operations can execute the merge process, and only read operations that satisfy the merge condition can execute the merge process. The merge condition is that the two merged read request messages are consecutive. The size of the read request message includes an index value and a length, and the read request message succession includes a left succession and a right succession. Further, for example, A, B, C three read request messages, the index value of the a read request message plus the length of the a read request message is equal to the index value of the B read request message, which is called that the B read request message is left-consecutive to the left a read request message; the index value of the B read request message plus the length of the B read request message is equal to the index value of the C read request message, which is called that the B read request message is right-consecutive to the right C read request message. Therefore, the merge process of two read request messages can be performed only if consecutive merge conditions are satisfied. If the merging condition is met, executing merging operation by the source reading operation and the target reading operation, namely inserting the reading request message of the source reading operation into the reading request message of the target reading operation; and if the merging condition is not met, feeding the source read operation which does not meet the merging condition into the second queue to execute the merging operation of the read operation of the second queue.
In one embodiment, after the read request message of the source read operation is inserted into the read request message of the destination read operation, the method further includes:
and adding a merging mark to both the source read operation and the destination read operation for executing the merging of the read request messages.
In the merge process and the read process of the read operation, the judgment of whether the read operation executes the merge process is involved, and the corresponding operation is executed by whether the merge process is executed, so that a flag is required to judge whether the merge process is executed, that is, the merge flag. When the read request information is merged, merging marks are added to both the source read operation and the destination read operation, so that whether the read operations are merged or not can be judged by the merging marks subsequently.
In one embodiment, after the read request message of the source read operation is inserted into the read request message of the destination read operation, the method further includes:
and recording the merged message record of the source read operation and the destination read operation.
When the read request information is merged, the merged message records of the source read operation and the destination read operation are recorded. Although the read request messages are merged to improve the efficiency of the whole data reading process, when reading the read data and returning a reply message in the subsequent process, the original read request message and the corresponding read data are packed and returned, wherein the original read request message refers to the read request message when the merging process is not executed. Therefore, after recording the merged message records of the source read operation and the destination read operation, the original read request message can be obtained according to the merged message records, and the read data is split according to the original read request message, so that the read data corresponds to the original read request message one by one, and the read data and the original read request message corresponding to the read data are packaged and returned to the reply message.
In one embodiment, after the performing the merge procedure among the plurality of read operations, the method further includes:
judging whether the size of the read request message can be divided by the basic storage unit or reaches a waiting time threshold value;
after the merge process is performed, there are two results, one is that the merged read request message can be divided by the basic storage unit, the other is that the merged read request message cannot be divided by the basic storage unit without satisfying the merge condition, and the merged read request message cannot be divided by the basic storage unit. The read operation that can be divided by the basic memory cell is sent to the read flow for data reading, and the read operation that cannot be divided by the basic memory cell cannot always stay in the second queue, so a waiting time threshold is set to prevent the read operation that cannot be divided by the basic memory cell from always staying in the second queue and affecting the reading efficiency, and the waiting time threshold is generally set to be 70ms to 80ms, preferably 75ms, and the calculation is started with the enqueue time.
If the size of the read request message can be divided by the basic storage unit or reaches the waiting time threshold, continuing to execute the read process;
after the merge process is performed, if the size of the read request message is divisible by the basic memory unit, the read operations divisible by the basic memory unit are directly entered into the read process to perform the data read operation. In addition, after the merging process is executed, there are read operations which cannot always satisfy the merging condition and do not execute the merging process, the effect of the read operations which cannot execute the merging process on the efficiency of the whole data reading process is not great and can be ignored, and the read operations which cannot execute the merging process cannot always stay in the second queue, so a waiting time threshold is set, and after the read operations enter the first queue to record the enqueue time and reach the waiting time threshold, the read operations which reach the waiting time threshold are sent to the read process for data reading no matter whether the read operations can not be completely divided by the basic storage unit.
If the size of the read request message is not evenly divisible by the base unit of storage and the latency threshold is not reached, then read operations that are not evenly divisible by the base unit or that do not reach the latency threshold are added to the second queue.
After the merging process is executed, because new read operations are continuously added to the first queue, new read operations are continuously added to the corresponding second queue, and there is a possibility that the read operations which cannot be evenly divided by the basic storage unit and do not reach the waiting time threshold value can be merged with the new operations so that the merged read request message is evenly divided by the basic storage unit, so that the read operations which cannot be evenly divided by the basic storage unit and are within the waiting time threshold value are sent to the second queue to continuously execute the merging process.
In one embodiment, the read process specifically includes:
judging whether the read operation has a merging mark, if not, returning a reply message according to the original flow, and ending the read flow; if a merge flag is present, the merge message record is traversed.
The read process is to execute a corresponding data read process according to the read request information, and since the read request information is merged after the merge process is executed and the read data is to be packaged with the original read request information and returned, whether the read operation has a merge flag or not is judged at first when the read process is executed, and the return information mode is distinguished according to the merge flag, that is, the read operation which does not execute the merge process directly returns the read data and the read request information according to the original process; and executing the reading operation of the merging process to split the read data and the merged read request message according to the merging information record, packaging the split data and the corresponding original read request message to form a reply message, and then returning the reply message. Therefore, when the reading process is executed, whether a merging mark exists in the reading operation is judged firstly, if the merging mark does not exist, a reply message is returned according to the original process, and the reading process is ended; if the merge flag exists, the read request message in the read operation is traversed to split the read data.
In one embodiment, after the traversing and merging the message records, the traversing and merging further includes:
splitting the read data and the merged read request message according to the merged message record of the read request message, forming a reply message by the split data and the corresponding read request message, returning the reply message, and ending the read process.
If the merge flag exists, traversing the merge message record, splitting the read data and the merged read request message according to the merge message record, namely, after traversing the merge message record, splitting the read corresponding data and the merged read request message according to the merge message record, so that the split original read request message and the split data are in one-to-one correspondence, packaging each read request message and the corresponding read data, forming a reply message by the packaged read request message and the corresponding read data, and returning the reply message.
Example two:
referring to fig. 2, fig. 2 is a merged flowchart of the data reading method based on distributed storage according to the present invention.
S100, judging whether the source read operation and the target read operation meet a merging condition;
whether continuous merging conditions are met between the read request messages of the two read operations of the merging flow is judged. If the continuous merging condition is satisfied, executing step S200; if the continuous merging conditions are not met, sending the source read operation which does not meet the merging conditions to a second queue, and then ending the merging process;
s200, inserting the read request message of the source read operation into the read request message of the target read operation;
and if the continuous merging condition is met, inserting the read request message of the source read operation into the read request message of the destination read operation.
S300, adding a merging mark to both source read operation and target read operation for executing read request message merging;
when the read request information is merged, merging marks are added to both the source read operation and the destination read operation, so that whether the read operations are merged or not can be judged by the merging marks subsequently.
And S400, recording the combined message record of the source read operation and the destination read operation.
After recording the merged message records of the source read operation and the destination read operation, the original read request message can be obtained according to the merged message records, and the read data is split according to the original read request message, so that the read data corresponds to the original read request message one by one, and the read data and the original read request message corresponding to the read data are packaged and returned to the reply message.
Example three:
referring to fig. 3, fig. 3 is a flowchart illustrating a second method of the data reading method based on distributed storage according to the present invention.
The data reading method based on distributed storage of the embodiment comprises the following steps:
s10, receiving the read operation with the read request message;
the read operation is carried with read request information, and the corresponding data is read by the read request information, so that the read operation is received first, and the read request information carried by the read operation is read.
S20, adding the received read operation into the first queue, and periodically transferring the read operation in the first queue to the second queue;
when the merging process is executed at the beginning, the first queue and the second queue are both empty, no read operation exists, after the read operation is received, the read operation is added into the first queue, and the read operation of the first queue is transferred to the second queue periodically, so that the read operation in the first queue and the read operation in the second queue can execute the merging process.
S30, taking the read operation in the first queue as the source read operation and the read operation in the second queue as the target read operation, executing the merging process;
after the read operation is transferred, the read operation in the first queue is traversed, then the read operation in the first queue is used as a source read operation, the read operation in the second queue is used as a destination read operation, and the source read operation and the destination read operation are sequentially executed to form a merging process, that is, a read request message of the source read operation and a read request message of the destination read operation are executed to form a merging process.
And S40, taking the read operation which is not executed with the merging flow as the source read operation and the read operation which is executed with the merging flow as the destination read operation in the second queue, and executing the merging flow.
And traversing the read operation in the second queue, namely taking the read operation in which the merging flow is executed in the second queue as a target read operation, and taking the read operation in which the merging flow is not executed as a source read operation, so as to execute the merging flow again by the read operation which can be merged in the second queue, and thus most of the merged read request information can be divided by the basic storage unit.
S50, executing merging flow among the rest read operations in the second queue;
and performing a read operation merging process among the rest read operations in the second queue so as to perform the merging process on all the read operations capable of performing the merging process. Through the merging flows, most of the read operations capable of executing the merging flows are merged, and most of the read request messages can be divided by the basic storage unit, so that the data reading efficiency is greatly improved.
S60, judging whether the size of the read request message can be divided by the basic storage unit or reaches the waiting time threshold value, if the size of the read request message can be divided by the basic storage unit or reaches the waiting time threshold value, continuing to execute the read process; if the size of the read request message is not evenly divisible by the base unit of storage and the latency threshold is not reached, then read operations that are not evenly divisible by the base unit or that do not reach the latency threshold are added to the second queue.
After the merge process is executed, determining whether the size of the read request message can be divided by the basic storage unit or reaches the waiting time threshold, if the size of the read request message can be divided by the basic storage unit, the read operation divided by the basic storage unit directly enters the read process of step S70; if the read operation corresponding to the read request message enters the first queue to record the enqueue time and reaches the waiting time threshold, the read operation reaching the waiting time threshold, no matter the read operation can not be divided by the basic storage unit, is sent to the read flow of step S70 for data reading, because the read operation that cannot execute the merge flow has little influence on the efficiency of the whole data read flow and can be ignored, and the read operation that cannot execute the merge flow cannot always stay in the second queue, otherwise, the efficiency of the whole data read flow is influenced. After the merge process is executed, since new read operations are continuously added to the first queue, new read operations are continuously added to the corresponding second queue, and there is a possibility that a read operation that cannot be divisionally divided by the basic storage unit and does not reach the latency threshold may be merged with a new operation so that the merged read request message is divisionally divided by the basic storage unit, so if the size of the read request message cannot be divisionally divided by the basic storage unit and does not reach the latency threshold, a read operation that is within the latency threshold and cannot be divisionally divided by the basic storage unit is added to the second queue, and the process returns to step S30.
S70, judging whether the read operation has a merging sign, if not, sending a reply message according to the original flow, and ending the read flow; if the merge flag is present, the read request message in the read operation is traversed.
The read process is to execute a corresponding data read process according to the read request information, and since the read request information is merged after the merge process is executed and the read data is to be packaged with the original read request information and returned, whether the read operation has a merge flag or not is judged at first when the read process is executed, and the return information mode is distinguished according to the merge flag, that is, the read operation which does not execute the merge process directly returns the read data and the read request information according to the original process; and executing the reading operation of the merging process to split the read data and the merged read request message according to the merging information record, packaging the split data and the corresponding original read request message, forming a reply message by the read data and the corresponding read request message, and returning the reply message. Therefore, when the reading process is executed, whether a merging mark exists in the reading operation is judged firstly, if the merging mark does not exist, a reply message is returned according to the original process, and the reading process is ended; if the merge flag is present, the flow proceeds to step S80.
S80, if a merge flag exists, traversing the merge message record, and splitting the read data and the merged read request message according to the merge message record, that is, after traversing the merge message record, splitting the read corresponding data and the merged read request message according to the merge message record, so that the split original read request message and the split data are in one-to-one correspondence, each read request message and its corresponding read data are packaged to form a reply message, and the reply message is returned, and the read process is ended.
Example four:
referring to fig. 4, fig. 4 is a system structural diagram of a data reading system based on distributed storage according to the present invention.
The data reading system based on distributed storage of this embodiment includes:
the object storage module receives the read operation from the client module;
the client sends the read operation with the read request message to the object storage module, and the object storage module is used for receiving the read operation with the read request message from the client so as to execute the merging process of the read operation with the read request message.
The operation merging module executes a merging process of the read operation;
the merging process for executing the read operation refers to executing the merging process of the read request message, and the merging process is executed on the read request message through the operation merging module, so that the size of most of the read request messages after executing the merging process is integral multiple of the basic storage unit, and thus the read request messages are in a stripe alignment state, and the data reading efficiency and the reading performance are effectively improved in the stripe alignment state.
The integer division judging module is used for judging whether the read request message of the read operation can be integer divided by the basic storage unit;
after the merging process is executed, the integer division judging module judges that if the size of the read request message can be evenly divided by the basic storage unit, the read operations which can be evenly divided by the basic storage unit directly enter the read process to execute the data reading operation.
And the data reading module is used for reading the data in the object storage module.
And the read operation after the merging process is executed enters a read process, and the data reading module reads corresponding data according to the read request information to finish the data reading operation.
In one embodiment, the data reading system further includes:
the operation transfer module is used for transferring the read operation in the queue;
after receiving the read operation, the operation transfer module adds the received read operation to the first queue and periodically transfers the read operation in the first queue to the second queue, so as to execute a merging process of the read operation in the first queue and the read operation in the second queue.
The time recording module is used for recording the enqueue time of the read operation;
after each read operation is added into the first queue, the time recording module records the enqueue time of the read operation, so that whether the read operation reaches the waiting time threshold value or not is judged subsequently, the read operation reaching the time threshold value is executed in a read process, or the read operation not reaching the time threshold value is sent into the second queue.
The condition judgment module is used for judging whether the merging condition of the read operation is met or not;
before executing the merging process, firstly, judging whether two read operations meet a merging condition through a condition judgment module, if so, executing the merging operation by the source read operation and the target read operation, namely, inserting a read request message of the source read operation into a read request message of the target read operation; and if the merging condition is not met, feeding the source read operation which does not meet the merging condition into the second queue to execute the merging operation of the read operation of the second queue.
And the message recording module is used for recording the combined record of the reading operation.
The message recording module records the combined message record of the source read operation and the target read operation, then the original read request message can be obtained according to the combined message record, and the read data is split according to the original read request message, so that the read data is in one-to-one correspondence with the original read request message, and the read data and the original read request message corresponding to the read data are packaged conveniently and returned to the reply message.
In one embodiment, the data reading system further includes:
the time judging module is used for judging whether the reading operation reaches a waiting time threshold value;
after the merging process is executed, the read operations which cannot always satisfy the merging condition but do not execute the merging process exist, the influence of the read operations which cannot execute the merging process on the efficiency of the whole data reading process is not large and can be ignored, and the read operations which cannot execute the merging process cannot always stay in the second queue, so a waiting time threshold is set, after the judgment of the time judgment module, the read operations reach the waiting time threshold after entering the first queue to record the queuing time and reaching the waiting time threshold, and the read operations are sent to the reading process for data reading no matter whether the read operations cannot be evenly divided by the basic storage unit.
The merging marking module is used for adding merging marks to the read operation participating in the merging process;
when the read request information is merged, the merging marking module adds merging marks to both the source read operation and the destination read operation, so that whether the read operations are merged or not can be judged subsequently through the merging marks.
The merging judgment module is used for judging whether the reading operation is merged;
when the read process is executed, firstly, whether a merging mark exists in the read operation is judged through a merging judgment module, if the merging mark does not exist, a reply message is returned according to the original process, and the read process is ended; if the merge flag exists, the read request message in the read operation is traversed to split the read data.
The message reply module is used for returning a reply message;
splitting the read data and the combined read request message according to the combined message record of the read request message, and forming a reply message by the split data and the corresponding read request message, wherein the message reply module is responsible for returning the reply message, and the reading process is finished.
And the client module is used for sending the read operation to the object storage module.
The client initiates a read operation and sends the read operation with the read request message to the object storage module.
Example five:
the present embodiment provides a computer-readable storage medium storing a program that, when executed by a processor, causes the processor to perform the steps of the distributed storage based data reading method in the above-described embodiments.
As will be appreciated by one of skill in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It is to be noted that the foregoing description is only exemplary of the invention and that the principles of the technology may be employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in some detail by the above embodiments, the invention is not limited to the above embodiments, but may include other equivalent embodiments without departing from the spirit of the invention, and the scope of the invention is determined by the scope of the appended claims.

Claims (11)

1. A data reading method based on distributed storage is characterized by comprising the following steps:
receiving a plurality of read operations with read request messages;
executing a merging flow among a plurality of read operations, namely executing the merging flow on a read request message of the read operation, so that the size of the read request message after the merging flow is executed is integral multiple of the basic storage unit;
and executing the read operation after the merging process to enter a read process, and performing data read operation.
2. The distributed-storage-based data reading method according to claim 1, wherein before the performing the merge process among the plurality of read operations, the method further comprises:
the received read operations are added to the first queue and periodically transferred from the first queue to the second queue.
3. The data reading method based on distributed storage according to claim 1, wherein the executing of the merging process among the plurality of read operations specifically includes:
taking the read operation in the first queue as a source read operation and the read operation in the second queue as a target read operation, and executing a merging process, wherein the first queue can continuously add new read operations;
in the second queue, the read operation which does not execute the merging process is used as a source read operation, the read operation which executes the merging process is used as a target read operation, and the merging process is executed;
and executing a merging process between the rest read operations in the second queue.
4. The data reading method based on distributed storage according to claim 1, wherein the merging process specifically includes:
judging whether the source read operation and the target read operation meet a merging condition;
if the merging condition is met, inserting the read request message of the source read operation into the read request message of the target read operation;
and if the merging condition is not met, feeding the source read operation which does not meet the merging condition into a second queue.
5. The data reading method based on distributed storage according to claim 4, wherein inserting the read request message of the source read operation into the read request message of the destination read operation further comprises:
and adding a merging mark to both the source read operation and the destination read operation for executing the merging of the read request messages.
6. The data reading method based on distributed storage according to claim 4, wherein after the read request message of the source read operation is inserted into the read request message of the destination read operation, the method further comprises:
and recording the merged message record of the source read operation and the destination read operation.
7. The data reading method based on distributed storage according to claim 1, wherein after the performing the merge process among the plurality of read operations, further comprising:
judging whether the size of the read request message can be divided by the basic storage unit or reaches a waiting time threshold value;
if the size of the read request message can be divided by the basic storage unit or reaches the waiting time threshold, continuing to execute the read process;
if the size of the read request message is not evenly divisible by the base unit of storage and the latency threshold is not reached, then read operations that are not evenly divisible by the base unit or that do not reach the latency threshold are added to the second queue.
8. The data reading method based on distributed storage according to claim 1, wherein the reading process specifically includes:
judging whether the read operation has a merging mark, if not, returning a reply message according to the original flow, and ending the read flow; if a merge flag is present, the merge message record is traversed.
9. The distributed storage based data reading method according to claim 8, wherein after traversing and merging the message records, the method further comprises:
and recording the split read data and the combined read request message according to the combined message of the read request message, forming a reply message by the split data and the corresponding read request message, returning the reply message, and ending the reading process.
10. A data reading system based on distributed storage, characterized by: the system comprises:
the object storage module receives the read operation from the client module;
the operation merging module executes a merging process of the read operation;
the integer division judging module is used for judging whether the read request message of the read operation can be integer divided by the basic storage unit;
and the data reading module is used for reading the data in the object storage module.
11. A computer-readable storage medium characterized by: the computer readable storage medium stores a program which, when executed by a processor, causes the processor to perform the steps of the method of any one of claims 1 to 9.
CN202210194691.8A 2022-03-01 2022-03-01 Data reading method based on distributed storage Active CN114564154B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210194691.8A CN114564154B (en) 2022-03-01 2022-03-01 Data reading method based on distributed storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210194691.8A CN114564154B (en) 2022-03-01 2022-03-01 Data reading method based on distributed storage

Publications (2)

Publication Number Publication Date
CN114564154A true CN114564154A (en) 2022-05-31
CN114564154B CN114564154B (en) 2023-08-18

Family

ID=81715077

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210194691.8A Active CN114564154B (en) 2022-03-01 2022-03-01 Data reading method based on distributed storage

Country Status (1)

Country Link
CN (1) CN114564154B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103425439A (en) * 2013-07-16 2013-12-04 记忆科技(深圳)有限公司 Method for reading and writing solid-state disk and solid-state disk thereof
KR20170095524A (en) * 2016-02-15 2017-08-23 에스케이하이닉스 주식회사 Memory system and operation method thereof
CN109976679A (en) * 2019-04-11 2019-07-05 苏州浪潮智能科技有限公司 A kind of distributed type assemblies volume pre-head method, system, equipment and computer media
CN111881096A (en) * 2020-07-24 2020-11-03 北京浪潮数据技术有限公司 File reading method, device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103425439A (en) * 2013-07-16 2013-12-04 记忆科技(深圳)有限公司 Method for reading and writing solid-state disk and solid-state disk thereof
KR20170095524A (en) * 2016-02-15 2017-08-23 에스케이하이닉스 주식회사 Memory system and operation method thereof
CN109976679A (en) * 2019-04-11 2019-07-05 苏州浪潮智能科技有限公司 A kind of distributed type assemblies volume pre-head method, system, equipment and computer media
CN111881096A (en) * 2020-07-24 2020-11-03 北京浪潮数据技术有限公司 File reading method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN114564154B (en) 2023-08-18

Similar Documents

Publication Publication Date Title
US10860457B1 (en) Globally ordered event stream logging
CN109189608B (en) A kind of method guaranteeing duplication transaction consistency and corresponding reproducing unit
CN108572970B (en) Structured data processing method and distributed processing system
CN108228102B (en) Method and device for data migration between nodes, computing equipment and computer storage medium
WO2021052169A1 (en) Equalization processing method and device for distributed data, computing terminal and storage medium
US8190857B2 (en) Deleting a shared resource node after reserving its identifier in delete pending queue until deletion condition is met to allow continued access for currently accessing processor
US9553951B1 (en) Semaphores in distributed computing environments
CN104346373A (en) Partition log queue synchronization management method and device
CN111797172B (en) Data migration method, device, equipment, distributed system and storage medium
CN111553652B (en) Service processing method and device
CN111708738A (en) Method and system for realizing data inter-access between hdfs of hadoop file system and s3 of object storage
CN107992358B (en) Asynchronous IO execution method and system suitable for extra-core image processing system
CN115470235A (en) Data processing method, device and equipment
CN114564154B (en) Data reading method based on distributed storage
CN112035428A (en) Distributed storage system, method, apparatus, electronic device, and storage medium
CN112363980A (en) Data processing method and device for distributed system
CN116701452A (en) Data processing method, related device, storage medium and program product
CN104933066A (en) Data processing method and system
CN114785662A (en) Storage management method, device, equipment and machine readable storage medium
CN111399753B (en) Method and device for writing pictures
CN112068948B (en) Data hashing method, readable storage medium and electronic device
CN110677497B (en) Network medium distribution method and device
KR20160145250A (en) Shuffle Embedded Distributed Storage System Supporting Virtual Merge and Method Thereof
CN109791541B (en) Log serial number generation method and device and readable storage medium
CN113641604A (en) Data transmission method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant