CN113392087A - Data access method and computing device - Google Patents

Data access method and computing device Download PDF

Info

Publication number
CN113392087A
CN113392087A CN202110601072.1A CN202110601072A CN113392087A CN 113392087 A CN113392087 A CN 113392087A CN 202110601072 A CN202110601072 A CN 202110601072A CN 113392087 A CN113392087 A CN 113392087A
Authority
CN
China
Prior art keywords
version
data
index data
access
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110601072.1A
Other languages
Chinese (zh)
Inventor
尚灿芳
黄贵
王剑英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Innovation Co
Original Assignee
Alibaba Singapore Holdings Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Singapore Holdings Pte Ltd filed Critical Alibaba Singapore Holdings Pte Ltd
Priority to CN202110601072.1A priority Critical patent/CN113392087A/en
Publication of CN113392087A publication Critical patent/CN113392087A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees

Abstract

The embodiment of the application provides a data access method and computing equipment. Wherein the method comprises the following steps: under the condition that all read requests access the first version index data, detecting the version update of the index data, and executing the following stream-cutting operation; wherein the first version is the current latest version; controlling the gradual switching from the least reading requests to the access of the index data of the latest version and controlling the rest reading requests to access the index data of the historical version along with the version updating of the index data; and stopping executing the stream cutting operation under the condition of detecting that all the read requests are switched to access the index data of the latest version. The technical scheme provided by the embodiment of the application reduces the influence on the data reading efficiency.

Description

Data access method and computing device
Technical Field
The embodiment of the application relates to the technical field of data processing, in particular to a data access method and computing equipment.
Background
A storage system adopting an LSM-Tree (The Log-Structured Merge-Tree) structure writes data in an additional mode without updating original data. The write operation is firstly written into the memory, when the memory data reaches a corresponding threshold value, the memory data is frozen into a layer and then written into the persistent storage medium, and the persistent storage medium and the data in the persistent storage medium are merged. The data on the persistent storage media can also be stored hierarchically, and the data in each layer can be merged with the data in the next layer after reaching a corresponding threshold. All written data, whether in the memory or in the persistent storage medium, are stored after being sorted by the primary Key (Key).
An existing storage system adopting an LSM-Tree architecture stores data in a persistent storage medium in the form of data blocks, and establishes index information for each data block to quickly locate the data block. Index data is organized by index information of the data blocks and memory data, and it is known that index data is updated by both data write operations and data merge operations. When a reading request for target data corresponding to a target key is received, firstly accessing memory data recorded in index data, if the target data does not exist in the memory data, then determining a target data block where the target data is located based on index information of the data block, then firstly reading cache data, if the target data block exists in the cache data, reading the target data from the cache data, and if the target data block does not exist in the cache data, then reading the persistent storage medium layer by layer until the target data block is found to read the target data.
Since the merging operation reorders the data of the adjacent layers to generate a new data block and updates the index data, when the cache data is not updated, if a read request is received, based on the updated index data, the target data is read from the persistent storage medium if the target data is in the new data block and the cache data does not cache the new data block, which may cause a large number of read requests to access the persistent storage medium, resulting in jitter in read performance, read delay, and affect data read efficiency.
Disclosure of Invention
The embodiment of the application provides a data access method and computing equipment, which are used for solving the technical problem that the data reading efficiency is influenced in the prior art.
In a first aspect, an embodiment of the present application provides a data access method, including:
under the condition that all read requests access the first version index data, detecting the version update of the index data, and executing the following stream-cutting operation; wherein the first version is the current latest version;
controlling the gradual switching from the least reading requests to the access of the index data of the latest version and controlling the rest reading requests to access the index data of the historical version along with the version updating of the index data;
and stopping executing the stream cutting operation under the condition of detecting that all the read requests are switched to access the index data of the latest version.
Optionally, after stopping the flow cutting operation, the method further comprises:
the index data of the history version is deleted.
Optionally, the method further comprises:
and releasing the data blocks in the persistent storage medium which are not referenced by the random version index data.
Optionally, the controlling the remaining read request access history version index data includes:
and controlling the rest read requests to access the first version index data.
Optionally, the method further comprises:
after stopping accessing any historical version index data, deleting the historical version index data.
Optionally, the flow cutting operation further comprises:
determining whether to write new data;
if so, controlling all the reading requests to be switched to access the index data of the latest version;
if not, following the version update of the index data, controlling the gradual switching from at least more reading requests to accessing the index data of the latest version and controlling the rest reading requests to access the index data of the historical version;
optionally, the flow cutting operation further comprises:
determining a tangent flow proportion corresponding to each version update following the version update of the index data;
controlling the corresponding number of read requests to be switched to access the index data of the latest version according to the stream cutting proportion;
the rest of the read requests are controlled to access the historical version of the index data.
Optionally, the determining, according to the number of version updates of the index data, a tangent stream proportion corresponding to each version update includes:
and following the version update of the index data, determining the tangent stream proportion corresponding to each version update based on the incremental proportion from zero according to an equal proportion increasing principle.
Optionally, the method further comprises:
the incremental ratio is adjusted in conjunction with the current swipe-in operating frequency and/or the compression operating frequency.
In a second aspect, embodiments of the present application provide a computing device, comprising a storage component and a processing component; the storage component stores one or more computer instructions; the one or more computer instructions to be invoked for execution by the processing component; the processing component is configured to implement the data access method according to the first aspect.
In a third aspect, an embodiment of the present application provides a storage engine, including:
the detection module is used for detecting the version updating of the index data and triggering the execution of the stream switching operation under the condition that all the reading requests access the first version index data;
the stream switching module is used for executing the stream switching operation so as to follow the version updating of the index data, controlling the gradual switching from at least more reading requests to accessing the index data of the latest version and controlling the rest reading requests to access the index data of the historical version; and stopping executing the stream switching operation under the condition that all the read requests are switched to access the index data of the latest version.
In a fourth aspect, an embodiment of the present application provides a storage system based on an LSM-tree architecture, including a persistent storage medium and a storage engine as described in the third aspect.
In the embodiment of the application, under the condition that all read requests access to the first version index data, wherein the first version is the current latest version; detecting the version updating of the index data, and executing the following stream switching operation; controlling the gradual switching from the least reading requests to the access of the index data of the latest version and controlling the rest reading requests to access the index data of the historical version along with the version updating of the index data; and stopping executing the stream switching operation if detecting that all the read requests are switched to access the index data of the latest version. According to the principle of from less to more, along with version updating, read flow is gradually migrated to access the index data of the latest version, part of the read requests still access the index data of the historical version, all the flow is directly migrated to the index data of the latest version, and the cache data is correspondingly updated in the gradual migration process, so that the gradual preheating effect of the cache data is achieved, the read delay caused by the jitter of the read performance due to the fact that the index data of the latest version is directly accessed can be avoided, and the data reading efficiency is improved.
These and other aspects of the present application will be more readily apparent from the following description of the embodiments.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow chart illustrating one embodiment of a data access method provided herein;
FIG. 2 is a schematic diagram of a tangential flow of an embodiment of the present application in one implementation;
FIG. 3 is a flow chart illustrating a further embodiment of a method for data access provided herein;
FIG. 4 is a schematic diagram illustrating an embodiment of a storage engine provided herein;
FIG. 5 illustrates a schematic structural diagram of one embodiment of a computing device provided herein.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
In some of the flows described in the specification and claims of this application and in the above-described figures, a number of operations are included that occur in a particular order, but it should be clearly understood that these operations may be performed out of order or in parallel as they occur herein, the number of operations, e.g., 101, 102, etc., merely being used to distinguish between various operations, and the number itself does not represent any order of performance. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.
The technical scheme of The embodiment of The application is mainly applied to a storage system adopting an LSM-Tree (The Log-Structured Merge-Tree) architecture.
For the convenience of understanding the technical solutions of the present application, the following explains technical terms that may appear in the embodiments of the present application first:
Key-Value storage system: a storage system for storing data in key value pairs.
LSM-Tree: a data storage structure is suitable for a Key-Value storage system, data is written in an additional mode, the writing operation can be firstly inserted into a memory, and the memory data is written into a persistent storage medium after reaching a corresponding threshold Value. The persistent storage medium stores data in a layered mode, and after the data of each layer reaches a corresponding threshold value, the data is written into the next layer, so that the data is written in an additional mode, multiple versions of data corresponding to the same key exist, the new versions of data are located at a higher level, the old versions of data may be located at a lower level, reading operation starts from the highest level, the latest version of data is sequentially searched from the lower level, and the reading operation can be terminated after the latest version of data is found.
LSM-tree storage system: and the storage system adopts an LSM-tree architecture.
compact (compression): in the LSM-tree storage system, since data is written in an additional manner and there is data redundancy, data merging (merge) can be performed through a compact operation, and the compact operation can continuously merge data of adjacent layers and write the data into a lower layer. Specifically, two or more layers of adjacent data to be merged are read out and sorted according to keys, if the same keys have multiple versions, only the new version data is reserved, the old version data is deleted, and then the lower layer is written.
Active-Memtable: in an active memory table in a memory in an LSM-tree storage system, a write operation is first written into the active memory table in the memory.
Immunable-Memtable: and in the LSM-tree storage system, a frozen memory table in a memory. After the active memory table is full, the active memory table is transferred (switch) to the frozen memory table. The frozen memory table is then flushed (flush) to the persistent storage medium.
Persistent storage media: the storage device for persistently storing data may be, for example, a magnetic disk, an optical disk, or the like. Data on a persistent storage medium in the LSM-tree storage system is hierarchically stored according to a hierarchical organization, and assuming that the data comprises three layers of L0, L1 and L2, a flush operation divides the data in a frozen memory table into data blocks and flushes the data into the L0 layer, and the data can be merged with the data in the L0 layer through a compact operation. Data merging is also performed among L0, L1 and L2 through compact operation.
Data block: each layer in a persistent storage medium stores data in the form of data blocks. In order to quickly locate data blocks, each data block is indexed by setting corresponding index information, which may specifically refer to metadata of the data block, and may include a data block size, a data block identifier, a key value range, and the like. It can be seen that both flush operations and compact operations may generate new data blocks, but new data is not introduced, and only switch operations introduce new data.
Index data: for convenience of data management, a current LSM-tree storage system is index data composed of memory data (e.g., Active-Memtable and executable-Memtable) and index information of each data block. The reading operation firstly accesses the index data, firstly searches for Active-Memtable, then searches for Immunable-Memtable, and then locates the data blocks based on the index information, and further searches for each layer in the persistent storage medium, and the like. It is known that the switch operation, the flush operation, and the compact operation all update the index data. Taking the example of a persistent storage medium comprising three layers L0, L1 and L2, index data may be different before and after updating, both Active-Memtable and Immunable-Memtable, both Immunable-Memtable and L0, L0, L0 and L1, both L1 and L2, and L2, etc.
Caching data: in order to improve the data reading efficiency, data blocks meeting certain conditions, such as higher access heat or latest writing, in the persistent storage medium are cached as hot data blocks in the memory, which are called cache data. If the reading operation is positioned to a certain data block based on the index information, the cache data is firstly searched, if the cache data hits the data block, the cache data can be directly read, otherwise, each layer in the durable storage medium is searched.
Quote: index information of the data block is recorded in the index data, i.e., indicates that the data block is referenced. For example, if data block 1 of L0 and data block 2 of L1 layer are merged to generate data block 3 and data block 4, data block 3 and data block 4 are written into L1 layer, wherein data block 3 and data block 4 are new data blocks, data block 1 of L0 layer and data block 2 of L1 layer are old data blocks, and the old data blocks are not released immediately. Since the index data update may generate different versions of index data, a certain data block may be referenced by multiple versions of index data, and thus the number of index data referencing the data block may be represented by reference count. If the reference count of a data block is 0, i.e. not referenced by any version of index data, the data block is released.
Cache invalidation: if a data block in the cache data is not referenced by a certain index data, when data reading is performed by accessing the index data, reading failure occurs in the cache data, and the cache data becomes invalid relative to the index data.
As can be seen from the foregoing description, the compact operation will reorder the data of the adjacent layers, generate a new data block, and update the index data. However, the cache data is not updated immediately, and if a read request is received, based on the updated index data, if the target data requested to be read is located in the new data block and the new data block is not cached in the cache data, the target data is read from the persistent storage medium, and if a large number of such read requests access the persistent storage medium, the read performance may be jittered, the read delay may be generated, and the data read efficiency may be affected.
In order to improve data reading efficiency, the inventors have studied and found that, in an LSM-tree storage system, a switch operation, a flush operation, and a compact operation all update index data, but only the switch operation introduces new data into the storage system, but the flush operation and the compact operation do not introduce new data, and only changes in data organization structure, such as a certain data is located in a data block a, and may be located in a data block B after the compact operation, and on the premise that the storage system has both the data block a and the data block B, the data can be obtained from both the data block a and the data block B. Accordingly, the inventor thinks that, after flush operation or compact operation, whether historical version index data can be temporarily reserved or not, and when new version index data is generated, the historical version index data is not deleted, so that an old data block in cache data can still be referred and cannot be immediately released from a persistent storage medium, the old data block is referred by the historical version index data but not by the new version index data, and the new data is referred by the new version index data, and when target data requested to be accessed exist in both the old data and the new data, the target data can be successfully read for the same read operation no matter whether the new version index data or the historical version index data is accessed, and the access historical version index data can be directly obtained from the cache data, so that data reading efficiency can be ensured, and cache of the access new version data is invalidated, the persistent storage medium needs to be read. However, the inventor thinks that the cache data is also updated, the historical version index data is still accessed after the flush operation or the compact operation is adopted, the cache data is updated over time, reading failure occurs in the cache data, and the cache data is still required to be read from the persistent storage medium relative to the historical version index data and invalidation.
According to the above inventive process, the inventor has proposed the technical solution of the present application through a series of researches. In the embodiment of the application, under the condition that all read requests access the first version index data, wherein the first version is the current latest version; detecting the version updating of the index data, and executing the following stream switching operation; following the version update of the index data, controlling the gradual switching from a few reading requests to the access of the index data of the latest version and controlling the rest reading requests to access the index data of the historical version; and stopping executing the stream switching operation if all the read requests are detected to be switched to access the index data of the latest version. The read flow is gradually migrated to access the latest version index data according to the principle of from less to more, part of the read requests still access the historical version index data, all the flow is directly migrated to the latest version index data, the cache data is correspondingly updated in the gradual migration process, the gradual preheating effect of the cache data is achieved, the read delay caused by the jitter of the read performance due to the fact that the latest version index data is directly accessed can be avoided, the data reading efficiency is improved, and the cache hit rate is improved.
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 1 is a flowchart of an embodiment of a data access method provided in an embodiment of the present application, where a technical solution of the embodiment may be executed by a storage engine, and the method may include the following steps:
101: in the case where all read requests access the first version index data, the index data version update is detected, and the stream cut operation of step 102 is performed.
The first version index data refers to the current latest version index data, and is named as the first version index data for convenience of description.
An all read request may refer to an all read request received at one time.
The version update of the index data may be implemented by detecting whether to generate a new version of index data or detecting whether to execute a merge operation, which may specifically be detecting whether to execute a flush operation or a compact operation, and if detecting that the flush operation or the compact operation is executed, it may be considered that the version of the index data is updated.
102: and following the version update of the index data, controlling the gradual switching from the least reading requests to the access of the index data of the latest version, and controlling the rest reading requests to access the index data of the historical version.
The remaining read requests may refer to the remaining read requests of the received read requests except for accessing the latest version of index data.
The version update following the index data can be performed, for a received read request, the step-by-step switching from a few read requests to the latest version index data access and the step-by-step switching from the few read requests to the historical version index data access of the rest read requests are controlled. Starting from the first version, as the number of version updates is increased, more and more read requests are controlled to access the latest version index data until all the latest version index data are accessed, and less read requests access the historical version index data until the access amount is zero.
Alternatively, after each version update of the index data, for a received read request, a first number of read requests in the received read request are controlled to access the index data of the latest version, and a second number of read requests other than the first number of read requests are controlled to access the index data of the historical version. Wherein the first number increases with increasing number of version updates from 0, and the second number decreases with increasing number of version updates until 0. The first number may be the same or different in number per increment and the second number may be the same or different in number per decrement.
103: and stopping executing the stream switching operation under the condition of detecting that all the read requests are switched to access the index data of the latest version.
That is, the second number is 0 due to the stream switching operation, and the stream switching operation may be stopped when all the read traffic accesses the latest version index data. At this time, the latest version index data is used as the first version index data, and the process returns to step 101 to continue the execution.
For example, the read request may carry a target key corresponding to target data requested to be read, first read memory data in the index data, if the target data corresponding to the target key exists, the read operation is ended, if the target data corresponding to the target key does not exist, based on the target key and a key value range in the index information, a target data block corresponding to the target key may be determined, and then whether the target data block exists may be first searched from the memory data, if so, the read operation is directly performed, otherwise, the read operation is performed layer by layer in the persistent storage medium, and here, detailed reading process is not repeated.
In the embodiment, the read traffic is gradually migrated to access the latest version index data according to the principle of from least to more, part of the read requests still access the historical version index data, all the read traffic is directly migrated to the latest version index data, the cache data is correspondingly updated in the gradual migration process, the gradual preheating effect of the cache data is achieved, the read delay caused by the jitter of the read performance due to the fact that the index data of the latest version is directly accessed can be avoided, the data reading efficiency is improved, and the cache hit rate is improved.
For example, if the target data requested to be read is located in the cache data and located in the data block 1 before the merge operation, the data block 1 may also store a copy in the cache data. The index data after the merge operation, which is located in data block 2, will not refer to data 1 any more, and the target data can be obtained from data block 1 of the cache data by using the index data before the merge operation, without reading the persistent storage medium, and the target data can only be obtained from data block 2 of the persistent storage medium using the index data after the merge operation, according to the technical scheme of the application, most of the read traffic is accessed to the historical version index data at first, so that the data reading efficiency can be ensured, and the cache data will be updated gradually with the version update, the data block 2 will be saved in the cache data with the increase of access heat, most of the read traffic accesses the index data of the latest version, the target data can be directly obtained from the cache data more probably, the cache hit rate can be improved, and the data reading efficiency is also ensured. Therefore, the cache data is gradually preheated by adopting the technical scheme of the application, the jitter of the reading performance can be reduced, and the data reading efficiency is improved.
In some embodiments, after stopping the flow cutting operation, the method may further include:
the index data of the history version is deleted.
In order to avoid data redundancy, the storage performance is improved, and in addition, data blocks which are not referenced by any version index data in the persistent storage medium can be released.
After the index data of the historical version is deleted, the data blocks which are respectively referenced are no longer referenced correspondingly, the reference count can be correspondingly reduced, and a data block which is not referenced by any version index data can be indicated when the reference count is 0, so that the data block with the reference count of 0 can be released, that is, the data block with the reference count of 0 is deleted from the storage system.
As an optional way, when the storage system includes a plurality of historical version index data, the storage system may segment the remaining read requests, and read requests in different groups access different historical version index data, that is, the plurality of historical version index data share read traffic. The remaining read requests can be equally split according to the number of the historical version index data, the read requests with the same number of the historical version index data are split according to different degrees, of course, the read requests with the same number of the historical version index data can also be unequally split, and the read requests from less to more are sequentially born according to the time sequence of the historical version index data.
As another alternative, the rest of the read requests may be controlled to access the same historical version index data.
Optionally, for different times of version updates of the index data, the rest of the read requests can be controlled to access the same historical version index data
Furthermore, as yet another alternative, it may be that the rest of the read requests are controlled to access the first version index data.
That is, after each version update of the index data, the rest of the read requests can be controlled to access the first version of the index data.
In some embodiments, the method may further comprise:
after stopping accessing any historical version index data, deleting the historical version index data.
If other reading requests access the first version index data, historical version index data generated between the first version index data and the latest version index data can be deleted, and the occupation of storage space is reduced. Because the index data is deleted, the reference of the data block can be changed, and the data block which is not referenced by the index data of any version can be deleted, thereby further ensuring that the occupation of the storage space is reduced.
In some embodiments, when performing a cut stream operation, it may first be determined whether to write new data;
if so, controlling all the reading requests to be switched to access the index data of the latest version;
if not, following the version update of the index data, controlling the gradual switching from the least reading requests to the access of the index data of the latest version and controlling the rest reading requests to access the index data of the historical version;
alternatively, whether to write new data may be determined by comparing the index data of the adjacent two versions.
Because the index data comprises the active memory table, the active memory table in the index data of two adjacent versions can be compared, if the active memory table is different, new data can be determined to be written, otherwise, the new data is not written.
Alternatively, whether a switch operation exists or not can be detected, if so, data can be determined to be written, and otherwise, new data is considered to be not written.
In the stream cutting operation, following the version update of the index data, it may be determined that after each version update of the index data, a first quantity corresponding to the latest version of the index data and a second quantity corresponding to the historical version of the index data may be determined, and the corresponding quantity of read requests may be controlled to access the different versions of the index data according to the respective access quantities. The first number and the second number corresponding to each time after the version of the index data is updated may be preset, or may be implemented in other manners.
As an alternative implementation, the flow cutting operation may include:
determining a tangent flow proportion corresponding to each version update following the version update of the index data;
controlling the corresponding number of reading requests to be switched to access the index data of the latest version according to the stream cutting proportion;
the rest of the read requests are controlled to access the historical version of the index data.
The tangent flow proportion is increased along with the increment of the version updating times, and the proportion can be the same or different in each increment.
Assuming that 100 ten thousand read requests are received at a time after a certain version is updated, and the stream-cutting proportion is 30%, the access number corresponding to the stream-cutting proportion is 30 ten thousand read requests, specifically, 30 ten thousand read requests are switched to access the index data of the latest version, and the remaining 70 ten thousand read requests still access the index data of the historical version.
Alternatively, following the version update of the index data, the cut stream proportion corresponding to each version update is determined based on the increasing proportion from zero according to the equal proportion increasing principle.
The reading delay rate after each index data version update can be balanced through equal proportion increase, and the data reading efficiency after each index data version update can be ensured to be similar, so that the reading experience of a user is ensured.
The incremental ratio can be set by combining with actual conditions, and can also be dynamically adjusted. The incremental ratio may be determined in conjunction with a current swipe operation (flush) frequency and/or a compress operation (compact) frequency, and thus, in some embodiments, the method may further comprise: the incremental ratio is adjusted in conjunction with the current swipe-in operating frequency and/or the compression operating frequency.
In addition, the data size may also be determined by combining a flush data size and a merge data size, where the flush data size may refer to the data size written by the flush operation into the persistent storage medium, and the merge data size may refer to the data size merged by the compact operation.
The increment ratios corresponding to different brushing operation frequencies, different brushing data volumes, different compression operation frequencies and different merging data volumes can be preset.
Alternatively, it is also possible to determine the total cut time in conjunction with a switch operating frequency, and then determine the incremental ratio based on the total cut time, as well as the current swipe-in operating frequency or the compression operating frequency.
For example, based on the brushing operation frequency and the total switching time, the prediction times of version update in the total switching time can be determined, and similarly, based on the compression operation frequency and the total switching time, the prediction times of version update in the total switching time can also be determined. Then the increment ratio may be determined according to the predicted number of version updates, for example, the number of version updates is 4, and the increment may be set to be greater than or equal to 25%, for example, 30%.
Of course, the number of version update predictions may also be determined for the input operation frequency and the compression operation frequency, for example, taking an average value or a minimum value as a predetermined frequency, based on the predetermined frequency and the total tangential flow time. If the preset frequency is updated every 20 seconds, the total cut-stream time is 60 seconds, the prediction times of version updating are 3 times, and the like.
It should be noted that the above description is only an example of the possible determination manner of the incremental ratio, and the present application does not specifically limit this.
For ease of understanding, assume an incremental ratio of 30%, with the first version of the index data being SV0,SV0Obtaining index data SV after version updating on the basis1,SV1Obtaining index data SV after version updating on the basis2,SV2Obtaining index data SV after version updating on the basis3Until SV is obtainedmWherein m is a positive integer. In conjunction with the tangential flow scale schematic shown in fig. 2:
all read request access SV0In the case of (1), detecting the execution of a flush or compact operation to generate an SV1At this point, a cut-off operation may be initiated towards SV1Tangential flow, SV1The corresponding tangential flow ratio is 0+ 30% ~ 30%, meaning that SV is generated1Thereafter, for a large number of read requests received at one time, 30% of the read request access SVs are controlled170% of read requests still access SV0
Then, the SV is generated after continuing to execute the flush or compact operation2,SV2The corresponding tangential flow ratio is 30% + 60%, and the flow to SV can be stopped1Shear flow, steering SV2Tangential flow, i.e. production of SV1Thereafter, for a large number of read requests received at one time, 60% of the read requests are controlled to access SV1, and 40% of the read requests are still accessed to SV0
Then, the SV is generated after continuing to execute the flush or compact operation3,SV3The corresponding tangential flow ratio is 60% + 30% + 90%, in which caseTo stop moving towards SV2Shear flow, steering SV3Tangential flow, i.e. production of SV3Thereafter, for a large number of read requests received at one time, 90% of the read requests are controlled to access SV3, and 10% of the read requests still access SV0
Then, the SV is generated after continuing to execute the flush or compact operation4,SV4The corresponding tangential flow proportion is 100%, and the SV can be stopped3Shear flow, steering SV4Shear flow to generate SV4Thereafter, for a large number of read requests received at one time, all read requests are diverted to access the SV4,SV0Is 0. The flow cutting operation is finished, at the moment, the flow cutting operation can be stopped, and the current SV4Can be used as SV0The above operation is continued.
Wherein it is assumed that SV is obtained2Thereafter, a switch operation is performed to generate SV5At this point, all of the read requests received at one time will be switched to access SV's for a large number of read requests since the switch operation introduces new data5,SV0Is 0 to ensure normal access of data.
In a practical application, the data access process may be executed according to the flow shown in fig. 3, and the data access method shown in fig. 3 may include the following steps:
301: and under the condition that all the read requests access the first version index data, detecting the version update of the index data and executing the stream switching operation. The tangential flow operation may include the following steps 302-305.
302: it is determined whether new data is to be written, if so, step 303 is performed, and if not, step 304 is performed.
303: and determining the corresponding tangential flow proportion of each version update.
304: and controlling the corresponding number of read requests to be switched to access the index data of the latest version according to the stream cutting proportion.
305: the remaining read requests are controlled to access the first version of the index data.
306: and controlling all read requests to be switched to access the latest version index data.
307: and stopping executing the stream switching operation under the condition of detecting that all the read requests are switched to access the index data of the latest version.
308: after stopping accessing any historical version index data, deleting the historical version index data.
309: and releasing the data blocks in the persistent storage medium which are not referenced by the random version index data.
The specific execution operations of each step may be detailed as shown in the corresponding embodiments, and will not be repeated here.
An embodiment of the present application further provides a storage engine, as shown in fig. 4, the storage engine may include:
the detection module 401 is configured to detect version update of the index data and trigger execution of a stream switching operation when all the read requests access the first version index data;
a stream switching module 402, configured to perform stream switching operation to control switching from at least many read requests to access the index data of the latest version step by step and to control the rest of the read requests to access the index data of the historical version, following the version update of the index data; and stopping executing the stream switching operation under the condition that all the read requests are switched to access the index data of the latest version.
The storage engine shown in fig. 4 may execute the data access method described in the embodiment shown in fig. 1 or fig. 3, and the implementation principle and the technical effect are not described again. The specific manner in which each module and unit of the storage engine in the above embodiments perform operations has been described in detail in the embodiments related to the method, and will not be elaborated herein.
In one possible design, the storage engine of the embodiment shown in fig. 4 may be configured in a computing device, so as shown in fig. 5, the embodiment of the present application further provides a computing device, which may include a storage component 501 and a processing component 502; the storage component 501 stores one or more computer instructions for the processing component 502 to call and execute, which may implement the data access method of the embodiments shown in fig. 1 or fig. 3.
The processing component 502 may include one or more processors executing computer instructions to perform all or part of the steps of the method described above. Of course, the processing elements may also be implemented as one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components configured to perform the above-described methods.
The storage component 501 is configured to store various types of data to support operations on a computing device. The memory components may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Of course, a computing device may also necessarily include other components, such as input/output interfaces, communication components, and so forth.
The input/output interface provides an interface between the processing components and peripheral interface modules, which may be output devices, input devices, etc.
The communication component is configured to facilitate wired or wireless communication between the computing device and other devices, and the like.
The computing device may be a physical device or an elastic computing host provided by a cloud computing platform, and the computing device may be a cloud server, and the processing component, the storage component, and the like may be a basic server resource rented or purchased from the cloud computing platform.
The embodiment of the present application further provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a computer, the data access method of the embodiment shown in fig. 1 or fig. 3 may be implemented.
In addition, the embodiment of the present application further provides a storage system based on an LSM-tree architecture, which can persist a storage medium and a storage engine as described in fig. 4.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (10)

1. A method of data access, comprising:
under the condition that all read requests access the first version index data, detecting the version update of the index data and executing the following cut flow operation; wherein the first version is the current latest version;
controlling the gradual switching from the least reading requests to the access of the index data of the latest version and controlling the rest reading requests to access the index data of the historical version along with the version updating of the index data;
and stopping executing the stream cutting operation under the condition of detecting that all the read requests are switched to access the index data of the latest version.
2. The method of claim 1, wherein after stopping the flow-cutting operation, the method further comprises:
the index data of the history version is deleted.
3. The method of claim 1, further comprising:
and releasing the data blocks in the persistent storage medium which are not referenced by the random version index data.
4. The method of claim 1, wherein the controlling the remaining read request access history version index data comprises:
and controlling the rest read requests to access the first version index data.
5. The method of claim 4, further comprising:
after stopping accessing any historical version index data, deleting the historical version index data.
6. The method of claim 1, wherein the tangential flow operation further comprises:
determining whether to write new data;
if so, controlling all the reading requests to be switched to access the index data of the latest version;
if not, the step of gradually switching from the least reading requests to the index data of the latest version and controlling the rest reading requests to access the index data of the historical version is controlled along with the version updating of the index data.
7. The method of claim 1, wherein the tangential flow operation further comprises:
determining a tangent flow proportion corresponding to each version update following the version update of the index data;
controlling the corresponding number of read requests to be switched to access the index data of the latest version according to the stream cutting proportion;
the rest of the read requests are controlled to access the historical version of the index data.
8. The method of claim 7, wherein the determining the tangent stream proportion corresponding to each version update following the version update of the index data comprises:
and following the version update of the index data, determining the tangent stream proportion corresponding to each version update based on the incremental proportion from zero according to an equal proportion increasing principle.
9. The method of claim 8, further comprising:
the incremental ratio is adjusted in conjunction with the current swipe-in operating frequency and/or the compression operating frequency.
10. A computing device comprising a storage component and a processing component; the storage component stores one or more computer instructions; the one or more computer instructions to be invoked for execution by the processing component; the processing component is used for realizing the data access method of any one of claims 1 to 9.
CN202110601072.1A 2021-05-31 2021-05-31 Data access method and computing device Pending CN113392087A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110601072.1A CN113392087A (en) 2021-05-31 2021-05-31 Data access method and computing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110601072.1A CN113392087A (en) 2021-05-31 2021-05-31 Data access method and computing device

Publications (1)

Publication Number Publication Date
CN113392087A true CN113392087A (en) 2021-09-14

Family

ID=77619441

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110601072.1A Pending CN113392087A (en) 2021-05-31 2021-05-31 Data access method and computing device

Country Status (1)

Country Link
CN (1) CN113392087A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113760830A (en) * 2021-09-22 2021-12-07 国网信息通信产业集团有限公司 System and method for storing and editing distributed files

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113760830A (en) * 2021-09-22 2021-12-07 国网信息通信产业集团有限公司 System and method for storing and editing distributed files
CN113760830B (en) * 2021-09-22 2024-01-30 国网信息通信产业集团有限公司 Distributed file storage editable system and method

Similar Documents

Publication Publication Date Title
US10417138B2 (en) Considering a frequency of access to groups of tracks and density of the groups to select groups of tracks to destage
US10402091B1 (en) Managing data in log-structured storage systems
US9235524B1 (en) System and method for improving cache performance
US10102150B1 (en) Adaptive smart data cache eviction
US8627012B1 (en) System and method for improving cache performance
US9104529B1 (en) System and method for copying a cache system
US9134914B1 (en) Deduplication
US10528520B2 (en) Snapshot management using heatmaps in a large capacity disk environment
JP6817318B2 (en) Methods and equipment for accessing flash memory devices
US9671974B2 (en) Storage system and deduplication control method therefor
US9507732B1 (en) System and method for cache management
US8930947B1 (en) System and method for live migration of a virtual machine with dedicated cache
US10235060B1 (en) Multilevel snapshot replication for hot and cold regions of a storage system
US10970209B2 (en) Destaging metadata tracks from cache
US20200341668A1 (en) Sub-block data deduplication
US10275360B2 (en) Considering a density of tracks to destage in groups of tracks to select groups of tracks to destage
US10095595B2 (en) Instant recovery in a multi-grained caching framework
US11620218B2 (en) Using multi-tiered cache to satisfy input/output requests
US11061828B1 (en) Using multi-tiered cache to satisfy input/output requests
US10241918B2 (en) Considering a frequency of access to groups of tracks to select groups of tracks to destage
CN113392087A (en) Data access method and computing device
US10754730B2 (en) Copying point-in-time data in a storage to a point-in-time copy data location in advance of destaging data to the storage
US9053033B1 (en) System and method for cache content sharing
US9009416B1 (en) System and method for managing cache system content directories
US11520510B1 (en) Extending the lifespan of a flash-based storage device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240313

Address after: # 03-06, Lai Zan Da Building 1, 51 Belarusian Road, Singapore

Applicant after: Alibaba Innovation Co.

Country or region after: Singapore

Address before: Room 01, 45th Floor, AXA Building, 8 Shanton Road, Singapore

Applicant before: Alibaba Singapore Holdings Ltd.

Country or region before: Singapore