CN117827818A - Data storage method and device - Google Patents

Data storage method and device Download PDF

Info

Publication number
CN117827818A
CN117827818A CN202211202134.2A CN202211202134A CN117827818A CN 117827818 A CN117827818 A CN 117827818A CN 202211202134 A CN202211202134 A CN 202211202134A CN 117827818 A CN117827818 A CN 117827818A
Authority
CN
China
Prior art keywords
data
index engine
data table
index
storage device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211202134.2A
Other languages
Chinese (zh)
Inventor
王顺卓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Cloud Computing Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Cloud Computing Technologies Co Ltd filed Critical Huawei Cloud Computing Technologies Co Ltd
Priority to CN202211202134.2A priority Critical patent/CN117827818A/en
Priority to PCT/CN2023/104709 priority patent/WO2024066597A1/en
Publication of CN117827818A publication Critical patent/CN117827818A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures

Abstract

The present invention relates to the field of data processing, and in particular, to a data storage method and apparatus. The method comprises the following steps: providing a configuration interface for a user to configure the load characteristic of the data table as a read-intensive load or a write-intensive load; when the configuration interface indicates that the load characteristic of the user configuration data table is a read intensive load, the storage device is instructed to store data in the data table according to the first index engine; the first index engine is matched with the read intensive load; when the configuration interface indicates that the load characteristic of the user configuration data table is a write-intensive load, the storage device is instructed to store data in the data table according to the second index engine; the second index engine is matched to the write-intensive load. The method can enable the index engine of the data table to be matched with load characteristics caused by service by the index engine of the data table, and improves the access performance of the data table.

Description

Data storage method and device
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a data storage method and apparatus.
Background
The database is a warehouse for storing data, and stores a large amount of data. In order to ensure query efficiency, in the database, data needs to be organized according to a certain structure, namely, the data is stored by adopting an index engine. Currently, index engines commonly used for databases are log structured merge trees (log structured merge tree, LSM tree), b+ trees (b+ tree), and the like.
Different access operations to the database bring different load characteristics to the database. Different indexing engines accommodate different load characteristics. For example, a log structured merge tree structure is better suited for load characteristics when write operations occur more frequently, and a b+ tree structure is better suited for load characteristics when read operations occur more frequently.
The access operations to the database by different users may be different, as may the resulting load characteristics. Therefore, no matter what kind of index engine is used in the database, it is difficult to adapt to the load characteristics caused by the access operations of a plurality of users, and thus the access performance to the database is reduced.
Disclosure of Invention
The embodiment of the application provides a data storage method and device, which can be configured by a user or adjust an index engine of a data table according to the change of the load characteristic of the data table.
In a first aspect, a data storage method is provided, and the data storage method is applied to a control device in a storage system, wherein the storage system further comprises a storage device, and the storage device stores a data table of a user; the method comprises the following steps: providing a configuration interface for a user to configure the load characteristic of the data table as a read-intensive load or a write-intensive load; when the configuration interface indicates that the load characteristic of the user configuration data table is a read intensive load, the storage device is instructed to store data in the data table according to the first index engine; the first index engine is matched with the read intensive load; when the configuration interface indicates that the load characteristic of the user configuration data table is a write-intensive load, the storage device is instructed to store data in the data table according to the second index engine; the second index engine is matched to the write-intensive load.
By the method, the user can configure the index engine of the data table, so that the user can adjust the index engine of the data table according to the change of the service served by the data table, the index engine is matched with the load characteristic caused by the changed service, and the access performance of the data table is improved.
In one possible implementation, the first index engine includes at least a B+ tree structure and the second index engine includes at least a log structured merge tree structure.
In this implementation, the b+ tree structure belongs to a read-friendly index engine, and the first index engine includes the b+ tree structure, which can improve the matching degree between the first index engine and the read-intensive load. The log-structured merge tree structure belongs to a write-friendly index engine, and the second index engine comprises the log-structured merge tree structure, so that the matching degree of the second index engine and the write-intensive load can be improved.
In one possible implementation, the index engine in the data table is the second index engine before instructing the storage device to store the data in the data table in accordance with the first index engine; instruct the storage device to store the data in the data table according to the first index engine, including: the storage device is instructed to migrate an index engine in the data table from the second index engine to a third index engine, wherein the structure of the third index engine is between the structure of the first index engine and the structure of the second index engine; the storage device is then instructed to migrate the index engine of the data table from the third index engine to the first index engine. Wherein the third indexing engine may be referred to as a hybrid indexing engine.
In the implementation mode, the index engine of the data table can be switched from a write-friendly index engine to a hybrid index engine, and then is switched from the hybrid index engine to a read-friendly index engine, so that gradual switching of the index engines is realized, and index engine change expense caused by switching among index engines with large structural difference can be avoided.
In one possible implementation, the data table has a local secondary index, and the configuration interface is further configured to allow a user to configure a load characteristic of the local secondary index as a read-intensive load or a write-intensive load; when the configuration interface indicates that a user configures the load characteristic of the local secondary index to be a read intensive load, the storage device is instructed to store data under the local secondary index according to the first index engine; when the configuration interface instructs the user to configure the load characteristic of the local secondary index as a write-intensive load, the storage device is instructed to store data under the local secondary index according to the second index engine.
In the implementation mode, the user can configure the index engine of the local secondary index in the data table, so that the user can adjust the index engine of the local secondary index according to the change of the service served by the local secondary index, the index engine is matched with the load characteristic caused by the changed service, and the access performance of the local secondary index is improved.
In one possible implementation, the data in the data table is stored in the form of key-value pairs.
In the implementation mode, the method can be applied to the key value database, and the access performance of the key value database can be improved.
In a second aspect, a data storage method is improved, and the data storage method is applied to a control device in a storage system, wherein the storage system further comprises a storage device, and the storage device stores a data table; the method comprises the following steps: the control device monitors the operation amplification of the data table, wherein the operation amplification comprises the reading amplification of reading data from the data table or the writing amplification of writing data into the data table when the index engine of the data table is a fourth index engine; when the operation amplification comprises the reading amplification and the reading amplification is larger than a first threshold value, the storage device is instructed to store data in the data table according to a fifth index engine; the matching degree of the fifth index engine and the read intensive load is larger than that of the fourth index engine and the read intensive load; when the operation amplification comprises write amplification and the write amplification is greater than a second threshold, instructing the storage device to store the data in the data table according to the sixth index engine; the matching degree of the sixth index engine and the write-intensive load is larger than that of the fourth index engine and the write-intensive load.
The method can monitor the read amplification or the write amplification of the data table, judge whether the current index engine of the data table is matched with the load characteristic of the data table or not and judge the change direction of the index engine according to the read amplification or the write amplification, so that the index engine can be adjusted towards the direction of matching the load characteristic of the data table, the index engine is matched with the load characteristic of the data table, and the access performance of the data table can be improved.
In one possible implementation, the operational amplification includes both read amplification and write amplification; when the operational magnification includes a read magnification and the read magnification is greater than a first threshold, directing the storage device to store data in the data table according to a fifth index engine, comprising: and when the read amplification is greater than the first threshold and the write amplification is less than the third threshold, instructing the storage device to store the data in the data table according to the fifth index engine.
In the implementation mode, the index engine of the data table can be adjusted towards the direction of the read-friendly index engine under the condition that the read amplification is larger and the write amplification is smaller, so that the read amplification and the write amplification can be balanced, and the comprehensive access performance of the data table is improved.
In one possible implementation, the operational amplification includes both read amplification and write amplification; when the operation magnification includes a write magnification and the write magnification is greater than a second threshold, directing the storage device to store data in the data table according to a sixth index engine, comprising: and when the write amplification is greater than the second threshold and the read amplification is less than the fourth threshold, instructing the storage device to store the data in the data table according to the sixth index engine.
In the implementation mode, the index engine of the data table can be adjusted towards the direction of the write-friendly index engine under the condition that the write amplification is relatively large and the read amplification is relatively small, so that the read amplification and the write amplification can be balanced, and the comprehensive access performance of the data table is improved.
In one possible implementation, the fifth index engine includes at least a b+ tree structure and the sixth index engine includes at least a log structured merge tree structure.
In this implementation, the b+ tree structure belongs to a read-friendly index engine, and the fifth index engine includes the b+ tree structure, which can improve the matching degree of the fifth index engine and the read-intensive load. The log-structured merge tree structure belongs to a write-friendly index engine, and the sixth index engine comprises the log-structured merge tree structure, so that the matching degree of the sixth index engine and the write-intensive load can be improved.
In one possible implementation, the data table has a local secondary index LSI, and the operational magnification is a magnification generated by operating the data under the local secondary index; instruct the storage device to store the data in the data table according to the fifth index engine, comprising: the storage device is instructed to store data under the local secondary index according to the fifth index engine; alternatively, the instructing the storage device to store the data in the data table according to the sixth indexing engine includes: the storage device is instructed to store data under the local secondary index according to the sixth index engine.
In the implementation manner, the index engine of the local secondary index can be adjusted according to the operation amplification of the index engine of the local secondary index in the data table, so that the load characteristics of the index engine and the local secondary index are matched, and the access performance of the local secondary index in the data table can be improved.
In a third aspect, a data storage device is provided, and the control device is configured in a storage system, where the storage system further includes a storage device, and the storage device stores a data table of a user; the data storage device includes: the system comprises a providing module, a configuration interface and a control module, wherein the providing module is used for providing a configuration interface, and the configuration interface is used for a user to configure the load characteristic of a data table to be a read-intensive load or a write-intensive load; the indicating module is used for indicating the storage device to store the data in the data table according to the first index engine when the configuration interface indicates that the load characteristic of the user configuration data table is a read intensive load; the first index engine is matched with the read intensive load; the indicating module is further used for indicating the storage device to store the data in the data table according to the second index engine when the configuration interface indicates that the load characteristic of the user configuration data table is a write intensive load; the second index engine is matched to the write-intensive load.
In one possible implementation, the first index engine includes at least a B+ tree structure and the second index engine includes at least a log structured merge tree structure.
In one possible implementation, the index engine in the data table is the second index engine before instructing the storage device to store the data in the data table in accordance with the first index engine; the indication module is also used for: the storage device is instructed to migrate an index engine in the data table from the second index engine to a third index engine, wherein the structure of the third index engine is between the structure of the first index engine and the structure of the second index engine; the storage device is then instructed to migrate the index engine of the data table from the third index engine to the first index engine.
In one possible implementation, the data table has a local secondary index, and the configuration interface is further configured to allow a user to configure a load characteristic of the local secondary index as a read-intensive load or a write-intensive load; the indication module is also used for: when the configuration interface indicates that a user configures the load characteristic of the local secondary index to be a read intensive load, the storage device is instructed to store data under the local secondary index according to the first index engine; when the configuration interface instructs the user to configure the load characteristic of the local secondary index as a write-intensive load, the storage device is instructed to store data under the local secondary index according to the second index engine.
In a fourth aspect, a data storage device is provided, and the data storage device is configured in a control device in a storage system, where the storage system further includes a storage device, and the storage device stores a data table; the data storage device includes: the monitoring module is used for controlling the device to monitor the operation amplification of the data table under the condition that the index engine of the data table is a fourth index engine, wherein the operation amplification comprises the reading amplification of reading data from the data table or the writing amplification of writing data into the data table; the indicating module is used for indicating the storage device to store the data in the data table according to the fifth index engine when the operation amplification comprises the reading amplification and the reading amplification is larger than the first threshold value; the matching degree of the fifth index engine and the read intensive load is larger than that of the fourth index engine and the read intensive load; the indicating module is further used for indicating the storage device to store the data in the data table according to the sixth index engine when the operation amplification comprises write amplification and the write amplification is larger than a second threshold; the matching degree of the sixth index engine and the write-intensive load is larger than that of the fourth index engine and the write-intensive load.
In one possible implementation, the operational amplification includes both read amplification and write amplification; the indication module is used for: and when the read amplification is greater than the first threshold and the write amplification is less than the third threshold, instructing the storage device to store the data in the data table according to the fifth index engine.
In one possible implementation, the operational amplification includes both read amplification and write amplification; the indication module is used for: and when the write amplification is greater than the second threshold and the read amplification is less than the fourth threshold, instructing the storage device to store the data in the data table according to the sixth index engine.
In one possible implementation, the fifth index engine includes at least a b+ tree structure and the sixth index engine includes at least a log structured merge tree structure.
In one possible implementation, the data table has a local secondary index LSI, and the operational magnification is a magnification generated by operating the data under the local secondary index; the indication module is used for: the storage device is instructed to store data under the local secondary index according to the fifth index engine; alternatively, the storage device is instructed to store data under the local secondary index according to the sixth index engine.
In a fifth aspect, a cluster of computing devices is provided, comprising at least one computing device, each computing device comprising a processor and a memory; the processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device to cause the cluster of computing devices to perform the method as provided in the first aspect or the method as provided in the second aspect.
In a sixth aspect, there is provided a computer program product comprising instructions which, when executed by a cluster of computing devices, cause the cluster of computing devices to perform the method as provided in the first aspect or the method as provided in the second aspect.
In a seventh aspect, a computer readable storage medium is provided, comprising computer program instructions which, when executed by a cluster of computing devices, perform the method as provided in the first aspect or the method as provided in the second aspect.
According to the data storage method and device, a user can configure the index engine of the data table or adjust the index engine of the data table according to the load characteristic of the data table, so that the index engine of the data table is matched with the load characteristic of the data table, and the access performance of the data table is improved.
Drawings
FIG. 1A is a schematic diagram of a log structured merge tree;
FIG. 1B is a schematic diagram of a B+ tree structure;
FIG. 2 is a schematic diagram of a storage system according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of an initial load characteristic configuration submodule according to an embodiment of the present application;
FIG. 4 is a flow chart of a data storage scheme provided in an embodiment of the present application;
FIG. 5 is a schematic diagram of an indexing engine provided in an embodiment of the present application;
FIG. 6 is a flow chart of a data storage scheme provided by an embodiment of the present application;
FIG. 7 is a flow chart of a data storage scheme provided by an embodiment of the present application;
FIG. 8 is a flowchart of a data storage method according to an embodiment of the present disclosure;
FIG. 9 is a flowchart of a data storage method according to an embodiment of the present application;
FIG. 10 is a schematic diagram of a data storage device according to an embodiment of the present disclosure;
FIG. 11 is a schematic diagram of a data storage device according to an embodiment of the present disclosure;
FIG. 12 is a schematic diagram of a computing device according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of a computing device cluster according to an embodiment of the present application;
FIG. 14 is a schematic diagram of a computing device cluster according to an embodiment of the present disclosure;
FIG. 15 is a schematic structural diagram of a computing device according to an embodiment of the present application;
FIG. 16 is a schematic diagram of a computing device cluster according to an embodiment of the present disclosure;
fig. 17 is a schematic structural diagram of a computing device cluster according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. Wherein, in the embodiments of the present application, "a plurality" means "at least two".
A key-value store stores data in key-value pairs, where keys are unique identifiers of data and values (values) are data content, which can be anything, including complex objects, from simple to complex. The key value database has the advantages of simple storage data form, distributed processing capability, quick response and the like. In addition, the key value database belongs to non-relational databases (nosqls), and can cope with large-scale data storage. Accordingly, key-value databases have been widely used in the field of computer systems, particularly cloud storage (cloud storage).
The key value database may serve multiple users. Illustratively, the user may be a tenant of the cloud storage. A user may query data in a database through an index (index) engine. Wherein the indexing engine is a data storage structure for efficient querying of data. In embodiments of the present application, the indexing engine may also be referred to as a data storage structure.
In particular, a user may create a data table in a key-value database to store and manage the user's data. The data table adopts a local main index (local primary index, LPI), establishes a key-value mapping, and adopts an index engine to store the key-value mapping relationship so as to store the data in the data table. The key is used for identifying the data, and may be the name of the data. Value represents data that may be defined based on business requirements, such as the performance of a subject or subjects of a student. The method comprises the steps of establishing a key-value mapping by adopting a local main index, wherein the key and the value are stored according to a data storage mode of the local main index.
In some embodiments, a data table may be composed of one or more data partition instances. One of the partition instances may be used to store data specifying a key range (key range). Wherein the data specifying the key range may be referred to as data in the partition instance. Wherein a partition instance can establish a mapping of keys to values in corresponding key ranges through a local master index to store data in the partition instance. When a data table is composed of multiple data partition instances, the index engines used by the local primary indexes of the multiple data partition instances may be the same, i.e., multiple data partition instances in the same data table may use the same index engine to store data. Thus, the index engine of the local master index of a data partition instance in a data table may be referred to as the index engine of the data table.
In some embodiments, a value may include multiple pieces of information (e.g., value is an object of complex composite information). In order to improve the query efficiency, it is generally possible to construct a local secondary index (local second index, LSI) and to construct a mapping relationship between one piece of information of the pieces of information and the key. Wherein the piece of information may be referred to as a subvalue. The local secondary index also adopts an index engine to store the mapping relation from the sub value to the key.
When inquiring the value corresponding to a certain key or a plurality of keys, the keys can be screened in the local secondary index through the sub-value corresponding to the key. Then, the value is queried in the local main index through the key obtained by screening, so that the query efficiency is improved. Taking a data table or a data partition example for storing student achievements as an example, wherein a key is a student number, and a value includes achievements of multiple subjects such as language, mathematics, english, physics, chemistry and the like of the student. The local master index builds a mapping of the number to the number of discipline achievements. The local secondary index builds a mapping from the Chinese score to the academic number, namely, the sub value is the Chinese score. The user can be set to inquire the score of each subject of the student with the score of more than 90. And then, in the local main index, inquiring value according to the selected academic number, thereby obtaining the score of each discipline of the student with the Chinese score of more than 90 points. Thus, query efficiency may be improved through local secondary indexing.
One or more local secondary indexes may be constructed for a data table. The corresponding subvalues in different local secondary indexes may be different or partially the same.
In the following description, a local primary index may be simply referred to as a primary index, and a local secondary index may be simply referred to as a secondary index. In addition, when the local primary index and the local secondary index are not particularly distinguished, they may be simply referred to as indexes.
The above examples introduce indexes in a key-value database. Next, load characteristics of the key-value database are described.
The access operations for the database may include read operations and write operations, and accordingly, the load characteristics may include read-intensive loads, write-intensive loads, and hybrid loads. The hybrid load is a load characteristic between the read-intensive load and the write-intensive load that indicates that the frequency of occurrence of the read operation and the write operation is not very different, or that is, the read operation and the write operation are relatively balanced.
When the ratio of the occurrence frequency of the read operation to the occurrence frequency of the write operation is greater than the threshold value A1, the load characteristic is specifically a read-intensive load. The threshold A1 is a preset value. Illustratively, the threshold A1 is greater than or equal to 4. In one example, the threshold A1 is 9. When the ratio of the frequency of occurrence of the write operation to the frequency of occurrence of the read operation is greater than the threshold A2, the load characteristic is a write-intensive load. The threshold A2 is a preset value. Illustratively, the threshold A2 is greater than or equal to 4. In one example, the threshold A2 is 9. When the ratio of the occurrence frequency of the read operation to the occurrence frequency of the write operation is not greater than the threshold A1 and the ratio of the occurrence frequency of the write operation to the occurrence frequency of the read operation is not greater than the threshold A2, the load characteristic is a hybrid load.
The access operation to the index, that is, the access operation performed in the index engine adopted by the index, brings load characteristics to the index. Different access operations result in the index carrying different load characteristics. The load characteristic caused by the access operation for the local primary index is called a primary index load characteristic, and the load characteristic caused by the access operation for the local secondary index is called a secondary index load characteristic.
The load characteristics of a data table depend on the type of traffic served by the data table and the type of user behavior (e.g., user access operations to the data table, querying the data in the data table using a local secondary index). For example, for a data table serving live traffic, during the peak hours of live traffic (e.g., 8 to 10 hours at night), the data table is read and written at high frequencies, and the load characteristics at this time are mixed loads. In the low peak period of service (for example, 0 to 6 in the early morning), the user performs dump backup on the data in the data table, and the data in the data table is read at high frequency, and the data table is a read intensive load. For data tables that serve data backup services, the load characteristics are typically write-intensive loads. When the data is restored from the data table serving the data backup service to the data lost data table, a large amount of data needs to be read from the data table serving the data backup service, and the load characteristic of the data table serving the data backup service is converted into a read intensive load.
The local secondary index provides quick access to the data in the data table, and different local secondary indexes can record different sub-values in the data table, so that the data in the data table can be queried through the different sub-values. Thus, the load characteristics of the different local secondary indexes within the data table depend on the query behavior of the user. For example, for a data table containing two local secondary indexes, the local secondary indexes are denoted LSI 1 and LSI 2, respectively. In a certain period, if the user accesses the data table only by referring to the condition by using the subvalue recorded in LSI 1, LSI 1 is a hybrid load and LSI 2 is a write-intensive load. On the contrary, LSI 1 is a write-intensive load and LSI 2 is a hybrid load.
It will be appreciated that the access operation for the secondary index is not constant, and thus the load characteristics of the same secondary index at different times may also be different.
Thus, the load characteristics of the primary index or the load characteristics of the secondary index are typically dynamically changed.
In addition, in the following description, when the main index load characteristic and the secondary index load characteristic are not particularly distinguished, they may be simply referred to as data table load characteristics.
Some index engines are suitable for reading operation, so that the reading amplification is small, the reading time delay is low, and the reading operation experience is good. Such an indexing engine may be referred to as a read-friendly indexing engine. Some index engines are suitable for writing operation, so that the writing amplification is small, the writing time delay is low, and the writing operation experience is good. Such an indexing engine may be referred to as a write-friendly indexing engine.
Specifically, the read-friendly index engine writes data in an additional writing (application) mode, and has the advantages of high writing speed, low writing operation time delay and higher writing performance. However, the data in the data table is delayed to be updated by the way of additional writing, so that a plurality of historical versions of key values are reserved in the data table, read amplification is increased, and the read operation time delay is high.
The log-structured merge tree (log structured merge tree, LSM tree) structure is a typical write-friendly index engine. The LSM tree stores data in the form of logs (logs). As shown in fig. 1A, the LSM tree includes a data area and an index area (manifest). Wherein the data area is an area in the LSM tree where data is stored. The data area may be located in a hard disk, and is used for implementing persistent storage of data. The data area includes a plurality of storage layers (C1 layer, C2 layer, C3 layer, C4 layer, C5 layer, C6 layer as shown in fig. 1A) from top to bottom. Wherein, the memory space of the upper layer is smaller and the memory space of the lower layer is larger in the plurality of memory layers. In data writing, data is written to the uppermost layer, i.e., the C1 layer. When the data amount of the C1 layer reaches the preset value D1, data of the C1 layer and the next layer (i.e., C2 layer) of the C1 layer are combined (combined), and the combined data are handed over to the C2 layer. When the data amount of the C2 layer reaches a preset value D2, merging the data of the C2 layer and the data of the next layer (C3) of the C2 layer, transferring the merged data to the C3 layer, and so on, so that old data is continuously transferred to the lower layer, and new data can be continuously written to the upper layer. In addition, the index field may be used to expedite locating where a particular key is located in a storage tier.
The read-friendly index engine stores data in a timely update manner. In a read-friendly index engine, only a single key value is reserved, so the read-friendly index engine is beneficial to data reading and is more suitable for reading operation. When writing data into the read-friendly index engine, the key value of the previous version needs to be read, and after the key value is updated, the updated key value is written again, so that larger writing expense is caused. Thus, the read-friendly index engine is not suitable for write operations, resulting in write amplification.
Wherein, the B+ tree (B+ tree) structure is a typical read-friendly index engine. As shown in fig. 1B, the data area is composed of leaf nodes in which key values are stored in order. The B+ tree structure has an index area for accelerating the positioning of a leaf node where a certain key value is located.
Since the load characteristics of the data table may vary, it may vary from a read-intensive load to a write-intensive load, or from a write-intensive load to a read-intensive load. Therefore, no matter the data table adopts a read-friendly index engine or a write-friendly index engine, the situation that the index engine is not matched with the load characteristic can occur, so that the access delay is increased, the service operation is affected, and the user experience is affected.
In view of the foregoing, the present application provides a data storage scheme that may provide an index engine configuration interface so that a user may configure an index engine of a data table at any time. The data in the data table may then be stored in accordance with a user configured data storage structure. It will be appreciated that the load characteristics of a data table are related to the traffic served by the data table. For example, the load characteristics of a data table serving a data backup service are typically write-intensive loads. When the data is restored from the data table serving the data backup service to the data lost data table, a large amount of data needs to be read from the data table serving the data backup service, and the load characteristic of the data table serving the data backup service is converted into a read intensive load. And the user can set or predict when the data table serves which service. Therefore, a user can configure an index engine matched with the load characteristic brought by the service after switching before or during the service switching of the data table, so that the data table can provide better access performance and the read or write time delay is reduced.
Next, an exemplary description will be given of a data storage scheme provided in an embodiment of the present application.
First, embodiments of the present application provide a storage system 100 that may be used to implement a data storage scheme. As shown in fig. 2, the storage system 100 includes a control device 110 and a storage device 120.
Wherein the storage 120 is a device or apparatus for persisting data. In some embodiments, the storage 120 may be a hard disk (disk), such as a Solid State Disk (SSD). In other embodiments, the storage 120 may be other forms of devices or apparatus having the function of persisting stored data. The specific implementation form of the storage device 120 is not specifically limited in the embodiments of the present application.
As shown in fig. 2, the storage 120 may include a plurality of data tables such as a data table T1, a data table T2, and the like. The data tables T1 and T2 may belong to databases. The database may specifically be a key value database. Wherein, some data tables of the plurality of data tables belong to one user, and some data tables can belong to different users. Wherein, as shown in fig. 2, a data table, such as data table T1, may include a local master index. In some embodiments, data table T1 may also include a local secondary index B1 and a local secondary index B2.
The control module 110 is a module or component having data processing capabilities. In some embodiments, the control module 110 may be a physical device, such as a server or processor, or the like. In some embodiments, the control module 110 may be a virtual device, such as a Virtual Machine (VM) or a container (container), or the like. The specific implementation form of the control module 110 is not specifically limited in the embodiments of the present application.
The control module 110 is used for controlling or adjusting an indexing engine of the data table in the storage device 120. Wherein, as shown in fig. 2, the control module 110 may include a configuration module 111 and a processing module 112. The configuration module 111 may be used for a user to configure the load characteristics of the data table. The processing module 112 may configure an index engine that matches the load characteristics based on the load characteristics configured by the user and instruct the storage 120 to store data according to the index engine. More specifically, the configuration module 111 may provide a configuration interface to the user for the user to input load characteristics through. The processing module 112 may obtain the load characteristic input by the user from the configuration module 111, and configure an index engine matched with the load characteristic, so as to instruct the storage device 120 to store the data in the data table of the user according to the index engine.
In some embodiments, as shown in fig. 2, the configuration module 111 may include an initial load characteristic configuration submodule 111A and the processing module 112 may include an index engine initialization submodule 112A. Wherein the initial-load-characteristic-configuration submodule 111A may provide an initial-load-characteristic-configuration interface to the user when the user creates the data table in the database 121. The user may enter initial load characteristics through the configuration interface. The index engine initialization submodule 112A may configure an index engine matching the initial load characteristic. Thereafter, the index engine initialization submodule 112A may instruct the storage 120 to configure the index engine as an initialization index engine for the data table newly created by the user.
In some embodiments, as shown in fig. 3, the initial load characteristic configuration submodule 111A may include a main index initial load characteristic configuration submodule 111A1. The master index initial load characteristic configuration submodule 111A1 may provide a master index initial load characteristic configuration interface to a user when the user creates a data table in the database 121. The user may enter the primary index initial load characteristics through the configuration interface. The index engine initialization submodule 112A may configure an index engine matching the primary index initial load characteristic with the primary index initial load characteristic. Thereafter, the index engine initialization submodule 112A may instruct the storage 120 to configure the index engine as an initialization master index for the data table newly created by the user.
In some embodiments, as shown in fig. 3, the initial load characteristic configuration submodule 111A may include a secondary index initial load characteristic configuration submodule 111A2. The secondary index initial load characteristic configuration submodule 111A2 may provide a secondary index initial load characteristic configuration interface to a user when the user creates a secondary index in a data table. The user may enter a secondary index initial load characteristic through the configuration interface. The index engine initialization submodule 112A may configure an index engine matching the secondary index initial load characteristic with the secondary index initial load characteristic. Thereafter, the index engine initialization submodule 112A may instruct the storage 120 to configure the index engine as an initialization secondary index for the data table newly created by the user.
Returning to fig. 2, in some embodiments, the configuration module 111B may include a load characteristic adjustment sub-module 111B, and the processing module 112 may include an index engine adjustment sub-module 112B. Wherein the load characteristic adjustment submodule 111B may provide the load characteristic adjustment interface to the user after the data table creation is completed. The user may enter new load characteristics through the tuning interface. The index engine adjustment submodule 112B may configure an index engine matching the new load characteristic according to the new load characteristic and instruct the storage device 120 to migrate the index engine of the user data table to the index engine matching the new load characteristic or instruct the storage device 120 to switch the index engine of the user data table to the index engine matching the new load characteristic.
In this embodiment, migration may be understood as a gradual change. For example, it is set that there are a first structure, a second structure, and a third structure, wherein a difference between the first structure and the second structure is large, and the third structure is between the first structure and the second structure, that is, a difference between the first structure and the third structure, and a difference between the second structure and the third structure are smaller than a difference between the first structure and the second structure. The first structure is migrated to the second structure, specifically, the first structure is switched to the third structure, and then the third structure is switched to the second structure. Therefore, the index engine change overhead caused by switching between structures with larger difference can be avoided.
In some embodiments, as shown in FIG. 2, the control module 110 further includes a load characteristic sensing module 113. The load characteristics sensing module 113 may sense load characteristics of the data table. Specifically, the load characteristics sensing module 113 may sense an operational amplification that operates on the data table. When the operation magnification is larger than a preset threshold value, it can be determined that the current index engine of the data table is not matched with the current load characteristic of the data table, and the index engine needs to be adjusted. And the type of current load characteristics may be determined based on the particular type of operation.
Specifically, the operations may include a read operation and a write operation, and the operation amplification includes a read amplification and a write amplification, respectively. The operation amplification refers to the ratio of the data amount of the data actually operated to the data amount of the data to be operated, the read amplification refers to the data amount of the data actually read to the data amount of the data to be read, and the write amplification refers to the data amount of the data actually written to the data amount to be written.
The load characteristics sensing module 113 may sense a sense amp that performs a read operation on the data table. When the reading amplification is greater than a preset threshold value A3, it may be determined that the structure of the current index engine of the data table does not match the current load characteristic, and the index engine of the data table needs to be adjusted toward the direction of the reading friendly index engine. The load characteristics sensing module 113 may sense a write amplification of a write operation to the data table. In one example, when the read magnification is greater than a preset threshold A3 and the write magnification is less than a preset threshold A4, it may be determined that the structure of the data table's current index engine does not match the data table's current load characteristics, and that the index engine of the data table needs to be adjusted in the direction of the read-friendly index engine. Wherein, adjusting the index engine of the data table in the direction of the read-friendly index engine means that the structure of the index engine after adjustment is more suitable for or more matched with the read-intensive load than the structure of the index engine before adjustment.
The threshold A3 and the threshold A4 may be set in advance empirically or experimentally. In one example, the threshold A3 may be 20 and the threshold A4 may be 10. In another example, the threshold A3 may be 30 and the threshold A4 may be 15. In yet another example, the threshold A3 may be 40 and the threshold A4 may be 20. Etc., the threshold A3 and the threshold A4 are not particularly limited in the embodiment of the present application.
Wherein the load characteristics sensing module 113 may sense a write amplification of a write operation to the data table. When the write amplification is greater than a preset threshold value A5, it may be determined that the current index engine of the data table does not match the current load characteristic of the data table, and that the index engine of the data table needs to be adjusted in the direction of the write-friendly index engine. In one example, when the write amplification is greater than a preset threshold A5 and the read amplification is less than a preset threshold A6, it may be determined that the current index engine of the data table does not match the current load characteristics of the data table and that the index engine of the data table needs to be adjusted in the direction of the write-friendly index engine. The index engine of the data table is adjusted towards the direction of the write-friendly index engine, so that the structure of the index engine after adjustment is more suitable for or more matched with the write-intensive load than the structure of the index engine before adjustment.
The threshold A5 and the threshold A6 may be set in advance empirically or experimentally. In one example, the threshold A5 may be 20 and the threshold A6 may be 10. In another example, the threshold A5 may be 30 and the threshold A6 may be 15. In yet another example, the threshold A5 may be 40 and the threshold A6 may be 20. Etc., the threshold A5 and the threshold A6 are not particularly limited in the embodiment of the present application.
The above examples introduce the storage system 100. Next, a data storage scheme provided in an embodiment of the present application is described in connection with the storage system 100 by way of example. Here, the data table T1 may be set to correspond to the user 200, and the data table T1 is used to store data of the service of the user 200.
Referring to fig. 4, the control module 110 may perform step 401 to provide a configuration interface to the user 200. Wherein the configuration interface is used for a user to input the load characteristics of the data table. The load characteristics input by the user through the configuration interface can be read, write, or hybrid load. That is, the configuration interface is used for the user to configure whether the load characteristics of the data table T1 are read-intensive, write-intensive, or hybrid loads. For example, the control module 110 may confirm that the load characteristics configured by the user are default characteristics in the case where the user does not input the load characteristics to the configuration interface, i.e., in the case where the control module 110 does not receive the load characteristics input by the user. In one example, the default characteristic may be a hybrid load.
The user 200 may perform step 403 to input the load characteristic E1. The load characteristic E1 may be a read-intensive load, a write-intensive load, or a hybrid load, among others.
In some embodiments, storage system 100 may also include a client (not shown) located on the user 200 side. In step 401, the control device 100 may provide a configuration interface to the client. The user may make an input on the client to enable the input of the load characteristic E1 to the configuration interface. The foregoing is merely illustrative of a manner in which the user inputs the load characteristics to the configuration interface, and is not limiting. Other manners supported by the prior art may be adopted to implement the input of the load characteristic to the configuration interface by the user, which will not be described in detail herein.
In some embodiments, control 110 may provide a configuration interface to user 200 when user 200 creates data table T1. The configuration interface at this time is used for the user 200 to configure the initialization load characteristics. That is, the load characteristic E1 is an initialization load characteristic.
In some embodiments, control 110 may provide a configuration interface to user 200 during use of data table T1. The configuration interface at this time is used for the user 200 to adjust the load characteristics. That is, the load characteristic E1 is a load characteristic actively adjusted by the user. It will be appreciated that the user may instruct the data table T1 to serve different services for different periods of time. The load characteristics caused by different services are different. Thus, the user can input the load characteristics caused by the changed service through the configuration interface according to the change of the service served by the data table T1. I.e. the user can actively adjust the load characteristics.
In some embodiments, the configuration interface may be used for a user to configure the load characteristics of the primary index and/or the load characteristics of the secondary index. That is, the load characteristic E1 may be the load characteristic of the primary index, the load characteristic of the secondary index, or both the load characteristic of the primary index and the load characteristic of the secondary index.
In some embodiments, the configuration interface may be specifically an application programming interface (application programming interface, API).
In one example, the configuration interface is specifically used for the user 200 to configure the initialization load characteristics of the master index, and the function of the configuration interface may be InitTableStore (workloadType type). The parameter workloadType type is valued in read, write, default. When the parameter value of workloadType type is read, the load characteristic E1 is a read-intensive load. When the parameter value of workloadType type is write, the load characteristic E1 is a write-intensive load. When the parameter value of workloadType type is default, the load characteristic E1 is a hybrid load.
In one example, the configuration interface is specifically used for the user 200 to configure the initialization load characteristics of the secondary index, and the function of the configuration interface may be InitIndexStore (workloadType type). The parameter workloadType type is valued in read, write, default. When the parameter value of workloadType type is read, the load characteristic E1 is a read-intensive load. When the parameter value of workloadType type is write, the load characteristic E1 is a write-intensive load. When the parameter value of workloadType type is default, the load characteristic E1 is a hybrid load.
In one example, the configuration interface is used by the user 200 to adjust the load characteristics, and the function of the configuration interface may be ChangeStore (workloadType oldType, workloadType newType). The parameter workloadType oldType represents the load characteristic before adjustment, and the parameter workloadType newType represents the load characteristic after adjustment. The load characteristic E1 is an adjusted load characteristic, that is, the parameter workloadType newType indicates the load characteristic E1. Wherein, the parameters workloadType oldType and workloadType newType can be all three values of read, write, default. As described above, when the value is read, the load characteristic is a read-intensive load. When the value is write, the load characteristic is write-intensive load. When the value is default, the load characteristic E1 is a hybrid load.
With continued reference to fig. 4, the control device 110 may execute step 405 to determine an index engine E11 that matches the load characteristic E1. When the load characteristic E1 is a read-intensive load, the index engine E11 is a read-friendly index engine. When the load characteristic E1 is a write-intensive load, then the index engine E11 is a write-friendly index engine. When the load characteristic E1 is a hybrid load, the index engine E11 is a hybrid index engine. Wherein the structure of the hybrid index engine is intermediate between the structure of the read-friendly index engine and the structure of the write-friendly index engine.
In some embodiments, the index engine may be built based on a b+ tree structure and a log-structured merge tree structure. The read-friendly index engine at least comprises a B+ tree structure, and the write-friendly index engine at least comprises a log structure merging tree structure. Wherein the hybrid index engine may include both a b+ tree structure and a log-structured merge tree structure, wherein the b+ tree structure is located at a lower level of the log-structured merge tree structure.
Wherein in fig. 5, a plurality of index engines are sequentially shown from left to right, wherein the read performance of the plurality of index engines is sequentially enhanced from left to right, and the write performance is sequentially enhanced from right to left. The leftmost index engine can be used as a read-friendly index engine, the rightmost index engine can be used as a write-friendly index engine, and the middle index engine can be used as a hybrid index engine.
In one illustrative example, as shown in FIG. 5, the read-friendly index engine is embodied as a B+ tree structure. The write-friendly index engine is composed of a B+ tree structure and a log-structured merge tree structure, wherein the B+ tree structure is positioned at the lower layer of the log-structured merge tree structure. The hybrid index engine is also composed of a b+ tree structure and a log-structured merge tree structure, with the b+ tree structure being located at a lower level of the log-structured merge tree structure. Wherein the number of layers of the log structured merge tree structure storage layer in the hybrid index engine is small compared to the write friendly index engine. That is, the log-structured merge tree structure in the write-friendly index engine has a greater number of storage layers, and the log-structured merge tree structure in the hybrid index engine has a lesser number of storage layers.
In this case, as described above, data is written to the uppermost layer, i.e., the C1 layer, at the time of data writing. When the data amount of the C1 layer reaches the preset value D1, the data of the C1 layer and the data of the next layer (i.e., the C2 layer) of the C1 layer are combined (comparison), and the combined data is handed over to the C2 layer. When the data amount of the C2 layer reaches the preset value D2, the data of the C2 layer and the data of the next layer (i.e., the C3 layer) of the C2 layer are combined, and the combined data is handed over to the C3 layer, and so on. Therefore, the more the number of layers of the storage layer in the log structure merging tree structure is, the better the data aggregation effect of the upper layer of the B+ tree is, and the less data is written into the B+ number structure, so that the write amplification can be reduced, and the write delay can be reduced.
While the more layers of storage layers in a log structured merge tree structure, the more historical versions of the data may be, which results in read amplification and increased read latency. Therefore, to reduce read amplification and reduce read latency, it is desirable to reduce the number of layers of the storage layer of the log structured merge tree structure. Thus, by adjusting the layer number of the storage layers in the log structure merging tree structure, index engines with different read-write performances are built, namely, a read-friendly index engine, a write-friendly index engine and a hybrid index engine are built. Wherein the number of storage layers of the log-structured merge tree structure in the hybrid index engine may be reduced or increased, thereby biasing the hybrid index engine toward read-friendly or write-friendly.
Returning to fig. 4, when it is determined in step 405 that the index engine E11, the control device 110 may perform step 407, instructing the storage device 120 to store the data in the data table T1 according to the index engine E11. Specifically, as shown in fig. 4, the control device 110 may perform step 4071 to send indication information to the storage device 120, where the indication information may include an identification of the index engine E11. The storage device 120 may perform step 4072, in response to the indication information, store the data in the data table T1 according to the index engine E11.
When the load characteristic E1 is the load characteristic of the primary index, the control device 110 instructs the storage device 120 to store the data in the whole data table T1 according to the index engine E11, or store the data in the data partition instance corresponding to the primary index. Specifically, the indication information may include the identifier of the data table T1 or the identifier of the data partition instance while including the identifier of the index engine E11, and thus, the storage device 120 stores the data in the entire data table T1 or the data in the data partition instance corresponding to the primary index according to the index engine E11 according to the identifier of the data table T1 or the identifier of the data partition instance.
When the load characteristic E1 is a load characteristic of a secondary index, then the control 110 instructs the storage 120 to store data under the secondary index in accordance with the index engine E11. Specifically, the indication information may include the identification of the secondary index while including the identification of the index engine E11, and thus the storage 120 stores data under the secondary index according to the index engine E11 according to the identification of the secondary index.
In some embodiments, the index engine of data table T1 is index engine E12 prior to performing step 407. Wherein, the index engine E12 is a read-friendly index engine, and the index engine E11 is a write-friendly index engine; alternatively, the index engine E11 is a read-friendly index engine, and the index engine E12 is a write-friendly index engine. In this case, in step 407, the control device 110 may instruct the storage device 120 to store the data in the data table T1 according to the hybrid index engine, and then store the data in the data table T1 according to the index engine E11. The storage 120 may change the index engine of the data table T1 from the index engine E12 to the hybrid index engine first, and then change the index engine of the data table T1 from the hybrid index engine to the index engine E1. In this way, the index engine storing data is gradually migrated from the index engine E12 to the index engine E11, so that the variation overhead of the index engine can be reduced.
In the data storage scheme provided by the embodiment of the application, the user can configure the index engine of the data table, so that the user can adjust the index engine of the data table in time or at any time when the service served by the data table changes, and the index engine is matched with the load characteristic caused by the changed service, so that the access performance of the data table can be improved.
In connection with the storage system 100 shown in fig. 2, the embodiments of the present application also provide a data storage scheme. Next, the example introduces this scheme.
As shown in fig. 6, the control device 110 may perform step 601 to monitor the operation magnification of the data table T1. Illustratively, step 601 may be performed periodically. Wherein an average value of the operation amplification monitored in one execution period may be regarded as the operation amplification of the execution period. The execution period of step 601 may be preset. In one example, the execution period of step 601 may be 10 minutes. In another example, the execution period of step 601 may be 20 minutes. Etc.
The operation amplification may include a read amplification for reading data from the data table T1 or a write amplification for writing data to the data table T1, among others. The operational amplification may also include both read amplification to read data from the data table T1 and/or write amplification to write data to the data table T1.
In addition, in the following description, unless otherwise specified, the sense amplification refers to sense amplification of reading data from the data table T1, and the write amplification refers to write amplification of writing data to the data table T1.
With continued reference to fig. 6, the control device 100 may execute step 603 to determine whether the operational amplification is greater than the threshold y1.
In some embodiments, the operational amplification comprises a sense amplification and the threshold y1 comprises a threshold A3. In step 603, it may be determined whether the sense amp is greater than a threshold A3. If the reading amplification is greater than the threshold value A3, it may be determined that the current index engine of the data table T1 is not matched with the current load characteristic of the data table T1, and the direction of the reading friendly index engine is required to be adjusted to the index engine of the data table T1.
In one illustrative example of this embodiment, the operational amplification includes a read amplification and a write amplification, and the threshold y1 includes a threshold A3 and a threshold A4. In step 603, it may be determined whether the read amplification is greater than a threshold A3 and whether the write amplification is less than a threshold A4. If the read amplification is greater than the threshold A3 and the write amplification is less than the threshold A4, it may be determined that the current index engine of the data table T1 is not matched with the current load characteristic of the data table T1, and the index engine of the data table T1 needs to be adjusted in the direction of the read-friendly index engine.
The threshold A3 and the threshold A4 may be specifically described above, and will not be described herein.
In some embodiments, the operational amplification comprises write amplification and the threshold y1 comprises a threshold A5. In step 603, it may be determined whether the write amplification is greater than a threshold A5. If the write amplification is greater than the threshold value A5, it may be determined that the current index engine of the data table T1 does not match the current load characteristic of the data table T1, and that the direction of the write-friendly index engine is required to be adjusted to the index engine of the data table T1.
In one illustrative example of this embodiment, the operational amplification includes write amplification and read amplification, and the threshold y1 includes a threshold A5 and a threshold A6. In step 603, it may be determined whether the write amplification is greater than a threshold A5, and whether the read amplification is less than a threshold A6. If the write amplification is greater than the threshold A5 and the read amplification is less than the threshold A6, it may be determined that the current index engine of the data table T1 does not match the current load characteristic of the data table T1, and that the index engine of the data table T1 needs to be adjusted in the direction of the write-friendly index engine.
The threshold A5 and the threshold A6 may be specifically described above, and will not be described herein.
With continued reference to fig. 6, the control device 110 may execute step 605 to adjust the index engine of the data table T1 to the index engine E21 in the direction of decreasing the operation magnification.
Specifically, when it is determined in step 603 that the direction of the read-friendly index engine needs to be adjusted to the index engine of the data table T1, the index engine E21 is an index engine more suitable or more advantageous for the read operation than the current index engine of the data table T1, that is, the matching degree of the index engine E21 and the read-intensive load is greater than the matching degree of the current index engine of the data table T1 and the read-intensive load. That is, the read magnification of the data table T1 when the data of the data table T1 is stored in accordance with the index engine E21 is smaller than the current read magnification of the data table T1.
In some embodiments, the index engine may be biased toward read-friendly or write-friendly by adjusting the number of storage layers of the log-structured merge tree in the index engine, as described above. Wherein the indexing engine is more biased toward read-friendly when the number of storage layers of the log-structured merge tree in the indexing engine decreases. In this way, when it is determined in step 603 that the direction of the read-friendly index engine is required to be adjusted, the index engine of the data table T1 may be set as the index engine E21 by N storage layers less than the current structure of the index engine of the data table T1. That is, the index engine E21 has a structure that is N storage layers less than the current index engine structure of the data table T1. The storage layer refers to a storage layer of a log structure merging tree, and N is an integer greater than or equal to 1. Wherein, the value of N can be preset. In one example, N is 1, 2, 3, or the like.
When it is determined in step 603 that the direction of the write-friendly index engine is required to adjust the index engine of the data table T1, the index engine E21 is an index engine more suitable or more advantageous for writing than the current index engine of the data table T1, that is, the matching degree of the index engine E21 and the write-intensive load is greater than the matching degree of the current index engine of the data table T1 and the write-intensive load. That is, the write amplification of the data table T1 when the data of the data table T1 is stored in accordance with the index engine E21 is smaller than the current write amplification of the data table T1.
In some embodiments, the index engine may be biased toward read-friendly or write-friendly by adjusting the number of storage layers of the log-structured merge tree in the index engine, as described above. Wherein the indexing engine is more biased towards write friendliness as the number of storage layers of the log-structured merge tree in the indexing engine increases. In this way, when it is determined in step 603 that the direction of the write-friendly index engine is required to adjust the index engine of the data table T1, an index engine having M more storage layers than the current structure of the index engine of the data table T1 may be used as the index engine E21. That is, the structure of the index engine E21 is more than M storage layers than the structure of the index engine of the data table T1. The storage layer refers to a storage layer of a log structure merging tree, and M is an integer greater than or equal to 1. Wherein, M values can be preset. In one example, M is 1, 2, 3, or the like.
The control device 110 may also perform step 607, instructing the storage device 120 to store the data in the data table T1 according to the index engine E21.
Specifically, as shown in fig. 6, the control device 110 may perform step 6071 to send indication information to the storage device 120, wherein the indication information may include an identification of the index engine E21. The storage device 120 may perform step 6072, in response to the instruction information, store the data in the data table T1 according to the index engine E21.
When the operation magnification monitored in step 601 is the operation magnification of the primary index, the control device 110 instructs the storage device 120 to store the data in the whole data table T1 according to the index engine E21, or store the data in the data partition instance corresponding to the primary index. Specifically, the indication information may include the identifier of the data table T1 or the identifier of the data partition instance, while including the identifier of the index engine E21, so that the storage device 120 may store, according to the index engine E21, the data in the entire data table T1 or the data in the data partition instance corresponding to the primary index according to the identifier of the data table T1 or the identifier of the data partition instance.
When the operation magnification monitored in step 601 is the operation magnification of the local secondary index of the data table T1, then the control 110 instructs the storage device 120 to store the data under the local secondary index according to the index engine E21. Specifically, the indication information may further include an identifier of the local secondary index while including an identifier of the index engine E21, and thus the storage device 120 stores data under the local secondary index according to the index engine E21 according to the identifier of the local secondary index.
With continued reference to fig. 6, after step 607, the control device 110 may execute step 601 and step 603 again. Wherein the index engine of the adjustment data table T1 may be stopped when the operation magnification is not greater than the threshold y 1. When the operational magnification is greater than the threshold y1, step 605, step 607 may be performed again. Reference is specifically made to the above description, and no further description is given here.
Thus, through the iterative execution of steps 601-607, the index engine of the data table T1 may be dynamically adjusted, so that the index engine of the data table T1 matches the load characteristics of the data table T1 as much as possible, and the operational magnification is reduced.
According to the data storage scheme provided by the embodiment of the application, the dynamic change of the operation amplification of the data table can be perceived, and the index engine of the data table is dynamically adjusted according to the dynamic change of the operation amplification, so that the index engine is matched with the load characteristic of the data table, and the access performance of the data table can be improved.
In connection with the storage system 100 shown in fig. 2, the embodiments of the present application also provide a data storage scheme. Next, the example introduces this scheme.
As shown in fig. 7, the control device 110 may perform step 701 to monitor the read/write operation of the data table T1. Illustratively, step 601 may be performed periodically. The read-write operation comprises a read operation and a write operation. In step 701, the total number of read operations and the total number of write operations in one execution cycle may be monitored, to obtain a monitoring result of the execution cycle. I.e. the monitoring result comprises the total number of read operations and the total number of write operations of the data table T1 in the monitoring period. The execution period may be preset. In one example, the execution period of step 701 may be 1 hour. In another example, the execution period of step 701 may be two hours. Etc.
The control device 110 may execute step 703 to obtain the load characteristic E3 according to the monitoring result. Wherein when the ratio of the total number of read operations to the total number of write operations is greater than the threshold value A1, the read-intensive load is taken as the load characteristic E3. When the ratio of the total number of write operations to the total number of upper read operations is greater than the threshold value A2, the write-intensive load is taken as the load characteristic E3. The threshold A1 and the threshold A2 may be specifically described above, and will not be described herein.
Next, the control device 110 may execute step 705 to migrate to the index engine matching the load characteristic E3, and obtain the index engine E31.
When the load characteristic E3 is a read intensive load, the structure of the index engine E31 is N storage layers less than the current structure of the index engine of the data table T1. The storage layer refers to a storage layer of a log structure merging tree, and N is an integer greater than or equal to 1. Wherein, the value of N can be preset. In one example, N is 1, 2, 3, or the like.
When the load characteristic E3 is a write-intensive load, the structure of the index engine E31 is more than M storage layers than the structure of the index engine at present of the data table T1. The storage layer refers to a storage layer of a log structure merging tree, and M is an integer greater than or equal to 1. Wherein, the value of M can be preset. In one example, M is 1, 2, 3, or the like.
Then, the control device 110 may perform step 707 to instruct the storage device 120 to store the data in the data table T1 according to the index engine E31. Step 707 may include step 7071, sending indication information to a storage device. Step 707 may also include step 7072, storing the data in data table T1, according to indexing engine E31. The implementation of step 407 and steps 4071 and 4072 in fig. 4 may be referred to, and will not be described herein.
According to the data storage scheme provided by the embodiment of the application, the dynamic change of the load characteristic of the data table can be perceived, and the index engine of the data table is dynamically adjusted according to the dynamic change of the load characteristic, so that the index engine is matched with the load characteristic of the data table, and the access performance of the data table can be improved.
Based on the data storage scheme described above, the embodiments of the present application provide a data storage method. It will be appreciated that the method is combined with the data storage scheme described above and that specific execution of the relevant steps in the method may refer to execution of the corresponding steps in the data storage scheme.
The method is applied to a control device in a storage system (e.g., the control device 110 in the storage system 100), which further includes a storage device (e.g., the storage device 120 in the storage system 100) storing a data table of a user. As shown in fig. 8, the method includes the following steps.
Step 801 provides a configuration interface for the user to configure the load characteristics of the data table as either a read-intensive load or a write-intensive load. Reference is made in particular to the implementation of the description of step 401 in fig. 4 above.
803a, when the configuration interface instructs the user to configure the load characteristic of the data table to be a read intensive load, instructing the storage device to store the data in the data table according to a first index engine; the first index engine is matched with the read-intensive load. Reference is made in particular to the implementation of the description of steps 403-407 in fig. 4 above.
803b, when the configuration interface indicates that the user configures the load characteristic of the data table to be a write intensive load, indicating the storage device to store the data in the data table according to a second index engine; the second index engine is matched to the write-intensive load. Reference is made in particular to the implementation of the description of steps 403-407 in fig. 4 above.
In some embodiments, the first index engine includes at least a b+ tree structure and the second index engine includes at least a log structure merge tree structure.
In some embodiments, the index engine in the data table is the second index engine before the instructing the storage means to store the data in the data table in accordance with the first index engine; the instructing the storage device to store the data in the data table according to a first index engine includes: instructing the storage device to migrate an index engine in the data table from the second index engine to a third index engine, the structure of the third index engine being interposed between the structure of the first index engine and the structure of the second index engine; and then instructs the storage device to migrate the index engine of the data table from the third index engine to the first index engine. Reference is made in particular to the implementation of step 407 in fig. 4 described above.
In some embodiments, the data table has a local secondary index, the configuration interface is further configured to provide the user with configuring a load characteristic of the local secondary index as either a read-intensive load or a write-intensive load; when the configuration interface indicates that the user configures the load characteristic of the local secondary index to be a read intensive load, the storage device is instructed to store data under the local secondary index according to the first index engine; and when the configuration interface indicates that the user configures the load characteristic of the local secondary index to be a write-intensive load, the storage device is instructed to store data under the local secondary index according to a second index engine. Reference is made in particular to the implementation of step 407 in fig. 4 described above.
By the data storage method provided by the embodiment of the application, the user can configure the index engine of the data table, so that the user can adjust the index engine of the data table in time or at any time when the service served by the data table changes, and the index engine is matched with the load characteristic caused by the changed service, so that the access performance of the data table can be improved.
The embodiment of the application also provides a data storage method, which is applied to a control device in a storage system (for example, the control device 110 in the storage system 100), and the storage system further comprises a storage device (for example, the storage device 120 in the storage system 100), and the storage device stores a data table. As shown in fig. 9, the method includes the following steps.
In step 901, the control device monitors, in a case where the index engine of the data table is a fourth index engine, an operation magnification of the data table, where the operation magnification includes a read magnification of reading data from the data table or a write magnification of writing data to the data table. Reference is made in particular to the implementation described above for step 601 in fig. 6.
Step 903a, when the operation magnification includes the read magnification and the read magnification is greater than a first threshold, instructing the storage device to store the data in the data table according to a fifth index engine; wherein the fifth index engine matches the read-intensive load more than the fourth index engine matches the read-intensive load. Reference is made in particular to the implementation of the description of steps 603-607 in fig. 6 above.
Step 903b, when the operation magnification includes the write magnification and the write magnification is greater than a second threshold, instructing the storage device to store the data in the data table according to a sixth index engine; wherein the degree of matching of the sixth index engine to the write-intensive load is greater than the degree of matching of the fourth index engine to the write-intensive load. Reference is made in particular to the implementation of the description of steps 603-607 in fig. 6 above.
In some embodiments, the operational amplification includes both the read amplification and write amplification; when the operational magnification includes the sense magnification and the sense magnification is greater than a first threshold, instructing the storage device to store the data in the data table according to a fifth index engine, including: and when the read amplification is greater than the first threshold and the write amplification is less than a third threshold, the storage device is instructed to store the data in the data table according to a fifth index engine. Reference is made in particular to the implementation of the description of steps 603-607 in fig. 6 above.
In some embodiments, the operational amplification includes both the read amplification and write amplification; when the operational amplification includes the write amplification and the write amplification is greater than a second threshold, instructing the storage device to store the data in the data table according to a sixth index engine, comprising: and when the write amplification is greater than the second threshold and the read amplification is less than the fourth threshold, the storage device is instructed to store the data in the data table according to a sixth index engine. Reference is made in particular to the implementation of the description of steps 603-607 in fig. 6 above.
In some embodiments, the fifth index engine includes at least a b+ tree structure and the sixth index engine includes at least a log structure merge tree structure.
In some embodiments, the data table has a local secondary index LSI, the operation magnification being a magnification resulting from operating data under the local secondary index; the instructing the storage device to store the data in the data table according to a fifth index engine includes: instructing the storage device to store data under the local secondary index according to the fifth index engine; alternatively, the instructing the storage device to store the data in the data table according to a sixth index engine includes: and indicating the storage device to store the data under the local secondary index according to a sixth index engine. Reference is made in particular to the implementation described above for step 607 in fig. 6.
According to the data storage method provided by the embodiment of the invention, the dynamic change of the operation amplification of the data table can be perceived, and the index engine of the data table is dynamically adjusted according to the dynamic change of the operation amplification, so that the index engine is matched with the load characteristic of the data table, and the access performance of the data table can be improved.
The present embodiments provide a data storage device 1000. The apparatus 1000 may be configured as a control device in a storage system further comprising a storage device storing a data table of a user. As shown in fig. 10, the apparatus 1000 includes:
a providing module 1010, configured to provide a configuration interface, where the configuration interface is configured to enable the user to configure the load characteristic of the data table as a read-intensive load or a write-intensive load;
an instruction module 1020, configured to instruct the storage device to store data in the data table according to a first index engine when the configuration interface instructs the user to configure the load characteristic of the data table to be a read intensive load; the first index engine is matched with the read-intensive load;
the indicating module 1020 is further configured to instruct the storage device to store data in the data table according to a second index engine when the configuration interface indicates that the user configures the load characteristic of the data table to be a write-intensive load; the second index engine is matched to the write-intensive load.
Wherein, the providing module 1010 and the indicating module 1020 may be implemented by software or may be implemented by hardware. Illustratively, an implementation of the provisioning module 1010 is described next with reference to the provisioning module 1010. Similarly, the implementation of the indication module 1020 may refer to the implementation of the provision module 1010.
Module as an example of a software functional unit, the provisioning module 1010 may include code running on a computing instance. The computing instance may include at least one of a physical host (computing device), a virtual machine, and a container, among others. Further, the above-described computing examples may be one or more. For example, the provisioning module 1010 may include code running on multiple hosts/virtual machines/containers. It should be noted that, multiple hosts/virtual machines/containers for running the code may be distributed in the same region (region), or may be distributed in different regions. Further, multiple hosts/virtual machines/containers for running the code may be distributed in the same availability zone (availability zone, AZ) or may be distributed in different AZs, each AZ comprising a data center or multiple geographically close data centers. Wherein typically a region may comprise a plurality of AZs.
Also, multiple hosts/virtual machines/containers for running the code may be distributed in the same virtual private cloud (virtual private cloud, VPC) or in multiple VPCs. In general, one VPC is disposed in one region, and a communication gateway is disposed in each VPC for implementing inter-connection between VPCs in the same region and between VPCs in different regions.
Module as an example of a hardware functional unit, the providing module 1010 may include at least one computing device, such as a server or the like. Alternatively, the provisioning module 1010 may be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (programmable logic device, PLD), or the like. The PLD may be implemented as a complex program logic device (complex programmable logicaldevice, CPLD), a field-programmable gate array (FPGA), a general-purpose array logic (generic array logic, GAL), or any combination thereof.
The multiple computing devices included in the provisioning module 1010 may be distributed in the same region or may be distributed in different regions. The plurality of computing devices included in the provisioning module 1010 may be distributed among the same AZ or may be distributed among different AZ. Likewise, multiple computing devices included in provisioning module 1010 may be distributed in the same VPC or may be distributed among multiple VPCs. Wherein the plurality of computing devices may be any combination of computing devices such as servers, ASIC, PLD, CPLD, FPGA, and GAL.
It should be noted that, in other embodiments, the providing module 1010 may be configured to perform any step in the method shown in fig. 8, and the indicating module 1020 may be configured to perform any step in the method shown in fig. 8. The steps that the providing module 1010 and the indicating module 1020 are responsible for implementing may be specified as desired, and the providing module 1010 and the indicating module 1020 implement different steps in the method shown in fig. 8 to implement the overall functions of the data storage device 1000.
The embodiment of the application also provides a data storage device 1100. The apparatus 1100 may be configured as a control device in a storage system that also includes a storage device that stores a data table. As shown in fig. 11, the apparatus 1100 includes:
a monitoring module 1110, configured to monitor, by the control device, an operation amplification of the data table, where the index engine of the data table is a fourth index engine, where the operation amplification includes a read amplification for reading data from the data table or a write amplification for writing data to the data table;
an indication module 1120, configured to instruct the storage device to store the data in the data table according to a fifth index engine when the operation magnification includes the read magnification and the read magnification is greater than a first threshold; wherein the degree of matching of the fifth index engine with the read-intensive load is greater than the degree of matching of the fourth index engine with the read-intensive load;
the indicating module 1120 is further configured to instruct the storage device to store the data in the data table according to a sixth index engine when the operation magnification includes the write magnification and the write magnification is greater than a second threshold; wherein the degree of matching of the sixth index engine to the write-intensive load is greater than the degree of matching of the fourth index engine to the write-intensive load.
Wherein, the monitoring module 1110 and the indicating module 1120 may be implemented by software or may be implemented by hardware. The implementation manner of the monitoring module 1110 and the indicating module 1120 may refer to the implementation manner of the providing module 1010, and is specifically described above, and will not be described herein.
It should be noted that, in other embodiments, the monitoring module 1110 may be used to perform any step in the method shown in fig. 9, and the indicating module 1120 may be used to perform any step in the method shown in fig. 9. The steps that the monitoring module 1110 and the indicating module 1120 are responsible for implementing may be specified as needed, and the monitoring module 1110 and the indicating module 1120 implement different steps in the method shown in fig. 9 to implement the overall functions of the data storage device 1100.
The present application also provides a computing device 1200. As shown in fig. 12, a computing device 1200 includes: a bus 1202, a processor 1204, a memory 1206, and a communication interface 1208. The processor 1204, the memory 1206, and the communication interface 1208 communicate via the bus 1202. Computing device 1200 may be a server or a terminal device. It should be understood that the present application is not limited to the number of processors, memories in computing device 1200.
The bus 1202 may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one line is shown in fig. 12, but not only one bus or one type of bus. The bus 1202 may include a path for transferring information between various components of the computing device 1200 (e.g., the memory 1206, the processor 1204, the communication interface 1208).
The processor 1204 may include any one or more of a central processing unit (central processing unit, CPU), a graphics processor (graphics processing unit, GPU), a Microprocessor (MP), or a digital signal processor (digital signal processor, DSP).
The memory 1206 may include volatile memory (RAM), such as random access memory (random access memory). The memory 1206 may also include a non-volatile memory (ROM), such as a read-only memory (ROM), a flash memory, a mechanical hard disk (HDD), or a solid state disk (solid state drive, SSD).
The memory 1206 has stored therein executable program code that is executed by the processor 1204 to implement the functions of the aforementioned providing module 1010 and the indicating module 1020, respectively, to implement the method illustrated in fig. 8. That is, the memory 1206 has instructions stored thereon for performing the method of FIG. 8.
Communication interface 1208 enables communication between computing device 1200 and other devices or communication networks using a transceiver module such as, but not limited to, a network interface card, transceiver, or the like.
The embodiment of the application also provides a computing device cluster. The cluster of computing devices includes at least one computing device. The computing device may be a server, such as a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device may also be a terminal device such as a desktop, notebook, or smart phone.
As shown in fig. 13, the cluster of computing devices includes at least one computing device 1200. The same instructions for performing the method shown in fig. 8 may be stored in memory 1206 in one or more computing devices 1200 in a cluster of computing devices.
In some possible implementations, part of the instructions for performing the method shown in fig. 8 may also be stored in the memory 1206 of one or more computing devices 1200 in the cluster of computing devices, respectively. In other words, a combination of one or more computing devices 1200 may collectively execute instructions for performing the method shown in FIG. 8.
It should be noted that, the memory 1206 in different computing devices 1200 in the computing device cluster may store different instructions for performing part of the functions of the apparatus 1000. That is, the instructions stored in the memory 1206 in the different computing devices 1200 may implement the functionality of one or more of the providing module 1010, the indicating module 1020.
In some possible implementations, one or more computing devices in a cluster of computing devices may be connected through a network. Wherein the network may be a wide area network or a local area network, etc. Fig. 14 shows one possible implementation. As shown in fig. 14, two computing devices 1200A and 1200B are connected by a network. Specifically, the connection to the network is made through a communication interface in each computing device. In this type of possible implementation, instructions to perform the functions of the providing module 1010 are stored in a memory 1206 in the computing device 1200A. Meanwhile, instructions for performing the functions of instruction module 1020 are stored in memory 1206 in computing device 1200B.
It should be appreciated that the functionality of computing device 1200A shown in fig. 14 may also be performed by multiple computing devices 1200. Likewise, the functionality of computing device 1200B may also be performed by multiple computing devices 1200.
The embodiment of the application also provides another computing device cluster. The connection between computing devices in the computing device cluster may be similar to the connection of the computing device cluster described with reference to fig. 13 and 14. In contrast, the memory 1206 in one or more computing devices 1200 in the cluster of computing devices may have stored therein the same instructions for performing the method of FIG. 8.
In some possible implementations, part of the instructions for performing the method shown in fig. 8 may also be stored in the memory 1206 of one or more computing devices 1200 in the cluster of computing devices, respectively. In other words, a combination of one or more computing devices 1200 may collectively execute instructions for performing the method shown in FIG. 8.
The present application also provides a computing device 1500. As shown in fig. 15, the computing device 1500 includes: bus 1502, processor 1504, memory 1506, and communication interface 1508. The processor 1504, memory 1506, and communication interface 1508 communicate via bus 1502. The computing device 1500 may be a server or a terminal device. It should be understood that the present application is not limited to the number of processors, memories in computing device 1500.
Wherein the bus 1502, the processor 1504, the memory 1506, and the communication interface 1508 may be implemented with reference to the bus 1202, the processor 1204, the memory 1206, and the communication interface 1208, respectively.
The memory 1506 has stored therein executable program code that the processor 1504 executes to implement the functions of the aforementioned monitoring module 1110 and the indication module 1120, respectively, to implement the method illustrated in fig. 9. That is, the memory 1506 has instructions stored thereon for performing the method of FIG. 9.
Communication interface 1508 enables communication between computing device 1500 and other devices or communication networks using a transceiver module such as, but not limited to, a network interface card, transceiver, or the like.
The embodiment of the application also provides a computing device cluster. The cluster of computing devices includes at least one computing device. The computing device may be a server, such as a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device may also be a terminal device such as a desktop, notebook, or smart phone.
As shown in fig. 16, the cluster of computing devices includes at least one computing device 1500. The same instructions for performing the method shown in fig. 9 may be stored in memory 1506 of one or more computing devices 1500 in a cluster of computing devices.
In some possible implementations, portions of the instructions for performing the method shown in fig. 9 may also be stored in the memory 1506 of one or more computing devices 1500 in the computing device cluster, respectively. In other words, a combination of one or more computing devices 1500 may collectively execute instructions for performing the method shown in FIG. 9.
It should be noted that, the memories 1506 in different computing devices 1500 in the computing device cluster may store different instructions for performing part of the functions of the apparatus 1100. That is, the instructions stored by the memory 1506 in the different computing devices 1500 may implement the functionality of one or more of the monitoring module 1110 and the indication module 1120.
In some possible implementations, one or more computing devices in a cluster of computing devices may be connected through a network. Wherein the network may be a wide area network or a local area network, etc. Fig. 17 shows one possible implementation. As shown in fig. 17, two computing devices 1500A and 1500B are connected by a network. Specifically, the connection to the network is made through a communication interface in each computing device. In this type of possible implementation, instructions to perform the functions of the monitoring module 1110 are stored in the memory 1506 of the computing device 1500A. Meanwhile, instructions to perform the functions of the instruction module 1120 are stored in the memory 1506 in the computing device 1500B.
It should be appreciated that the functionality of computing device 1500A shown in fig. 17 may also be performed by multiple computing devices 1500. Likewise, the functionality of computing device 1500B may also be performed by multiple computing devices 1500.
The embodiment of the application also provides another computing device cluster. The connection between computing devices in the computing device cluster may be similar to the connection of the computing device cluster described with reference to fig. 16 and 17. In contrast, the same instructions for performing the method of FIG. 9 may be stored in memory 1506 of one or more computing devices 1500 in the computing device cluster.
In some possible implementations, portions of the instructions for performing the method shown in fig. 9 may also be stored in the memory 1506 of one or more computing devices 1500 in the computing device cluster, respectively. In other words, a combination of one or more computing devices 1500 may collectively execute instructions for performing the method shown in FIG. 9.
Embodiments of the present application also provide a computer program product comprising instructions. The computer program product may be software or a program product containing instructions capable of running on a computing device or stored in any useful medium. The computer program product, when run on at least one computing device, causes the at least one computing device to perform the method of fig. 8.
Embodiments of the present application also provide a computer program product comprising instructions. The computer program product may be software or a program product containing instructions capable of running on a computing device or stored in any useful medium. The computer program product, when run on at least one computing device, causes the at least one computing device to perform the method of fig. 9.
Embodiments of the present application also provide a computer-readable storage medium. The computer readable storage medium may be any available medium that can be stored by a computing device or a data storage device such as a data center containing one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), etc. The computer-readable storage medium includes instructions that instruct a computing device to perform the method shown in fig. 8.
Embodiments of the present application also provide a computer-readable storage medium. The computer readable storage medium may be any available medium that can be stored by a computing device or a data storage device such as a data center containing one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), etc. The computer-readable storage medium includes instructions that instruct a computing device to perform the method shown in fig. 9.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; these modifications or substitutions do not depart from the essence of the corresponding technical solutions from the protection scope of the technical solutions of the embodiments of the present invention.

Claims (21)

1. A data storage method, characterized in that the data storage method is applied to a control device in a storage system, the storage system further comprises a storage device, and the storage device stores a data table of a user; the method comprises the following steps:
providing a configuration interface for the user to configure the load characteristic of the data table as a read-intensive load or a write-intensive load;
when the configuration interface indicates that the user configures the load characteristic of the data table to be a read intensive load, the storage device is instructed to store data in the data table according to a first index engine; the first index engine is matched with the read-intensive load;
when the configuration interface indicates that the user configures the load characteristic of the data table to be a write-intensive load, the storage device is instructed to store data in the data table according to a second index engine; the second index engine is matched to the write-intensive load.
2. The method of claim 1, wherein the first index engine comprises at least a b+ tree structure and the second index engine comprises at least a log structured merge tree structure.
3. The method of claim 1 or 2, wherein the index engine in the data table is the second index engine before the instructing the storage device to store the data in the data table in accordance with a first index engine;
The instructing the storage device to store the data in the data table according to a first index engine includes:
instructing the storage device to migrate an index engine in the data table from the second index engine to a third index engine, the structure of the third index engine being interposed between the structure of the first index engine and the structure of the second index engine;
and then instructs the storage device to migrate the index engine of the data table from the third index engine to the first index engine.
4. A method according to any of claims 1-3, wherein the data table has a local secondary index, the configuration interface being further for the user to configure the load characteristic of the local secondary index as either a read-intensive load or a write-intensive load;
when the configuration interface indicates that the user configures the load characteristic of the local secondary index to be a read intensive load, the storage device is instructed to store data under the local secondary index according to the first index engine;
and when the configuration interface indicates that the user configures the load characteristic of the local secondary index to be a write-intensive load, the storage device is instructed to store data under the local secondary index according to a second index engine.
5. The data storage method is characterized by being applied to a control device in a storage system, wherein the storage system further comprises a storage device, and the storage device stores a data table; the method comprises the following steps:
the control device monitors operation amplification of the data table under the condition that an index engine of the data table is a fourth index engine, wherein the operation amplification comprises reading amplification of data from the data table or writing amplification of data to the data table;
when the operation amplification comprises the reading amplification and the reading amplification is larger than a first threshold value, the storage device is instructed to store the data in the data table according to a fifth index engine; wherein the degree of matching of the fifth index engine with the read-intensive load is greater than the degree of matching of the fourth index engine with the read-intensive load;
when the operation amplification comprises the write amplification and the write amplification is larger than a second threshold value, the storage device is instructed to store the data in the data table according to a sixth index engine; wherein the degree of matching of the sixth index engine to the write-intensive load is greater than the degree of matching of the fourth index engine to the write-intensive load.
6. The method of claim 5, wherein the operational amplification includes both the read amplification and write amplification;
when the operational magnification includes the sense magnification and the sense magnification is greater than a first threshold, instructing the storage device to store the data in the data table according to a fifth index engine, including:
and when the read amplification is greater than the first threshold and the write amplification is less than a third threshold, the storage device is instructed to store the data in the data table according to a fifth index engine.
7. The method of claim 5 or 6, wherein the operational amplification comprises both the read amplification and write amplification;
when the operational amplification includes the write amplification and the write amplification is greater than a second threshold, instructing the storage device to store the data in the data table according to a sixth index engine, comprising:
and when the write amplification is greater than the second threshold and the read amplification is less than the fourth threshold, the storage device is instructed to store the data in the data table according to a sixth index engine.
8. The method of any of claims 5-7, wherein the fifth index engine comprises at least a b+ tree structure and the sixth index engine comprises at least a log structured merge tree structure.
9. The method according to any one of claims 5 to 8, wherein the data table has a local secondary index LSI, and the operation magnification is a magnification generated by operating data under the local secondary index;
the instructing the storage device to store the data in the data table according to a fifth index engine includes: instructing the storage device to store data under the local secondary index according to the fifth index engine; or,
the instructing the storage device to store the data in the data table according to a sixth index engine includes: and indicating the storage device to store the data under the local secondary index according to a sixth index engine.
10. A data storage device, characterized by a control device configured in a storage system, the storage system further comprising a storage device storing a user's data table; the data storage device includes:
the system comprises a providing module, a configuration interface and a control module, wherein the providing module is used for providing a configuration interface, and the configuration interface is used for enabling the user to configure the load characteristic of the data table to be a read-intensive load or a write-intensive load;
the indicating module is used for indicating the storage device to store data in the data table according to a first index engine when the configuration interface indicates that the user configures the load characteristic of the data table to be a read intensive load; the first index engine is matched with the read-intensive load;
The indicating module is further configured to instruct the storage device to store data in the data table according to a second index engine when the configuration interface indicates that the user configures the load characteristic of the data table to be a write intensive load; the second index engine is matched to the write-intensive load.
11. The data storage device of claim 10, wherein the first index engine comprises at least a b+ tree structure and the second index engine comprises at least a log structured merge tree structure.
12. The data storage device of claim 10 or 11, wherein the index engine in the data table is the second index engine before the storage device is instructed to store data in the data table in accordance with a first index engine; the indication module is further configured to:
instructing the storage device to migrate an index engine in the data table from the second index engine to a third index engine, the structure of the third index engine being interposed between the structure of the first index engine and the structure of the second index engine;
and then instructs the storage device to migrate the index engine of the data table from the third index engine to the first index engine.
13. The data storage device of any of claims 10-12, wherein the data table has a local secondary index, the configuration interface further configured for the user to configure a load characteristic of the local secondary index as either a read-intensive load or a write-intensive load; the indication module is further configured to:
when the configuration interface indicates that the user configures the load characteristic of the local secondary index to be a read intensive load, the storage device is instructed to store data under the local secondary index according to the first index engine;
and when the configuration interface indicates that the user configures the load characteristic of the local secondary index to be a write-intensive load, the storage device is instructed to store data under the local secondary index according to a second index engine.
14. A data storage device, characterized by a control device configured in a storage system, the storage system further comprising a storage device storing a data table; the data storage device includes:
the monitoring module is used for monitoring the operation amplification of the data table when the index engine of the data table is a fourth index engine, wherein the operation amplification comprises the reading amplification of reading data from the data table or the writing amplification of writing data into the data table;
An instruction module, configured to instruct the storage device to store data in the data table according to a fifth index engine when the operation magnification includes the read magnification and the read magnification is greater than a first threshold; wherein the degree of matching of the fifth index engine with the read-intensive load is greater than the degree of matching of the fourth index engine with the read-intensive load;
the indicating module is further configured to instruct the storage device to store the data in the data table according to a sixth index engine when the operation magnification includes the write magnification and the write magnification is greater than a second threshold; wherein the degree of matching of the sixth index engine to the write-intensive load is greater than the degree of matching of the fourth index engine to the write-intensive load.
15. The data storage device of claim 14, wherein the operational amplification includes both the read amplification and write amplification;
the indication module is used for: and when the read amplification is greater than the first threshold and the write amplification is less than a third threshold, the storage device is instructed to store the data in the data table according to a fifth index engine.
16. The data storage device of claim 14 or 15, wherein the operational amplification includes both the read amplification and write amplification;
The indication module is used for: and when the write amplification is greater than the second threshold and the read amplification is less than the fourth threshold, the storage device is instructed to store the data in the data table according to a sixth index engine.
17. The data storage device of any of claims 14-16, wherein the fifth index engine comprises at least a b+ tree structure and the sixth index engine comprises at least a log structured merge tree structure.
18. The data storage device according to any one of claims 14 to 17, wherein the data table has a local secondary index LSI, the operation magnification being magnification generated by operating data under the local secondary index;
the indication module is used for:
instructing the storage device to store data under the local secondary index according to the fifth index engine; or,
and indicating the storage device to store the data under the local secondary index according to a sixth index engine.
19. A cluster of computing devices, comprising at least one computing device, each computing device comprising a processor and a memory;
the processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device to cause the cluster of computing devices to perform the method of any of claims 1-4 or the method of any of claims 5-9.
20. A computer program product containing instructions that, when executed by a cluster of computing devices, cause the cluster of computing devices to perform the method of any of claims 1-4 or the method of any of claims 5-9.
21. A computer readable storage medium comprising computer program instructions which, when executed by a cluster of computing devices, perform the method of any of claims 1-4 or the method of any of claims 5-9.
CN202211202134.2A 2022-09-29 2022-09-29 Data storage method and device Pending CN117827818A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211202134.2A CN117827818A (en) 2022-09-29 2022-09-29 Data storage method and device
PCT/CN2023/104709 WO2024066597A1 (en) 2022-09-29 2023-06-30 Data storage method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211202134.2A CN117827818A (en) 2022-09-29 2022-09-29 Data storage method and device

Publications (1)

Publication Number Publication Date
CN117827818A true CN117827818A (en) 2024-04-05

Family

ID=90475955

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211202134.2A Pending CN117827818A (en) 2022-09-29 2022-09-29 Data storage method and device

Country Status (2)

Country Link
CN (1) CN117827818A (en)
WO (1) WO2024066597A1 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150324135A1 (en) * 2014-05-06 2015-11-12 Netapp, Inc. Automatic storage system configuration based on workload monitoring
CN109800185B (en) * 2018-12-29 2023-10-20 上海霄云信息科技有限公司 Data caching method in data storage system
CN111475507B (en) * 2020-03-31 2022-06-21 浙江大学 Key value data indexing method for workload adaptive single-layer LSMT
CN112579287A (en) * 2020-12-16 2021-03-30 跬云(上海)信息科技有限公司 Cloud arrangement system and method based on read-write separation and automatic expansion
CN114896250B (en) * 2022-05-19 2023-02-03 中国地质大学(北京) Key value separated key value storage engine index optimization method and device

Also Published As

Publication number Publication date
WO2024066597A1 (en) 2024-04-04

Similar Documents

Publication Publication Date Title
US11797498B2 (en) Systems and methods of database tenant migration
US11237746B2 (en) Storage system and node management method
US7243089B2 (en) System, method, and service for federating and optionally migrating a local file system into a distributed file system while preserving local access to existing data
US7818515B1 (en) System and method for enforcing device grouping rules for storage virtualization
US7076622B2 (en) System and method for detecting and sharing common blocks in an object storage system
US7627699B1 (en) System and method for managing I/O access policies in a storage environment employing asymmetric distributed block virtualization
US9015519B2 (en) Method and system for cluster wide adaptive I/O scheduling by a multipathing driver
KR102288503B1 (en) Apparatus and method for managing integrated storage
US11294931B1 (en) Creating replicas from across storage groups of a time series database
US10725971B2 (en) Consistent hashing configurations supporting multi-site replication
JP2015153123A (en) Access control program, access control method, and access control device
US11232000B1 (en) Moving database partitions from replica nodes
US7293191B1 (en) System and method for managing I/O errors in a storage environment employing asymmetric distributed block virtualization
US20230161513A1 (en) Methods and systems for efficient metadata management
CN117827818A (en) Data storage method and device
US20200081869A1 (en) File storage method and storage apparatus
US11301417B1 (en) Stub file selection and migration
US10712959B2 (en) Method, device and computer program product for storing data
CN113094354A (en) Database architecture method and device, database all-in-one machine and storage medium
US20190121899A1 (en) Apparatus and method for managing integrated storage
US11620194B1 (en) Managing failover between data streams
US11853317B1 (en) Creating replicas using queries to a time series database
US11789825B2 (en) Hashing information of an input/output (I/O) request against a plurality of gateway nodes
WO2024040902A1 (en) Data access method, distributed database system and computing device cluster
CN113225362A (en) Server cluster system and implementation method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication