CN114297243A

CN114297243A - Remote storage service local cache management method for cloud database

Info

Publication number: CN114297243A
Application number: CN202111678121.8A
Authority: CN
Inventors: 赵伟; 寇韦韦
Original assignee: Tianjin Nankai University General Data Technologies Co ltd
Current assignee: Tianjin Nankai University General Data Technologies Co ltd
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-04-08

Abstract

The invention provides a remote storage service local cache management method for a cloud database, wherein a proxy server receives all requests; merging and optimizing the requests; the cache management server searches a local cache according to the combined and optimized requests; if the data of the request exists in the local cache, packaging and feeding back the requested information to the requester; if the data of the request does not exist in the local cache, the path of the request is fed back to the requester, the requester directly searches for relevant information in the object server, and the object server feeds back the relevant information of the request to the cache management server. The management method of the invention realizes the rapid capacity expansion or capacity reduction operation of the cluster. After the buffer data is obtained, a metadata management mechanism can be adopted, the buffer data file and the buffer state are recorded in a related mode, the data access is convenient, the position of reading the data and the position of writing the data are controlled and recorded, and the data reading and writing efficiency is improved.

Description

Remote storage service local cache management method for cloud database

Technical Field

The invention belongs to the technical field of remote storage local cache management of cloud databases, and particularly relates to a remote storage local cache management method for a cloud database.

Background

Through the separated storage resources and the calculation resources of the cloud database, the resource specification and the capacity of storage and calculation can be independently planned. Therefore, the capacity expansion, the capacity reduction and the release of the computing resources can be completed quickly, and extra data relocation cost is avoided. The storage and calculation can be combined with the respective characteristics better, and the resource specification and design more suitable for the user can be selected. And the cloud data can well meet the customization requirements of users on the database, so that the cloud database is more and more widely applied, each cloud provider can freely design products according to respective conditions, the cloud provides certain services for the users, and the cloud model can reduce the cost of the users because the users can operate without purchasing software and hardware, and the service providers provide necessary components for the users. However, since the management node and the storage node of the cloud database are usually not deployed on one node, when a user sends a request to the management node and the request is sent to the storage node by the management node, an obtained return result is sent through a network, and the request efficiency is affected by conditions such as network delay, network packet loss, service concurrency problems, and the like. Therefore, three roles of remote storage local cache, cache management and remote agent management can be adopted for management, and the data reading efficiency is improved.

Disclosure of Invention

In view of the above, the present invention is directed to a method for managing a local cache of a remote storage service for a cloud database, so as to solve the problem of storage separation.

In order to achieve the purpose, the technical scheme of the invention is realized as follows:

a local cache management method of a remote storage service for a cloud database comprises the following steps:

s1, all requesters issue request commands to the object server through the client, and the proxy server receives all requests;

s2, the proxy server merges and optimizes the requests according to the positions of all the requests;

s3, sending the merged and optimized requests to a cache management server, and searching the local cache by the cache management server according to the merged and optimized requests;

s4, if the data of the request exists in the local cache, packaging and feeding back the information of the request to the requester;

and S5, if the data of the request does not exist in the local cache, feeding back the path of the request to the requester, directly searching the relevant information in the object server by the requester, feeding back the relevant information of the request to the cache management server by the object server, feeding back the relevant information to the requester by the cache management server, and storing the relevant information in the local cache by the cache management server.

Further, in step S1, the proxy server hands the accepted request to the host for execution, and the host uses lru mechanisms for swapping in and out of the memory and refreshing and eliminating the cache overflow for the history and new requests during execution.

Further, the proxy server adopts a write priority algorithm to sequence all the requests, so that the task priority of write operation is improved, and the request priority of other read data is reduced.

Further, the proxy server preferentially searches the requested data in the local memory, and manages, controls and records the position of the read data and the position of the written data.

Further, the proxy server rapidly resolves the address connected in the request, locates the specific position of the object, and resolves to: website location, file location, block location.

Further, the process of optimizing the request merging by the proxy server in step S2 is as follows:

the merging process is to merge the addresses of the same object server which are requested to be accessed into a request;

the optimization process is to filter and delete the requests which are too long in request time or invalid.

Further, in the step S3, the local cache is stored in the form of a disk in blocks, and records the buffer status of the metadata record, where each block of record is attached with an identifier.

Furthermore, when the size of the local cache and the size of the object memory are stored, if the data file reaches a set threshold value, the data file is written into the next file.

Compared with the prior art, the remote storage service local cache management method for the cloud database has the following advantages:

(1) according to the local cache management method for the remote storage service of the cloud database, a high-performance solid state disk can be used as a local cache by establishing a local cache role, the access to data is accelerated, the cache memory stores and stores data, the data is managed in a block form, the metadata is recorded and the buffer state is recorded, and under the condition of remotely reading the data, the data modification can be carried out in an apendonly additional mode without covering, so that the random operation of the buffer and the operation of reducing the efficiency are reduced.

(2) According to the local cache management method for the remote storage service of the cloud database, disclosed by the invention, after the cache data is obtained, a metadata management mechanism is adopted, so that the use of data access is facilitated; the cache management server can manage the metadata, record the storage information of the metadata, and ensure the accuracy of metadata recording by using a form of supporting a transaction kv storage database. In the process of executing the remote interactive request, the nodes can generate a plurality of data reading requests, and the problems of synchronization, remote pressure, scheduling and the like are solved by using the proxy management server proxy master.

(3) According to the local cache management method for the remote storage service of the cloud database, the proxy management server can optimize and combine a plurality of requests, finds out unnecessary requests in advance, reduces interference operation on a remote machine, is easy to realize in synchronization, introduces a proxy mechanism, sends real-time data requests to a host for execution, performs centralized management and facilitates management of metadata, and uses lru memory swapping-in and swapping-out and buffer overflow to refresh and eliminate mechanisms again.

(4) According to the local cache management method for the remote storage service of the cloud database, the proxy management server uses the write priority algorithm to enable the task with the write transaction to be executed first, so that the weight of the write task is higher. The requested data is preferably searched in the storage cache, and the positions of the read data and the positions of the write data are controlled and recorded, so that the data reading and writing efficiency and the manual operation inspection efficiency are improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:

fig. 1 is a frame diagram of a local cache management method for a remote storage service of a cloud database according to an embodiment of the present invention;

fig. 2 is a frame diagram of a local cache server according to an embodiment of the present invention;

FIG. 3 is a block diagram of a cache server according to an embodiment of the present invention;

fig. 4 is a diagram of a proxy server framework according to an embodiment of the present invention.

Detailed Description

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.

In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used only for convenience in describing the present invention and for simplicity in description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention. Furthermore, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," etc. may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless otherwise specified.

In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art through specific situations.

The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.

As shown in fig. 1 to 4, a local cache management method for a remote storage service of a cloud database includes the following steps:

the client establishes connection with the server:

the first step is as follows: the client establishes a link with an oss server (object server) through an api (cloud vendor interface) provided by a cloud vendor

The second step is that: after the security authentication (cloud user has own authority), the data is read by indicating the form of bucket, folder and file (which is the interaction with oss)

The third step: the data demander establishes connection with the data master, the required data file and the data are shifted, the data size is sent to the data master, the subsequent master merges similar requests to the maximum extent according to the data requests, namely after the same packet, folder and file, reasonable concurrent data reading is adopted, the data are returned to the data demander, and then the data are landed as local buffer, and the oss obtaining is not carried out on the premise that the data are not changed next time.

The request processing process comprises the following steps:

the first step is as follows: the requester sends a request to the client, the proxy server receives all the requests, performs operations of IO merging, IO optimization and removal of useless IO, reduces access pressure on the object server, manages IO by adopting a kv database mode for proxy role management, and can perform IO processing operations (processing a certain number of IO at a time) by using a batch mode, and specific steps can refer to the proxy role function description in fig. 4. The proxy server may also pinpoint the requesting oss (object server), e.g., may be split into several parts: url (specific website location), bucket (mart) i.e. folder location, block (data block), etc.

The second step is that: the cache master (cache management role) processes the merged and optimized IO, and manages by adopting a kv database, namely, each piece of data is stored corresponding to a key value, and corresponding address information, data offset and the like are recorded in value storage, namely, data is managed in a mode of buffering map. For the data which can be searched in the local cache in the request, the information such as the file position and the like can be packaged to the requester, and the requester can search in the corresponding local cache. If the content of the request is not in the local cache, the requester is informed of the positioning of a specific object server, the requester directly finds related data in the object server and returns the related data to the requester, and meanwhile, the returned data is informed to a cache master (cache management role), the cache master (cache management role) stores the returned data in the local cache, and meanwhile, the data mark state is updated, so that the data can be conveniently found in the local cache when the same data is found next time, and the data access efficiency is improved. The details are shown in fig. 2 and 3 and explained in fig. 2 and 3.

The third step: for the storage in which data obtained from the object server by the requestor is put into the local cache, the storage mode should be consistent with that of the remote object server, that is, the storage mode is a one-to-one mapping relationship, for example, the object server is stored in the form of pid (process number) -folder-file, then the local cache may use the form of block (data block) corresponding folder, for example, the form of block1, block2, block3, which is convenient for data management and improves the efficiency of searching data. The specific content is shown in fig. 2 and explained in fig. 2.

As shown in FIG. 2, the local storage cache role (local cache)

After the local cache agent filters the valid IO, the data to be accessed is locally stored according to the request, and the size of the locally stored file is consistent with that of the remote object memory, so that a requester can conveniently read the data. The remote object server stores in the form of pid-file folder-file, if f1 includes f1_0, f1_1, f1_2 (each file size is 16M), then the local storage is block1 for f1, block2 for f2, and block3 for f 3.

By establishing a local buffer role, using a high-performance ssd (solid state disk) as a local buffer to accelerate data access, storing and storing data in a storage cache (cache memory), wherein the management mode adopts a block management mode to record a metadata recording buffer state, and when the data is read from a remote place, the data modification can use an apend only mode without coverage, so that the random operation of the buffer and the operation of reducing the efficiency are reduced. And storing the data into a data file according to blocks, wherein the data file meets the size of a set threshold and is written into the next file.

The local data file should be consistent with the object server storage, for example: the size is 16M in units, and the local storage and the object server storage should be in a one-to-one mapping relationship. The object server storage is in the form of folders and files, and one folder is a storage corresponding to a plurality of files (one folder is a storage for executing an operation at a time). The local cache takes block as a storage unit, namely 1 block corresponds to 1 file on the object storage.

The data file is stored in a memory segment (bucket), analogizing with a hard disk: the object is a file, the storage segment is a folder, the object and the storage segment can be searched through uniform resource identifiers, the object and the storage segment can only be accessed by a creator in a default mode, and other visitors can access in a coarse-grained mode and a fine-grained mode through an authorization mode;

the coarse and fine particle sizes are: access control can be divided into coarse-grained and fine-grained according to the thickness degree of a control object, and a certain layer for defining access to the whole database table or a view derived from a basic table is generally called coarse-grained access control, while fine-grained control is to refine security control to the row level or the column level of the database.

As shown in FIG. 3, the cache master role (cache management server)

And (3) the cache master (cache management role) searches the request of the first step in the kv database and searches in the bit map (local kv database key value pair map), and if the data to be searched already exists, the requested data is directly returned to the requester. Because the specific position and offset of the file can be recorded in the bit map, the cache master packs the query result, namely the position of the returned file, to the requester, and informs the requester of the position from which the corresponding data is obtained, namely the corresponding uri is provided to the requester.

If the object storage server does not exist, the cache master (cache management role) returns to the requester to the position (uri) of the object storage server, the requester searches the corresponding position of the provided object storage, the corresponding data is cached in the cache map after being searched, the related information is recorded in the kv database again, and the marking state is finished.

The role of the cache master can manage the metadata, record the state of the buffer block and record the storage information of the metadata, and ensure the accuracy of the metadata record by using a form supporting a transaction kv storage database. Centralized management and convenient management of metadata.

The kv storage database is a key value database, which is a non-relational database and stores data using a simple key value method; the key-value database stores data as a set of key-value pairs, with keys as unique identifiers; keys and values can be anything from simple objects to complex compound objects.

Key-value stores are highly partitionable and allow horizontal expansion at scales that cannot be achieved with other types of databases.

The K-V database provides complete ACID characteristics, and ensures the security of metadata. On top of the K-V database, a stateless service node layer may be built to accept access requests from the compute layer and system metadata. Above the service node layer is a scheduling layer with load balancing, ensuring the high availability of the stateless service layer. High availability of data is guaranteed by the underlying K-V database. This type of database mainly uses a hash table, which has a specific key and a pointer pointing to specific data. The Key/value model has the advantages of simplicity and easy deployment for IT systems.

As shown in FIG. 4, the proxy master role (proxy server)

The first step is as follows: the agent role processes all the IOs of the requesters, requests are classified, the requests are merged when overlapping is found, the processed contents comprise IO merging (IO merging), so-called IO merging, for example, a plurality of requesters require to access the same object storage address, the two IOs can be merged into one to reduce the access pressure on the object server, and the so-called IO is optimized. The removal of invalid IOs is to filter invalid urls (addresses) sent by a requester, directly remove the IOs and not send the IOs to a remote object server.

The second step is that: the proxy server will order all the requests of the requester, for example, if there is a task of writing data in the request, it will raise its priority, and lower the priority of the request of other read data, so as to implement the write-first operation, because the priority of the request for the write operation should be the highest in the data request, which satisfies that the write transaction is completed as soon as possible.

The third step: the proxy can quickly resolve the address of the connection oss (object server) in the request to facilitate locating the specific location of the object server, and can resolve the address into three parts: url (website location), bucket (file location) and block (block location), and the file location to be accessed is quickly located according to the analysis of the locations.

The use of a remote proxy management role master (proxy management) solves the problems of synchronization, remote pressure, scheduling, etc. The agent can optimize and combine a plurality of requests, for the combination of iO (requests), IO capable of being combined is combined (for example, many requests access the address of the same object storage server, the iO requests can be combined), unnecessary requests can be found in advance through the combination incapable of being combined, the interference operation on a remote machine is reduced, the synchronization is easy to realize, the agent role is introduced, real-time data requests are delivered to a host for execution, and the mechanism of lru memory swap-in and swap-out and buffer overflow refreshing and elimination is used when the agent role processes the requests.

The agent role uses a write-priority algorithm to enable the task with the write transaction to be executed first, so that the weight of the write task is higher. The requested data is preferably searched in a storage cache (local memory), and the position for reading the data and the position for writing the data are controlled and recorded, so that the data reading and writing efficiency and the manual operation checking efficiency are improved. The main task of the agent role is to manage the IO of the requester, and the management mode can also use a management mode (key value database) similar to a kv database to process, and can use a batch processing mode to process a certain amount of IO at one time.

The swap-in and swap-out of lru memory is: the Least Recently Used algorithm is Used by the Least Recently Used page, which is to be replaced, for example: for example, there are only 4 blocks in total space. Only 4 pages can be placed. Then when the page is full, that page that you have not used for a long time (i.e., the least recently used page) is replaced, and the new page is then replenished,

specific embodiment IO merge:

there is an IO list: A. b, C, D, E

A＝D+B,B＝C+E,C＝D+E

The order of calculation proceeds as follows:

IO merging:

if d and e are calculated by IO, if the two are ordered according to topology and then are in parallel, 2 threads are started simultaneously to calculate d and e respectively; although multithreading does not increase time, one more IO operation brings about the effect that IO has a bottleneck, if the IO operation of the system becomes more, the system jitter and the delay are caused, for example, d and e are IO merged, that is, d and e can be read out at one time, and the capacity of the system IO can be improved by 1 time.

IO cannot be merged:

there are 2 cases:

if d and e are different databases accessed, then IO cannot merge and can only read 2 times.

The complexity of IO merging is increased, for example, 3 nodes d, e and c are IO calculation, d and e can perform IO merging, and c needs to wait for j in operation of d and e to determine which IOs to read, and when IO merging is performed, d and e can only be merged first, but c cannot be merged because after j in operation of d and e, it is known what data c needs to look up, so merging cannot be performed.

The 2 data join operations are to find the union part of 2 data sets, for example, one data set is a mathematic achievement, one data set is a geographic achievement, students with mathematic achievements more than 60 points and geographic achievements more than 90 need to find the intersection part of the 2 data sets, that is, the 2 data sets are to be joined.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A local cache management method of a remote storage service for a cloud database is characterized by comprising the following steps:

2. The local cache management method for the remote storage service of the cloud database according to claim 1, wherein: in step S1, the proxy server hands the accepted request to the host for execution, and the host uses lru mechanisms for swapping in and out of the memory and refreshing and eliminating the buffer overflow for the history and new request during execution.

3. The local cache management method for the remote storage service of the cloud database according to claim 1, wherein: the proxy server adopts a write priority algorithm to sequence all the requests, the task priority of the write operation is improved, and the request priorities of other read data are reduced.

4. The local cache management method for the remote storage service of the cloud database according to claim 1, wherein: and the proxy server preferentially searches the requested data in the local memory, and manages, controls and records the position of the read data and the position of the written data.

5. The local cache management method for the remote storage service of the cloud database according to claim 1, wherein: the proxy server rapidly resolves the address connected in the request, positions the specific position of the object, and resolves the position into: website location, file location, block location.

6. The local cache management method for the remote storage service of the cloud database according to claim 1, wherein: the process of optimizing the request merge by the proxy server in step S2 is as follows:

7. The local cache management method for the remote storage service of the cloud database according to claim 1, wherein: in the step S3, the local cache is stored in the form of a disk in blocks, and records the buffer status of the metadata record, and each block of record is attached with an identifier.

8. The local cache management method for the remote storage service of the cloud database according to claim 1, wherein: when the size of the local cache and the size of the object memory are stored, if the data file reaches a set threshold value, the data file is written into the next file.