CN115586871A

CN115586871A - Data appending and writing method, device, equipment and medium for cloud computing scene

Info

Publication number: CN115586871A
Application number: CN202211336566.2A
Authority: CN
Inventors: 易正利
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-10-28
Filing date: 2022-10-28
Publication date: 2023-01-10
Anticipated expiration: 2042-10-28
Also published as: CN115586871B

Abstract

The disclosure provides a data appending writing method, device, equipment and medium for a cloud computing scene, relates to the field of artificial intelligence, in particular to cloud computing, cloud storage and cloud database technologies, and can be applied to an intelligent cloud scene. The specific implementation scheme is as follows: determining a physical address of the data in response to the data being additionally written to a first data segment of the data stream; updating a first index information table in a storage engine according to the mapping relation between the logical address and the physical address of the data writing block device, wherein the first index information table is used for indexing data to be read in the data stream; and updating a second index information table in the storage engine according to the mapping relation, wherein the second index information table is used for determining the validity of the data written in the second data segment of the data stream. According to the technology disclosed by the invention, the efficient index management of the data can be realized, the stable index access performance is provided, and the overall performance of the storage system corresponding to the cloud disk is improved.

Description

Data additional writing method, device, equipment and medium for cloud computing scene

Technical Field

The disclosure relates to the field of artificial intelligence, in particular to cloud computing, cloud storage and cloud database technologies, which can be applied to intelligent cloud scenes.

Background

The Cloud Disk CDS (Cloud Disk Service) is a safe and reliable high-elasticity storage Service, can be used as an expansion block storage component of a Cloud server, and provides high-availability and high-capacity support for data storage of the Cloud server.

Disclosure of Invention

The disclosure provides a data appending and writing method, device, equipment and medium for a cloud computing scene.

According to an aspect of the present disclosure, a data appending and writing method for a cloud computing scene is provided, which is applied to a cloud disk, where the cloud disk includes a data stream and a storage engine, and includes:

determining a physical address of the data in response to the data being additionally written to a first data segment of the data stream;

updating a first index information table in a storage engine according to the mapping relation between the logical address and the physical address of the data writing block device, wherein the first index information table is used for indexing data to be read in a data stream; and

and updating a second index information table in the storage engine according to the mapping relation, wherein the second index information table is used for determining the validity of the data written in the second data segment of the data stream.

According to another aspect of the present disclosure, there is provided a data appending and writing apparatus for a cloud computing scenario, including:

the first determining module is used for responding to the first data segment which is additionally written into the data stream by the data and determining the physical address of the data;

the first updating module is used for updating a first index information table in the storage engine according to the mapping relation between the logical address and the physical address of the data writing block device, wherein the first index information table is used for indexing data to be read in the data stream; and

and the second updating module is used for updating a second index information table in the storage engine according to the mapping relation, and the second index information table is used for determining the validity of the data written in the second data segment of the data stream.

According to another aspect of the present disclosure, an electronic device is provided, which is applied to a cloud disk, where the cloud disk includes a data stream and a storage engine, and includes:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to cause the at least one processor to perform a method according to any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method according to any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a method according to any of the embodiments of the present disclosure.

According to the technology disclosed by the invention, the efficient index management of the data can be realized, the stable index access performance is provided, and the overall performance of the storage system corresponding to the cloud disk is improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic flowchart of a data append write method for a cloud computing scenario according to an embodiment of the disclosure;

fig. 2 is an application schematic diagram of a data append write method for a cloud computing scenario according to an embodiment of the present disclosure;

fig. 3 is an application diagram of a data append writing method for a cloud computing scenario according to an embodiment of the present disclosure;

fig. 4 is a schematic diagram of a data appending apparatus oriented to a cloud computing scenario, in accordance with an embodiment of the present disclosure;

fig. 5 is a block diagram of an electronic device for implementing a data append write method for a cloud computing scenario according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

As shown in fig. 1, an embodiment of the present disclosure provides a data appending and writing method for a cloud computing scenario, which is applied to a cloud disk, where the cloud disk includes a data stream and a storage engine, and includes:

step S101: in response to the data being additionally written to a first data segment of the data stream, a physical address of the data is determined.

Step S102: and updating a first index information table in the storage engine according to the mapping relation between the logical address and the physical address of the data writing block device, wherein the first index information table is used for indexing data to be read in the data stream.

Step S103: and updating a second index information table in the storage engine according to the mapping relation, wherein the second index information table is used for determining the validity of the data written in the second data segment of the data stream.

According to the embodiments of the present disclosure, it should be noted that:

any one or more Cloud disks (Cloud Disk services) in the existing storage system can apply the data appending and writing method facing the Cloud computing scenario of the embodiment of the disclosure. When applied to a plurality of cloud disks of a storage system, each cloud disk has a corresponding data stream and storage engine.

The storage engine of the cloud disk may adopt any database instance having a data persistence function in the prior art, which is not specifically limited herein.

The data stream of the cloud disk can be understood as a group of data sequences containing ordered data segments, having a starting point and an ending point, and used for storing data. The additionally written data stream can be divided into a plurality of data segments (segments), the segments are internally used for additionally writing, only one segment can write data at a certain time, and when one segment is fully written, the data stream can create a new segment to continue to additionally write the data; each time segment is created, segment id is increased by 1; meanwhile, when there are many segment holes, the segment space is recovered, so the segment is also the basic unit for space recovery.

The first data segment may be understood as a data segment that can be currently written with data additionally, among a plurality of data segments of the data stream.

The data is additionally written to the first data segment of the data stream, which may be understood as data being written to one data segment, or may be understood as data being written to a plurality of first data segments, and the number of the first data segments to be written is specifically determined according to the size of the data. When the number of the first data segments is multiple, the data has a corresponding physical address corresponding to each first data segment.

The physical address of the data may include id (Identity document) information of a first data segment in which the data is written and location information (offset, internal offset) of the first data segment. For example, if the physical addresses of the data are segment1_ offset1 and segment1_ offset2, the data is written in the first data segment with id1 in the data stream, specifically, written in offset positions of offset1 and offset2 of the first data segment with id 1. offset1 and offset2 also represent the position of the first data segment with id1 in the entire data stream.

A block device may be understood as a user volume, i.e. a block of storage space determined by a user in a storage system. A block device consists of a number of data blocks (blocks), each having its own id. The logical address of the data may include id of the data block in which the data is written and specific storage location information (offset, internal offset) in the data block. For example, the logical addresses may be written as block1_ offset1 and block1_ offset2, indicating that data is stored in the data block with id1, specifically, offset positions within offset1 and offset2 in the data block with id 1.

Since the original data location is changed by the additional write (append), a mapping relationship from a logical address (LBA) to a Physical address (PBA), that is, a first index information table, needs to be maintained. The first index information table is used for storing index information between the logical address and the physical address.

The first index information table and the second index information table may be understood as two Column Family. And respectively storing the mapping relation between the logical address and the physical address in a key-value pair mode. The key value pair of the first index information table is set opposite to the key value pair of the second index information table to realize that the second index information table is used as the reverse index of the first index information table, namely the second index information table is used for storing the reverse index information corresponding to the index information of the first index information table.

According to the embodiment of the disclosure, efficient index management of data can be realized, stable index access performance is provided, and the overall performance of a storage system corresponding to a cloud disk is improved. The first index information table and the second index information table are stored in the storage engine of the cloud disk instead of the block device (such as a memory), so that the problem that index information is stored in the block device with volatile storage and takes a long time to recover when the system is restarted can be solved; and the amount of index information is also limited by the memory capacity. According to the embodiment of the disclosure, because the data stream is written in the mode of additional writing, delay increase caused by random writing can be avoided. The first index information table and the second index information table of the embodiment of the disclosure are stored in the storage engine of the cloud disk, so that the read-write delay can be reduced, the required data can be quickly read based on the first index information table and the second index information table, and the read performance is ensured. Meanwhile, the first index information table and the second index information table are stored in a storage engine of the cloud disk instead of on an extra hardware device, so that the cost of a storage system corresponding to the cloud disk can be saved. According to the embodiment of the disclosure, the first index information and the second index information are stored in the storage engine of the cloud disk, so that stable and low-delay data access performance can be provided, the cost of the storage system can be effectively reduced, and a high-cost-performance storage system can be designed.

In one example, the block device may include a memory.

In one example, when the data is log data generated in real time as storage behavior, the data stream may be understood as a log stream (log).

In one example, the storage engine may be a racks db (database). The racks db is an open source key value pair storage engine which is wide in use, complete in function and good in performance, can provide particularly good writing performance and relatively fast point query and interval query, and supports data persistence.

In one implementation manner, the data appending and writing method for a cloud computing scenario according to the embodiment of the present disclosure includes steps S101 to S103, where step S101: determining a physical address of the data in response to the data being appended to a first data segment of the data stream, comprising:

in response to the data being additionally written to a first data segment of the data stream, a unique identification code of the first data segment and location information of the first data segment in the data stream are determined.

And determining the physical address of the data according to the unique identification code of the first data segment and the position information of the first data segment in the data stream.

the first data segment may be understood as a data segment in the data stream, where data writing is currently possible. If the size of the data is larger than the memory size of one first data segment, the data is divided and stored into a plurality of first data segments. Based on this, the physical address of the data comprises the unique identification codes and the position information of the plurality of first data segments.

The position information of the first data segment in the data stream can be understood as the internal offset (offset) of the first data segment.

According to the embodiment of the disclosure, the physical address corresponding to the data written into the data stream can be accurately determined.

In one implementation manner, the data appending and writing method for a cloud computing scenario according to the embodiment of the present disclosure includes steps S101 to S103, where step S102: updating a first index information table in a storage engine according to the mapping relation between the logical address and the physical address of the data writing block device, wherein the updating comprises the following steps:

and determining a first key-value pair of the data according to the mapping relation between the logical address of the data writing block device and the physical address of the writing data stream, wherein the key of the first key-value pair is the logical address, and the value of the first key-value pair is the physical address.

And updating a first index information table in the storage engine according to the first key value pair, wherein the first index information table is used for indexing data to be read in the data stream.

the key of the first key-value pair is a logical address, which may be understood as the key of the first key-value pair is a logical address, and the value of the first key-value pair is a physical address, which may be understood as the value of the first key-value pair is a physical address. As shown in fig. 2 and fig. 3, the data _ table can be understood as a first index information table, where block1_ offset1 \8230; block 3_ offset5 can be understood as a logical address, i.e. a specific storage address of data in a data block of a block device. Ext _1 \8230; ext _5 in the table can be understood as a physical address, namely a specific storage address of data in the data stream. ext _1 \8230; ext _5 contains the id of the first data segment in which the data is written and the specific location information (offset) in which the first data segment is written.

The data to be read can be understood as data that needs to be obtained and fed back from the data written in the data stream.

According to the embodiment of the disclosure, the first index information table constructed based on the logical address as the key and the physical address as the value can accurately index the required data to be read from the data stream quickly and accurately according to the data reading instruction.

In one implementation manner, the data appending and writing method for a cloud computing scenario according to the embodiment of the present disclosure includes steps S101 to S103, where step S103: according to the mapping relation, updating a second index information table in the storage engine, wherein the updating comprises the following steps:

and determining a second key-value pair of the data according to the mapping relation, wherein the key of the second key-value pair is a physical address, and the value of the second key-value pair is a logical address.

And updating a second index information table in the storage engine according to the second key value pair, wherein the second index information table is used for determining the validity of the data written in the second data segment of the data stream.

the key of the second key-value pair is a physical address, which may be understood as the key of the second key-value pair is a physical address, and the value of the second key-value pair is a logical address, which may be understood as the value of the second key-value pair is a logical address. As shown in fig. 2 and fig. 3, segment _ table can be understood as a second index information table, where block1_ offset1 \8230 \ 8230; block 3_ offset5 can be understood as a logical address, i.e. a specific storage address of data in a data block of a block device. Seg1_ off1 \8230; _ 8230; seg2_ off2 in the table can be understood as the physical address, i.e. the specific memory address of the data in the data stream. seg1_ off1 \8230 \, seg2_ off2 contains id of the first data segment where the data is written and specific position information (offset) of the first data segment.

The second data segment may be understood as a data segment in each data segment of the data stream, where validity of the written data needs to be determined, and may be understood as a data hole in the data segment, where data written in the data segment needs to be migrated, and a space of the data segment needs to be emptied and recycled.

According to the embodiment of the disclosure, based on the second index information table constructed by taking the physical address as the key and the logical address as the value, the validity of the data written in the second data segment of the data stream can be accurately determined.

In an implementation manner, the data appending and writing method for a cloud computing scenario according to the embodiment of the present disclosure includes steps S101 to S103, and further includes:

step S104: and responding to the fact that the data holes in the second data segment exceed the threshold, and determining the corresponding logical address of the second data segment in the second index information table according to the actual physical address of the second data segment.

Step S105: and determining the corresponding logical address of the second data segment in the second index information table and the corresponding prestored physical address in the first index information table.

Step S106: and determining the data written in the second data segment as valid data in response to the actual physical address matching the pre-stored physical address.

Step S107: and migrating the valid data to the target data segment.

when data deletion is performed on a data segment of a data stream or data with a reduced data size is written in an overwriting manner, some data holes are caused in the data segment of the data stream, and the space utilization rate of a cloud disk is influenced due to the existence of the data holes in the data stream.

The actual physical address of the second data segment can be understood as the actual id and location information of the second data segment in the data stream.

As shown in fig. 2 and fig. 3, if a data hole exists in the second data segment seg1 (segment 1), seg1 is determined as an actual physical address. Based on the segment _ table, it can be determined that block id1_ offset1 corresponding to seg1_ off1, block id1_ offset2 corresponding to seg1_ off2, and block id2_ offset3 corresponding to seg1_ off3 are the corresponding logical addresses of the second data segment in the second index information table. Based on block1_ offset1, block1_ offset2 and block 2_ offset3, ext _1 corresponding to block1_ offset1, ext _2 corresponding to block1_ offset2 and ext _3 corresponding to block 2_ offset3 are determined as the corresponding prestored physical addresses in the first index information table of the corresponding logical addresses in the second index information table based on data _ table. If the physical address information included in ext _1 corresponding to block1_ offset1 is segment1_ offset1, it is determined that the data written in offset1 of segment1 of the second data segment is valid data by matching segment1_ off1 (segment 1_ offset 1) corresponding to block1_ offset1 in the second index information table. If the physical address information included in ext _2 corresponding to block1_ offset2 is segment1_ offset3, it is determined that the data written in offset2 of segment1 of the second data segment is invalid because segment1_ off2 (segment 1_ offset 2) corresponding to block1_ offset2 in the second index information table does not match.

The target data segment may be understood as a data segment to which data can be currently written additionally in the data stream.

According to the embodiment of the disclosure, by determining the valid data in the second data segment with the data hole, the valid data in the second data segment can be subjected to data migration, and meanwhile, the second data segment after the data migration can be emptied, recycled and reused, so that the space utilization rate of the cloud disk is improved.

In an implementation manner, the data appending and writing method for a cloud computing scenario according to the embodiment of the present disclosure includes steps S101 to S107, and further includes:

step S108: and determining the data written in the second data segment as invalid data in response to the actual physical address not matching the pre-stored physical address.

Step S109: and clearing invalid data from the second data segment.

as shown in fig. 2 and fig. 3, assuming that a data hole exists in the second data segment seg1 (segment 1), seg1 is determined as an actual physical address. Based on the segment _ table, it can be determined that block id1_ offset1 corresponding to seg1_ off1, block id1_ offset2 corresponding to seg1_ off2, and block id2_ offset3 corresponding to seg1_ off3 are the corresponding logical addresses of the second data segment in the second index information table. Based on block1_ offset1, block1_ offset2 and block 2_ offset3, ext _1 corresponding to block1_ offset1, ext _2 corresponding to block1_ offset2 and ext _3 corresponding to block 2_ offset3 are determined as the corresponding pre-stored physical addresses of the corresponding logical addresses in the second index information table in the first index information table based on data _ table. If the physical address information included in ext _1 corresponding to block1_ offset1 is segment1_ offset1, it is matched with segment1_ off1 (segment 1_ offset 1) corresponding to block1_ offset1 in the second index information table, and it is described that the data written in offset1 of segment1 of the second data segment is valid data. If the physical address information included in ext _2 corresponding to block1_ offset2 is segment1_ offset3, it is determined that the data written in offset2 of segment1 of the second data segment is invalid because segment1_ off2 (segment 1_ offset 2) corresponding to block1_ offset2 in the second index information table does not match.

According to the embodiment of the disclosure, the valid data in the second data segment with the data hole is determined, so that the valid data in the second data segment can be migrated, and meanwhile, the invalid data in the second data segment can be emptied by determining the invalid data in the second data segment, so that the second data segment after data migration can be emptied, recycled and reused, and the space utilization rate of the cloud disk is improved.

In one example, as shown in fig. 2 and 3, the data deletion or overwriting is to delete or change the index information in the data _ table, and there are holes in the log. The existence of these holes in the log stream affects the space usage rate. Therefore, the log stream is cut into segments, when one segment cavity is more, the effective data needs to be moved to the head part of the log stream, and then the old segments can be emptied, so that the space cleaning is completed. Specifically, the method comprises the following steps:

for segment (second data segment) to be recycled, acquiring all key value pairs under segment id from segment _ table (second index information table), wherein value records block key of data _ table (first index information table) index table;

according to the block key obtained in the last step, reading the index information ext of the key from the data _ table, and according to whether the segment id and segment offset information recorded by the ext are matched with the segment id and segment _ offset read from the segment _ table in the last step, if so, showing that the segment of data is still valid and needs to be moved; otherwise, the data is invalid data;

moving the still effective data in the segment to a new segment, then updating the new index information to the data _ table, and writing the mapping from the new segment id, the segment offset to the block key into the segment _ table;

after segment effective data is removed, the original segment can be emptied, and the segment space recovery is completed.

In an implementation manner, the data appending and writing method for a cloud computing scene according to the embodiment of the present disclosure includes steps S101 to S103, and further includes:

based on the preset writing frequency, writing the current log state information of the data stream into the cloud disk in the form of a checkpoint (checkpoint). The checkpoint is used for loading the data stream and the first index information table and the second index information table corresponding to the data stream based on the checkpoint under the condition that the storage system corresponding to the cloud disk is restarted.

According to the embodiment of the disclosure, when the storage system is restarted, the checkpoint can be used for loading the log stream from the checkpoint which is written into the cloud disk at the last time, so that the effect of quick restart is achieved. Meanwhile, the storage engine can be quickly loaded based on the reloaded log stream, so that the first index information table and the second index information table in the storage engine can be updated based on the reloaded log stream.

In an example, in order to shorten the restart time of the storage system, the storage engine may periodically write the state of the log application of the data stream to the disk in a checkpoint form, and when the storage system is restarted, data may be loaded from the checkpoint to avoid reading the full log. Since the storage engine rocks db can persist data, at checkpoint, only the state of the log application needs to be written to disk. When the device is restarted: loading rock db data; reading checkpoint, and loading log application state information; and starting from the checkpoint point, loading the log, and constructing a first index information table and a second index information table of the reloaded data in the data stream in the storage engine.

in response to a data read instruction, a logical address of data to be read is determined.

And determining a corresponding physical address in the first index information table according to the logical address of the data to be read.

And indexing the data to be read from the data stream according to the corresponding physical address in the first index information table.

According to the embodiment of the disclosure, by using the first index information table, data corresponding to the data reading instruction can be quickly acquired from the data stream and can be fed back to a sending object of the data reading instruction.

In one implementation manner, the data appending and writing method for a cloud computing scenario according to the embodiment of the present disclosure includes steps S101 to S103, where, in step S101: in response to the data being additionally written to a first data segment of the data stream, determining a physical address of the data, and before, further comprising:

in response to the data append write instruction, at least one first data segment is determined from the data stream based on the size of the data.

And correspondingly writing the data into at least one first data segment.

when the size of the data is larger than the storage space of one data segment, the data needs to be stored through a plurality of first data segments together. And at the same time, storing the data in only one first data segment, and continuously storing the data in the next first data segment after the data is fully stored.

In one implementation manner, the data appending and writing method for a cloud computing scenario according to the embodiment of the present disclosure includes steps S101 to S103, where, in step S102: updating first index information of a first index information table in a storage engine according to a mapping relation between a logical address and a physical address of a data writing block device, wherein the method comprises the following steps:

and determining the unique identification code of the data block and the writing position information of the data in the data block according to the data block of the block device in which the data is written.

And determining the logical address of the data according to the unique identification code of the data block and the writing position information of the data in the data block.

the user volumes (i.e., block devices) are partitioned into a plurality of data blocks (blocks) based on the logical addresses, each block having a unique identification code, denoted as block id.

In one example, as shown in fig. 2 and 3, the storage engine includes a plurality of disks (i.e., cloud disks), each disk containing one storage log stream (i.e., data stream) and one racks db instance (i.e., storage engine). The storage engine also includes a block device (Memory), a data block (block) of which is used to store a logical address of data. And storing the mapping relation between the logical address of the memory and the physical address of the log stream in the racks db instance. And regularly writing the direct log state information of the memory and the log stream into the cloud disk in a checkpoint form. Each instance of rocks db contains two columns Family: the data _ table (i.e. the first index information table) and the segment _ table (i.e. the second index information table), where the data _ table is used to store index information of user data, key is block id + block internal offset, and value is location information of the user data in the log stream; the segment _ table is used for storing reverse index information of the segment, key is segment id + segment internal offset, and value is corresponding key in the data _ table. The main reason for introducing the segment _ table is to facilitate space recovery, avoid scanning the whole data _ table to determine valid data in a certain segment, directly read the segment _ table to find the key value pair containing the corresponding segment, and then determine which keys in the data _ table are contained in the value, thereby determining the valid data.

In one example, as shown in fig. 2 and 3, the data writing flow is as follows:

writing user data into log; updating index information of data in db data _ table according to the logic address of the user data; the reverse index information of the segment in the db segment _ table is updated.

In one example, as shown in fig. 2 and 3, the data reading flow is as follows:

searching corresponding index information ext in db data _ table according to the logical address; if the ext corresponding to the logical address is found, reading the data in the log according to the ext, returning the data to the user, and ending the reading process; if no ext is found, null data is returned to the user.

As shown in fig. 4, an embodiment of the present disclosure provides a data appending writing apparatus for a cloud computing scenario, which is applied to a cloud disk, where the cloud disk includes a data stream and a storage engine, and includes:

a first determining module 410 for determining a physical address of the data in response to the data being appended to a first data segment of the data stream.

The first updating module 420 is configured to update a first index information table in the storage engine according to a mapping relationship between a logical address and a physical address of a data write block device, where the first index information table is used to index data to be read in a data stream. And

and a second updating module 430, configured to update a second index information table in the storage engine according to the mapping relationship, where the second index information table is used to determine validity of data written in a second data segment of the data stream.

In one embodiment, the first determining module 410 includes:

and the first determining sub-module is used for responding to the data appended to the first data segment of the data stream, and determining the unique identification code of the first data segment and the position information of the first data segment in the data stream.

And the second determining submodule is used for determining the physical address of the data according to the unique identification code of the first data segment and the position information of the first data segment in the data stream.

In one embodiment, the first update module 420 includes:

and the third determining submodule is used for determining a first key-value pair of the data according to the mapping relation between the logical address and the physical address of the data writing block device, wherein the key of the first key-value pair is the logical address, and the value of the first key-value pair is the physical address.

And the first updating submodule is used for updating a first index information table in the storage engine according to the first key value pair, and the first index information table is used for indexing data to be read in the data stream.

In one embodiment, the second update module 430 includes:

and the fourth determining submodule is used for determining a second key-value pair of the data according to the mapping relation, wherein the key of the second key-value pair is a physical address, and the value of the second key-value pair is a logical address.

And the second updating submodule is used for updating a second index information table in the storage engine according to the second key value pair, and the second index information table is used for determining the validity of the data written in the second data segment of the data stream.

In one embodiment, the data appending and writing device for the cloud computing scenario further includes:

and the second determining module is used for determining a corresponding logical address of the second data segment in the second index information table according to the actual physical address of the second data segment in response to the fact that the data hole in the second data segment exceeds the threshold value.

And the third determining module is used for determining the pre-stored physical address of the corresponding logical address of the second data segment in the second index information table in the first index information table.

And the fourth determining module is used for determining the data written in the second data segment as valid data in response to the fact that the actual physical address is matched with the pre-stored physical address.

And the migration module is used for migrating the effective data to the target data segment.

In one embodiment, the data appending and writing device for the cloud computing scenario further comprises:

and a fifth determining module, configured to determine the data written in the second data segment as invalid data in response to the actual physical address not matching the pre-stored physical address.

And the clearing module is used for clearing the invalid data from the second data segment.

and the checkpoint module is used for writing the current log state information of the data stream into the cloud disk in a checkpoint mode based on the preset writing frequency. The checkpoint is used for loading the data stream and the first index information table and the second index information table corresponding to the data stream based on the checkpoint under the condition that the storage system corresponding to the cloud disk is restarted.

and the sixth determining module is used for responding to the data reading instruction and determining the logic address of the data to be read.

And the seventh determining module is used for determining the corresponding physical address in the first index information table according to the logical address of the data to be read.

And the index module is used for indexing the data to be read from the data stream according to the corresponding physical address in the first index information table.

and the eighth determining module is used for responding to the data appending and writing instruction and determining at least one first data segment from the data stream according to the size of the data.

And the writing module is used for correspondingly writing the data into at least one first data segment.

and the ninth determining module is used for determining the unique identification code of the data block and the writing position information of the data in the data block according to the data block of the block device in which the data is written.

And the tenth determining module is used for determining the logical address of the data according to the unique identification code of the data block and the writing position information of the data in the data block.

For a description of specific functions and examples of each module and sub-module of the apparatus in the embodiment of the present disclosure, reference may be made to the description of corresponding steps in the foregoing method embodiments, and details are not repeated here.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 5 illustrates a schematic block diagram of an example electronic device 500 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 5, the device 500 comprises a computing unit 501 which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data required for the operation of the device 500 can also be stored. The calculation unit 501, the ROM502, and the RAM503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, etc. An output unit 507 such as various types of displays, speakers, and the like. A storage unit 508, such as a magnetic disk, optical disk, or the like. And a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 501 executes the respective methods and processes described above, such as a data append write method oriented to a cloud computing scenario. For example, in some embodiments, the data append write method for a cloud computing scenario may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 500 via ROM502 and/or communications unit 509. When the computer program is loaded into the RAM503 and executed by the computing unit 501, one or more steps of the data append write method described above for the cloud computing scenario may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the data append write method for a cloud computing scenario by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, causes the functions/acts specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user. And a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with the user. For example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback). And input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server combining a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. A data appending and writing method facing to a cloud computing scene is applied to a cloud disk, the cloud disk comprises a data stream and a storage engine, and the method comprises the following steps:

determining a physical address of data in response to data being appended to a first data segment of the data stream;

updating a first index information table in the storage engine according to the mapping relation between the logical address and the physical address of the data writing block device, wherein the first index information table is used for indexing data to be read in the data stream; and

2. The method of claim 1, wherein said determining a physical address of the data in response to the data being appended to the first data segment of the data stream comprises:

determining a unique identification code of a first data segment and position information of the first data segment in the data stream in response to data being additionally written to the first data segment of the data stream;

3. The method of claim 1, wherein the updating the first index information table in the storage engine according to the mapping relationship between the logical address and the physical address of the data write block device comprises:

determining a first key-value pair of the data according to a mapping relation between a logical address of the data writing block device and the physical address, wherein a key of the first key-value pair is the logical address, and a value of the first key-value pair is the physical address;

and updating a first index information table in the storage engine according to the first key-value pair, wherein the first index information table is used for indexing data to be read in the data stream.

4. The method of claim 1, wherein the updating the second index information table in the storage engine according to the mapping relationship comprises:

determining a second key-value pair of the data according to the mapping relation, wherein a key of the second key-value pair is the physical address, and a value of the second key-value pair is the logical address;

and updating a second index information table in the storage engine according to the second key-value pair, wherein the second index information table is used for determining the validity of the data written in the second data segment of the data stream.

5. The method of any of claims 1 to 4, further comprising:

responding to the fact that a data hole in the second data segment exceeds a threshold value, and determining a corresponding logical address of the second data segment in the second index information table according to an actual physical address of the second data segment;

determining a pre-stored physical address corresponding to a logical address of the second data segment in the second index information table in the first index information table;

determining the data written in the second data segment as valid data in response to the actual physical address matching the pre-stored physical address;

and migrating the effective data to a target data segment.

6. The method of claim 5, further comprising:

in response to the actual physical address not matching the pre-stored physical address, determining the data written in the second data segment as invalid data;

and emptying the invalid data from the second data segment.

7. The method of any of claims 1 to 4, further comprising:

writing the current log state information of the data stream into the cloud disk in a check point mode based on a preset writing frequency; the checkpoint is used for loading the data stream and a first index information table and a second index information table corresponding to the data stream based on the checkpoint under the condition that a storage system corresponding to the cloud disk is restarted.

8. The method of any of claims 1 to 4, further comprising:

responding to a data reading instruction, and determining a logic address of data to be read;

determining a corresponding physical address in the first index information table according to the logical address of the data to be read;

9. The method of any of claims 1 to 4, the determining a physical address of the data in response to the data being appended to a first data segment of the data stream, before, further comprising:

in response to a data appending and writing instruction, determining at least one first data segment from the data stream according to the size of data;

and correspondingly writing the data into the at least one first data segment.

10. The method according to any one of claims 1 to 4, wherein the updating the first index information of the first index information table in the storage engine according to the mapping relationship between the logical address and the physical address of the data write block device, before, further comprises:

determining a unique identification code of the data block and writing position information of the data in the data block according to the data block of the block device in which the data is written;

11. A data appending and writing device facing to a cloud computing scene is applied to a cloud disk, wherein the cloud disk comprises a data stream and a storage engine, and the data appending and writing device comprises:

a first determining module for determining a physical address of data in response to data being appended to a first data segment of the data stream;

a first updating module, configured to update a first index information table in the storage engine according to a mapping relationship between a logical address of the data write block device and the physical address, where the first index information table is used to index data to be read in the data stream; and

and the second updating module is used for updating a second index information table in the storage engine according to the mapping relation, wherein the second index information table is used for determining the validity of the data written in a second data segment of the data stream.

12. The apparatus of claim 11, wherein the first determining means comprises:

a first determining sub-module, configured to determine, in response to data being additionally written to a first data segment of the data stream, a unique identification code of the first data segment and location information of the first data segment in the data stream;

13. The apparatus of claim 11, wherein the first update module comprises:

a third determining submodule, configured to determine a first key-value pair of the data according to a mapping relationship between a logical address of the data write block device and the physical address, where a key of the first key-value pair is the logical address, and a value of the first key-value pair is the physical address;

14. The apparatus of claim 11, wherein the second update module comprises:

a fourth determining submodule, configured to determine a second key-value pair of the data according to the mapping relationship, where a key of the second key-value pair is the physical address, and a value of the second key-value pair is the logical address;

and the second updating submodule is used for updating a second index information table in the storage engine according to the second key value pair, wherein the second index information table is used for determining the validity of the data written in the second data segment of the data stream.

15. The apparatus of any of claims 11 to 14, further comprising:

a second determining module, configured to determine, in response to a data hole in the second data segment exceeding a threshold, a corresponding logical address of the second data segment in the second index information table according to an actual physical address of the second data segment;

a third determining module, configured to determine a pre-stored physical address of a logical address, corresponding to the second data segment in the second index information table, in the first index information table;

a fourth determining module, configured to determine, in response to that the actual physical address matches the pre-stored physical address, that the data written in the second data segment is valid data;

and the migration module is used for migrating the effective data to a target data segment.

16. The apparatus of claim 15, further comprising:

a fifth determining module, configured to determine, in response to that the actual physical address does not match the pre-stored physical address, data written in the second data segment as invalid data;

17. The apparatus of any of claims 11 to 14, further comprising:

the check point module is used for writing the current log state information of the data stream into the cloud disk in a check point mode based on a preset writing frequency; the checkpoint is used for loading the data stream and a first index information table and a second index information table corresponding to the data stream based on the checkpoint under the condition that a storage system corresponding to the cloud disk is restarted.

18. The apparatus of any of claims 11 to 14, further comprising:

a sixth determining module, configured to determine, in response to the data reading instruction, a logical address of the data to be read;

a seventh determining module, configured to determine, according to the logical address of the data to be read, a corresponding physical address in the first index information table;

and the indexing module is used for indexing the data to be read from the data stream according to the corresponding physical address in the first index information table.

19. The apparatus of any of claims 11 to 14, further comprising:

an eighth determining module, configured to determine, in response to a data appending and writing instruction, at least one first data segment from the data stream according to a size of data;

and the writing module is used for correspondingly writing the data into the at least one first data segment.

20. The apparatus of any of claims 11 to 14, further comprising:

a ninth determining module, configured to determine, according to a data block of a block device in which the data is written, a unique identification code of the data block and writing location information of the data in the data block;

21. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 10.

22. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1 to 10.

23. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1 to 10.