CN117742609A

CN117742609A - Data processing method and device based on distributed storage

Info

Publication number: CN117742609A
Application number: CN202311787090.9A
Authority: CN
Inventors: 熊江; 丁林; 刘志民; 李钊; 黄岩
Original assignee: Yunhe Enmo Beijing Information Technology Co ltd
Current assignee: Yunhe Enmo Beijing Information Technology Co ltd
Priority date: 2023-12-22
Filing date: 2023-12-22
Publication date: 2024-03-22

Abstract

The application discloses a data processing method and device based on distributed storage. Wherein the method comprises the following steps: nodes in the distributed storage system receive a data writing request, wherein the writing request comprises the following steps: data to be written and a first index; writing data to be written into target data containers in a plurality of data containers to obtain written data and container position descriptors of the written data; updating the first index based on the container location descriptor and the first identifier to obtain a second index, and updating the state of the state machine corresponding to the node according to the second index. The method and the device solve the technical problem of write amplification caused by the fact that the log file and the data file are required to be written into the storage area by a log mechanism before writing in the related technology.

Description

Data processing method and device based on distributed storage

Technical Field

The application relates to the technical field of distributed storage, in particular to a data processing method and device based on distributed storage.

Background

In a distributed storage system, in order to restore to a state before a failure when a system crashes, a log before Write (WAL) mechanism is generally used, and a full amount of log is written before data is written to a final location. Thus, after the system crashes and is restarted, the recovery of all data which has been submitted successfully can be ensured by replaying the log. The write-ahead log changes the host writing process into two phases, write-ahead log and application state machine. Wherein, the process of writing the log changes random writing into sequential writing; when the log is applied, the success can be returned after the log is written into the write cache.

For the traditional mechanical hard disk, WAL ensures the data durability and atomicity and simultaneously remarkably improves the read-write performance. However, WAL also presents a technical problem of write amplification in that the same piece of data is written twice within a single node.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the application provides a data processing method and device based on distributed storage, which at least solve the technical problem of write amplification caused by the fact that a log file and a data file are required to be written into a storage area before writing in a log mechanism in the related technology.

According to an aspect of an embodiment of the present application, there is provided a data processing method based on distributed storage, including: nodes in the distributed storage system receive a data writing request, wherein the writing request comprises the following steps: the device comprises data to be written and a first index, wherein the first index is used for indicating the corresponding relation between a logic address of the data to be written and a first identifier, and the first identifier is an identifier of metadata of the data to be written; writing the data to be written into target data containers in a plurality of data containers to obtain written data and container position descriptors of the written data, wherein the target data containers comprise a plurality of storage units, and the container position descriptors are used for recording information of the target data containers where the written data are located and information of the target storage units where the written data are located; updating the first index based on the container position descriptor and the first identifier to obtain a second index, and updating the state of a state machine corresponding to the node according to the second index, wherein the second index is used for indicating the corresponding relation between the information of the storage unit where the written data are located and the container position descriptor in the metadata.

Optionally, the capacity of each storage unit in the plurality of storage units is consistent and x is x, where x is a natural number.

Optionally, writing the data to be written into a target data container in a plurality of data containers includes: under the condition that the capacity of the data to be written does not meet k x, acquiring the offset of the data to be written relative to a target storage unit, wherein k is a positive integer; determining initial data in the target storage unit according to the capacity and the offset of the target storage unit; combining the initial data with the data to be written to obtain target data; target data is written to a target storage unit in a target data container.

Optionally, deleting data in a first data container in the plurality of data containers under the condition that the remaining space of the plurality of data containers is lower than a first preset threshold value and/or at preset time intervals, wherein the first data container is any data container except the current data container in the plurality of data containers.

Optionally, deleting the data in the first data container of the plurality of data containers includes: determining the number of first storage units in each of a plurality of data containers, wherein the first storage units are storage units which have no corresponding relation with the container position descriptors; copying data in a second storage unit to a current data container under the condition that the number of the first storage units in the first data container exceeds a second preset threshold, wherein the second storage unit is a storage unit with a corresponding relation with a container position descriptor; after copying the data in the second storage unit to the current data container, the first data container is deleted.

Optionally, copying the data in the second storage unit to the current data container includes: and modifying the second index of the data copied to the current data container according to the information of the current data container where the data is located and the information of the current storage unit where the data is located.

Optionally, writing the data to be written into a target data container in a plurality of data containers includes: and writing the data to be written into a target data container in the plurality of data containers by an additional writing mode.

According to still another aspect of the embodiments of the present application, there is further provided a data processing method based on distributed storage, including: a node in a distributed storage system receives a data read request, wherein the data read request comprises: data to be read and a logical address of the data to be read; determining metadata of the data to be read through a second index based on the logical address of the data to be read, wherein the second index is used for indicating the corresponding relation between the logical address of the data to be read and a second identifier, the second identifier is an identifier of the metadata of the data to be read, and the metadata of the data to be read comprises: the method comprises the steps that the corresponding relation between information of a storage unit where data to be read are located and a container position descriptor is used for recording information of a target data container where the data to be read are located and information of a target storage unit where the data to be read are located; and determining the information of the target data container where the data to be read is located and the information of the target storage unit where the data to be read is located according to the container position descriptor, and reading the data to be read based on the information of the target data container where the data to be read is located and the information of the target storage unit where the data to be read is located.

According to still another aspect of the embodiments of the present application, there is further provided a data processing apparatus based on distributed storage, including: the receiving module is configured to receive a data writing request, where the writing request includes: the device comprises data to be written and a first index, wherein the first index is used for indicating the corresponding relation between a logic address of the data to be written and a first identifier, and the first identifier is an identifier of metadata of the data to be written; the writing module is used for writing the data to be written into a target data container in a plurality of data containers to obtain written data and container position descriptors of the written data, wherein the target data container comprises a plurality of storage units, and the container position descriptors are used for recording information of the target data container where the written data are located and information of the target storage unit where the written data are located; the updating module is used for updating the first index based on the container position descriptor and the first identifier to obtain a second index, and updating the state of a state machine corresponding to the node in the distributed storage system according to the second index, wherein the second index is used for indicating the corresponding relation between the information of the storage unit where the written data are located and the container position descriptor in the metadata.

According to still another aspect of the embodiments of the present application, there is further provided a nonvolatile storage medium, where the storage medium includes a stored program, and when the program runs, the device on which the storage medium is controlled to execute the above data processing method based on distributed storage.

According to still another aspect of the embodiments of the present application, there is also provided an electronic device, including: the system comprises a memory and a processor for running a program stored in the memory, wherein the program runs to execute the data processing method based on distributed storage.

In the embodiment of the application, a node in a distributed storage system is adopted to receive a data writing request, wherein the writing request comprises: the device comprises data to be written and a first index, wherein the first index is used for indicating the corresponding relation between a logic address of the data to be written and a first identifier, and the first identifier is an identifier of metadata of the data to be written; writing the data to be written into target data containers in a plurality of data containers to obtain written data and container position descriptors of the written data, wherein the target data containers comprise a plurality of storage units, and the container position descriptors are used for recording information of the target data containers where the written data are located and information of the target storage units where the written data are located; based on the container position descriptor and the first identifier, updating the first index to obtain a second index, and updating the state of a state machine corresponding to the node according to the second index, wherein the second index is used for indicating the corresponding relation between the information of the storage unit where the written data is located and the container position descriptor in the metadata.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

FIG. 1 is a schematic diagram of a distributed storage-based data processing method according to the related art;

FIG. 2 is a flow chart of a data processing method based on distributed storage according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a distributed storage based data processing method according to an embodiment of the present application;

FIG. 4 is a flow chart of another distributed storage based data processing method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of another distributed storage-based data processing method according to an embodiment of the present application;

FIG. 6 is a schematic diagram of deleting data in a first data container according to an embodiment of the present application;

FIG. 7 is a block diagram of a data processing apparatus based on distributed storage according to an embodiment of the present application;

fig. 8 is a hardware block diagram of a computer terminal based on a data processing method of distributed storage according to an embodiment of the present application.

Detailed Description

In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

For better understanding of the embodiments of the present application, technical terms related in the embodiments of the present application are explained below:

the RAFT algorithm is a distributed consistency algorithm and is used for managing log replication and fault tolerance. The RAFT algorithm ensures that there is a leader in the system through an election mechanism, which is responsible for receiving client requests and copying them to other nodes. The core of the RAFT algorithm is to divide the system state into three roles of a leader, a follower and a candidate, and ensure that only one leader is in the system through mechanisms such as election, heartbeat and the like.

The write-ahead log WAL mechanism is a common logging mechanism in database systems. When using the WAL mechanism, when a database system performs write operations, these operations are first recorded into a special log file and then applied to the database file. This mechanism ensures the stability and consistency of the database when performing the write operation, because the database system can be restored and repaired by the records in the log file even if an unexpected situation occurs during the write operation. WAL mechanisms are commonly used to ensure database persistence and reliability, as well as to improve the performance of database systems.

In a distributed storage system, in order to restore to a state before a failure when a system is abnormally crashed, a WAL mechanism is generally adopted, and a full amount of log is written before data is written to a final location. Thus, after the system crashes and is restarted, the recovery of all data which has been submitted successfully can be ensured by replaying the log. FIG. 1 is a schematic diagram of a data processing method based on distributed storage according to the related art, in a typical RAFT-based distributed storage system, the data writing process is as follows:

1. the host sends a read-write request to the master node. 2. The master node generates a log and copies the log to the slave node, and the log storage area is additionally written with log entries according to the sequence. 3. After most nodes successfully write the log, the master node applies the log to the state machine: allocating space in the data storage area and updating index information; writing the data into the write cache, and returning that the client side is successfully written at the moment; the write cache is flushed to the data storage area. 4. The master node informs the slave node to apply the data to the state machine.

For the traditional mechanical hard disk, the WAL ensures the data durability and atomicity and simultaneously remarkably improves the read-write performance, and specifically: data persistence: the log contains complete data content, once submitted, the log completely performs persistent storage on the data and cannot be lost; data atomicity: when the system crashes and restarts, the system redos the submitted logs and provides services to the outside after the completion of the log, so that the situation that the data is partially updated does not occur; high performance: the WAL changes the host's write process into two phases, write log and application state machine. Wherein the process of writing the log changes random writing into sequential writing; when the log is applied, the writing and writing buffer can return success (without waiting for writing and buffer to brush the disk), and synchronous writing is changed into asynchronous writing.

On the other hand, WAL also brings about write amplification, and within a single node, the same piece of data is written twice: when the log is written for the first time, the whole data is written into the log area; the second time the log is applied to the state machine, the full amount of data is written to its final saved data area in an overwrite fashion.

The foregoing high performance is for a mechanical hard disk, but with the rapid popularization of a nonvolatile memory hard disk, the advantages of the above mechanism gradually disappear, and the nonvolatile memory hard disk has the following advantages compared with the mechanical hard disk: 1. near the high performance of the memory 2. The performance gap between random access and sequential access is not large.

Therefore, for non-volatile memory hard disks, WAL has little optimization for performance, but the write amplification that it presents incurs additional computational and memory resource overhead, particularly in distributed storage systems, as the number of copies increases, the write amplification increases significantly.

In summary, in order to solve the above-mentioned problems, related solutions are provided in the embodiments of the present application, and the following detailed description is provided.

In accordance with the embodiments of the present application, there is provided a method embodiment of a data processing method based on distributed storage, it being noted that the steps illustrated in the flowchart of the figures may be performed in a computer system, such as a set of computer executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order other than that illustrated herein.

FIG. 2 is a flow chart of a data processing method based on distributed storage according to an embodiment of the present application, as shown in FIG. 2, the method includes the steps of:

step S202, a node in the distributed storage system receives a data writing request, where the writing request includes: the first index is used for indicating the corresponding relation between the logical address of the data to be written and a first identifier, and the first identifier is the identifier of metadata of the data to be written.

A master node in a distributed storage system receives a data writing request sent by a host and sends the data writing request to a plurality of standby nodes in the distributed storage system, wherein in the distributed storage system, a logic address of data to be written is a unique identifier generated by the system and used for identifying the position and storage information of the data. The logical address may contain an identifier of the node or server where the data is located, as well as specific location information of the data in that node. Through the logical address, the system can accurately locate the storage location of the data and write the data into the corresponding storage location.

Metadata of data to be written is used for recording information such as type, size, storage location, version, authority, creation time, modification time, access time, backup state, compression mode, encryption state and the like of the data. The identification of metadata of the data to be written is for example: identification Information (ID) of metadata

In step S204, the data to be written is written into a target data container in the plurality of data containers, so as to obtain the written data and a container location descriptor of the written data, where the target data container includes a plurality of storage units, and the container location descriptor is used to record information of the target data container where the written data is located and information of the target storage unit where the written data is located.

Illustratively, the ID of the target data container in which the written data is located is: the ID of the target storage unit where the written data is located is: block5, the container location descriptor of the written data is recorded with: container5, block5.

It should be noted that, the physical storage space corresponding to the node may be divided into a plurality of target data containers, where the capacity of the target data container is a fixed value, for example: 4MB. The physical storage space of a node refers to the actual storage space owned by each node in the distributed storage system. These storage spaces may be hard disks, solid state disks, or other storage devices for storing data and files in a distributed system.

In addition, the data to be written can be written into the target data container only by the additional writing mode, and cannot be written into the target data container by the overwriting writing mode.

According to further alternative embodiments of the present application, the capacity of each of the plurality of memory cells is consistent, assuming x, where x is a natural number. In the process of writing the data to be written into the target data container in an additional writing mode, if the capacity of the data to be written does not meet k x, acquiring the offset of the data to be written relative to the target storage unit, wherein k is a positive integer; determining initial data in the target storage unit according to the capacity and the offset of the target storage unit; combining the initial data with the data to be written to obtain target data; target data is written to a target storage unit in a target data container.

For example, the data to be written is: chunk1, chunk1 has a data length of 2KB, chunk1 has an offset of 2KB with respect to the target memory unit to which it is to be written. In the additional writing process, initial data having an offset of 0 with respect to the target memory cell and a data length of 2KB needs to be read first. The initial data and the data to be written are combined into a new data block, namely the target data. Finally, the target data is written into a target storage unit in the target data container.

Step S206, updating the first index based on the container position descriptor and the first identifier to obtain a second index, and updating the state of the state machine corresponding to the node according to the second index, wherein the second index is used for indicating the corresponding relation between the information of the storage unit where the written data is located and the container position descriptor in the metadata.

Preferably, the data processing method based on distributed storage in this embodiment is based on RAFT algorithm, and in the distributed storage system, each node has a state machine, and the state machine changes its own state by executing a command in a log. Whereas RAFT ensures that the state machines of the various nodes in the distributed system remain consistent by way of log replication. Specifically, when a client sends a command to the distributed system, the command is appended to the log in the system, and then the log is guaranteed to be copied and applied to the respective state machine by all nodes through a RAFT algorithm, so that the consistency of the system is guaranteed. In this way, regardless of which node the client request is handled by, eventually the state machines of all nodes reach a consistent state.

In some optional embodiments of the present application, in a case where a remaining space of the plurality of data containers is below a first preset threshold, and/or at preset time intervals, deleting data in a first data container in the plurality of data containers, where the first data container is any data container in the plurality of data containers except for a current data container, specifically including the following steps:

Determining the number of first storage units in each of a plurality of data containers, wherein the first storage units are storage units which have no corresponding relation with the container position descriptors; copying data in a second storage unit to a current data container under the condition that the number of the first storage units in the first data container exceeds a second preset threshold, wherein the second storage unit is a storage unit with a corresponding relation with a container position descriptor; after copying the data in the second storage unit to the current data container, the first data container is deleted.

Preferably, the copying of the data in the second storage unit to the current data container may be achieved by: and modifying the second index of the data copied to the current data container according to the information of the current data container where the data is located and the information of the current storage unit where the data is located.

It will be appreciated that in the case of overwriting at the same location, or in the case of deleting a volume, the old data block will be referenced without an index, becoming junk data, and this will require regular reclamation of the space.

FIG. 6 is a schematic diagram of deleting data in a first data container according to an embodiment of the present application, as shown in FIG. 6, scanning each of a plurality of data containers, for each of which the following operations are performed:

Step S601, counting the number of all unreferenced storage units (first storage units), if the number of unreferenced storage units does not exceed a second preset threshold value, continuing to scan the next data container, otherwise, jumping to step S602;

step S602, copying the data of the memory cell (second memory cell) being referenced into the current data container, and modifying the corresponding index record;

step S603, after the copy modification of the data of all the referenced storage units is completed, the entire data container (first data container) is deleted.

Fig. 3 is a schematic diagram of a data processing method based on distributed storage according to an embodiment of the present application, where log description information and log data (Chunk) are separated when writing a log, as shown in fig. 3.

Step S301, a data Container is utilized to manage a physical storage space, wherein the size of the Container is fixed, and the physical storage space can be divided into a plurality of containers; the content can only be written additionally, but cannot be overwritten; the memory space in the Container may be further divided into a plurality (e.g., 1024) of blocks of a fixed size (e.g., 4K).

In step S302, the data portion in the log is directly written into the Container, and after the writing is successful, a Container location descriptor (Container Position Descriptor, CPD) is returned, where the CPD is used to locate in which Block of which Container the data is stored.

In step S303, when the node applies the log, the node does not carry data any more, but carries CPD.

In step S304, the CPD corresponding to the Block in the metadata (Chunk Meta) is updated using the CPD carried in the request.

According to the steps, the data in the log before writing is written into the target data container to obtain the container position descriptor, the container position descriptor is carried when the log is applied, and the index information of the data block is updated by using the container position descriptor, so that the log before writing and the index information of the data block share the same data, the aim of reducing the data writing times is achieved, and the technical effect of improving the data reading and writing efficiency is achieved.

FIG. 4 is a flow chart of another distributed storage based data processing method according to an embodiment of the present application, as shown in FIG. 4, including the steps of:

step S402, a node in the distributed storage system receives a data reading request, where the data reading request includes: data to be read and logical addresses of the data to be read.

Step S404, determining metadata of the data to be read through a second index based on the logical address of the data to be read, wherein the second index is used for indicating the corresponding relation between the logical address of the data to be read and a second identifier, the second identifier is an identifier of the metadata of the data to be read, and the metadata of the data to be read comprises: and the corresponding relation between the information of the storage unit where the data to be read is located and the container position descriptor is used for recording the information of the target data container where the data to be read is located and the information of the target storage unit where the data to be read is located.

Step S406, according to the container position descriptor, determining the information of the target data container where the data to be read is located and the information of the target storage unit where the data to be read is located, and reading the data to be read based on the information of the target data container where the data to be read is located and the information of the target storage unit where the data to be read is located.

FIG. 5 is a schematic diagram of another distributed storage-based data processing method according to an embodiment of the present application, as shown in FIG. 5, including the steps of:

in step S501, a node in the distributed storage system receives a data reading request, where the data reading request carries: logical addresses of data to be read, for example: the ID (vol_id) of the sector in which the data to be read is located, and the ID (chunk_id) of the block in which the data to be read is located. Based on the ID of the sector where the data to be read is located and the ID of the data block where the data to be read is located, determining the ID of the bucket where the data to be read is located by utilizing a hash algorithm.

In step S502, the ID of the bucket where the data to be read is located is used as the key value of the Chunk index to determine the metadata of the data to be read. Wherein the Chunk index includes the following information: 1. unique identifier of data block: for uniquely identifying each data block for locating and accessing in the storage system. 2. Storing position information of the nodes: the node location at which each data block is stored is recorded so that the data block can be located and accessed quickly when required. 3. Metadata information of the data block: metadata information including size, creation time, rights, etc. of the data block is used to manage and control access and use of the data block.

Step S503, metadata includes: and obtaining the CPD through the mapping relation between the Block and the CPD.

In step S504, by CPD, it is possible to precisely locate on which Block of which Container the data is actually stored.

In step S505, the data is read from the specified container, and returned to the host side.

FIG. 7 is a block diagram of a data processing apparatus based on distributed storage according to an embodiment of the present application, as shown in FIG. 7, the apparatus includes:

the receiving module 70 is configured to receive a data writing request, where the writing request includes: the first index is used for indicating the corresponding relation between the logical address of the data to be written and a first identifier, and the first identifier is the identifier of metadata of the data to be written.

The writing module 72 is configured to write data to be written into a target data container in a plurality of data containers, to obtain written data and a container location descriptor of the written data, where the target data container includes a plurality of storage units, and the container location descriptor is configured to record information of the target data container where the written data is located and information of the target storage unit where the written data is located.

The updating module 74 is configured to update the first index based on the container location descriptor and the first identifier to obtain a second index, and update a state of a state machine corresponding to a node in the distributed storage system according to the second index, where the second index is used to indicate a correspondence relationship between information of a storage unit where the written data is located and the container location descriptor in the metadata.

Note that each module in fig. 7 may be a program module (for example, a set of program instructions for implementing a specific function), or may be a hardware module, and for the latter, it may be represented by the following form, but is not limited thereto: the expression forms of the modules are all a processor, or the functions of the modules are realized by one processor.

It should be noted that, the preferred implementation manner of the embodiment shown in fig. 7 may refer to the related description of the embodiment shown in fig. 2, which is not repeated herein.

Fig. 8 shows a block diagram of a hardware architecture of a computer terminal for implementing a data processing method based on distributed storage. As shown in fig. 8, the computer terminal 80 may include one or more processors 802 (shown as 802a, 802b, … …,802n in the figures) (the processor 802 may include, but is not limited to, a microprocessor MCU or a programmable logic device FPGA or the like processing device), a memory 804 for storing data, and a transmission module 806 for communication functions. In addition, the method may further include: a display, an input/output interface (I/O interface), a Universal Serial BUS (USB) port (which may be included as one of the ports of the BUS), a network interface, a power supply, and/or a camera. It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 8 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, the computer terminal 80 may also include more or fewer components than shown in FIG. 8, or have a different configuration than shown in FIG. 8.

It should be noted that the one or more processors 802 and/or other data processing circuits described above may be referred to herein generally as "data processing circuits. The data processing circuit may be embodied in whole or in part in software, hardware, firmware, or any other combination. Furthermore, the data processing circuitry may be a single stand-alone processing module or incorporated, in whole or in part, into any of the other elements in the computer terminal 80. As referred to in the embodiments of the present application, the data processing circuit acts as a processor control (e.g., selection of the path of the variable resistor termination to interface).

The memory 804 may be used for storing software programs and modules of application software, such as program instructions/data storage devices corresponding to the data processing method based on distributed storage in the embodiments of the present application, and the processor 802 executes the software programs and modules stored in the memory 804, thereby performing various functional applications and data processing, that is, implementing the data processing method based on distributed storage. The memory 804 may include high-speed random access memory, but may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory. In some examples, the memory 804 may further include memory located remotely from the processor 802, which may be connected to the computer terminal 80 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission module 806 is used to receive or transmit data via a network. The specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 80. In one example, the transmission module 806 includes a network adapter (Network Interface Controller, NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission module 806 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 80.

It should be noted here that, in some alternative embodiments, the computer terminal shown in fig. 8 may include hardware elements (including circuits), software elements (including computer code stored on a computer readable medium), or a combination of both hardware and software elements. It should be noted that fig. 8 is only one example of a specific example, and is intended to illustrate the types of components that may be present in the computer terminal described above.

It should be noted that, the computer terminal shown in fig. 8 is configured to execute the data processing method based on distributed storage shown in fig. 2, so that the explanation of the method for executing the command is also applicable to the electronic device, and will not be repeated here.

The embodiment of the application also provides a nonvolatile storage medium, which comprises a stored program, wherein the program controls equipment where the storage medium is located to execute the data processing method based on distributed storage when running.

The nonvolatile storage medium executes a program of the following functions: nodes in the distributed storage system receive a data writing request, wherein the writing request comprises the following steps: the device comprises data to be written and a first index, wherein the first index is used for indicating the corresponding relation between a logic address of the data to be written and a first identifier, and the first identifier is an identifier of metadata of the data to be written; writing the data to be written into target data containers in a plurality of data containers to obtain written data and container position descriptors of the written data, wherein the target data containers comprise a plurality of storage units, and the container position descriptors are used for recording information of the target data containers where the written data are located and information of the target storage units where the written data are located; updating the first index based on the container position descriptor and the first identifier to obtain a second index, and updating the state of a state machine corresponding to the node according to the second index, wherein the second index is used for indicating the corresponding relation between the information of the storage unit where the written data are located and the container position descriptor in the metadata.

The embodiment of the application also provides electronic equipment, which comprises: the system comprises a memory and a processor for running a program stored in the memory, wherein the program runs to execute the data processing method based on distributed storage.

The processor is configured to execute a program that performs the following functions: nodes in the distributed storage system receive a data writing request, wherein the writing request comprises the following steps: the device comprises data to be written and a first index, wherein the first index is used for indicating the corresponding relation between a logic address of the data to be written and a first identifier, and the first identifier is an identifier of metadata of the data to be written; writing the data to be written into target data containers in a plurality of data containers to obtain written data and container position descriptors of the written data, wherein the target data containers comprise a plurality of storage units, and the container position descriptors are used for recording information of the target data containers where the written data are located and information of the target storage units where the written data are located; updating the first index based on the container position descriptor and the first identifier to obtain a second index, and updating the state of a state machine corresponding to the node according to the second index, wherein the second index is used for indicating the corresponding relation between the information of the storage unit where the written data are located and the container position descriptor in the metadata.

The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed technology content may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, for example, may be a logic function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be essentially or a part contributing to the related art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application and are intended to be comprehended within the scope of the present application.

Claims

1. A data processing method based on distributed storage, comprising:

a node in a distributed storage system receives a data write request, wherein the write request comprises: the data writing device comprises data to be written and a first index, wherein the first index is used for indicating the corresponding relation between a logic address of the data to be written and a first identifier, and the first identifier is an identifier of metadata of the data to be written;

writing the data to be written into a target data container in a plurality of data containers to obtain written data and container position descriptors of the written data, wherein the target data container comprises a plurality of storage units, and the container position descriptors are used for recording information of the target data container where the written data are located and information of the target storage unit where the written data are located;

Updating the first index based on the container position descriptor and the first identifier to obtain a second index, and updating the state of a state machine corresponding to the node according to the second index, wherein the second index is used for indicating the corresponding relation between the information of the storage unit where the written data is located and the container position descriptor in the metadata.

2. The method of claim 1, wherein each of the plurality of memory cells has a uniform capacity of x, wherein x is a natural number.

3. The method of claim 2, wherein writing the data to be written to a target data container of a plurality of data containers comprises:

acquiring the offset of the data to be written relative to the target storage unit under the condition that the capacity of the data to be written does not meet k x, wherein k is a positive integer;

determining initial data in the target storage unit according to the capacity of the target storage unit and the offset;

combining the initial data with the data to be written to obtain target data;

and writing the target data into the target storage unit in the target data container.

4. The method according to claim 1, wherein the method further comprises: and deleting the data in a first data container in the plurality of data containers under the condition that the residual space of the plurality of data containers is lower than a first preset threshold value and/or according to a preset time interval, wherein the first data container is any data container except the current data container in the plurality of data containers.

5. The method of claim 4, wherein deleting data in a first data container of the plurality of data containers comprises:

determining the number of first storage units in each of the plurality of data containers, wherein the first storage units are storage units which have no corresponding relation with the container position descriptors;

copying data in a second storage unit to the current data container under the condition that the number of the first storage units in a first data container exceeds a second preset threshold, wherein the second storage unit is a storage unit with a corresponding relation with the container position descriptor;

after copying the data in the second storage unit to the current data container, deleting the first data container.

6. The method of claim 5, wherein copying data in a second storage unit to the current data container comprises: and modifying the second index of the data copied to the current data container according to the information of the current data container where the data is located and the information of the current storage unit where the data is located.

7. The method of claim 1, wherein writing the data to be written to a target data container of a plurality of data containers comprises: and writing the data to be written into the target data container in a plurality of data containers in an additional writing mode.

8. A data processing method based on distributed storage, comprising:

a node in a distributed storage system receives a data read request, wherein the data read request comprises: data to be read and a logic address of the data to be read;

determining metadata of the data to be read through a second index based on the logical address of the data to be read, wherein the second index is used for indicating the corresponding relation between the logical address of the data to be read and a second identifier, the second identifier is an identifier of the metadata of the data to be read, and the metadata of the data to be read comprises: the corresponding relation between the information of the storage unit where the data to be read is located and the container position descriptor, wherein the container position descriptor is used for recording the information of the target data container where the data to be read is located and the information of the target storage unit where the data to be read is located;

And determining the information of the target data container where the data to be read is located and the information of the target storage unit where the data to be read is located according to the container position descriptor, and reading the data to be read based on the information of the target data container where the data to be read is located and the information of the target storage unit where the data to be read is located.

9. A data processing apparatus based on distributed storage, comprising:

the receiving module is used for receiving a data writing request, wherein the writing request comprises the following steps: the data writing device comprises data to be written and a first index, wherein the first index is used for indicating the corresponding relation between a logic address of the data to be written and a first identifier, and the first identifier is an identifier of metadata of the data to be written;

the writing module is used for writing the data to be written into a target data container in a plurality of data containers to obtain written data and container position descriptors of the written data, wherein the target data container comprises a plurality of storage units, and the container position descriptors are used for recording information of the target data container where the written data are located and information of the target storage unit where the written data are located;

And the updating module is used for updating the first index based on the container position descriptor and the first identifier to obtain a second index, and updating the state of a state machine corresponding to a node in the distributed storage system according to the second index, wherein the second index is used for indicating the corresponding relation between the information of the storage unit where the written data is located and the container position descriptor in the metadata.

10. A non-volatile storage medium, characterized in that the non-volatile storage medium comprises a stored program, wherein the program, when run, controls a device in which the non-volatile storage medium is located to perform the distributed storage based data processing method of any one of claims 1 to 8.

11. An electronic device, comprising: a memory and a processor for executing a program stored in the memory, wherein the program is executed to perform the distributed storage-based data processing method of any one of claims 1 to 8.