CN116560560A

CN116560560A - Method for storing data and related device

Info

Publication number: CN116560560A
Application number: CN202210114211.2A
Authority: CN
Inventors: 冯锐; 王中天; 邵鑫; 肖林厂
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-01-30
Filing date: 2022-01-30
Publication date: 2023-08-08

Abstract

The embodiment of the invention provides a method for storing data and a related device, wherein the method comprises the following steps: the computing node determines a plurality of data units processed by the same process in the target file and sends the plurality of data units to the same storage node. Since data units belonging to the same process are all written into the same storage node, when data units of one process are read, only required data units can be read from one storage node, and the data units required to be read do not need to be searched from all the storage nodes. Therefore, through the technical scheme, when the data unit of the same process is required to be read and written, the read and write operation can be carried out on only one storage node which stores the data unit of the same process, so that the read and write expenditure can be effectively reduced.

Description

Method for storing data and related device

Technical Field

The embodiment of the invention relates to the field of storage, in particular to a method and a related device for storing data.

Background

A computer cluster is a type of computer system. Computer clusters are connected by a loosely-integrated set of computer software or hardware to perform computing work in a highly-tight, coordinated manner.

According to functionality and architecture, computer clusters can be divided into the following categories: high-availability (HA) clusters, load balancing (load balancing) clusters, high-performance computing (high performance computing, HPC) clusters, etc.

The computer devices in a computer cluster may be referred to as nodes. A computer cluster typically includes a plurality of computing nodes that are responsible for performing computing tasks and a plurality of storage nodes that are responsible for storing data. In order to obtain higher execution efficiency, a computing task is generally divided into equal-sized slices, and the divided slices are respectively sent to a plurality of processes of a plurality of computing nodes to be executed. The computing node performing the computing task of each tile will have access to the data it needs, so there are two typical models of computer cluster application access to data: 1) An N-N model; 2) N-1 model, wherein N-1 model also includes an N-1 sliced (N-1 segment) model and an N-1 stride (N-1 stride) model.

FIG. 1 shows a schematic diagram of an N-N model, an N-1 segment model, and an N-1 stride model.

As shown in (a) of fig. 1, in the N-N model, each process accesses a separate file, and there is no conflict between the files accessed by different processes.

As shown in FIGS. 1 (b) and (c), in the N-1 segment model and the N-1 stride model, different processes access different locations of the same file. The N-1 segment model differs from the N-1 stride model in that the Input Output (IO) granularity per access is different from the contiguous area that each process accesses.

The N-1 model uses only one file. Therefore, the N-1 model is simpler to apply more optimally. However, the locations of the read and write data of each process in the N-1 stride model are discontinuous in the file, and the IO granularity is smaller, so that a larger read and write overhead is easily caused. Therefore, how to reduce the read-write overhead of using the N-1 stride model is a current urgent problem to be solved.

Disclosure of Invention

The embodiment of the invention provides a method and a related device for storing data, which can reduce the reading and writing expense of the data.

In a first aspect, the present invention provides a method of storing data, the method comprising: the method comprises the steps that a computing node obtains all data units corresponding to a target process in a target file, wherein the target file comprises a plurality of data units, each data unit in the plurality of data units corresponds to one process in M processes, each process in the M processes has at least two corresponding data units in the target file, the target process is one process in the M processes, the target process is operated by the computing node, and M is a positive integer greater than or equal to 2; the computing node writes all data units corresponding to the target process to a target storage node, wherein the target storage node is one of a plurality of available storage nodes of the computing node.

According to the technical scheme, the data units corresponding to the same process are all written into the same storage node. In this way, when writing data, only data units corresponding to the same process can be sent to one storage node, without being sent to a plurality of storage nodes respectively. Furthermore, since data units belonging to the same process are all written into the same storage node, when data units of one process are read, only required data units can be read from one storage node, and the data units required to be read do not need to be searched from all the storage nodes. Therefore, through the technical scheme, when the data unit of the same process is required to be read and written, the read and write operation can be carried out on only one storage node which stores the data unit of the same process, so that the read and write expenditure can be effectively reduced.

With reference to the first aspect, in a possible implementation manner of the first aspect, the processes corresponding to any two adjacent data units in the plurality of data units are different.

With reference to the first aspect, in a possible implementation manner of the first aspect, the obtaining, by the computing node, all data units corresponding to the target process in the target file includes: the computing node determines all data units corresponding to the identification information of the target process in the target file.

With reference to the first aspect, in a possible implementation manner of the first aspect, before the computing node writes all data units corresponding to the target process to the target storage node, the method further includes: the computing node determines a storage node corresponding to the identification information of the target process from a plurality of storage nodes as the target storage node.

With reference to the first aspect, in a possible implementation manner of the first aspect, the target file includes multiple sets of data units, each set of data units in the multiple sets of data units includes M data units, the M data units respectively correspond to the M processes, positions of data units corresponding to a same process in the target file in any two sets of data units in the multiple sets of data units are the same, and the computing node acquires all data units corresponding to the target process in the target file, including: the computing node determines the data unit located at the corresponding position of the target process in each group of data units according to the length of the data unit, the position of each data unit in the plurality of data units in the target file, and the process number M.

With reference to the first aspect, in a possible implementation manner of the first aspect, before the computing node writes all data units corresponding to the target process to the target storage node, the method further includes: the computing node determines a storage node corresponding to a target location from a plurality of storage nodes as the target storage node, the target location being a location of a data unit corresponding to the target process in each group of data units.

With reference to the first aspect, in a possible implementation manner of the first aspect, writing, by the computing node, all data units corresponding to the target process to the target storage node includes: the computing node divides all data units corresponding to the target process into K groups of data units, each group of data units in the K groups of data units comprises a plurality of data units corresponding to the target process, and K is a positive integer greater than or equal to 1; the computing node writes the K groups of data units into a cache of the computing node, wherein addresses of a plurality of data units included in each group of data units in the K groups of data units in the cache are continuous; the computing node determines K metadata, wherein the K metadata corresponds to the K groups of data units one by one, and each piece of metadata in the K metadata comprises position information of each data unit in a corresponding group of data units and the length of the data unit; the computing node sends the K target metadata and the K sets of data units retrieved from the cache to the first storage node.

According to the technical scheme, a plurality of small data units are aggregated into one large data unit, so that the data unit which originally needs a plurality of write operations can be sent to the storage node through one write operation. Thus, the data reading and writing cost can be effectively reduced. In addition, the aggregation of a plurality of metadata into one metadata can also effectively reduce the read-write overhead of the metadata.

With reference to the first aspect, in a possible implementation manner of the first aspect, the method further includes: the computing node obtains a reading instruction, wherein the reading instruction is used for indicating the K metadata and the K group data units to be read from the target storage node; the computing node reading the K metadata and the K group data units from the target storage node; the computing node determines the position of each data unit included in the K groups of data units in the target file according to the K metadata.

In a second aspect, the invention provides a computer device comprising means for implementing the first aspect or any one of the possible implementations of the first aspect.

In a third aspect, the present invention provides a computer device comprising a processor for coupling with a memory, reading and executing instructions and/or program code in the memory to perform the first aspect or any of the possible implementations of the first aspect.

In a fourth aspect, the present invention provides a chip system comprising logic circuitry for coupling with an input/output interface through which data is transferred for performing the first aspect or any one of the possible implementations of the first aspect.

In a fifth aspect, the present invention provides a computer readable storage medium storing program code which, when run on a computer, causes the computer to perform any one of the possible implementations as in the first aspect or the first aspect.

In a sixth aspect, the present invention provides a computer program product comprising: computer program code which, when run on a computer, causes the computer to perform any one of the possible implementations as or in the first aspect.

Drawings

FIG. 1 is a schematic diagram of an N-N model, an N-1 segment model, and an N-1 stride model.

Fig. 2 is a schematic block diagram of a computing cluster according to an embodiment of the present invention.

FIG. 3 is a schematic diagram of the relationship of application, integration and compute nodes.

Fig. 4 is a schematic flow chart of a method of storing data provided in accordance with an embodiment of the present invention.

FIG. 5 is a schematic diagram of correspondence between object files, processes, and compute nodes.

Fig. 6 is a schematic diagram of a target file.

Fig. 7 is a block diagram of a computer device according to an embodiment of the present invention. _cell

Detailed Description

The technical scheme of the invention will be described below with reference to the accompanying drawings.

As shown in fig. 2, the computing cluster provided in this embodiment includes a computing node sub-cluster and a storage node sub-cluster.

The computing node subset includes one or more computing nodes 210 (three computing nodes 210 are shown in fig. 1, but are not limited to three computing nodes 210), and the computing nodes 210 may communicate with each other.

The computing node 210 is a computing device such as a server, a desktop computer, or a controller of a storage array. In hardware, as shown in fig. 2, the computing node 210 includes at least a processor 212, a memory 213, and a network card 214. The processor 212 is a central processing unit (central processing unit, CPU) for processing data access requests from outside the computing node 210 or requests generated inside the computing node 210. Illustratively, when the processor 212 receives write data requests sent by a user, the data in these write data requests is temporarily stored in the memory 213. When the total amount of data in the memory 213 reaches a certain threshold, the processor 212 sends the data stored in the memory 213 to the storage node 200 for persistent storage. In addition, the processor 212 is configured to perform calculations or processing on data, such as metadata management, deduplication, data compression, virtualized storage space, address translation, and the like. Only one CPU 212 is shown in fig. 2, and in practical applications, there are often a plurality of CPUs 212, where one CPU 212 has one or more CPU cores. The present embodiment does not limit the number of CPUs and the number of CPU cores.

The memory 213 is an internal memory for directly exchanging data with the processor, and can read and write data at any time, and is fast, and is used as a temporary data memory for an operating system or other running programs. The memory includes at least two types of memories, for example, the memory may be a random access memory (ram) or a Read Only Memory (ROM). For example, the random access memory is a dynamic random access memory (dynamic random access memory, DRAM), or a storage class memory (storage class memory, SCM). DRAM is a semiconductor memory, which, like most random access memories (Random Access Memory, RAM), is a volatile memory (volatile memory) device. SCM is a composite storage technology combining both traditional storage devices and memory characteristics, and storage class memories can provide faster read and write speeds than hard disks, but access speeds slower than DRAM, and are cheaper in cost than DRAM. However, the DRAM and SCM are only exemplary in this embodiment, and the memory may also include other random access memories, such as static random access memories (static random access memory, SRAM), and the like. For read-only memory, for example, it may be a programmable read-only memory (programmable read only memory, PROM), erasable programmable read-only memory (erasable programmable read only memory, EPROM), etc. In addition, the memory 213 may be a dual in-line memory module or a dual in-line memory module (DIMM), i.e., a module composed of Dynamic Random Access Memory (DRAM), or a Solid State Disk (SSD). In practical applications, multiple memories 213 may be configured in the computing node 210, and different types of memories 213 may be configured. The number and type of the memories 213 are not limited in this embodiment. In addition, the memory 213 may be configured to have a power conservation function. The power-up protection function means that the data stored in the memory 213 is not lost when the system is powered down and powered up again. The memory having the power-saving function is called a nonvolatile memory.

The network card 214 is used to communicate with the storage node 200. For example, when the total amount of data in memory 213 reaches a threshold, computing node 210 may send a request to storage node 200 via network card 214 to persist the data. In addition, computing node 210 may also include a bus for communication among the components within computing node 210. Functionally, since the primary function of the compute node 210 in FIG. 2 is computing traffic, persistent storage can be achieved with remote storage while storing data, so it has less local storage than conventional servers, thereby achieving cost and space savings. However, this does not represent that the computing node 210 cannot have a local memory, and in an actual implementation, the computing node 210 may have a small number of hard disks built therein or a small number of external hard disks.

Any one of the computing nodes 210 may access any one of the storage nodes 200 in the storage node sub-cluster via the network. The storage node subset includes a plurality of storage nodes 200 (three storage nodes 200 are shown in fig. 2, but are not limited to three storage nodes 200). A storage node 200 includes one or more controllers 201, network cards 204, and a plurality of hard disks 205. The network card 204 is used to communicate with the computing node 210. The hard disk 205 is used to store data, and may be a magnetic disk or other type of storage medium, such as a solid state disk or shingled magnetic recording disk. The controller 201 is configured to write data to the hard disk 205 or read data from the hard disk 205 according to a read/write data request sent by the computing node 210. In the process of reading and writing data, the controller 201 needs to convert an address carried in a read/write data request into an address identifiable by the hard disk. It follows that the controller 201 also has some simple computing functions.

In practical applications, the control unit 201 may have various forms. In one case, the control unit 201 includes a CPU and a memory. The CPU is used for performing address conversion, data reading and writing and other operations. The memory is used to temporarily store data to be written to the hard disk 205 or read data from the hard disk 205 to be sent to the computing node 210. In another case, the control unit 201 is a programmable electronic component, such as a data processing unit (data processing unit, DPU). The DPU has the versatility and programmability of the CPU, but more specialized, and can operate efficiently on network packets, storage requests, or analysis requests. The DPU is distinguished from the CPU by a large degree of parallelism (requiring handling of a large number of requests). Alternatively, the DPU may be replaced by a graphics processing unit (graphics processing unit, GPU), an embedded neural Network Processor (NPU), or the like. In general, the number of the control units 201 may be one, two or more. When the storage node 200 comprises at least two control units 201, there may be a home relationship between the hard disk 205 and the control units 201. When there is a home relationship between the hard disk 205 and the control unit 201, each controller can only access the hard disk attributed to it, and this therefore tends to involve forwarding read/write data requests between the control units 201, resulting in a longer path for data access. In addition, if the storage space is insufficient, the home relation between the hard disk 205 and the controller 201 needs to be rebind when a new hard disk 205 is added in the storage node 200, and the operation is complex, resulting in poor expansibility of the storage space.

Thus, in another embodiment, the functionality of the control unit 201 may be offloaded onto the network card 204. In other words, the storage node 200 does not have the controller 201 inside, but the network card 204 performs data reading and writing, address translation, and other computing functions. In this case, the network card 204 is an intelligent network card. It may contain a CPU and memory. The CPU is used for performing address conversion, data reading and writing and other operations. The memory 103 is used for temporarily storing data to be written to the hard disk 205 or reading data to be sent to the computing node 210 from the hard disk 205. Or a programmable electronic component, such as a data processing unit (data processing unit, DPU). The DPU has the versatility and programmability of the CPU, but more specialized, and can operate efficiently on network packets, storage requests, or analysis requests. The DPU is distinguished from the CPU by a large degree of parallelism (requiring handling of a large number of requests). Alternatively, the DPU may be replaced by a graphics processing unit (graphics processing unit, GPU), an embedded neural Network Processor (NPU), or the like. There is no attribution relationship between the network card 204 and the hard disk 205 in the storage node 200, and the network card 204 can access any hard disk 205 in the storage node 200, so that it is convenient to expand the hard disk when the storage space is insufficient.

The storage node sub-cluster stores data that the computing node needs to access through the parallel file system. Parallel file systems require a unified namespace and can support multi-client/multi-process parallel access. The application may access the parallel file system through a portable operating system interface (portable operating system interface of UNIX, POSIX) or a messaging interface (message passing interface, MPI). When using POSIX to read and write data in a parallel file system, data consistency in applying mutually exclusive access files may be provided by the parallel file system. When MPI is used for reading and writing data in the parallel file system, task fragments running on each process of each node can be communicated with each other through MPI, and negotiation and ensuring that data access of files cannot overlap can be achieved between the task fragments, and the file system is not required to provide data consistency protection.

FIG. 3 is a schematic diagram of the relationship of applications, processes and compute nodes.

An application (application) of a computing cluster may be made up of multiple processes. The application shown in fig. 3 consists of process 0 to process 5. Different processes may be run by different computing nodes. As shown in fig. 3, process 0 and process 3 are run by compute node 0, process 1 and process 4 are run by compute node 1, and process 2 and process 5 are run by compute node 2.

The data unit in the embodiment of the invention refers to a piece of data in a file. The file may be retrieved from the storage node sub-cluster by the distributed file system. The process acquires the data to be processed from the file with the size of one data unit as granularity. In other words, a data unit may be understood as data that a process obtains from a file through a read operation. The size of a data cell is the data size that needs to be read for this read operation.

401, a computing node obtains all data units corresponding to a target process in a target file, wherein the target file comprises a plurality of data units, each data unit in the plurality of data units corresponds to one process in M processes, each process in the M processes has at least two corresponding data units in the target file, the first process is one process in the M processes, the target process is operated by the computing node, and M is a positive integer greater than or equal to 2.

402, the computing node writes all data units corresponding to a target process to a target storage node, wherein the target storage node is one of a plurality of available storage nodes of the computing node.

The solution shown in fig. 4 writes data units corresponding to the same process to the same storage node. In this way, when writing data, only data units corresponding to the same process can be sent to one storage node, without being sent to a plurality of storage nodes respectively. Furthermore, since data units belonging to the same process are all written into the same storage node, when data units of one process are read, only required data units can be read from one storage node, and the data units required to be read do not need to be searched from all the storage nodes. Therefore, through the technical scheme, when the data unit of the same process is required to be read and written, the read and write operation can be carried out on only one storage node which stores the data unit of the same process, so that the read and write expenditure can be effectively reduced.

The method shown in fig. 4 is described below in connection with fig. 5.

As shown in fig. 5, the application is composed of processes 0 to 5, processes 0 and 3 are run by computing node 0, processes 1 and 4 are run by computing node 1, and processes 2 and 5 are run by computing node 2.

The object file includes 18 data units, which may be referred to as data unit 0 through data unit 17, respectively.

Table 1 shows the correspondence between 18 data units and 6 processes.

TABLE 1

Progress of a process	Data unit
		0	0，6，12
1	1，7，13
		2	2，8，14
3	3，9，15
		4	4，10，16
5	5，11，17

As in table 1, therefore, the data units corresponding to process 0 include data unit 0, data unit 6 and data unit 12, and the data units corresponding to process 3 include data unit 3, data unit 9 and data unit 15.

In some embodiments, each process has an identification (e.g., task identification (taskID) and the identification of the different processes is different, the process identification of the data unit is the identification of the process to which the data unit corresponds, taking Table 1 as an example, if the identification of Process 0 is 0, then the process identifications of data unit 0, data unit 6 and data unit 12 are 0, and if the identification of Process 4 is 4, then the process identifications of data unit 4, data unit 10 and data unit 16 are 4, in which case it can be determined whether the data units correspond to the same process based on the process identifications of the data units.

In the case of accessing a parallel storage system using POSIX, process identification information for each data unit may be obtained by hijacking the code. By hijacking the code, it is possible to determine which process each data unit is executed by, and thus process identification information of each data unit.

When an application running in an MPI framework uses MPI-IO to access data, an interface carries IO information accessed by the application. In this case, it may be determined directly from the IO information which process each data unit was acquired by. In this way, the process identification information of the data unit can be determined.

In other embodiments, it may be determined whether the processes corresponding to the data units are the same based on the relative location of each data unit. The process in the application reads the target file in sequence to obtain the data unit. Thus, the data units in the target file are regular. The data units in the target file may be divided into multiple groups, and each group of data units may include the same number of data units as the total number of processes. For example, if the target file is handled by M processes, each group of data units includes M data units, which are handled by the M processes, respectively. The positions of the data units processed by each process in any two groups of data units are the same.

Fig. 6 is a schematic diagram of a target file. The target file as shown in fig. 6 is represented by data units corresponding to different processes in the file system shown in fig. 5 using different fills. The object file as shown in fig. 6 may be divided into three groups of data units, each group of data units including 6 data units, the 6 data units being in one-to-one correspondence with processes 0 to 5. The data units in the object file as shown in fig. 6 may be considered to be repeated in the order of the pattern as shown in fig. 6. Thus, the locations of the data units in the pattern corresponding to the same process are all the same. For example, the data units corresponding to process 0 are all the first data unit in the pattern, and the data unit corresponding to process 1 is the 3 rd data unit. Thus, it may be determined whether the positions (which may be referred to as relative positions) of different data units in the pattern are the same to determine whether the data units correspond to the same process. The relative position of the data units may be determined according to the following formula:

LOC _r (i) =offset (i)% triaunitsize, (equation 1)

Wherein LOC _r (i) Indicating the relative position of data unit i, offset (i) indicating the offset of data unit i, stride being the product of the total number of processes and the read-write length (i.e. the length of each data unit), and% indicating the remainder operation.

The offset of data unit i (which may also be referred to as the location of data unit i) may be specified by the application. The offsets of the data units specified by different applications may or may not be the same.

In the case of accessing a parallel storage system with MPI, when an application running in the MPI framework accesses data using MPI-IO, the interface will carry IO information accessed by the application, such as the data type, number, etc. of each IO. Therefore, under the condition that the MPI is applied to access the target file, the computing node can acquire aggregate (collective) IO information in an abstract device IO layer (abstract device I/O, ADIO), wherein the aggregate IO comprises IO size and total number of processes each time. Each IO size is the length of a single data unit. Thus, the trianitsize in equation 1 can be obtained from the product of each IO size and the total number of processes.

Also taking fig. 6 as an example, assuming that the length of each data unit is x, then stride=6x. The relative positions of the 18 data units in fig. 6 may be as shown in table 2.

TABLE 2

It can be seen from table 2 that data units 0, 6 and 12 are data units corresponding to the same process.

And determining the corresponding process of each data unit according to the corresponding relation between the relative position and the process. The correspondence between the relative position and the progress is assumed as shown in table 3.

TABLE 3 Table 3

Relative position	Progress of a process
		0	0
1	1
		2	2
3	3
		4	4
5	5

Then, it can be determined from table 3 that data units 0, 6 and 12 correspond to process 0.

After determining the data units belonging to the same process, the computing node may send the data units belonging to the same process to the same storage node.

Each process has a corresponding storage node. The computing node may determine a storage node for storing the data unit corresponding to the target process according to the correspondence of the process and the node.

In some embodiments, the correspondence between a process and a node may be determined according to the identification information of the process. In other words, the computing node may determine, according to the identification information of the process, a storage node corresponding to the process. Also taking the target process as an example, the computing node may determine a first storage node for storing the first process according to the identification information of the target process and the total number of available storage nodes. The target storage node is one of a plurality of available storage nodes of the computing node.

For ease of description, the following assumes that the number of available storage nodes is N _SN . Obviously N _SN Should be a positive integer greater than or equal to 2.

In some embodiments, one or more ofTo identify information (hereinafter referred to as first identification information) and N of a process corresponding to the data unit _SN And taking the remainder, and determining a storage node for storing the data unit according to the result of taking the remainder. The first identification information is a positive integer greater than or equal to 0, N _SN Is a positive integer greater than or equal to 1. Thus, the first identification information pair N _SN The remainder of (2) is greater than or equal to 0 and less than N _SN Is an integer of (a). N (N) _SN The available storage nodes may be divided into available storage node 0, available storage node 1, … …, available storage node N _SN -1. If the first identification information is to N _SN The result after the remainder of (a) is n _SN (n _SN Is greater than or equal to 0 and less than N _SN An integer of (a) transmitting all data units corresponding to the first identification information to the available storage node n _SN . For example, if the first identification information is for N _SN If the result after the remainder is 0, transmitting all the data units corresponding to the first identification information to the available storage node 0; if the first identification information is to N _SN If the result after the remainder is 1, all data units corresponding to the first identification information are sent to the available storage node 1, and so on.

Also taking the scenario shown in fig. 4 as an example, it is assumed that the total number of storage nodes is 3 (referred to as storage node 0, storage node 1, and storage node 2, respectively), and the identification information of processes 0 to 5 is 0 to 5, respectively. Then, according to the above solution, the data units of process 0 and process 3 may be stored to storage node 0, the data units of process 1 and process 4 may be stored to storage node 1, and the data units of process 2 and process 5 may be stored to storage node 2.

In other embodiments, the identification information (hereinafter referred to as second identification information) and N of the computing node may be _SN And taking the remainder, and determining a storage node for storing the data unit according to the result of taking the remainder. The second identification information is a positive integer greater than or equal to 0, N _SN Is a positive integer greater than or equal to 1. Thus, the second identification information pair N _SN The remainder of (2) is greater than or equal to 0 and less than N _SN Is an integer of (a). Storage method of specific data unitAnd according to the identification information and N of the corresponding process of the data unit _SN The storage method of the remainder is the same,

in other embodiments, the correspondence between a process and a node may be determined according to the relative location of the data units corresponding to the process. In other words, the computing node may determine the storage node corresponding to the process according to the relative location of the data unit corresponding to the process. Taking a target process as an example, the computing node determines a storage node corresponding to a target location from a plurality of storage nodes as the target storage node, where the target location is a location where a data unit corresponding to the target process in each group of data units is located. The target storage node is one of a plurality of available storage nodes of the computing node. The computing node may calculate the relative position and N of the data units _SN And taking the remainder, and determining a storage node for storing the data unit according to the result of taking the remainder. The relative position is a positive integer greater than or equal to 0, N _SN And is also a positive integer greater than or equal to 1. Thus, the relative position is relative to N _SN The remainder of (2) is greater than or equal to 0 and less than N _SN Is an integer of (a). Thus, relative position information pair N _SN The remainder of (2) is greater than or equal to 0 and less than N _SN Is an integer of (a). N (N) _SN The available storage nodes may be divided into available storage node 0, available storage node 1, … …, available storage node N _SN -1. If the relative position is to N _SN The result after the remainder of (a) is n _SN (n _SN Is greater than or equal to 0 and less than N _SN An integer of (a) then sends the data unit of the relative position to the available storage node n _SN . For example, if the relative position is to N _SN If the result after the remainder is 0, sending the data unit of the relative position to the available storage node 0; if the relative position is to N _SN The result of the remainder of (1) is that the data unit of the relative position is sent to the available storage node 1, and so on.

In some embodiments, assume N _SN If the value of (2) is 2, it is possible to directly determine whether the first identification information/second identification information/relative position is odd or even. Taking the first identification information as an example, the first identification can be obtained Data units with odd information are stored in one of the two available storage nodes, and data units with even first identification information are stored in the other of the two available storage nodes.

In other embodiments, the correspondence between the computing node and the storage node is preset and stored in the computing node. The computing node may directly send the data of the process running in the computing node to the corresponding storage node by querying the correspondence. Let the number of compute nodes be 9 and the number of storage nodes be 3. The computing node may determine the storage node holding the data unit according to the correspondence as shown in table 4.

TABLE 4 Table 4

Computing node	Storage node
		0，1，2	0
3，4，5	1
		6，7，8	2

As shown in table 4, computing node 0 may save all data units of processes running in computing node 0 to storage node 0, and computing node 3 may save all data units of processes running in computing node 3 to storage node 1.

In some embodiments, the computing node may directly send the data units belonging to the same process and the metadata corresponding to the data units to the determined storage node.

In other embodiments, a computing node may aggregate multiple data units corresponding to the same process before sending the multiple data units to a determined storage node. Specifically, the compute node may allocate a length of cache in memory (e.g., memory 213 in fig. 2) for each process. Multiple processes that need to be sent to a storage node may be written to the cache allocated for that process, with the storage addresses of the multiple computing units in the cache being contiguous. The plurality of data units also respectively correspond to a plurality of metadata, and each metadata comprises identification information of a process corresponding to the corresponding data unit and a storage address of the data unit in a cache. When the data units are aggregated, a plurality of metadata corresponding to the plurality of data units can be aggregated. The aggregated metadata may include location information for the data units and lengths of the data units. The aggregated metadata may also include a total number of processes. After aggregating the data units and metadata, the compute node may send the aggregated data units and metadata to the storage node. Accordingly, when a read instruction is obtained, the computing node may read the aggregated data units and metadata from the corresponding storage node. The computing node may restore the aggregated data unit to a data unit before aggregation according to the aggregated metadata, and determine a location of each data unit before aggregation in the target file.

Optionally, in some embodiments, an aggregation threshold may be set, and if the number of data units acquired by the computing node corresponding to the same process is greater than the aggregation threshold, the acquired data units may be divided into a plurality of data unit sets, where each data unit set in the plurality of data unit sets includes a number of data units less than or equal to the aggregation threshold. Multiple data units belonging to the same set of data units may be aggregated into one data unit.

By aggregating the data units, the aggregated data units may be sent to the storage node at one time according to the size of the aggregated data units. Also taking the scenario shown in fig. 5 as an example, if data unit aggregation is not performed, then computing node 0 needs to write data unit 0, data unit 6, and data unit 12 to the storage node with three write operations. However, if data unit 0, data unit 6, and data unit 12 are aggregated into one large data unit, then compute node 0 may write three data units to the storage node with only one write operation. In other words, by aggregating operations, a compute node may aggregate multiple small write operations into one large write operation. Similarly, if no aggregation operation is performed, then computing node 0 needs to acquire data unit 0, data unit 6, and data unit 12 from the storage node, then computing node 0 may need to acquire these three data units through three read operations. However, if an aggregation operation is performed, computing node 0 may read these three data units from the storage node with only one read operation. From this, it can be seen that the IO number can be effectively reduced by the aggregation operation. In addition, the number of metadata is also reduced from three to 1 by the aggregation operation. Thus, the reading and writing expense of the metadata can be effectively reduced. In addition, the data units are aggregated together in access order.

In addition, during data reading, the computing node can trigger intelligent pre-reading according to the intelligent identification of the subsequent access data position, and if the number of read processes and IO size are the same as those of writing, the data is almost completely read sequentially.

Also taking the scenario shown in fig. 5 as an example, it is assumed that when reading data units 0 to 17 stored in the storage node, the application is still implemented by processes 0 to 5, and the size of the single data unit is still the same as the size of the single data unit shown in fig. 5. Further, it is assumed that data unit 0, data unit 6 and data unit 12 are aggregated and stored in the storage node as data unit A1. In reading data unit, process 0 can read data unit 0, data unit 6 and data unit 12 from data unit A1 in sequence directly according to the address order and the size of the individual data units.

In some embodiments, the computing node may periodically send the data units in the cache to the determined storage node.

In other embodiments, the compute node sends the data units in the cache to the determined storage node according to an instruction (e.g., fsync).

In other embodiments, the computing node may also determine when to send data units in the cache to the determined storage node based on the ratio of the space used in the cache to the total cache space. Assuming that the computing node determines that the ratio of the occupied cache space to the total cache space corresponding to the process is greater than or equal to a preset threshold, the data unit in the cache may be sent to the determined storage node.

In other embodiments, the computing node may also determine when to send data in the cache to the determined storage node based on multiple ones of the cycle, instruction, and cache occupancy (i.e., occupied cache space size/total cache space size) simultaneously. For example, if a compute node receives a send instruction within a cycle, the compute node sends the data in the cache to a certain storage node even if the cycle is not over. For another example, if the storage node does not receive the sending instruction or the period is not finished, but the occupied proportion of the buffer space corresponding to the process is greater than the preset threshold, the computing node may send the data unit in the buffer to the determined storage node.

Fig. 7 is a block diagram of a computer device according to an embodiment of the present invention. The computer device 700 as shown in fig. 7 includes a processing unit 701 and a transmitting unit 702.

The processing unit 701 is configured to obtain all data units corresponding to a target process in a target file, where the target file includes a plurality of data units, each data unit in the plurality of data units corresponds to one process in M processes, and each process in the M processes has at least two corresponding data units in the target file, the target process is one process in the M processes, and the target process is executed by the computer device 700, and M is a positive integer greater than or equal to 2.

A sending unit 702, configured to write all data units corresponding to the target process to a target storage node, where the target storage node is one of a plurality of available storage nodes of the computer device 700.

In some embodiments, the computer device 700 may also include a cache unit 703. The processing unit 701 is further configured to divide all data units corresponding to the target process into K groups of data units, where each group of data units in the K groups of data units includes a plurality of data units corresponding to the target process, and K is a positive integer greater than or equal to 1; a processing unit 701, further configured to write the K sets of data units into the buffer unit 703, where addresses of a plurality of data units included in each of the K sets of data units in the buffer unit 703 are consecutive; the processing unit 701 is further configured to determine K metadata, where the K metadata corresponds to the K sets of data units one by one, and each metadata in the K metadata includes location information of each data unit in the corresponding set of data units and a length of the data unit; the sending unit 702 is specifically configured to send the K target metadata and the K sets of data units acquired from the buffering unit 703 to the first storage node.

In some embodiments, computer device 700 also includes a receiving unit 704. The processing unit 701 is further configured to obtain a read instruction, where the read instruction is used to instruct to read the K metadata and the K group data units from the target storage node. And a receiving unit 704, configured to read the K metadata and the K group data units from the target storage node according to the read instruction acquired by the processing unit. The processing unit 701 is further configured to determine, according to the K metadata, a location of each data unit included in the K sets of data units in the target file.

The specific functions and beneficial effects of the processing unit 701, the transmitting unit 702, the buffering unit 703 and the receiving unit 704 may be referred to the above embodiments, and will not be described herein for brevity.

The computer device 700 shown in fig. 7 may be the computing node 210 shown in fig. 2. The processing unit 701 may be the CPU 212 shown in fig. 2, the cache unit may be the memory 213 shown in fig. 2, and the transmitting unit 702 and the receiving unit 704 may be the network card 214 shown in fig. 2.

The present invention also provides a computer device comprising a processor for coupling with a memory, reading and executing instructions and/or program code in the memory to perform the methods of the above embodiments.

Embodiments of the present invention provide a chip system including a logic circuit for coupling with an input/output interface through which data is transmitted to perform the method of the above embodiments.

An embodiment of the present invention provides a computer-readable storage medium storing program code that, when run on a computer, causes the computer to perform a method as in the above embodiment.

An embodiment of the present invention provides a computer program product comprising: computer program code which, when run on a computer, causes the computer to perform the method as in the embodiments described above.

It should be appreciated that the processor in embodiments of the present invention may be an integrated circuit chip with the capability to process signals. In implementation, the steps of the above method embodiments may be implemented by integrated logic circuits of hardware in a processor or instructions in software form. The processor may be a central processing unit (central processing unit, CPU), but may also be other general purpose processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), field programmable gate arrays (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in the embodiment of the invention can be directly embodied in a hardware encoding processor for execution or in a combination of hardware and software modules in the encoding processor for execution. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.

The memory in embodiments of the present invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and direct memory bus RAM (DRRAM).

It should be noted that when the processor is a general purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, the memory (storage module) may be integrated into the processor.

It should also be noted that the memory described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided by the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, etc.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method of storing data, the method comprising:

the method comprises the steps that a computing node obtains all data units corresponding to a target process in a target file, wherein the target file comprises a plurality of data units, each data unit in the plurality of data units corresponds to one process in M processes, each process in the M processes has at least two corresponding data units in the target file, the target process is one process in the M processes, the target process is operated by the computing node, and M is a positive integer greater than or equal to 2;

the computing node writes all data units corresponding to the target process to a target storage node, wherein the target storage node is one of a plurality of available storage nodes of the computing node.

2. The method of claim 1, wherein the processes corresponding to any two adjacent data units in the plurality of data units are different.

3. The method according to claim 1 or 2, wherein the computing node obtains all data units in the target file corresponding to the target process, comprising:

and the computing node determines all data units corresponding to the identification information of the target process in the target file.

4. A method according to claim 3, wherein before the computing node writes all data units corresponding to the target process to a target storage node, the method further comprises: and the computing node determines a storage node corresponding to the identification information of the target process from a plurality of storage nodes as the target storage node.

5. The method of claim 2, wherein the target file comprises a plurality of sets of data units, each set of data units in the plurality of sets of data units comprising M data units, the M data units respectively corresponding to the M processes, the data units corresponding to the same process in the target file being identical in position in any two sets of data units in the plurality of sets of data units,

the computing node obtains all data units corresponding to the target process in the target file, including:

and the computing node determines the data unit positioned at the corresponding position of the target process in each group of data units according to the length of the data unit, the position of each data unit in the plurality of data units in the target file and the process number M.

6. The method of claim 5, wherein before the computing node writes all data units corresponding to the target process to a target storage node, the method further comprises: the computing node determines a storage node corresponding to a target position from a plurality of storage nodes as the target storage node, wherein the target position is the position of a data unit corresponding to the target process in each group of data units.

7. The method of any of claims 1 to 6, wherein the computing node writing all data units corresponding to the target process to a target storage node comprises:

the computing node divides all data units corresponding to the target process into K groups of data units, each group of data units in the K groups of data units comprises a plurality of data units corresponding to the target process, and K is a positive integer greater than or equal to 1;

the computing node writes the K groups of data units into a cache of the computing node, and addresses of a plurality of data units included in each group of data units in the K groups of data units in the cache are continuous;

the computing node determines K metadata, wherein the K metadata corresponds to the K groups of data units one by one, and each piece of metadata in the K metadata comprises position information of each data unit in a corresponding group of data units and the length of the data unit;

The computing node sends the K target metadata and the K groups of data units acquired from the cache to the first storage node.

8. The method of claim 7, wherein the method further comprises:

the computing node acquires a reading instruction, wherein the reading instruction is used for indicating the K metadata and the K group data units to be read from the target storage node;

the computing node reads the K metadata and the K group data units from the target storage node;

and the computing node determines the position of each data unit included in the K groups of data units in the target file according to the K metadata.

9. A computer device, the computer device comprising:

the processing unit is used for acquiring all data units corresponding to a target process in a target file, wherein the target file comprises a plurality of data units, each data unit in the plurality of data units corresponds to one process in M processes, each process in the M processes has at least two corresponding data units in the target file, the target process is one process in the M processes, the target process is operated by the computer equipment, and M is a positive integer greater than or equal to 2;

And the sending unit is used for writing all the data units corresponding to the target process into a target storage node, wherein the target storage node is one of a plurality of available storage nodes of the computer equipment.

10. The computer device of claim 9, wherein the processes corresponding to any two adjacent data units in the plurality of data units are different.

11. The computer device according to claim 9 or 10, wherein the processing unit is specifically configured to determine all data units in the target file corresponding to the identification information of the target process.

12. The computer apparatus according to claim 11, wherein the processing unit is further configured to determine, from a plurality of storage nodes, a storage node corresponding to the identification information of the target process as the target storage node before the transmitting unit writes all data units corresponding to the target process to the target storage node.

13. The computer device according to claim 10, wherein the target file includes a plurality of groups of data units, each group of data units in the plurality of groups of data units includes M data units, the M data units respectively correspond to the M processes, positions of data units corresponding to a same process in the target file in any two groups of data units in the plurality of groups of data units are the same, and the processing unit is specifically configured to determine, according to a length of the data unit, a position of each data unit in the plurality of data units in the target file, and a process number M, a data unit in a position corresponding to the target process in each group of data units.

14. The computer device of claim 13, wherein the processing unit is further configured to determine, from a plurality of storage nodes, a storage node corresponding to a target location as the target storage node, the target location being a location of the data unit corresponding to the target process in each group of data units, before the sending unit writes all data units corresponding to the target process to the target storage node.

15. The computer device according to any one of claims 9 to 14, further comprising a cache unit,

the processing unit is further configured to divide all data units corresponding to the target process into K groups of data units, where each group of data units in the K groups of data units includes a plurality of data units corresponding to the target process, and K is a positive integer greater than or equal to 1;

the processing unit is further configured to write the K groups of data units into the cache unit, where addresses of a plurality of data units included in each of the K groups of data units in the cache unit are consecutive;

the processing unit is further configured to determine K metadata, where the K metadata corresponds to the K groups of data units one by one, and each piece of metadata in the K metadata includes location information of each data unit in a corresponding group of data units and a length of the data unit;

The sending unit is specifically configured to send the K target metadata and the K sets of data units acquired from the buffering unit to the first storage node.

16. The computer device of claim 15, wherein the processing unit is further configured to obtain a read instruction, the read instruction being configured to instruct reading the K metadata and the K group data units from the target storage node;

the computer equipment further comprises a receiving unit, a processing unit and a storage unit, wherein the receiving unit is used for reading the K metadata and the K group data units from the target storage node according to the reading instruction acquired by the processing unit;

the processing unit is further configured to determine, according to the K metadata, a location of each data unit included in the K sets of data units in the target file.

17. A computer device, comprising: a processor for coupling with a memory, reading and executing instructions and/or program code in the memory to perform the method of any of claims 1-8.

18. A chip system, comprising: logic circuitry for coupling with an input/output interface through which data is transmitted to perform the method of any of claims 1-8.

19. A computer readable medium, characterized in that the computer readable medium stores a program code which, when run on a computer, causes the computer to perform the method according to any of claims 1-8.