CN117762880A - Data access method and computing device - Google Patents
Data access method and computing device Download PDFInfo
- Publication number
- CN117762880A CN117762880A CN202311562213.9A CN202311562213A CN117762880A CN 117762880 A CN117762880 A CN 117762880A CN 202311562213 A CN202311562213 A CN 202311562213A CN 117762880 A CN117762880 A CN 117762880A
- Authority
- CN
- China
- Prior art keywords
- file
- process group
- file system
- data block
- accessed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 826
- 238000004891 communication Methods 0.000 claims description 25
- 230000006978 adaptation Effects 0.000 claims description 21
- 238000004590 computer program Methods 0.000 claims description 20
- 238000012795 verification Methods 0.000 claims description 16
- 238000003860 storage Methods 0.000 claims description 11
- 238000012546 transfer Methods 0.000 claims description 8
- 230000005540 biological transmission Effects 0.000 claims description 6
- FFBHFFJDDLITSX-UHFFFAOYSA-N benzyl N-[2-hydroxy-4-(3-oxomorpholin-4-yl)phenyl]carbamate Chemical compound OC1=C(NC(=O)OCC2=CC=CC=C2)C=CC(=C1)N1CCOCC1=O FFBHFFJDDLITSX-UHFFFAOYSA-N 0.000 description 31
- 238000012545 processing Methods 0.000 description 20
- 238000010586 diagram Methods 0.000 description 17
- 230000015654 memory Effects 0.000 description 8
- 238000004220 aggregation Methods 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the application provides a data access method and computing equipment, wherein the method comprises the following steps: acquiring the position information of a file data block to be accessed by each process in a process group; the position information of the file data block is used for indicating the position of the file data block in the target file, the process group comprises N processes, and the positions of the file data blocks to be accessed between any two processes are not intersected; the file data block to be accessed by each process belongs to a target file, and the target file is stored in a file system; applying a distributed lock of the process group to a file system based on the position information of the file data block to be accessed by each process in the process group; the locking area of the distributed lock is a set of file data block positions to be accessed by each process in the process group; based on the distributed lock obtained by the application, each process in the running process group performs access operation on the file data block. The embodiment of the application can improve the data access efficiency.
Description
Technical Field
The present disclosure relates to the field of computing devices, and in particular, to a data access method and a computing device.
Background
At present, with the development of computer technology, the data volume to be processed is larger and larger, and the performance requirements on software and hardware are also higher and higher. When multiple processes access the same file in the file system simultaneously, each process needs to apply a distributed lock to ensure the consistency of the data when accessing the file in order that the file is not covered by other processes. When a plurality of processes apply for the distributed lock at the same time, the distributed lock can only be acquired by one process, and other processes can acquire the distributed lock to access the file only after the process finishes accessing the file and releases the distributed lock, so that the problem of low data access efficiency exists in the data access mode.
Disclosure of Invention
The embodiment of the application provides a data access method and computing equipment, which can improve the data access efficiency.
In a first aspect, an embodiment of the present application provides a data access method, where the method includes:
acquiring the position information of a file data block to be accessed by each process in a process group; the file data block to be accessed by each process belongs to a target file, and the target file is stored in a file system; the position information of the file data block is used for indicating the position of the file data block in the target file, the process group comprises N processes, the positions of the file data block to be accessed between any two processes are not intersected, and N is an integer greater than 1;
Applying a distributed lock of the process group to a file system based on the position information of the file data block to be accessed by each process in the process group; the locking area of the distributed lock is a set of file data block positions to be accessed by each process in the process group;
based on the distributed lock obtained by application, each process in the running process group performs access operation on the file data block; the access operation includes a read and/or write operation.
In the implementation manner, the distributed locks are applied in the process group mode, so that the locking area of the applied distributed locks is the set of the positions of the file data blocks to be accessed by each process in the process group, and therefore all processes in the process group can directly access the file data blocks in the corresponding positions based on the distributed locks, and each process group does not need to independently apply the distributed locks in the corresponding access positions, competition cost of the distributed locks is reduced, and data access efficiency is improved.
In some implementations, based on the applied distributed lock, each process in the running process group performs an access operation on the file data block, including:
running each process based on the distributed lock obtained by the application; when the process is operated, a data access request is sent to the file system, the data access request carries the identification of the process group, and the data access request is used for requesting to perform access operation on the file data block at the corresponding position based on the identification of the process group.
In this implementation manner, when the access operation is performed on the file data block at the corresponding position, by carrying the identifier of the process group, the file system can determine that the process belongs to the process group according to the identifier of the process group, so that the process can be released to access the file data block at the corresponding position in the data block area locked by the distributed lock.
In some implementations, the method further includes:
the identification of the process group is transmitted into the file system, so that the file system performs validity check on the identification of the process group and returns a check result;
receiving a verification result returned by the file system;
and if the verification result indicates that the identification of the process group is valid, executing the step of acquiring the position information of the file data block to be accessed by each process in the process group.
In the implementation mode, the validity of the identification of the process group can be verified by transmitting the identification of the process group into the file system, so that the process carrying the identification of the process group can be effectively ensured to access the file data blocks in the file system.
In some implementations, the method further includes:
generating an identifier for the process group;
the identity of the process group is broadcast to all processes in the process group.
In this implementation, the identifier of the process group is broadcasted to all the processes in the process group, so that the processes in the process group access the file data block with the identifier of the process group after the distributed lock is applied for later.
In some implementations, the identification of the group of incoming processes into the file system includes:
and the identification of the process group is transmitted to the file system through a read-write interface provided by a file system adaptation layer in the message transmission interface.
In some implementations, obtaining location information of a file data block to be accessed by each process in a group of processes includes:
calling an aggregate communication interface included in the message passing interface to acquire the position information of the file data block to be accessed by each process in the process group, wherein the position information comprises at least two of the following: start position, end position, data block length information.
In some implementations, applying a distributed lock of a process group to a file system based on location information of a file data block to be accessed by each process in the process group includes:
determining the starting position and the ending position of the file data block to be accessed by each process according to the position information of the file data block to be accessed by each process;
Determining a minimum starting position and a maximum ending position in a process group based on the starting position and the ending position of the file data block to be accessed by each process, wherein the minimum starting position refers to the minimum starting position in all the starting positions, and the maximum ending position refers to the maximum ending position in all the ending positions;
applying for a distributed lock of the process group to the file system based on the minimum starting position and the maximum ending position;
wherein, the locking area of the distributed lock refers to: the processes in the process group access the data block area between the minimum starting position and the maximum ending position of the file data block.
In this implementation, the minimum starting position and the maximum ending position of the file data block to be accessed can be determined according to the position information of the file data block to be accessed by each process, so that by applying the distributed lock, the data block area between the minimum starting position and the maximum ending position when the process in the process group accesses the file data block can be locked.
In some implementations, the method further includes:
after the distributed lock of the file system is obtained, an aggregate communication interface in a message transfer interface standard is called, and a result of successful application of the distributed lock is broadcast to all processes in the process group, wherein the result of successful application of the distributed lock is used for indicating each process to access a file data block at a corresponding position in the file system based on an identifier of the process group carried by the distributed lock.
In the implementation manner, by broadcasting the successful result of the distributed lock application to the processes in the process group, each process can access the file data block at the corresponding position in the file system based on the distributed lock, and the data access efficiency is improved.
In some implementations, the method further includes:
and when each process in the distributed lock operation process group obtained based on the application completes the access operation to the file data block, releasing the distributed lock.
In this implementation manner, when all processes complete the access operation of the file data block at the corresponding position, releasing the distributed lock can facilitate processes other than the process group to apply for the distributed lock to perform the access operation of the file data block.
In some implementations, the file system corresponds to a file system client, and applies for a distributed lock of a process group to the file system based on a minimum start position and a maximum end position, including:
and sending a lock acquisition request to the file system client through a read-write interface provided by a file system adaptation layer in the message passing interface, wherein the lock acquisition request is used for requesting the file system client to apply for the distributed lock to the file system and returning the distributed lock based on the minimum starting position and the maximum ending position in the lock acquisition request.
In some implementations, the file system includes a distributed file system, and the target file is a distributed file in the distributed file system.
In a second aspect, an embodiment of the present application provides a data access apparatus, including:
the acquisition unit is used for acquiring the position information of the file data block to be accessed by each process in the process group; the file data block to be accessed by each process belongs to a target file, and the target file is stored in a file system; the position information of the file data block is used for indicating the position of the file data block in the target file, the process group comprises N processes, the positions of the file data block to be accessed between any two processes are not intersected, and N is an integer greater than 1;
the processing unit is used for applying the distributed lock of the process group to the file system based on the position information of the file data block to be accessed by each process in the process group; the locking area of the distributed lock is a set of file data block positions to be accessed by each process in the process group;
the processing unit is also used for executing each process in the process group to access the file data block based on the applied distributed lock; the access operation includes a read and/or write operation.
In some implementations, the processing unit is specifically configured to:
running each process based on the distributed lock obtained by the application; when the process is operated, a data access request is sent to the file system, the data access request carries the identification of the process group, and the data access request is used for requesting to perform access operation on the file data block at the corresponding position based on the identification of the process group.
In some implementations, the processing unit is further to:
the identification of the process group is transmitted into the file system, so that the file system performs validity check on the identification of the process group and returns a check result;
receiving a verification result returned by the file system;
and if the verification result indicates that the identification of the process group is valid, executing the step of acquiring the position information of the file data block to be accessed by each process in the process group.
In the implementation mode, the validity of the identification of the process group can be verified by transmitting the identification of the process group into the file system, so that the process carrying the identification of the process group can be effectively ensured to access the file data blocks in the file system.
In some implementations, the processing unit is further to:
generating an identifier for the process group;
the identity of the process group is broadcast to all processes in the process group.
In some implementations, the processing unit is specifically configured to:
and the identification of the process group is transmitted to the file system through a read-write interface provided by a file system adaptation layer in the message transmission interface.
In some implementations, the acquiring unit is specifically configured to:
calling an aggregate communication interface included in the message passing interface to acquire the position information of the file data block to be accessed by each process in the process group, wherein the position information comprises at least two of the following: start position, end position, data block length information.
In some implementations, the processing unit is specifically configured to:
determining the starting position and the ending position of the file data block to be accessed by each process according to the position information of the file data block to be accessed by each process;
determining a minimum starting position and a maximum ending position in a process group based on the starting position and the ending position of the file data block to be accessed by each process, wherein the minimum starting position refers to the minimum starting position in all the starting positions, and the maximum ending position refers to the maximum ending position in all the ending positions;
applying for a distributed lock of the process group to the file system based on the minimum starting position and the maximum ending position;
Wherein, the locking area of the distributed lock refers to: the processes in the process group access the data block area between the minimum starting position and the maximum ending position of the file data block.
In some implementations, the processing unit is further to:
after the distributed lock of the file system is obtained, an aggregate communication interface in a message transfer interface standard is called, and a result of successful application of the distributed lock is broadcast to all processes in the process group, wherein the result of successful application of the distributed lock is used for indicating each process to access a file data block at a corresponding position in the file system based on an identifier of the process group carried by the distributed lock.
In the implementation manner, by broadcasting the successful result of the distributed lock application to the processes in the process group, each process can access the file data block at the corresponding position in the file system based on the distributed lock, and the data access efficiency is improved.
In some implementations, the processing unit is further to:
and when each process in the distributed lock operation process group obtained based on the application completes the access operation to the file data block, releasing the distributed lock.
In some implementations, the file system corresponds to a file system client, and the processing unit is further configured to:
And sending a lock acquisition request to the file system client through a read-write interface provided by a file system adaptation layer in the message passing interface, wherein the lock acquisition request is used for requesting the file system client to apply for the distributed lock to the file system and returning the distributed lock based on the minimum starting position and the maximum ending position in the lock acquisition request.
In some implementations, the file system includes a distributed file system, and the target file is a distributed file in the distributed file system.
In a third aspect, embodiments of the present application provide a computing device comprising:
a processor adapted to execute a computer program;
a computer readable storage medium having a computer program stored therein, which when executed by a processor, implements a data access method as described above.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program loaded by a processor and performing a data access method as described above.
In a fifth aspect, embodiments of the present application provide a computer program product comprising a computer program stored in a computer readable storage medium. A processor of a computing device reads the computer program from a computer-readable storage medium, and the processor executes the computer program so that the computing device performs the above-described data access method.
Drawings
FIG. 1 is a block diagram of a data access system according to an embodiment of the present disclosure;
FIG. 2a is a block diagram of another data access system according to an embodiment of the present application;
FIG. 2b is a block diagram of yet another data access system according to an embodiment of the present application;
fig. 3 is a schematic diagram of a data access flow provided in an embodiment of the present application;
fig. 4 is a flow chart of a data access method according to an embodiment of the present application;
FIG. 5 is a flowchart illustrating another data access method according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of accessing a file in a two-dimensional layout according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a plurality of processes accessing file data blocks in a file system in parallel according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a data access device according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a computing device according to an embodiment of the present application.
Detailed Description
The following description of the technical solutions in the embodiments of the present application will be made with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments.
Referring to fig. 1, fig. 1 is a schematic diagram of a data access system according to an embodiment of the present application. The data access system comprises a first server cluster 11, at least one second server node 12 and a terminal device 13. The first server cluster 11 comprises one or more first server nodes. It should be understood that in the embodiment of the present application, the number and the form of the terminal device, the first server node and the second server node in the first server cluster are not limited in any way. The terminal device 13 may be enabled to communicate with the second server node 12 and a first server node in the first server cluster 11 may be enabled to communicate with the second server node 12. The first server cluster 11, the second server node 12 and the terminal device 13 are explained in relation to each other.
The first server nodes in the first server cluster 11 may run a file system, the first server nodes in the first server cluster 11 may store files in the file system, and in particular, the files in the file system may be stored in a memory of the first server nodes. Here, it should be noted that, in the case where there are a plurality of first server nodes in the first server cluster 11, each first server node may store all files in the file system, or each first server node may store a part of the files in the file system, or one file in the file system is divided into a plurality of file data blocks, and each first server node may store a part of the file data blocks in the file. The embodiments of the present application are not limited in any way.
The file system may include, but is not limited to: distributed file system (Distributed File System, DFS), network file system (Network File System, NFS), journaling file system (XFS), shared file system (General Parallel File System, GPFS), and so forth. By distributed system is meant a file system in which the physical storage resources managed by the file system are not necessarily directly connected to a local first server node, but are connected to other first server nodes via a computer network. That is, when the first server nodes in the first server cluster 11 are plural, each first server node includes a memory, and the memories of the plural first server nodes are constructed into one distributed file system through computer network connection.
The second server node 12 is installed with one or more file system clients corresponding to the file system, and the file system clients are used for providing various file operation interfaces of the file system. An object (e.g., a user) may log onto the file system client via a log-on web page (e.g., a web page) to perform various types of file operations on files on the file system, including, but not limited to: create, delete, access, etc., access operations including read operations and/or write operations. It should be understood herein that the second server nodes 12 in the data access system may be independent of each other and not constitute a server cluster. In some implementations, at least one second server node 12 in the data access system may be managed by the management node, and communication may be performed between the second server nodes 12 to share data resources, where each second server node 12 may form a second server cluster. The present application is not limited thereto.
The terminal device 13 has at least one application installed and running therein. Any application may be used to access files in the file system. When a certain file in the distributed file system is accessed in any application program, one or more processes can be run to concurrently access the file, and each process can access a file data block in a corresponding position of the file. An application herein may refer to an application of high performance parallel computing, as the application may include, but is not limited to: simulation application, weather forecast application, which is not limited in any way by the embodiments of the present application.
For ease of understanding, a file system is taken as an example of a distributed file system, and please refer to fig. 2a, which is a schematic diagram of another data access system according to an embodiment of the present application. In fig. 2a, a distributed file system is running in a first server cluster 11. The second server nodes 12 are running distributed file system clients, illustratively in fig. 2a, the data access system comprises three second server nodes 12, one distributed file system client being operable on each second server node 12. Objects (e.g., users) may log onto a distributed file system client via a log-on web page (e.g., web page) and perform various types of file operations on files on a distributed file system, including, but not limited to: create, delete, access, etc., access operations including read operations and/or write operations.
At least one application (e.g., application 101, application 102 …, application n in fig. 2 a) may be run in terminal device 13. It should be appreciated that the description will be made later by taking application 101 as an example. Application 101, which may also be referred to as a client, application 101 may be used to access a file in a distributed file system. When the application 101 is accessing the file in the distributed file system, one or more processes may be running to concurrently access the file, each process having access to a file data block in the distributed file system corresponding to the file.
The application 101 in the terminal device 13 may perform access operations on files in the distributed file system through an MPI (messaging interface, message Passing Interface) -IO interface. The MPI-IO interface is used for providing interfaces such as file concurrent operation, data access and the like, and various implementation modes of the MPI-IO interface are available, and the implementation mode of the MPI-IO interface can be ROMIO. In the MPI-IO interface implementation romiO, an MPI-IO interface library and a file system adaptation layer may be included. As in fig. 2a, the file system adaptation layer is located between the distributed file system and the MPI-IO interface library. In the embodiment of the application, various file systems may be docked in the file system adaptation layer, and the file system adaptation layer may dock a distributed file system, a network file system, a log file system, a shared file system, and so on. The object may adapt the docked file system for vertical optimization of performance according to requirements. All kinds of application programs can realize operations such as data access calculation and the like required by a service scene by calling the MPI-IO interface. Where MPI refers to a set of specifications or interfaces for message passing between different CPUs or server nodes involved in computation, capable of providing a unified standard for parallel application interfaces, MPI may include an MPI-IO interface, which refers to the I/O portion in the message transfer interface standard, and a communication aggregation interface. The MPI-IO provides standards for file class operations and data access. The MPI-IO interface refers to the I/O portion of the message transport interface standard, which provides standards for file class operations and data access.
When the application 101 accesses a file in the file system, the application 101 runs a plurality of processes in parallel to access corresponding file data blocks in the file, respectively, thereby accessing the entire file. However, access conflicts may arise when an application runs multiple processes in parallel accessing the distributed file system via distributed file system clients running on the second server node 12, respectively. Based on this, in the embodiment of the present application, when N processes are initiated (or run) in parallel in an application program, and the N processes access the same file in the file system in parallel, the N processes may be grouped into a process group. Illustratively, the application runs process 1 and process 2 in parallel, and process 1 and process access file a in the file system in parallel, and process 1 and process 2 form a process group. The process group refers to a set of all processes participating in accessing a certain file, and intra-group information sharing can be performed between processes in the same process group, and one process can only belong to one process group. The processes within the same process group are uniquely identified by a process number (RANK), that is, a process number that is used to uniquely identify the identity of a process in the process group. As one implementation, processes in a process group may increase in order from 0 with a unique process number assigned thereto. Illustratively, the number of processes in the process group is N, N is an integer greater than 1, and then the process numbers of the N processes in the process group are sequentially 0,1,2 … … N-1.
In addition, in the embodiment of the present application, the locations of the file data blocks to be accessed by all processes in one process group are not intersected, for example, one process group includes a process 1 and a process 2, the location of the file data block to be accessed by the process 1 and the process 2 is [0, 20M ] and (40M, 60M ] in the file a, the location of the file data block to be accessed by the process 2 is (20M, 40M) in the file a, it can be seen that the locations of the file data blocks to be accessed by the process 1 and the process 2 are not intersected.
Further, an application program in the terminal device runs a process in the process group, and the distributed lock is applied to the file system through the file system client through the MPI-IO interface and the identifier of the process group. By distributed lock is meant an implementation of a lock that controls different processes to commonly access shared resources in a file system. In a file system, multiple processes may access the same resource (as a file) at the same time, and if reasonable synchronization is not performed, a problem of inconsistent data may occur. The distributed lock can ensure that accesses to the same resource on different processes are orderly, thereby ensuring the consistency and the correctness of data. That is, when any process obtains a distributed lock for a location in a file, other processes cannot access the data in that location in the file. In the embodiment of the application, the lock area of the applied distributed lock can be set of positions of file data blocks to be accessed by all processes in the process group through the identification of the process group, so that the processes in the process group can access the file data blocks in the same file.
Referring to fig. 2b, an architecture diagram of another data access system according to an embodiment of the present application is provided. In this embodiment, taking the file system as a distributed file system as an example, a file system client corresponding to the distributed file system is a distributed file system client; taking an application program in the terminal device 13 as an example for illustration, in fig. 2b a distributed file system is running in the first server cluster 11. The data access system shown in fig. 2b may further comprise 3 second server nodes 12, each of the 3 second server nodes 12 having a distributed file system client running thereon. Any of the distributed file system clients may include a cache of the distributed file system, and the any of the distributed file system clients may acquire file data blocks of a file from the distributed file system operated by the first server node in the first server cluster 11, and store the acquired file data blocks in the cache, that is, the cache is used to cache the file data blocks in the distributed file system.
As shown in fig. 2b, the application program in the terminal device 13 may call a file operation interface provided by the file system through a file system adaptation layer in the MPI-IO interface to operate on files in the distributed file system. The file operation interface may include a file open interface (open), a view setting interface (set_view), an access interface (the access interface includes a read/write interface), a file close operation (close), and the like. The interfaces referred to in fig. 2b are described in relation to:
The view setting operation interface is used for a process to define view information, the view information is used for indicating the overall layout of the position of a file data block to be accessed by the process in a file system, and the overall layout refers to that: the starting position and the ending position of the file data block to be accessed by the process and the length information of the file data block. Illustratively, the view information is used to indicate the file data blocks of [0, 20M ] in file A in the file system to be accessed by the process. In addition, in the embodiment of the application, the view setting operation interface may be further configured to generate an identifier of a process group, where the generated identifier of the process group is used to apply for a distributed lock of each process in the process group to access a corresponding file data block. When a process in the application program invokes the view setting operation interface, the view information and the identifier of the process group may be set, and the view information may be sent to one or more file system clients, so that each file system client may determine a file data block location to be accessed by the process based on the view information.
The access interface may be configured to transmit the identifier of the process group to the file system client, so that the file system client transmits the identifier of the process group to the file system, and the file system may perform validity check on the identifier of the process group. Specifically, when a process calls an access interface to access a file data block at a corresponding position, an identifier of a process group can be transmitted to a file system client, and when the file system verifies that the identifier of the process group is effective, a certain process (such as a target process) running in an application program applies for distributed locks of all processes in the process group to the file system in a unified manner.
Further, when a target process in the process group successfully applies for the distributed lock, a result of the success of the application for the distributed lock can be broadcast to the process group where the target process is located, after all processes in the process group receive the result of the success of the application for the distributed lock, all processes directly access a file data block in a corresponding position in a file system based on the distributed lock and carry an identifier of the process group, and in this way, all processes in the process group do not check the distributed lock any more, so that the competition overhead of the distributed lock is reduced.
The overall data access flow is described next by taking a process in an application program to access a file system as an example. The data access flow is applied to a parallel IO scene. The parallel IO scene refers to: different processes in the application program can access the same file in the file system in parallel through the message passing interface at the same time. Referring to fig. 3, a schematic diagram of a data access flow provided in an embodiment of the present application includes the following four steps:
(1) Process open File (mpi_file_open): all processes in the application program call MPI-IO interfaces to open target files in the file system, and acquire file handles of the target files from the file system, and the target processes perform subsequent file operations (such as operations of creating, data reading and writing, closing, deleting and the like) on the target files in the file system by referring to the file handles. Specifically, as shown in fig. 2b, each process in the application program may send a data acquisition request through a file system adaptation layer in the MPI-IO interface, and after receiving the data acquisition request sent by each process, the file system adaptation layer may determine a file system to be docked, and send the data acquisition request to a file system client through a file opening interface included in the file operation interface. The file system client acquires a file handle from the file system based on the data acquisition request sent by each process, and returns the file handle to each process in the application program. A file handle refers to an abstract reference or identifier that is used to indicate an opened file. Each process may use the file handle to read the corresponding file, write the file, or perform other file operations.
(2) Process setting view (mpi_file_set_view): when an application program needs to access a certain file in a file system, the access to the file can be realized through a plurality of processes running in the application program. In this case, it is necessary to preset a start position, an end position, file data block length information, and the like of a file data block to be accessed by a plurality of processes running in the application program, respectively, so as to access different file data blocks of the file by the plurality of processes, thereby realizing that the application program accesses the entire file. For example, as shown in fig. 7, the client runs 4 processes, process 1, process 2, process 3, and process 4, respectively; process 1 accesses file data blocks located between 0 and 10M and between 40M and 50M in file a, process 2 accesses file data blocks located between 10M and 20M and between 50M and 60M in file a, and process 3 accesses file data blocks located between 20M and 30M and between 60M and 70M in file a; process 4 accesses file data blocks located at 30M-40M and 70M-80M in file A. Through these 4 processes, file a can ultimately be accessed.
In the embodiment of the application, after each process acquires the file handle, the view setting interface is called to set corresponding view information, and the view information of each process is used for indicating the overall layout of the file position in the file system to be accessed by each process. If the process 1 calls the view setting interface to set its own view information, the view information of the process 1 is used for indicating the overall layout of the file locations in the file system to be accessed by the process 1, and the process 2 calls the view setting interface to set its own view information, and the view information of the process 2 is used for indicating the overall layout of the file locations in the file system to be accessed by the process 2. The term overall layout means: when the process performs access operation, only the starting position and the ending position of the file data block and the length information of the file data block can be accessed. For example, as in FIG. 7, the overall layout of file locations in the distributed file system to be accessed is: file data blocks of 10M-20M and 50M-60M in the file A are accessed, and the length information of each file data block is 10M.
It should be understood that a process setting view is understood to define a view of a process, a view of a process is understood to be a visible file data block of a process, a visible file data block refers to a file data block that can be accessed by the process, in other words, view information of a file data block that can be accessed by a process can be set by calling a view setting interface, and if view information of a corresponding file data block is not set in the view information, the process is not visible to the file data block, and cannot access the file data block.
In addition, in the embodiment of the present application, one application program may run in parallel with one or more processes, and the processes running in the same application program are used to access the same file, so that the processes in the same application program may form a process group, and the following steps are added in the implementation of the view setting interface to generate the identifier of the process group: 1) Judging whether the process number of each process calling each view setting interface in the process group where the process is located is rank 0, and if the process number of the target process (namely, a certain process in the application program) in the process group is rank 0, calling the view setting interface by the target process to generate an Identification (ID) of the process group where the target process is located. Judging whether the identification of the process group is successfully generated, specifically judging whether the identification of the process group of the generated target process is valid, if so, determining that the identification of the process group of the generated target process is an invalid value, and broadcasting the result of the identification of the process group of the generated target process as the invalid value to other processes in the process group. And if the generated identification of the process group of the target process is valid, determining that the identification of the process group is successfully generated, and executing the step 2). 2) After determining that the identification of the process group was successfully generated, the target process may broadcast the identification of the process group to all processes within the process group in which it is located. As one implementation, the target process may broadcast the identity of the process group to all processes within the process group that are located may include: and broadcasting the identification of the process group to all processes in the process group through the aggregate communication interface in the MPI. And after the process group setting view information is finished, all processes in the process group can acquire the identification of the process group. The aggregate communication interface may be used as an interface for communication between all processes in a process group, through which data can be sent from one process in the communication domain to all processes in the process group. The communication domain provides a method of organizing and managing inter-process communication, in other words, the communication domain may be used to fully describe the communication relationships between processes, which means that processes in the communication domain may send messages to each other.
It should be understood that a process setup view is a collective operation, which refers to: all processes in the process group call the view setting interface to set corresponding view information, so that all processes know the overall layout of the file positions in the accessed distributed file system. However, it should be noted that, when the process sets up the view, the target process needs to generate the identifier of the process group, and other processes in the process group need not to regenerate the identifier of the corresponding process group.
(3) The process performs data access on the file: after the view information is set by all the processes, all the processes can access a certain file in the file system through MPI-IO in parallel. Here, accessing a certain file in a file system in parallel means: and performing read-write operation on a certain file in the file system in parallel.
In step (3), when all processes in the process group access a certain file in the file system in parallel, the target process may apply for a distributed lock to the file system in a unified manner through the identification of the process group, and specifically may include the following steps: (1) the target process transmits the identification of the process group obtained when the view information is set to the file system through the MPI-IO. Specifically, when the target process accesses, the access interface can be called through the file system adaptation layer in the MPI-IO, the identification of the process group is transmitted to the file system client, and the file system client can acquire the identification of the transmitted process group and transmit the identification to the file system through the file system client. (2) The file system can judge whether the identifier of the process group transmitted in the process access is valid, if the identifier of the process group is determined to be invalid, each process in the process group needs to apply for a distributed lock at a corresponding access position before accessing the file data blocks of the corresponding files in the file system, and if the application fails, the process waits. Wherein the file system may determine whether the identification of the process group is valid may include: judging whether the identifier of the process group is a valid value, wherein the valid value refers to: positive numbers, i.e., numbers greater than 0; if the identification of the process group is determined to be a valid value, the identification of the process group is determined to be valid, and if the identification of the process group is determined to be an invalid value, wherein the invalid value refers to a negative number, namely a number smaller than or equal to 0, the identification of the process group is determined to be invalid. In one implementation, when setting view information, the target process may send the identifier of the process group to the file system, and the file system may compare the stored identifier of the process group with the incoming identifier, and if the identifiers are consistent, determine that the identifier of the process group is valid, and if the identifiers are inconsistent, determine that the identifier of the process group is invalid. (3) If the identity of the process group is valid, the following steps are performed: step A, determining a minimum starting position and a maximum ending position: if the process number of the target process in the process group is rank 0, the target process calls the aggregate communication interface in the MPI, obtains the starting position and the ending position of the file data block to be accessed by all processes in the process group of the target process, calculates the minimum starting position (offset_min) in the starting position of the file data block to be accessed by all processes, and calculates the maximum ending position (offset_max) in the ending position of the file data block to be accessed by all processes. And B, applying a distributed lock to the file system based on the minimum starting position and the maximum ending position, wherein a locking area of the distributed lock can refer to a data block area between the minimum starting position and the maximum ending position when a process in the process group accesses a file data block, namely a data block area between the [ offset_min, offset_max ]. Illustratively, the process group includes 3 processes, process 1 accesses the file data block corresponding to [0M-30M ] in the file a, process 2 accesses the file data block corresponding to [30-50M ] in the file a, process 3 accesses the file data block corresponding to [60M-80M ] in the file a, at this time, it can be determined that the minimum starting position is 0M, the maximum ending position is 80M, and the locking area of the distributed lock refers to the data block area between [0M-80M ]. In some implementations, applying the distributed lock to the file system based on the minimum starting location and the maximum ending location may include: and sending a lock acquisition request to the file system client based on the minimum starting position and the maximum ending position, wherein the lock acquisition request comprises the minimum starting position and the maximum ending position, and the file system client applies for the distributed lock to the file system based on the minimum starting position and the maximum ending position and returns the distributed lock to the target process by the file system. And C, after the distributed lock is successfully applied to the file system, the target process can broadcast the result of the successful application of the distributed lock to all processes in the process group. Step D, after obtaining a successful result of the distributed lock application, all processes in the process group do not check the distributed lock, and all processes in the process group can read and write files of file data blocks in corresponding positions in the file system based on the identification of the process group carried by the distributed lock; and E, releasing the distributed lock after all processes in the process group complete read-write operation of the file data block at the corresponding position.
(4) Closing File (mpi_file_column): after all processes perform read-write operation on files in the file system, all processes can call a file closing interface, close a target file and release file handles and related resources.
It should be understood that, when different processes in the application program access the same file in the file system in parallel, the above (1) - (4) need to be executed, but the processes of generating the process group identifier and applying the distributed lock are all executed by the designated process (such as the process corresponding to RANK0 in the process group). Further, in determining the minimum starting position and the maximum ending position, all processes in the process group participate in the determination.
By the scheme, after all processes in the process group acquire the distributed lock, the application program can directly run the processes to access the file data blocks corresponding to the corresponding positions in the file system, and the distributed lock of the access positions required by the processes does not need to be additionally checked, so that the competition cost of the distributed lock is reduced, and the data access efficiency is improved.
The data access method provided in the embodiment of the present application is explained in the following.
Fig. 4 is a schematic flow chart of a data access method according to an embodiment of the present application. The data access method may be performed by a computing device, which may be the terminal device described above or a server node. The data access method may be performed by an application running in the computing device. The data access method may include the following steps S401 to S403:
S401, acquiring position information of a file data block to be accessed by each process in the process group.
The process group can comprise N processes, the positions of the accessed file data blocks between any two processes are not intersected, and N is an integer greater than 1; the file data blocks to be accessed by each process belong to a target file, and the target file is stored in a file system. File systems in embodiments of the present application may include, but are not limited to: distributed file systems, network file systems, and the like. Illustratively, when the file system includes a distributed file system, the file data blocks to be accessed by each process are from distributed files in the distributed file system, that is, the target file to which the file data blocks belong is a distributed file.
The location information of the file data block is used to indicate the location of the file data block in the target file. It should be understood that the location of a file data block in a target file may be understood as an offset relative to a reference point in the target file, illustratively a size of the target file of 80M, the reference point in the target file being defined as 0M, if the location of file data block 1 is 0, 20, file data block 1 is the corresponding file data block that is offset between 0M and 20M relative to the reference point in the target file.
In some implementations, N processes may run in parallel in the computing device, where the N processes access the target file in parallel, and the N processes may be grouped into a process group.
Wherein the location information includes at least two of: the starting position, the ending position and the data block length information of the file data block to be accessed in the target file. In one implementation, obtaining location information of a file data block to be accessed by each process in a process group may include: and calling an aggregate communication interface included in the message passing interface to acquire the position information of the file data block to be accessed by each process in the process group.
S402, applying a distributed lock to a file system based on the position information of the file data block to be accessed by each process.
The locking area of the distributed lock is a set of file data block positions to be accessed by each process in the process group. Illustratively, the locations of the file data blocks to be accessed by the 3 processes in the process group are respectively: the lock area of [0M,10M ], [10M,20M ], [20M,30M ] of the distributed lock is a set of file data block positions to be accessed by the 3 processes, namely, the lock area is a set of [0M,10M ], [10M,20M ], [20M,30M ]. The distributed lock is used for controlling the processes in the process group to access the file data blocks in the corresponding positions in the file system in parallel.
As one implementation, the specific implementation of step S402 may include S11-S13:
and s11, determining the starting position and the ending position of the file data block to be accessed by each process according to the position information of the file data block to be accessed by each process. Illustratively, the location information of the file data block to be accessed by the process a includes a start location (e.g., a location corresponding to a start location shifted by 20M from a reference point (e.g., 0M) of the target file) and data block length information (e.g., a file data block length information of 40M) of the file data block to be accessed by the process a in the target file, and according to the start location 20M and the data block length information of 40M, it may be determined that an end location of the file data block to be accessed by the process a in the target file is 60M.
s12, determining the minimum starting position and the maximum ending position in the process group based on the starting position and the ending position of the file data block to be accessed by each process. Specifically, determining a minimum starting position (offset_min) from starting positions of file data blocks to be accessed by all processes in the process group, wherein the minimum starting position is the minimum starting position in all starting positions; then, a maximum end position (offset_max) which is the largest end position among all end positions is determined from the end positions of the file data blocks to be accessed by all processes in the process group. For example, the start position of the file data block to be accessed by the process 1 is 20M, the end position is 40M, the start position of the file data block to be accessed by the process 2 is 40M, and the end position is 50M, and at this time, the start position 20M of the file data block to be accessed by the process 1 may be determined as the minimum start position, and the end position 50M of the file data block to be accessed by the process 2 may be determined as the maximum end position.
s13, applying a distributed lock to the file system based on the minimum starting position and the maximum ending position, wherein the locking area of the distributed lock refers to: the processes in the process group access the data block area between the minimum starting position and the maximum ending position of the file data block. In the above example, the minimum starting position is 20M, the maximum ending position is 50M, and the locking area of the distribution lock means: data block area between [20M,50M ]. By applying for the distributed lock, the processes in the process group can access the file data block between the minimum starting position and the maximum ending position in the file system.
Wherein applying the distributed lock to the file system based on the minimum starting position and the maximum ending position may include: and generating a lock acquisition request based on the minimum starting position and the maximum ending position, sending the lock acquisition request to the file system client through an access interface provided by a file system adaptation layer in the message transfer interface, wherein the lock acquisition request is used for requesting the file system client to apply for the distributed lock to the file system based on the minimum starting position and the maximum ending position in the lock acquisition request and returning the distributed lock.
When a distributed lock returned by a file system client is received, the distributed lock is successfully applied, at this time, an aggregate communication interface in a message transfer interface can be called, and a result of the successful application of the distributed lock is broadcast to all processes in a process group, wherein the result of the successful application of the distributed lock is used for indicating each process to access a file data block at a corresponding position in the file system based on an identifier of the process group carried by the distributed lock. Through the distributed lock, processes in the process group do not need to apply for the distributed lock respectively, so that the expenditure of applying for the distributed lock can be reduced to a certain extent, and the performance of data access can be improved.
S403, based on the distributed lock obtained by application, each process in the running process group accesses the file data block.
In a specific implementation, based on the application to obtain the distributed lock, each process in the running process group performs a secondary access operation on the file data block at the corresponding position. Illustratively, the location of the file data block accessed by process 1 in the process group is [0, 20M ], and the location of the file data block accessed by process 2 is [20, 40M ]. Based on the distributed lock obtained by the application, the running process 1 accesses the file data block corresponding to [0, 20M ], and the running process 2 accesses the file data block corresponding to [20, 40M ].
Further, after each process in the set of running processes based on the distributed lock obtained by the application completes the access operation to the file data block, the distributed lock may be released.
In the embodiment of the application, the position information of a file data block to be accessed by each process in the process group is acquired; the process group comprises N processes, any two processes are not intersected with the position of the accessed file data block, and N is an integer greater than 1; the file data block to be accessed by each process belongs to a target file, and the target file is stored in a file system; applying for a distributed lock to the file system based on the location information of the file data block to be accessed by each process; the locking area of the distributed lock is a set of positions of file data blocks to be accessed by each process in the process group, and each process in the running process group performs access operation on the file data blocks based on the distributed lock obtained by application; the access operation includes a read and/or write operation. The distributed locks are applied in the process group mode, so that the locking area of the applied distributed locks is a set of positions of file data blocks to be accessed by each process in the process group, and therefore all processes in the process group can directly access the file data blocks in corresponding positions based on the distributed locks, each process group is not required to apply for the distributed locks in corresponding access positions independently, distributed lock competition cost is reduced, and data access efficiency is improved.
Fig. 5 is a schematic flow chart of another data access method according to an embodiment of the present application. The data access method may be performed by a computing device, which may be the terminal device described above or a server node. In particular, the application program is run in the computing device, and the data access method may be performed by the application program running in the computing device. The data access method may include the following steps S501 to S505:
s501, the identification of the process group is transmitted into the file system, so that the file system performs validity check on the identification of the process group and returns a check result.
Wherein the identification of the process group is used for uniquely identifying the process group, and the identification of the process group can be a string of character strings, numbers and the like. In one implementation, the identification of the process group is passed to the file system through an access interface (e.g., a read-write interface) provided by a file system adaptation layer in the messaging interface. Specifically, when the target file in the file system is wanted to be accessed, the identification of the process group is transmitted to the file system client through the access interface provided by the file system adaptation layer in the message transmission interface, and the identification of the process group is transmitted to the file system by the file system client.
In some implementations, when accessing a target file in a file system, an application program in a computing device may run N processes in parallel to access the target file, where the N process groups form a process group, generate an identifier for the process group, and then broadcast the identifier of the process group to all processes in the process group, where, by broadcasting the identifier of the process group, it is convenient for a subsequent process in the process group to access a file data block with the identifier of the process group after the application obtains a distributed lock. In one implementation, generating an identification for a process group may include: and calling a view setting interface to generate an identifier for the process group.
S502, receiving a verification result returned by the file system.
The verification result may be used to indicate that the identifier of the process group is invalid, or the verification result may be used to indicate that the identifier of the process group is valid. If the verification result indicates that the identifier of the process group is invalid, each process in the process group needs to apply for the distributed locks accessing the corresponding file data blocks to the file system respectively. If the check result is used to indicate that the identification of the process group is valid, S503 is executed.
S503, if the verification result indicates that the identification of the process group is effective, acquiring the position information of the file data block to be accessed by each process in the process group; the position information of the file data block is used for indicating the position of the file data block in the target file; the process group comprises N processes, the positions of the accessed file data blocks between any two processes are not intersected, and N is an integer greater than 1; the file data blocks to be accessed by each process belong to target files, and the target files are stored in a file system.
S504, applying a distributed lock to a file system based on the position information of the file data block to be accessed by each process; the lock area of the distributed lock is a collection of file data block locations to be accessed by each process in the group of processes.
In some implementations, a target process is included in the process group, and the computing device performs steps S501-S504 through the target process. The target process may be any process in a process group, and illustratively, the target process is a process with a process number of 0 in the process group.
S505, running each process based on the distributed lock obtained by the application; when the process is operated, a data access request is sent to the file system, the data access request carries an identification of a process group, and the data access request is used for requesting access operation to the file data block at the corresponding position based on the identification of the process group.
In some implementations, sending a data access request to the file system when the process is run may be: the process is run sending a data access request to the file system through the file system client.
The corresponding location referred to in S505 refers to the location of the file data block to be accessed by the running process. Illustratively, the process group includes 3 processes, namely a process 1, a process 2 and a process 3, the location of a file data block to be accessed by the process 1 is [0m,20m ] in the file a, the location of a file data block to be accessed by the process 2 is [20m,30m ] in the file a, the location of a file data block to be accessed by the process 3 is [30m,40m ] in the file a, the process 1 is executed to send a data access request to the file system, the data access request is used for requesting access operation to the file data block of [0m,20m ] in the file a based on the identification of the process group, and the process 2 is executed to send a data access request to the file system, the data access request is used for requesting access operation to the file data block of [20m,30m ] in the file a based on the identification of the process group; the process 3 is run-time sending a data access request to the file system requesting an access operation to [30m,40m ] in file a based on the identity of the process group.
It should be understood that the reason why the identifier of the process group is carried in the data access request sent to the file system when the process is run is that: the file system can determine that the process belongs to the process group according to the identification of the process group, so that the process can be released to access the file data block at the corresponding position in the data block area locked by the distributed lock.
In the embodiment of the application, the identifier of the process group is transmitted into the file system, so that the file system performs validity check on the identifier of the process group and returns a check result, and the check result returned by the file system is received. If the verification result indicates that the identification of the process group is effective, acquiring the position information of the file data block to be accessed by each process in the process group, and applying a distributed lock to the file system based on the position information of the file data block to be accessed by each process; the locking area of the distributed lock is a set of file data block positions to be accessed by each process in the process group, and each process is operated based on the distributed lock obtained by application; when the process is operated, a data access request is sent to the file system, the data access request carries an identification of a process group, and the data access request is used for requesting access operation to the file data block at the corresponding position based on the identification of the process group. Through the scheme, the distributed lock can be applied uniformly through the identifiers of the process groups, after the distributed lock is applied successfully, all processes in the process groups can be operated based on the distributed lock to carry the identifiers of the process groups to directly access the file data blocks in the corresponding positions, and each process group is not required to apply for the distributed lock in the corresponding access position independently, so that the competition cost of the distributed lock is reduced, and the data access efficiency is improved.
For a better understanding of the data access procedure provided in the embodiments of the present application, a specific example is set forth below: in this example, a file system is described as an example of a distributed file system.
It is assumed that an application program 1 is running in the computing device, and the application program 1 runs with a process 1, a process 2, a process 3 and a process 4, where the process 1, the process 2, the process 3 and the process 4 belong to the same process group, and all of the process 1-process 4 access a file a (i.e. a target file) in the distributed file system in parallel, where the file a is a file with a two-dimensional layout, and of course, the target file in the embodiment of the present application may be a file with a three-dimensional layout or a file with a multidimensional layout, which is not limited in any way. Taking a two-dimensional layout file as an example, please refer to fig. 6, which is a schematic diagram of accessing a two-dimensional layout file according to an embodiment of the present application. In FIG. 6, when Process 1-Process 4 access file data blocks of File A of a two-dimensional layout, process 1 accesses file data blocks of columns 0-C1 in File A, process 2 accesses file data blocks of columns C1-C2 in File A, and Process 3 accesses file data blocks of columns C2-C3 in File A; process 4 accesses the file data blocks of columns C4-C5 in file a, each of which covers all rows of file a and the locations of which do not intersect. FIG. 7 is a schematic diagram of parallel access of file data blocks in a distributed file system by a plurality of processes according to an embodiment of the present application. In FIG. 7, file A is stored in a distributed file system to be behavior-oriented and one-dimensional, which results in a crossover in the location of each process's file data block accessing File A. As shown in fig. 7, the locations of two file data blocks of the file a accessed by the process 1 through the distributed file system client in the server node 1 (i.e. corresponding to the second server node) are [0, 10M ] and [40M,50M ], respectively, and the locations of two file data blocks of the file a accessed by the process 2 through the distributed file system client in the server node 1 are [10M,20M ] and [50M,60M ], respectively, if the process 1 needs to apply for the access operation to the two file data blocks of the file a to the distributed lock of the file data block [0, 50M ], the process 2 needs to apply for the access operation to the two file data blocks of the file a to the distributed lock of the file data block [10M,60M ], then the distributed lock of the file [0, 50M ] and the distributed lock of the file data block [10M,60M ] collide, so that the process 1 and the process 2 have to take one side of the distributed lock to complete access, and then wait for the access. Similarly, if process 3 needs to apply for the distributed locks of file data blocks [20m,70m ] to access two file data blocks in file a through the distributed file system client in server node 2, and process 4 needs to apply for the distributed locks of file data blocks [30m,80m ] to access two file data blocks in file a through the distributed file system client in server node 2 (i.e., corresponding to the second server node), then at this time, the distributed locks of [20m,70m ] and the distributed locks of [30m,80m ] collide, so that process 3 and process 4 must wait for access to one side of the distributed locks to be completed before releasing the distributed locks, which also causes waiting.
Based on the above situation, with the data access method provided in the embodiment of the present application, since the process 1 to the process 4 belong to the same process group, when the computing device performs the view information setting of the process 1 through the process 1, the identifier of the process group may be generated, and the identifier of the process group may be broadcast to the processes 2 to 4. When the computing device runs the processes 1-4 to access the file data blocks of the file A in the distributed file system, firstly, the computing device can transmit the identifier of the process group to the distributed file system through the file system client, the distributed file system performs validity check on the identifier of the transmitted process group, if the identifier of the transmitted process group is determined to be valid, the computing device acquires the starting position and the ending position of the file data blocks accessed by the processes 2-4 through the process 1, calculates the minimum starting position (namely 0M) in the starting position of the data blocks accessed by the processes 1-4 and the maximum ending position (namely 80M) in the ending position of the data blocks accessed by the processes 1-4, and then, the computing device applies for a distributed lock for accessing the file A to the distributed file system through the process 1, wherein the locking area of the distributed lock is a data block area between [0M,80M ]. After the distributed lock is successfully applied, the computing device can broadcast the result of the success application of the distributed lock to the processes 2 to 4 through the process 1, the subsequent processes 2 to 4 do not need to apply for the distributed lock at the corresponding position any more, the computing device directly runs the processes 1 to 4, based on the distributed lock, the data file blocks at the corresponding positions between [0M and 80M ] are directly accessed and operated by carrying the identifiers of the process groups, waiting caused by conflict of the distributed lock is reduced, the expenditure of the distributed lock is reduced, and further the performance of data access is improved.
A related description of the data access device provided in the embodiments of the present application follows.
Fig. 8 is a schematic structural diagram of a data access device according to an embodiment of the present application. The data access means may be a computer program (including program code) running on the computing device, for example the data access means may be an application software in the computing device; the data access means may be adapted to perform some or all of the steps of the method embodiments shown in fig. 4 and 5. Referring to fig. 8, the data access device includes the following units:
an obtaining unit 801, configured to obtain location information of a file data block to be accessed by each process in the process group; the file data block to be accessed by each process belongs to a target file, and the target file is stored in a file system; the position information of the file data block is used for indicating the position of the file data block in the target file, the process group comprises N processes, the positions of the file data block to be accessed between any two processes are not intersected, and N is an integer greater than 1;
a processing unit 802, configured to apply, to a file system, a distributed lock of a process group based on location information of a file data block to be accessed by each process in the process group; the locking area of the distributed lock is a set of file data block positions to be accessed by each process in the process group;
The processing unit 802 is further configured to execute, based on the distributed lock obtained by the application, each process in the process group to perform an access operation on the file data block; the access operation includes a read and/or write operation.
In some implementations, the processing unit 802 is specifically configured to:
running each process based on the distributed lock obtained by the application; when the process is operated, a data access request is sent to the file system, the data access request carries the identification of the process group, and the data access request is used for requesting to perform access operation on the file data block at the corresponding position based on the identification of the process group.
In some implementations, the processing unit 802 is further configured to:
the identification of the process group is transmitted into the file system, so that the file system performs validity check on the identification of the process group and returns a check result;
receiving a verification result returned by the file system;
and if the verification result indicates that the identification of the process group is valid, executing the step of acquiring the position information of the file data block to be accessed by each process in the process group.
In some implementations, the processing unit 802 is further configured to:
generating an identifier for the process group;
the identity of the process group is broadcast to all processes in the process group.
In some implementations, the processing unit 802 is specifically configured to:
and the identification of the process group is transmitted to the file system through a read-write interface provided by a file system adaptation layer in the message transmission interface.
In some implementations, the obtaining unit 801 is specifically configured to:
calling an aggregate communication interface included in the message passing interface to acquire the position information of the file data block to be accessed by each process in the process group, wherein the position information comprises at least two of the following: start position, end position, data block length information.
In some implementations, the processing unit 802 is specifically configured to:
determining the starting position and the ending position of the file data block to be accessed by each process according to the position information of the file data block to be accessed by each process;
determining a minimum starting position and a maximum ending position in a process group based on the starting position and the ending position of the file data block to be accessed by each process, wherein the minimum starting position refers to the minimum starting position in all the starting positions, and the maximum ending position refers to the maximum ending position in all the ending positions;
applying for a distributed lock of the process group to the file system based on the minimum starting position and the maximum ending position;
Wherein, the locking area of the distributed lock refers to: the processes in the process group access the data block area between the minimum starting position and the maximum ending position of the file data block.
In some implementations, the processing unit 802 is further configured to:
after the distributed lock of the file system is obtained, an aggregate communication interface in a message transfer interface standard is called, and a result of successful application of the distributed lock is broadcast to all processes in the process group, wherein the result of successful application of the distributed lock is used for indicating each process to access a file data block at a corresponding position in the file system based on an identifier of the process group carried by the distributed lock.
In some implementations, the processing unit 802 is further configured to:
and when each process in the distributed lock operation process group obtained based on the application completes the access operation to the file data block, releasing the distributed lock.
In some implementations, the file system corresponds to a file system client, and the processing unit 802 is further configured to:
and sending a lock acquisition request to the file system client through a read-write interface provided by a file system adaptation layer in the message passing interface, wherein the lock acquisition request is used for requesting the file system client to apply for the distributed lock to the file system and returning the distributed lock based on the minimum starting position and the maximum ending position in the lock acquisition request.
In some implementations, the file system includes a distributed file system, and the target file is a distributed file in the distributed file system.
In the embodiment of the application, the position information of a file data block to be accessed by each process in the process group is acquired; the process group comprises N processes, any two processes are not intersected with the position of the accessed file data block, and N is an integer greater than 1; the file data block to be accessed by each process belongs to a target file, and the target file is stored in a file system; applying for a distributed lock to the file system based on the location information of the file data block to be accessed by each process; the locking area of the distributed lock is a set of positions of file data blocks to be accessed by each process in the process group, and each process in the running process group performs access operation on the file data blocks based on the distributed lock obtained by application; the access operation includes a read and/or write operation. The distributed locks are applied in the process group mode, so that the locking area of the applied distributed locks is a set of positions of file data blocks to be accessed by each process in the process group, and therefore all processes in the process group can directly access the file data blocks in corresponding positions based on the distributed locks, each process group is not required to apply for the distributed locks in corresponding access positions independently, distributed lock competition cost is reduced, and data access efficiency is improved.
A related description of the computing device provided by embodiments of the present application follows.
Further, the embodiment of the application further provides a schematic structural diagram of the computing device, and the schematic structural diagram of the computing device can be seen in fig. 9; the computing device may include: a processor 901, an input device 902, an output device 903, and a memory 904. The processor 901, the input device 902, the output device 903, and the memory 904 are connected by buses. The memory 904 is used for storing a computer program comprising program instructions, and the processor 901 is used for executing the program instructions stored in the memory 904.
In the embodiment of the present application, the processor 901 performs the following operations by executing executable program codes in the memory 904:
acquiring the position information of a file data block to be accessed by each process in a process group; the file data block to be accessed by each process belongs to a target file, and the target file is stored in a file system; the position information of the file data block is used for indicating the position of the file data block in the target file, the process group comprises N processes, the positions of the file data block to be accessed between any two processes are not intersected, and N is an integer greater than 1;
Applying a distributed lock of the process group to a file system based on the position information of the file data block to be accessed by each process in the process group; the locking area of the distributed lock is a set of file data block positions to be accessed by each process in the process group;
based on the distributed lock obtained by application, each process in the running process group performs access operation on the file data block; the access operation includes a read and/or write operation.
In some implementations, when each process in the running process group performs an access operation on the file data block based on the applied distributed lock, the processor 901 may specifically perform the following operations:
running each process based on the distributed lock obtained by the application; when the process is operated, a data access request is sent to the file system, the data access request carries the identification of the process group, and the data access request is used for requesting to perform access operation on the file data block at the corresponding position based on the identification of the process group.
In some implementations, the processor 901 may also perform the following operations:
the identification of the process group is transmitted into the file system, so that the file system performs validity check on the identification of the process group and returns a check result;
Receiving a verification result returned by the file system;
and if the verification result indicates that the identification of the process group is valid, executing the step of acquiring the position information of the file data block to be accessed by each process in the process group.
In some implementations, the processor 901 may also perform the following operations:
generating an identifier for the process group;
the identity of the process group is broadcast to all processes in the process group.
In some implementations, the processor 901, when entering the identification of the process group into the file system, may specifically perform the following operations:
and the identification of the process group is transmitted to the file system through a read-write interface provided by a file system adaptation layer in the message transmission interface.
In some implementations, when acquiring the location information of the file data block to be accessed by each process in the process group, the processor 901 may specifically perform the following operations:
calling an aggregate communication interface included in the message passing interface to acquire the position information of the file data block to be accessed by each process in the process group, wherein the position information comprises at least two of the following: start position, end position, data block length information.
In some implementations, when applying the distributed lock of the process group to the file system based on the location information of the file data block to be accessed by each process in the process group, the processor 901 may specifically perform the following operations:
Determining the starting position and the ending position of the file data block to be accessed by each process according to the position information of the file data block to be accessed by each process;
determining a minimum starting position and a maximum ending position in a process group based on the starting position and the ending position of the file data block to be accessed by each process, wherein the minimum starting position refers to the minimum starting position in all the starting positions, and the maximum ending position refers to the maximum ending position in all the ending positions;
applying for a distributed lock of the process group to the file system based on the minimum starting position and the maximum ending position;
wherein, the locking area of the distributed lock refers to: the processes in the process group access the data block area between the minimum starting position and the maximum ending position of the file data block.
In some implementations, the processor 901 may also perform the following operations:
after the distributed lock of the file system is obtained, an aggregate communication interface in a message transfer interface standard is called, and a result of successful application of the distributed lock is broadcast to all processes in the process group, wherein the result of successful application of the distributed lock is used for indicating each process to access a file data block at a corresponding position in the file system based on an identifier of the process group carried by the distributed lock.
In some implementations, the processor 901 may also perform the following operations:
and when each process in the process group is operated based on the obtained distributed lock, the distributed lock is released.
In some implementations, where the file system corresponds to a file system client, the processor 901 may specifically perform the following operations when applying for a distributed lock of a process group to the file system based on a minimum start position and a maximum end position:
and sending a lock acquisition request to the file system client through a read-write interface provided by a file system adaptation layer in the message passing interface, wherein the lock acquisition request is used for requesting the file system client to apply for the distributed lock to the file system and returning the distributed lock based on the minimum starting position and the maximum ending position in the lock acquisition request.
In some implementations, the file system includes a distributed file system, and the target file is a distributed file in the distributed file system.
In the embodiment of the application, the position information of a file data block to be accessed by each process in the process group is acquired; the process group comprises N processes, any two processes are not intersected with the position of the accessed file data block, and N is an integer greater than 1; the file data block to be accessed by each process belongs to a target file, and the target file is stored in a file system; applying for a distributed lock to the file system based on the location information of the file data block to be accessed by each process; the locking area of the distributed lock is a set of positions of file data blocks to be accessed by each process in the process group, and each process in the running process group performs access operation on the file data blocks based on the distributed lock obtained by application; the access operation includes a read and/or write operation. The distributed locks are applied in the process group mode, so that the locking area of the applied distributed locks is a set of positions of file data blocks to be accessed by each process in the process group, and therefore all processes in the process group can directly access the file data blocks in corresponding positions based on the distributed locks, each process group is not required to apply for the distributed locks in corresponding access positions independently, distributed lock competition cost is reduced, and data access efficiency is improved.
In addition, the embodiment of the application further provides a computer readable storage medium, and the computer readable storage medium stores a computer program, where the computer program includes program instructions, when executed by a processor, can perform the method in the embodiment corresponding to fig. 4 and fig. 5. For technical details and advantages not disclosed in the embodiments of the computer readable storage medium related to the present application, please refer to the description of the method embodiments of the present application, and a detailed description will not be given here. As an example, a computer program may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.
According to one aspect of the present application, a computer program product is provided, the computer program product comprising a computer program storable in a computer readable storage medium. The processor of the computing device reads the computer program from the computer readable storage medium and executes the computer program, so that the computing device may execute the method in the embodiment corresponding to fig. 4 and fig. 5, and for technical details and advantages not disclosed in the embodiment of the computer program product related to the present application, please refer to the description of the embodiment of the method of the present application, which will not be repeated herein.
Claims (10)
1. A method of data access, comprising:
acquiring the position information of a file data block to be accessed by each process in a process group; the file data blocks to be accessed by each process belong to a target file, and the target file is stored in a file system; the position information of the file data block is used for indicating the position of the file data block in the target file, the process group comprises N processes, the positions of the file data block to be accessed between any two processes are not intersected, and N is an integer greater than 1;
applying a distributed lock of the process group to the file system based on the position information of the file data block to be accessed by each process in the process group; the locking area of the distributed lock is a set of file data block positions to be accessed by each process in the process group;
based on the distributed lock obtained by application, each process in the process group is operated to access the file data block; the access operation includes a read and/or write operation.
2. The method of claim 1, wherein running each process in the set of processes to access a block of file data based on the applied distributed lock comprises:
Running each process based on the distributed lock obtained by the application; and when the process is operated, sending a data access request to the file system, wherein the data access request carries the identification of the process group, and the data access request is used for requesting access operation to the file data block at the corresponding position based on the identification of the process group.
3. The method of claim 1, wherein the method further comprises:
the identification of the process group is transmitted into the file system, so that the file system performs validity check on the identification of the process group and returns a check result;
receiving a verification result returned by the file system;
and if the verification result indicates that the identification of the process group is valid, executing the step of acquiring the position information of the file data block to be accessed by each process in the process group.
4. A method according to claim 2 or 3, wherein the method further comprises:
generating an identification for the process group;
and broadcasting the identification of the process group to all processes in the process group.
5. The method of claim 3, wherein said entering the identification of the process group into the file system comprises:
And transmitting the identification of the process group to the file system through a read-write interface provided by a file system adaptation layer in the message transmission interface.
6. The method of claim 1, wherein the obtaining location information of the file data block to be accessed by each process in the process group comprises:
calling an aggregate communication interface included in a message passing interface to acquire the position information of a file data block to be accessed by each process in the process group, wherein the position information comprises at least two of the following: start position, end position, data block length information.
7. The method of claim 6, wherein applying the file system for the distributed lock of the process group based on the location information of the file data block to be accessed by each process in the process group comprises:
determining the starting position and the ending position of the file data block to be accessed by each process according to the position information of the file data block to be accessed by each process;
determining a minimum starting position and a maximum ending position in the process group based on the starting position and the ending position of the file data block to be accessed by each process, wherein the minimum starting position refers to the minimum starting position in all the starting positions, and the maximum ending position refers to the maximum ending position in all the ending positions;
Applying a distributed lock of the process group to the file system based on the minimum starting position and the maximum ending position;
wherein, the locking area of the distributed lock refers to: and when the processes in the process group access the file data block, a data block area is arranged between the minimum starting position and the maximum ending position.
8. The method of claim 1, wherein the method further comprises:
and after the distributed lock of the file system is obtained, calling an aggregate communication interface in a message transfer interface standard, broadcasting a result of successful application of the distributed lock to all processes in the process group, wherein the result of successful application of the distributed lock is used for indicating each process to access a file data block at a corresponding position in the file system based on the identifier of the process group carried by the distributed lock.
9. The method of claim 7, wherein the file system corresponds to a file system client, wherein the applying for the distributed lock of the process group to the file system based on the minimum starting location and the maximum ending location comprises:
and sending a lock acquisition request to the file system client through a read-write interface provided by a file system adaptation layer in a message passing interface, wherein the lock acquisition request is used for requesting the file system client to apply for a distributed lock to the file system and returning the distributed lock based on the minimum starting position and the maximum ending position in the lock acquisition request.
10. A computing device, comprising:
a processor adapted to execute a computer program;
a computer readable storage medium having stored therein a computer program which, when executed by the processor, performs the data access method of any of claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311562213.9A CN117762880A (en) | 2023-11-20 | 2023-11-20 | Data access method and computing device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311562213.9A CN117762880A (en) | 2023-11-20 | 2023-11-20 | Data access method and computing device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117762880A true CN117762880A (en) | 2024-03-26 |
Family
ID=90320981
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311562213.9A Pending CN117762880A (en) | 2023-11-20 | 2023-11-20 | Data access method and computing device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117762880A (en) |
-
2023
- 2023-11-20 CN CN202311562213.9A patent/CN117762880A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106712981B (en) | Node change notification method and device | |
US9654582B2 (en) | Enhanced shared memory based communication driver for improved performance and scalability | |
CN111400777B (en) | Network storage system, user authentication method, device and equipment | |
CN113900810A (en) | Distributed graph processing method, system and storage medium | |
CN111949856B (en) | Web-based object storage query method and device | |
CN110659303A (en) | Read-write control method and device for database nodes | |
CN112256457A (en) | Data loading acceleration method and device based on shared memory, electronic equipment and storage medium | |
CN110851853B (en) | Data isolation method, device, computer equipment and storage medium | |
CN110798358B (en) | Distributed service identification method and device, computer readable medium and electronic equipment | |
US7543300B2 (en) | Interface for application components | |
CN117762880A (en) | Data access method and computing device | |
CN116775712A (en) | Method, device, electronic equipment, distributed system and storage medium for inquiring linked list | |
CN110019057B (en) | Request processing method and device | |
CN112764897B (en) | Task request processing method, device and system and computer readable storage medium | |
KR102202645B1 (en) | Data Sharing Method for Relational Edge Servers | |
WO2021232860A1 (en) | Communication method, apparatus and system | |
CN114760260A (en) | Message pushing system and method, storage medium and electronic equipment | |
CN109309583B (en) | Information acquisition method and device based on distributed system, electronic equipment and medium | |
CN113253944A (en) | Disk array access method, system and storage medium | |
CN113032820A (en) | File storage method, access method, device, equipment and storage medium | |
CN112395316A (en) | Data query method and device | |
CN112631996A (en) | Log searching method and device | |
CN110019113B (en) | Database service processing method and database server | |
CN117806836B (en) | Method, device and equipment for managing naming space of distributed file system | |
US10313438B1 (en) | Partitioned key-value store with one-sided communications for secondary global key lookup by range-knowledgeable clients |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |