CN116225308A - Data reading and writing method and device, storage medium and chip - Google Patents

Data reading and writing method and device, storage medium and chip Download PDF

Info

Publication number
CN116225308A
CN116225308A CN202211463058.0A CN202211463058A CN116225308A CN 116225308 A CN116225308 A CN 116225308A CN 202211463058 A CN202211463058 A CN 202211463058A CN 116225308 A CN116225308 A CN 116225308A
Authority
CN
China
Prior art keywords
read
data
sub
write
shared memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211463058.0A
Other languages
Chinese (zh)
Other versions
CN116225308B (en
Inventor
金鑫
马金钢
陈焕盛
吴剑斌
余芬芬
范凡
张稳定
王文丁
秦东明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
3Clear Technology Co Ltd
Original Assignee
3Clear Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 3Clear Technology Co Ltd filed Critical 3Clear Technology Co Ltd
Priority to CN202211463058.0A priority Critical patent/CN116225308B/en
Publication of CN116225308A publication Critical patent/CN116225308A/en
Application granted granted Critical
Publication of CN116225308B publication Critical patent/CN116225308B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Multi Processors (AREA)

Abstract

The disclosure relates to a data reading and writing method, a device, a storage medium and a chip, and relates to the technical field of computers. The method is applied to a main board, and the main board is used for running a computing process and a sub-read-write process, and comprises the following steps: the first data written into the first shared memory area by the computing process is sent to a total read-write process through the sub read-write process, and the total read-write process is used for integrating a plurality of first data sent by a plurality of sub read-write processes and then storing the integrated first data into a memory and/or storing the integrated first data into the memory; and writing second data sent by the total read-write process into a second shared memory area through the sub read-write process so as to instruct the computing process to acquire the second data from the second shared memory area. By using the data read-write method provided by the disclosure, the communication between the computing process and the sub-read-write process can be completed in the main board node, and the communication overhead caused by network communication is reduced.

Description

Data reading and writing method and device, storage medium and chip
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data reading and writing method, device, storage medium, and chip.
Background
I/O read-write is input/output read-write, when the data of the user space is read into the disk, the data of the user space is temporarily stored in a buffer area, and then the data is taken out from the buffer area and written into the disk; or when the data in the disk is read to the user space, the data in the disk is temporarily stored in the buffer area, and then the data is taken out from the buffer area and written into the user space.
In the related technology of I/O reading and writing, a computing process and an IO process are provided, when the computing process needs to read and write a file, data is sent to the IO process, and the IO process writes the data into a disk. In the case of a large number of computing processes, the same IO process needs to communicate with the large number of computing processes, resulting in a large communication overhead between the IO process and the computing process.
Disclosure of Invention
In order to overcome the problems in the related art, the present disclosure provides a data read-write method, apparatus, storage medium and chip.
According to a first aspect of an embodiment of the present disclosure, there is provided a data read-write method applied to a motherboard, where the motherboard is configured to run a computing process and a sub-read-write process, including:
the first data written into the first shared memory area by the computing process is sent to a total reading and writing process by the sub-reading and writing process, and the total reading and writing process is used for integrating a plurality of first data sent by a plurality of sub-reading and writing processes and storing the integrated first data into a memory and/or storing the integrated first data into the memory;
And writing second data sent by the total read-write process into a second shared memory area through the sub read-write process so as to instruct the computing process to acquire the second data from the second shared memory area.
Optionally, the sending, by the sub-read-write process, the first data written into the shared memory area by the computing process to the total read-write process, where the total read-write process is configured to integrate a plurality of first data sent by the sub-read-write process and then store the integrated first data to a memory, where the method includes:
writing the first data into the first shared memory area through the computing process;
the first data read from the first shared memory area is sent to the total read-write process through the sub read-write process which is located on the same main board as the computing process;
and integrating the plurality of first data sent by the sub-read-write processes on the plurality of main boards through the total read-write process, and storing the integrated first data into a memory.
Optionally, the method further comprises:
the sub read-write process accesses the first shared memory area through the pointer position of the first shared memory area and/or;
And the computing process accesses the second shared memory area through the pointer position of the second shared memory area.
Optionally, the method further comprises:
calling a first MPI function to create a communication domain;
for a plurality of processes in the communication domain, determining at least one process as the sub-read-write process, and the rest of the plurality of processes as the computing process.
Optionally, the sending, by the sub-read-write process, the first data written into the first shared memory area by the computing process to the total read-write process includes:
and under the condition that the sub read-write process determines that the plurality of computing processes store the first data in the first shared memory area, the plurality of computing processes are respectively written into the first data in the plurality of first shared memory areas and sent to the total read-write process.
Optionally, the method further comprises:
the computing process creates the first shared memory area through a second MPI function;
and the sub read-write process creates the second shared memory area through the second MPI function.
Optionally, the writing, by the sub-read-write process, the second data sent by the total read-write process into a second shared memory area to instruct the computing process to acquire the second data from the second shared memory area includes:
Sending the disassembled second data to the sub-read-write process through the total read-write process;
writing the second data into the second shared memory area through the sub read-write process;
and reading a plurality of second data from the second shared memory area through the computing process which is positioned on the same main board as the sub read-write process.
Optionally, the method further comprises:
and under the condition that the sub-read-write process receives a read request of a target computing process, sending second data represented by the field starting position and the field length to the target computing process according to the field starting position and the field length corresponding to the target computing process.
According to a second aspect of embodiments of the present disclosure, there is provided a data read-write apparatus, the apparatus including a first sub-read-write module and/or a second sub-read-write module;
the first sub-read-write module is configured to send the first data written into the first shared memory area by the computing process to a total read-write process through the sub-read-write process, wherein the total read-write process is used for integrating a plurality of first data sent by a plurality of sub-read-write processes and then storing the integrated first data into a memory;
And the second sub read-write module is configured to write second data sent by the total read-write process into a second shared memory area through the sub read-write process so as to instruct the computing process to acquire the second data from the second shared memory area.
According to a third aspect of the embodiments of the present disclosure, there is provided a data read-write apparatus, including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to implement the steps of the data read-write method provided by the first aspect of the embodiments of the present disclosure when executing the executable instructions.
According to a fourth aspect of the disclosed embodiments, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the data read-write method provided by the first aspect of the disclosed embodiments.
According to a fifth aspect of embodiments of the present disclosure, there is provided a chip comprising a processor and an interface; the processor is configured to read instructions to perform the steps of the data read-write method provided in the first aspect of the embodiments of the present disclosure.
The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects:
Because the computing process and the sub-read-write process are positioned on the same main board node, the communication between the computing process and the sub-read-write process belongs to the communication in the main board node, and the communication between the computing process and the sub-read-write process is not needed by a network, so that the communication expense caused by the communication is avoided.
In order to realize inter-process communication in a main board node, the present disclosure proposes that both a computing process and a sub-IO process in the main board node may access a second shared memory area created by the sub-IO process and a first shared memory area created by the computing process. Therefore, through the access to the first shared memory area and the second shared memory area, the data interaction between the computing process and the sub IO process in the main board node can be realized, and the I/O read-write operation of the memory can be realized.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a flowchart illustrating a data read-write method according to an exemplary embodiment.
FIG. 2 is a schematic diagram illustrating a computing process interacting with an IO process, according to an example embodiment.
FIG. 3 is a schematic diagram illustrating a computing process within the same node interacting with IO processes, according to an example embodiment.
FIG. 4 is a schematic diagram illustrating an overall IO process interacting with multiple sub-IO processes, according to an example embodiment.
Fig. 5 is a flowchart illustrating a data read-write method according to an exemplary embodiment.
Fig. 6 is a block diagram illustrating a data read-write apparatus according to an exemplary embodiment.
Fig. 7 is a block diagram illustrating a data read-write apparatus according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
It should be noted that, all actions for acquiring signals, information or data in the present application are performed under the condition of conforming to the corresponding data protection rule policy of the country of the location and obtaining the authorization given by the owner of the corresponding device.
Some of the technical content of the present application will be described first to facilitate the reader's understanding of the present solution.
The air quality model is an effective tool for simulating atmospheric pollution transmission, and can simulate the processes of advection, diffusion, dry sedimentation, wet sedimentation, gas phase chemistry, liquid phase chemistry, aerosol chemistry and the like of the atmosphere. The number of grid areas in grid data output by the air quality model is large, the grid resolution is large, a single process in a single-core CPU (Central processing Unit) cannot be used for sequentially calculating sub-data of each grid area in the grid data, the grid area calculation efficiency can be improved only by adopting a plurality of processes in a multi-core CPU to calculate the grid areas in parallel, and the I/O performance bottleneck can be obviously increased by the plurality of processes to calculate the grid areas in parallel, for example, the performance bottleneck such as multiple times of memory copying of IO processes and continuous preemption of CPU resources by the calculation processes and increase of system overhead is avoided.
To solve the performance bottleneck of I/O, the following four parallel I/O schemes are generated in the related art:
scheme 1: the multiple computing processes compute the whole file into multiple subfiles, and then write the multiple subfiles generated into the disk. And frequent reading and writing of massive subfiles are performed in the disk, so that the loss of the disk is large.
Scheme 2: in order to avoid frequent reading and writing of the disk, one computing process is used as a master process, and the other computing processes are used as slave processes. When the files in the disk are read, the master process reads all data required by calculation at one time, and then segments the data into a plurality of data to be distributed to a plurality of slave processes; when writing data into the disk, the plurality of slave processes send the plurality of data to the master process, and the master process merges the data into one file to be written into the disk. The master process needs to perform MPI (information transfer interface) handshake operation with all the slave processes, and when the number of slave processes is large, handshake overhead and network communication time caused by the handshake operation between the master process and the slave processes with large number increase performance bottleneck of I/O.
Scheme 3: in order to reduce the I/O performance bottleneck caused by frequent "handshake" operations, the MPI is used as a message transfer interface function library, when multiple processes in a communication domain initiate read/write requests to the same File in a disk, the multiple processes call the collective I/O functions of the MPI-I/O, such as the mpi_file_write_all function and the mpi_file_read_all function, and integrate the multiple read/write requests into relatively fewer read/write requests, thereby reducing the number of "handshake" times between a master process and a slave process and reducing the I/O performance bottleneck. However, when the number of slave processes increases and a plurality of slave processes and a master process are located on different motherboard nodes, the plurality of slave processes also need to communicate with the master process across the motherboard nodes, and the master process also needs to communicate with the slave processes across the motherboard nodes. In this process, the slave process needs to wait for the master process to write data into the disk before performing a new round of computation, so that a large number of slave processes are occupied by I/O read/write operations, resulting in a large network communication overhead.
Scheme 4: to reduce the communication overhead between nodes, a two-step approach-asynchronous I/O may be used, with one part of the process being selected as the IO process and the other part of the process being the computing process. When the computing process needs to read and write the file, after the data is sent to the IO process, the computing can be started for a new round, and the computing is not started until the IO process writes the data into the disk, so that the time occupied by the I/O read-write operation of the slave process/computing process is reduced. However, the computing processes and the IO processes are located on different motherboard nodes, and network communication needs to be performed between the different motherboard nodes in a wired and/or wireless manner, which also results in increased network communication overhead when the number of computing processes is huge.
Therefore, in the process of adopting the four schemes, after the number of computing processes is increased, as the computing processes and the IO processes are located on different mainboards, the network communication overhead between the huge number of computing processes and the IO processes is also larger, and the performance bottleneck of the I/O is increased.
Illustratively, as the air quality model evolves, the grid area and grid resolution of the air quality model output increases gradually, as does the number of computing processes required, and an air quality model of 3km resolution typically requires 5000 computing processes, which results in that each I/O operation of the air quality model requires that the IO process collect or distribute data from 5000 different computing processes, resulting in a significant network communication overhead.
In the air quality model, there is WRFchem (Weather Research and Forecasting model coupled to Chemistry, weather research and forecast model coupled with atmospheric chemistry) in the I/O communication scheme, please refer to fig. 2, where the left side of fig. 2 is a computing process, the right side is an IO process, the left side computing process includes multiple computing processes, each computing process includes 4 computing processes, each IO process includes 1 IO process on the right side, and each IO process interacts with 4 computing processes. When the IO process M writes the data sent by the computing process of the left group into the disk and then is in an idle state, the IO process M in the idle state writes the data sent by the computing process of the next group into the disk, and similarly, the principle of the IO process N is the same.
When grid data output by the air quality model is required to be written into a disk, an IO process on the right side is in a suspension waiting state, 4 computing processes in each group of computing processes on the left side parallelly compute data of a plurality of grid areas in the grid data, after the computing processes are computed, the computing processes send the grid area data to the IO process, the computing processes start a new round of computation, and the IO process writes the grid area data sent by the computing processes into the disk. In this process, if there are 500 computing processes and 2 IO processes, then 1 IO process needs to perform data transceiving with 250 computing processes across the motherboard node respectively, and if the number of IO processes is increased to reduce the communication burden of each IO process, the resource shortage of the computing process and the resource waste in the idle period of the IO process will be caused, so the WRFchem model also has communication overhead caused by communication across the motherboard node.
In order to solve the communication overhead caused by the communication of the cross-motherboard nodes, the present disclosure proposes that a computing process and an IO process are located on the same motherboard node, and the computing process and the IO process transmit and receive data through a shared memory area. Because the computing process and the IO process are positioned on the same main board without network communication, communication overhead caused by communication across main board nodes is avoided.
Fig. 1 is a flowchart of a data read-write method according to an exemplary embodiment, as shown in fig. 1, where the data read-write method is used in a motherboard, and the motherboard is located in a server, and includes at least one of steps S11 and S12.
In step S11, the sub-read-write process is used to send the first data written into the first shared memory area by the computing process to a total read-write process, where the total read-write process is used to integrate the plurality of first data sent by the plurality of sub-read-write processes and then store the integrated first data into the memory.
The main board is located in the server, and may be referred to as a main board node, a node, etc., and cross-main board node communication may be understood as cross-server communication; the sub-read-write process may be a sub-IO process, and the total read-write process may be a total IO process.
The first SHARED memory area is a memory area for storing data, which is created by the computing process in the local memory of the computing process through a second MPI function, and the second MPI function may be an mpi_win_location_shared function.
The memory may be a magnetic disk, magnetic tape, optical disk, etc.
Illustratively, step S11 is a process of writing first data to a disk, including: (1) The computing process writes the first data into the first shared memory area; (2) The sub read-write process is positioned on the same main board as the computing process, and the first data read from the first shared memory area is sent to the total read-write process; (3) And integrating the plurality of first data sent by the sub-read-write processes on the plurality of mainboards by the total read-write process, and storing the integrated first data into the memory.
In (1), the number of computing processes on the same motherboard is plural, and the first shared memory area is a shared memory area created by the computing process, so that the computing process can directly access the first shared memory area created by itself. For example, referring to fig. 3, a motherboard node has a computing process 1, a computing process 2, a computing process 3, and sub IO processes. The computing process 1 creates a first shared memory area under the computing process 1, the computing process 2 creates a first shared memory area under the computing process 2, and the computing process 3 creates a first shared memory area under the computing process 3. Therefore, the computing process 1 can directly access the first shared memory area under the computing process 1, and perform read-write operation on the first shared memory area under the computing process 1; the computing process 2 can directly access the first shared memory area under the computing process 2, and perform read-write operation on the first shared memory area under the computing process 2; the computing process 3 may directly access the first shared memory area under the computing process 3, and perform a read/write operation on the first shared memory area under the computing process 3.
In the step (2), the number of sub IO processes on the same main board has at least one, and the first shared memory area is a shared memory area created by a computing process, so that the sub IO processes located on the same main board with the computing process cannot directly access the first shared memory area created by the computing process, but only access the first shared memory area created by the computing process through the pointer position of the first shared memory area, and of course, the computing process can also only directly access the first shared memory area created by itself, and if access to the first shared memory areas created by other computing processes is required, access to the first shared memory area created by other processes is also required according to the pointer positions of the first shared memory areas created by other computing processes.
The different first shared memory areas correspond to different pointer positions, and the pointer positions are used for identifying addresses of the first shared memory areas. When the sub-IO process accesses the first SHARED memory area under the computing process through the pointer position, the pointer position of any first SHARED memory area can be queried through a third MPI function (such as an MPI_WIN_SHARED_QUERY function), and then the pointer position is used for accessing the first SHARED memory area corresponding to the pointer position.
Taking the example shown in fig. 3 as an example for the step (2), the sub IO processes located on the same motherboard as the computing process respectively use the pointer positions of the first shared memory area under the computing process 1, read the first data from the first shared memory area under the computing process 1, use the pointer positions of the first shared memory area under the computing process 2, read the first data from the first shared memory area under the computing process 2, use the pointer positions of the first shared memory area under the computing process 3, and read the first data from the first shared memory area under the computing process 3; and finally, the sub IO process reads out the three first data and sends the three first data to the total IO process.
The sub-IO process on each motherboard node may send the first data of the respective motherboard node to the main IO process responsible for writing into the memory through a fourth MPI function, which may be an mpi_scanterv function or an mpi_gatherv function.
In the step (3), the total IO process may be one process selected from processes running on any of the motherboard nodes, or may be one process located on the rest of the motherboard nodes. Referring to the four motherboard nodes shown in fig. 4, each motherboard node has 4 computing processes and 1 sub IO process thereon. When the total IO process is one process selected from the target main board node, the total IO process is communicated with the sub IO processes on the target main board node in the main board node, and the total IO process is communicated with the rest 3 sub IO processes through a network; and when the total IO process is a newly added mainboard node, network communication is performed between the total IO process and the 4 sub IO processes.
The main IO process shown in fig. 4 is exemplified by a newly added motherboard node, and the main IO process receives first data output from a plurality of sub IO processes through inter-node communication, and performs integration and splicing on the plurality of first data in sequence to obtain complete grid data, and then writes the grid data into a disk.
The grid data output by the air quality model is high-resolution atmospheric pollutant concentration data, in order to quickly write the grid data into a memory, the grid data are firstly split into data of a plurality of grid areas according to the sequence, a plurality of calculation processes on a plurality of main board nodes are adopted to calculate the data of the grid areas in parallel, and after the calculation is finished, the plurality of calculation processes write the data of the grid areas into respective first shared memory areas; the plurality of sub IO processes take out the data of the grid areas from the plurality of first shared memory areas and send the data of the grid areas to the main IO process; and the main IO process sequentially combines the data of the grid areas to obtain grid data, and the grid data is written into the memory.
In step S12, the second data sent by the total read-write process is written into a second shared memory area through the sub read-write process, so as to instruct the computing process to acquire the second data from the second shared memory area.
The second SHARED memory area is a memory area for storing data, which is created by the sub-IO process in the local memory of the sub-IO process through a second MPI function, and the second MPI function may be an mpi_win_allocated_shared function. All processes on the same motherboard node can call the second MPI function to create a memory area for storing data in the local memory of each process.
Illustratively, step S12 is a process of reading second data from the disk, including: (4) The total read-write process sends the disassembled second data to the sub-read-write process; (5) The sub read-write process writes the second data into the second shared memory area; (6) And the computing process which is positioned on the same main board as the sub read-write process reads a plurality of second data from the second shared memory area.
In the step (4), after the total IO process obtains the complete data from the memory, the complete data is disassembled into a plurality of second data, and the plurality of second data are distributed to sub IO processes in a plurality of main board nodes.
And under the condition that the total IO process is a process randomly selected from a plurality of processes of the target main board nodes in the plurality of main board nodes, the total IO process transmits second data to sub-IO processes in the main board nodes through communication in the main board nodes, the total IO process transmits other second data to sub-IO processes in the other main board nodes through network communication, and the second data is transmitted to the sub-IO processes in the other main board nodes. And under the condition that the total IO process is a process on the newly added main board node, the total IO process communicates second data through a network and sends the second data to the sub IO process in the main board node.
In the step (5), the second shared memory area is a shared area created by the sub IO process, and the sub IO process may directly access the second shared memory area, and the computing process located on the same motherboard node as the sub IO process needs to access the second shared memory area through a pointer location of the second shared memory area. Of course, in the case that there are multiple sub IO processes in the same motherboard node, the sub IO process can only directly access the second shared memory area created by itself, and when accessing the second shared memory area created by the remaining sub IO processes, the pointer position is also required to be used to access the second shared memory area created by the remaining sub IO processes.
Under the condition that the number of sub IO processes in one main board node is 1, since the 1 sub IO process needs to send the data acquired from the main IO process to a plurality of computing processes, the capacity of a second shared memory area under the 1 sub IO process is larger than or equal to the capacity of a first shared memory area under the plurality of computing processes in the main board node; in the case that the number of sub-IO processes in the motherboard node is plural, the capacity of the second shared memory area of each sub-IO process needs to be larger than the capacity of the first shared memory area of each computing process.
The sub IO processes in the same main board node serve as a total scheduling process, the capacity of a first shared memory area of each computing process in the main board node and the field starting position and the field length corresponding to a grid area calculated by each computing process are known, and therefore, when the sub IO processes receive a read-write request of a target computing process for a second shared memory area, second data represented by the field starting position and the field length are sent to the target computing process according to the field starting position and the field length corresponding to the target computing process.
For example, referring to fig. 3, after the sub IO process receives the data ABCDEFGHIJKL of the grid area, the sub IO process writes the data ABCDEFGHIJKL into the second shared memory area, and the sub IO process knows that the calculation process 1 calculates the second data of the field start position a and the field length 4; the calculation process 2 calculates second data of a field starting position E and a field starting length 4; calculated by the calculation process 3 is the second data of the field start position I and the field start length 4. When the sub IO process receives a read-write request of the computing process 1 to the second shared memory area, the sub IO process sends second data ABCD in the second shared memory area to the computing process 1; when the sub IO process receives a read-write request of the computing process 2 to the second shared memory area, the sub IO process sends second data EFGH in the second shared memory area to the computing process 2; when the sub IO process receives the read-write request from the computing process 3 to the second shared memory area, the sub IO process sends second data IJKL in the second shared memory area to the computing process 3.
The calculation amount and the calculation starting position of the calculation process in each motherboard node are usually fixed, so that the sub IO process can determine the field starting position and the field length of each calculation process.
In the step (6), the second shared memory area is a shared memory area created by the sub IO process, so that the computing process located on the same main board as the sub IO process cannot directly access the second shared memory area created by the sub IO process, and the plurality of computing processes can access the second shared memory area only by passing through the pointer position of the second shared memory area.
The different second shared memory areas correspond to different pointer positions, and the pointer positions are used for indicating addresses of the second shared memory areas. When the computing process accesses the second shared memory area through the pointer position, the pointer position of any second shared memory area can be queried through the third MPI function, and then the second shared memory area corresponding to the pointer position is accessed through the pointer position.
Step (4) is illustrated by way of example in fig. 3, and the computing process 1, the computing process 2, and the computing process 3 on the same motherboard as the sub IO process respectively use the pointer positions of the second shared memory area to read the second data from the second shared memory area.
In the above step, the first data and the second data may be data of a mesh region outputted from the air quality model, the data of the mesh region being sub-data of the mesh data divided into a plurality of regions, concentration data of the contaminant, etc. being recorded in each of the mesh points.
The applicant found that, when designing a shared memory area, if designing a sub-shared memory area on a motherboard, a sub-IO process and a computing process on the motherboard share the sub-shared memory area, and connect the sub-shared memory areas on multiple motherboards into a total shared memory area to perform I/O read/write, a large amount of data redundancy is caused because: when writing data into a disk, a plurality of computing processes on a main board need to write the data into a sub-shared memory area corresponding to the main board according to the pointer position of the sub-shared memory area, and the sub-IO process also needs to read the data written into the sub-shared memory area according to the pointer position of the sub-shared memory area and send the data to the total IO process; when data is read out from a disk, the total IO process disassembles the data and distributes the disassembled data to a plurality of sub IO processes, the plurality of sub IO processes need to respectively send the data to the sub shared memory area according to pointer positions of the sub shared memory areas on respective mainboards, and the computing process reads the data from the self shared memory area according to the pointer positions. In this process, both the computing process and the child IO process, frequent access to the third MPI function is required, using a large number of pointer locations to access the child shared memory region, resulting in a large amount of data redundancy.
In order to reduce data redundancy, a process in each main board node in the disclosure creates a self shared memory area, for example, a computing process creates a self first shared memory area, a sub-IO process creates a self second shared memory area, when the computing process accesses the self created first shared memory area, a pointer position is not needed, and when the sub-IO process accesses the self created second shared memory area, the pointer position is not needed, so that the use of the pointer position is reduced, the number of times of accessing a third MPI function is reduced, and the data redundancy is reduced.
By the technical scheme, the computing process and the sub-read-write process are located on the same main board node, so that communication between the computing process and the sub-read-write process belongs to communication in the main board node, communication between the computing process and the sub-read-write process is not needed by means of a network between nodes, and communication overhead caused by network communication is avoided. The communication overhead refers to that when different processes communicate with each other, the two processes are required to perform a three-way handshake or four-way handshake operation, and the handshake operation requires frequent communication of the processes between different mainboard nodes, and the frequent communication brings additional communication overhead.
In order to realize inter-process communication in a main board node, the present disclosure proposes that a computing process and a sub-IO process in the main board node can access a second shared memory area created by the sub-IO process and a first shared memory area created by the computing process, so that data interaction between the computing process and the sub-IO process in the main board node can be realized by accessing the first shared memory area and the second shared memory area, and I/O read-write operation of a memory can be realized.
In one possible implementation, referring to fig. 5, the computing process and the sub-read-write process are determined by:
in step S21, a first MPI function is called, creating a communication domain.
Wherein processes within each motherboard node may call a first MPI function to create a communication domain. The first MPI function may be an mpi_comm_split_type function and the communication domain may be a shared memory communication domain. Different mainboard nodes correspond to different shared memory communication domains, the different shared memory communication domains are mutually isolated, and each shared memory communication domain comprises a computing process and a sub IO process. The shared memory communication domain is a region that creates a communication for the shared memory region.
In step S22, for a plurality of processes in the communication domain, at least one process is determined as the sub-read-write process, and the remaining processes of the plurality of processes are determined as the calculation process.
After all the processes in the main board node create a communication domain, at least one process can be determined from the multiple processes to be used as a sub IO process, and the rest processes are used as computing processes.
In the process of reading the first data from the plurality of first shared memory areas, if the plurality of first data are not completely stored in the plurality of first shared memory areas, the sub-IO process may cause incomplete data sent to the total IO process, and the total IO process may not splice into a complete file according to the incomplete data.
Therefore, under the condition that the sub IO process determines that the plurality of computing processes all store the first data in the first shared memory area, the plurality of computing processes write the first data in the plurality of first shared memory areas respectively and send the first data to the total IO process.
The sub-IO process may determine whether all the computing processes have completed the data storage operation by determining whether the computing processes have completed the storage operation when determining whether the computing processes store the first data in the first shared memory area.
The plurality of computing processes also need to read the second data from the second shared memory area under the condition that the plurality of sub IO processes store the second data in the second shared memory area respectively. For example, the computing process invokes the MPI_WIN_FENCE function to determine that all child IO processes have completed the data store operation.
Fig. 6 is a block diagram illustrating a data read-write apparatus according to an exemplary embodiment. Referring to fig. 6, the data read-write apparatus 120 includes a first sub-read-write module 121 and/or a second sub-read-write module 122.
The first sub-read-write module 121 is configured to send, through the sub-read-write process, the first data written into the first shared memory area by the computing process to a total read-write process, where the total read-write process is used to integrate the plurality of first data sent by the plurality of sub-read-write processes and store the integrated first data into a memory;
the second sub-read-write module 122 is configured to write, through the sub-read-write process, second data sent by the total read-write process into a second shared memory area, so as to instruct the computing process to acquire the second data from the second shared memory area.
Optionally, the first sub read-write module 121 includes:
The first data writing module is configured to write the first data into the first shared memory area through the computing process;
the first data sending module is configured to send the first data read from the first shared memory area to the total read-write process through the sub-read-write process which is located on the same main board as the computing process;
the first data integration module is configured to integrate a plurality of first data sent by the sub-read-write processes on a plurality of main boards through the total read-write process and then store the integrated first data into a memory.
Optionally, the data read/write device 120 includes: the first shared memory area access module and/or the second shared memory area access module;
the sub read-write process accesses the first shared memory area through the pointer position of the first shared memory area;
and the second shared memory area access module is configured to access the second shared memory area through the pointer position of the second shared memory area by the computing process.
Optionally, the data read/write device 120 includes:
the communication domain creation module is configured to call a first MPI function to create a communication domain;
And the process determining module is configured to determine at least one process as the sub-read-write process for a plurality of processes in the communication domain, and the rest processes in the plurality of processes are used as the computing processes.
Optionally, the first data sending module includes:
and the storage determining module is configured to write the plurality of computing processes into the first data in the plurality of first shared memory areas respectively and send the first data to the total reading and writing process when the sub-reading and writing process determines that the plurality of computing processes store the first data in the first shared memory area.
Optionally, the data read/write device 120 includes:
the first shared memory area creating module is configured to create the first shared memory area through a second MPI function by the computing process;
and the second shared memory area creation module is configured to create the second shared memory area through the second MPI function by the sub read-write process.
Optionally, the second sub read-write module 122 includes:
the second data sending module is configured to send the disassembled second data to the sub-read-write process through the total read-write process;
The second data writing module is configured to write the second data into the second shared memory area through the sub-read-write process;
and the second data reading module is configured to read a plurality of second data from the second shared memory area through the computing process which is positioned on the same main board as the sub read-write process.
Optionally, the data read/write device 120 includes:
and the data configuration module is configured to send second data represented by the field starting position and the field length to the target computing process according to the field starting position and the field length corresponding to the target computing process under the condition that the sub-read-write process receives the read request of the target computing process.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
Fig. 7 is a block diagram illustrating an apparatus 1900 for data reading and writing, according to an example embodiment. For example, the apparatus 1900 may be provided as a server. Referring to fig. 7, the apparatus 1900 includes a processing component 1922 that further includes one or more processors and memory resources represented by memory 1932 for storing instructions, such as application programs, that can be executed by the processing component 1922. The application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions. Further, processing component 1922 is configured to execute instructions to perform the data read-write methods described above.
The apparatus 1900 may further comprise a power component 1926 configured to perform power management of the apparatus 1900, a wired or wireless network interface 1950 configured to connect the apparatus 1900 to a network, and an input/output interface 1958. The device 1900 may operate based on an operating system stored in the memory 1932.
The present disclosure also provides a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the data read-write method provided by the present disclosure.
The apparatus may be a stand-alone electronic device or may be part of a stand-alone electronic device, for example, in one embodiment, the apparatus may be an integrated circuit (Integrated Circuit, IC) or a chip, where the integrated circuit may be an IC or may be a collection of ICs; the chip may include, but is not limited to, the following: GPU (Graphics Processing Unit, graphics processor), CPU (Central Processing Unit ), FPGA (Field Programmable Gate Array, programmable logic array), DSP (Digital Signal Processor ), ASIC (Application Specific Integrated Circuit, application specific integrated circuit), SOC (System on Chip, SOC, system on Chip or System on Chip), etc. The integrated circuit or the chip may be used to execute executable instructions (or codes) to implement the data read-write method. The executable instructions may be stored on the integrated circuit or chip or may be retrieved from another device or apparatus, such as the integrated circuit or chip including a processor, memory, and interface for communicating with other devices. The executable instructions may be stored in the memory, which when executed by the processor implement the data read-write method described above; alternatively, the integrated circuit or chip may receive the executable instructions through the interface and transmit the executable instructions to the processor for execution, so as to implement the data read/write method described above.
In another exemplary embodiment, a computer program product is also provided, comprising a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-described data read-write method when executed by the programmable apparatus.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (12)

1. The data read-write method is characterized in that the method is applied to a main board, and the main board is used for running a computing process and a sub read-write process, and comprises the following steps:
The first data written into the first shared memory area by the computing process is sent to a total reading and writing process by the sub-reading and writing process, and the total reading and writing process is used for integrating a plurality of first data sent by a plurality of sub-reading and writing processes and storing the integrated first data into a memory and/or storing the integrated first data into the memory;
and writing second data sent by the total read-write process into a second shared memory area through the sub read-write process so as to instruct the computing process to acquire the second data from the second shared memory area.
2. The method according to claim 1, wherein the sending, by the sub-read-write process, the first data written into the shared memory area by the computing process to the total read-write process, the total read-write process being configured to integrate the plurality of first data sent by the plurality of sub-read-write processes and store the integrated first data to the memory, includes:
writing the first data into the first shared memory area through the computing process;
the first data read from the first shared memory area is sent to the total read-write process through the sub read-write process which is located on the same main board as the computing process;
And integrating the plurality of first data sent by the sub-read-write processes on the plurality of main boards through the total read-write process, and storing the integrated first data into a memory.
3. The method according to claim 1, wherein the method further comprises:
the sub read-write process accesses the first shared memory area through the pointer position of the first shared memory area and/or;
and the computing process accesses the second shared memory area through the pointer position of the second shared memory area.
4. The method according to claim 1, wherein the method further comprises:
calling a first MPI function to create a communication domain;
for a plurality of processes in the communication domain, determining at least one process as the sub-read-write process, and the rest of the plurality of processes as the computing process.
5. The method of claim 2, wherein the sending, by the sub-read-write process, the first data written by the computing process into the first shared memory area to the total read-write process comprises:
and under the condition that the sub read-write process determines that the plurality of computing processes store the first data in the first shared memory area, the plurality of computing processes are respectively written into the first data in the plurality of first shared memory areas and sent to the total read-write process.
6. The method according to claim 1, wherein the method further comprises:
the computing process creates the first shared memory area through a second MPI function;
and the sub read-write process creates the second shared memory area through the second MPI function.
7. The method of claim 1, wherein writing, by the sub-read-write process, second data sent by the total read-write process to a second shared memory area to instruct the computing process to obtain the second data from the second shared memory area, comprises:
sending the disassembled second data to the sub-read-write process through the total read-write process;
writing the second data into the second shared memory area through the sub read-write process;
and reading a plurality of second data from the second shared memory area through the computing process which is positioned on the same main board as the sub read-write process.
8. The method of claim 7, wherein the method further comprises:
and under the condition that the sub-read-write process receives a read request of a target computing process, sending second data represented by the field starting position and the field length to the target computing process according to the field starting position and the field length corresponding to the target computing process.
9. The data read-write device is characterized by comprising a first sub read-write module and/or a second sub read-write module;
the first sub-read-write module is configured to send the first data written into the first shared memory area by the computing process to a total read-write process through the sub-read-write process, wherein the total read-write process is used for integrating a plurality of first data sent by a plurality of sub-read-write processes and then storing the integrated first data into a memory;
and the second sub read-write module is configured to write second data sent by the total read-write process into a second shared memory area through the sub read-write process so as to instruct the computing process to acquire the second data from the second shared memory area.
10. A data reading and writing apparatus, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to implement the steps of the method of any one of claims 1 to 8 when executing the executable instructions.
11. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the steps of the method of any of claims 1 to 8.
12. A chip, comprising a processor and an interface; the processor is configured to read instructions to perform the method of any one of claims 1 to 8.
CN202211463058.0A 2022-11-21 2022-11-21 Data reading and writing method and device, storage medium and chip Active CN116225308B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211463058.0A CN116225308B (en) 2022-11-21 2022-11-21 Data reading and writing method and device, storage medium and chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211463058.0A CN116225308B (en) 2022-11-21 2022-11-21 Data reading and writing method and device, storage medium and chip

Publications (2)

Publication Number Publication Date
CN116225308A true CN116225308A (en) 2023-06-06
CN116225308B CN116225308B (en) 2023-12-08

Family

ID=86575612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211463058.0A Active CN116225308B (en) 2022-11-21 2022-11-21 Data reading and writing method and device, storage medium and chip

Country Status (1)

Country Link
CN (1) CN116225308B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130326180A1 (en) * 2012-05-31 2013-12-05 International Business Machines Corporation Mechanism for optimized intra-die inter-nodelet messaging communication
CN109271344A (en) * 2018-08-07 2019-01-25 浙江大学 The data preprocessing method read based on Shen prestige chip architecture parallel file
US20190332318A1 (en) * 2018-04-26 2019-10-31 International Business Machines Corporation Accelerating shared file checkpoint with local burst buffers
CN111651286A (en) * 2020-05-27 2020-09-11 泰康保险集团股份有限公司 Data communication method, device, computing equipment and storage medium
CN114138381A (en) * 2022-01-30 2022-03-04 北京卡普拉科技有限公司 Processing system of numerical program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130326180A1 (en) * 2012-05-31 2013-12-05 International Business Machines Corporation Mechanism for optimized intra-die inter-nodelet messaging communication
US20190332318A1 (en) * 2018-04-26 2019-10-31 International Business Machines Corporation Accelerating shared file checkpoint with local burst buffers
CN109271344A (en) * 2018-08-07 2019-01-25 浙江大学 The data preprocessing method read based on Shen prestige chip architecture parallel file
CN111651286A (en) * 2020-05-27 2020-09-11 泰康保险集团股份有限公司 Data communication method, device, computing equipment and storage medium
CN114138381A (en) * 2022-01-30 2022-03-04 北京卡普拉科技有限公司 Processing system of numerical program

Also Published As

Publication number Publication date
CN116225308B (en) 2023-12-08

Similar Documents

Publication Publication Date Title
Wulf et al. C. mmp: A multi-mini-processor
JP2644780B2 (en) Parallel computer with processing request function
KR101400286B1 (en) Method and apparatus for migrating task in multi-processor system
CN108170612B (en) Automatic testing method and device and server
CN111324610A (en) Data synchronization method and device
CN112632069B (en) Hash table data storage management method, device, medium and electronic equipment
CN116541227B (en) Fault diagnosis method and device, storage medium, electronic device and BMC chip
CN101876954A (en) Virtual machine control system and working method thereof
CN113821332B (en) Method, device, equipment and medium for optimizing efficiency of automatic machine learning system
JP2006236123A (en) Job distribution program, job distribution method and job distribution device
CN116225308B (en) Data reading and writing method and device, storage medium and chip
Cornebize et al. Emulating high performance linpack on a commodity server at the scale of a supercomputer
CN117056123A (en) Data recovery method, device, medium and electronic equipment
CN103678244A (en) Intelligent device without application processor
CN114925078A (en) Data updating method, system, electronic device and storage medium
CN103064723A (en) Method and computer system for identifying virtual machine memory
CN115809015A (en) Method for data processing in distributed system and related system
CN114020454A (en) Memory management method, device, equipment and medium
CN113392052A (en) BIOS system, method and computer readable storage medium based on four-way server
KR100978083B1 (en) Procedure calling method in shared memory multiprocessor and computer-redable recording medium recorded procedure calling program
Enberg et al. Transcending POSIX: The End of an Era?
Civera et al. The μ Project: An Experience with a Multimicroprocessor System.
JPH09319653A (en) Information processor, information processing system and control method for the same
JPH02245864A (en) Multiprocessor system
Obaidat Performance evaluation of the IMPS multiprocessor system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant