CN112433869B - Software supernode-oriented OpenCL programming framework construction method and device - Google Patents

Software supernode-oriented OpenCL programming framework construction method and device Download PDF

Info

Publication number
CN112433869B
CN112433869B CN202011398621.1A CN202011398621A CN112433869B CN 112433869 B CN112433869 B CN 112433869B CN 202011398621 A CN202011398621 A CN 202011398621A CN 112433869 B CN112433869 B CN 112433869B
Authority
CN
China
Prior art keywords
communication port
opencl
node
daemon
software
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011398621.1A
Other languages
Chinese (zh)
Other versions
CN112433869A (en
Inventor
唐滔
崔英博
黄春
彭林
杨灿群
方建滨
张鹏
左克
于恒彪
范小康
易昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202011398621.1A priority Critical patent/CN112433869B/en
Publication of CN112433869A publication Critical patent/CN112433869A/en
Application granted granted Critical
Publication of CN112433869B publication Critical patent/CN112433869B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/542Event management; Broadcasting; Multicasting; Notifications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/547Messaging middleware

Abstract

The application relates to a software supernode-oriented OpenCL programming framework construction method and device, computer equipment and a storage medium. The method comprises the following steps: the construction of the software supernode OpenCL programming environment is completed by starting an OpenCL main process on a main node of a software supernode and initializing a first communication port, broadcasting the first communication port to a slave node of the software supernode through a daemon process on the main node, starting OpenCL-handler processes on all physical nodes and initializing second communication ports after the main node and the slave node both acquire addresses of the first communication port, sending a local second communication port to the OpenCL main process by each physical node according to the first communication port, packaging all received addresses of the second communication port and the first communication port by the OpenCL main process, and sending global address information obtained by packaging to all the physical nodes.

Description

Software supernode-oriented OpenCL programming framework construction method and device
Technical Field
The present application relates to the technical field of heterogeneous computer systems, and in particular, to a software supernode-oriented OpenCL programming framework construction method, apparatus, computer device, and storage medium.
Background
Heterogeneous computing is continuously developed in the field of high-performance computing due to the characteristics of high performance and high energy efficiency, and more special accelerators such as a GPU (graphics processing unit), an FPGA (field programmable gate array), a DSP (digital signal processor), an AI (analog input/output) accelerator and the like are developed. Generally, an accelerator is connected to a host-side CPU in the form of a PCIe (Peripheral Component Interconnect Express) Peripheral device to form a tightly coupled heterogeneous computing node, and then a parallel system is constructed by a plurality of nodes. This approach is computationally efficient but not easily scalable, making it difficult to add to a well-built high performance computing system when new accelerators are present.
Aiming at the problems, an array heterogeneous computer system is designed and constructed, accelerators form an array independently on the hardware level, the array is interconnected with the existing system through a high-speed interconnection network, when the accelerators need to be replaced or a novel accelerator appears, nodes only need to be added or reduced in the accelerator array, and the array heterogeneous computer system is easy to expand. Meanwhile, a software super-junction point environment facing the array heterogeneous computer system is constructed. The software super-node environment organizes a plurality of physically separated nodes into a virtual heterogeneous node, the super-node comprises a main node and a plurality of slave nodes, each physical node runs a daemon resident in the background and is responsible for managing the internal information of the super-node, and the internal information mainly comprises the number of the physical nodes in the super-node, the identity of the nodes (the main node or the slave nodes), the serial numbers of the nodes in the super-node, the communication port numbers of all the nodes in the super-node and the like. The daemon maintains a shared memory, which can communicate with other processes in the same node, and in addition, the daemon also supports communication between any nodes in the supernode.
However, in the prior art, no corresponding programming framework exists, and access to the computing resources inside and across the super junction point can be realized.
Disclosure of Invention
In view of the above, it is necessary to provide a software supernode-oriented OpenCL programming framework construction method, apparatus, computer device and storage medium capable of accessing computing resources inside and across a supernode.
A software supernode-oriented OpenCL programming framework construction method comprises the following steps:
acquiring a physical node list of an array heterogeneous computer system, and constructing a software super node of the array heterogeneous computer system; the software super-nodes comprise a main node and a plurality of slave nodes; a daemon is operated on each physical node in the physical node list;
starting an OpenCL host process on the master node, initializing a first communication port on the OpenCL host process, and broadcasting a first communication port address of the first communication port to a daemon process of the slave node through the daemon process of the master node;
starting an OpenCL-handler process on each slave node and each master node which receive the first communication port address, and initializing a second communication port of the OpenCL-handler process, so that the physical node sends a second communication port address of the second communication port to the OpenCL master process according to the first communication port address;
and packaging all the received second communication port addresses and the first communication port addresses through the OpenCL main process, and sending the global address information obtained by packaging to all the physical nodes, so that the OpenCL-handler process on the physical nodes receives the global address information, and the construction of the software supernode OpenCL programming environment is completed.
In one embodiment, the daemon process of the main node comprises a main node shared memory space; the first communication port is a high-speed communication port of a river;
after an OpenCL host process is started at the host node and a first communication port on the OpenCL host process is initialized, the method further includes:
and writing the first communication port address into the main node shared memory space.
In one embodiment, the method further comprises the following steps: reading information in the main node shared memory space through a daemon process of the main node to obtain the address of the first communication port;
and acquiring the slave node process number of the daemon process of the slave node in the software supernode, so that the daemon process of the master node broadcasts the first communication port address to the daemon process of the slave node through a Tianhe high-speed communication library interface according to the slave node process number.
In one embodiment, the method further comprises the following steps: after the first communication port address is broadcast to the daemon process of the slave node through the daemon process of the master node, a message for starting an OpenCL-handler process is sent to the daemon process of the slave node through the daemon process of the master node.
In one embodiment, the method further comprises the following steps: the daemon process of the slave node comprises a slave node shared memory space;
writing the first communication port address into a shared memory space of the slave node through a daemon process of the slave node on each slave node which receives the first communication port address; the first communication port address is received by the slave node through a sky and river high-speed communication library interface;
receiving a message for starting an OpenCL-handler process sent by the daemon process of the main node by the daemon process of the slave node through a Tianhe high-speed communication library interface;
starting an OpenCL-handler process on the master node and the slave node, and initializing a second communication port corresponding to the OpenCL-handler process; the second communication port is a high-speed communication port of the sky river;
reading information in a memory space shared by the master node and the slave node through OpenCL-handler processes of the master node and the slave node to obtain the address of the first communication port;
enabling OpenCL-handler processes of the master node and the slave node to send a second communication port address of the second communication port to the OpenCL master process according to the first communication port address; and the second communication port address is sent by the master node and the slave node through a high-speed communication library interface of the sky river.
In one embodiment, the method further comprises the following steps: the global address information is a global array; and packaging all the received second communication port addresses and the first communication port addresses through the OpenCL main process, and sending a global array obtained by packaging to all the physical nodes, so that the OpenCL-handler process on the physical nodes receives the global array, and the construction of the software super-node OpenCL programming environment is completed.
In one embodiment, the method further comprises the following steps: packaging all the received second communication port addresses and the first communication port addresses through the OpenCL host process, and packaging the obtained global address information so that the OpenCL host process sends the global address information to all the physical nodes through a high-speed communication library interface of a sky river;
and enabling the OpenCL-handler process on the physical node to receive the global address information through a Tianhe high-speed communication library interface, and completing the construction of the software supernode OpenCL programming environment.
In one embodiment, the method further comprises the following steps: the global address information is a global array.
An OpenCL programming framework building device for software supernodes, the device comprising:
the software super-junction point module is used for acquiring a physical node list of the array heterogeneous computer system and constructing a software super-junction of the array heterogeneous computer system; the software super-nodes comprise a main node and a plurality of slave nodes; a daemon is operated on each physical node;
a first communication port address broadcasting module, configured to start an OpenCL host process at the master node, initialize a first communication port on the OpenCL host process, and broadcast a first communication port address of the first communication port to a daemon process of the slave node through the daemon process of the master node;
a second communication port address sending module, configured to start an OpenCL-handler process on each slave node and the master node that receive the first communication port address, and initialize a second communication port of the OpenCL-handler process, so that the physical node sends a second communication port address of the second communication port to the OpenCL master process according to the first communication port address;
and the global address information establishing module is used for packaging all the received second communication port addresses and the first communication port addresses through the OpenCL master process, and sending the global address information obtained by packaging to all the physical nodes, so that the OpenCL-handler process on the physical nodes receives the global address information, and the construction of the software supernode OpenCL programming environment is completed.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring a physical node list of an array heterogeneous computer system, and constructing a software super node of the array heterogeneous computer system; the software super-nodes comprise a main node and a plurality of slave nodes; a daemon is operated on each physical node in the physical node list;
starting an OpenCL host process on the master node, initializing a first communication port on the OpenCL host process, and broadcasting a first communication port address of the first communication port to a daemon process of the slave node through the daemon process of the master node;
starting an OpenCL-handler process on each slave node and each master node which receive the first communication port address, and initializing a second communication port of the OpenCL-handler process, so that the physical node sends a second communication port address of the second communication port to the OpenCL master process according to the first communication port address;
and packaging all the received second communication port addresses and the first communication port addresses through the OpenCL main process, and sending the global address information obtained by packaging to all the physical nodes, so that the OpenCL-handler process on the physical nodes receives the global address information, and the construction of the software supernode OpenCL programming environment is completed.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring a physical node list of an array heterogeneous computer system, and constructing a software super node of the array heterogeneous computer system; the software super-nodes comprise a main node and a plurality of slave nodes; a daemon is operated on each physical node in the physical node list;
starting an OpenCL host process on the master node, initializing a first communication port on the OpenCL host process, and broadcasting a first communication port address of the first communication port to a daemon process of the slave node through the daemon process of the master node;
starting an OpenCL-handler process on each slave node and each master node which receive the first communication port address, and initializing a second communication port of the OpenCL-handler process, so that the physical node sends a second communication port address of the second communication port to the OpenCL master process according to the first communication port address;
and packaging all the received second communication port addresses and the first communication port addresses through the OpenCL main process, and sending the global address information obtained by packaging to all the physical nodes, so that the OpenCL-handler process on the physical nodes receives the global address information, and the construction of the software supernode OpenCL programming environment is completed.
The OpenCL programming framework construction method, the device, the computer equipment and the storage medium for the software super node start an OpenCL main process on a main node of the software super node and initialize a first communication port on the OpenCL main process, broadcast the first communication port to a slave node of the software super node through a daemon process on the main node, start an OpenCL-handler process on all physical nodes and initialize second communication ports of the OpenCL-handler process after the main node and the slave node acquire first communication port addresses, each physical node sends a local second communication port to the OpenCL main process according to the first communication port, the OpenCL main process packages all received second communication port addresses and first communication port addresses and sends the packaged global address information to all physical nodes so that the OpenCL-handler process on the physical nodes receives global address information, and completing the construction of the software supernode OpenCL programming environment. By the method, resources on other computing nodes in the array heterogeneous computer system can be accessed by utilizing the OpenCL programming interface, and the use of the software super node is realized.
Drawings
FIG. 1 is an application scenario diagram of an OpenCL programming framework construction method for software supernodes in an embodiment;
FIG. 2 is a flow chart illustrating a method for constructing an OpenCL programming framework for software supernodes according to an embodiment;
FIG. 3 is a structural block diagram of an OpenCL programming framework building device oriented to software supernodes in one embodiment;
FIG. 4 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The construction method of the OpenCL programming framework oriented to the software supernode can be applied to the application environment shown in FIG. 1. The software super-junction point comprises a main node and a plurality of slave nodes, the main node comprises a daemon process, an OpenCL main process and an OpenCL-handler process, and the slave nodes comprise the daemon process and the OpenCL-handler process. The main node and the slave node in the software super-junction point can communicate with each other, and the processes in the nodes can communicate with each other.
In an embodiment, as shown in fig. 2, a method for constructing an OpenCL programming framework oriented to a software supernode is provided, which is described by taking the application of the method to the software supernode in fig. 1 as an example, and includes the following steps:
step 202, acquiring a physical node list of the array heterogeneous computer system, and constructing a software supernode of the array heterogeneous computer system.
The software super-node is composed of a plurality of physical nodes in the array heterogeneous computing system. The physical node may be a general-purpose Processor, or may be a plurality of special accelerators, such as a GPU (Graphics Processing Unit), an FPGA (Field-programmable Gate Array), a DSP (Digital Signal Processor), an AI (Artificial Intelligence) accelerator, and so on. On the hardware level, the physical nodes independently form an array and are interconnected with the existing system through a high-speed interconnection network. The software super-node comprises a main node and a plurality of slave nodes; each physical node in the physical node list runs a daemon. The daemon resides on the background of the software supernode, the communication port information of all the slave node daemon processes is stored in the shared memory space maintained by the master node in the software supernode, and the communication port information of the master node daemon processes is stored in the shared memory space maintained by all the slave nodes, namely, the master node and the slave nodes in the software supernode can communicate with each other.
Step 204, starting the OpenCL host process at the master node, initializing the first communication port on the OpenCL host process, and broadcasting the first communication port address of the first communication port to the daemon process of the slave node through the daemon process of the master node.
The OpenCL runtime environment is mainly composed of three parts, namely a main thread, a command scheduler and a command processor. The main thread receives an OpenCL command and stores the command in a command queue; the command dispatcher takes the command from the command queue and sends the command to the command processor; the command handler dispatches the commands to different OpenCL devices for execution. The OpenCL host process started on the main node is responsible for command scheduling of an OpenCL programming framework, a first communication port address of the OpenCL host process is initialized, and the first communication port address of the OpenCL host process is used for the communication between the OpenCL host process and other processes and other slave nodes in the main node.
And step 206, starting the OpenCL-handler process on each slave node and master node which receive the first communication port address, and initializing a second communication port of the OpenCL-handler process, so that the physical node sends a second communication port address of the second communication port to the OpenCL master process according to the first communication port address.
And pulling the OpenCL-handler process on each physical node, wherein the OpenCL-handler process is responsible for command processing of an OpenCL programming framework, and initializing a second communication port address of the OpenCL-handler process, wherein the second communication port address of the OpenCL-handler process is used for communication between the slave node and the master node. And the master node and each slave node send a local second communication port to the OpenCL master process according to the first communication port address.
And step 208, packaging all the received second communication port addresses and first communication port addresses through the OpenCL host process, and sending the global address information obtained by packaging to all the physical nodes, so that the OpenCL-handler process on the physical nodes receives the global address information, and the construction of the software supernode OpenCL programming environment is completed.
After receiving the second communication port addresses sent by all the physical nodes, the OpenCL main process packages the local first communication port addresses and all the received second communication port addresses, and sends the global address information obtained by packaging to all the physical nodes, so that the OpenCL-handler processes on the physical nodes receive the global address information.
In the construction method of the OpenCL programming framework oriented to the software super node, the OpenCL main process is started on the main node of the software super node and a first communication port on the OpenCL main process is initialized, the first communication port is broadcasted to the slave node of the software super node through the daemon on the main node, after the main node and the slave node both acquire the address of the first communication port, the OpenCL-handler process is started on all the physical nodes and the second communication port of the OpenCL-handler process is initialized, each physical node sends the local second communication port to the OpenCL main process according to the first communication port, the OpenCL main process packs all the received second communication port addresses and the first communication port addresses and sends the packed global address information to all the physical nodes so that the OpenCL-handler process on the physical nodes receives the global address information, and completing the construction of the software supernode OpenCL programming environment. By the method, resources on other computing nodes in the array heterogeneous computer system can be accessed by utilizing the OpenCL programming interface, and the use of the software super node is realized.
In one embodiment, the daemon process of the main node comprises a main node shared memory space; the first communication port is a high-speed communication port of the sky river; after the OpenCL host process is started at the master node and a first communication port on the OpenCL host process is initialized, the method further includes: and writing the first communication port address into the main node shared memory space.
When a plurality of software super nodes are arranged in an array heterogeneous computer system, because the message transmission among the software super nodes is realized through an MPI communication programming interface, if the message transmission is also carried out inside the software super nodes by using the MPI communication programming interface, the mutual interference of two layers of MPI environments can be caused, and the message transmission cannot be correctly executed. Therefore, in the present embodiment, a high speed communication interface of the sky river is used to realize message passing in the software supernode.
A shared memory space is maintained by a daemon process in a main node, an OpenCL main process is started on the main node, and after a first communication port on the OpenCL main process is initialized, a first communication port address is written into the shared memory space of the main node, so that the daemon process in the main node can read the first communication port address of the OpenCL main process.
In one embodiment, information in a main node shared memory space is read through a daemon process of the main node to obtain a first communication port address; and acquiring the slave node process number of the daemon process of the slave node in the software supernode, so that the daemon process of the master node broadcasts the first communication port address to the daemon process of the slave node through the Tianhe high-speed communication library interface according to the slave node process number.
The shared memory space maintained by the main node in the software super node is internally provided with the communication port information of all the slave node daemon processes, and the shared memory space maintained by all the slave nodes is internally provided with the communication port information of the main node daemon processes, namely, the main node and the slave nodes in the software super node can communicate with each other.
The communication between the main node daemon process and the slave node daemon process in the software super-junction point is realized through a Tianhe high-speed communication library interface, so that the conflict with the outer-layer MPI environment can be prevented when computer resources across the software super-junction point are accessed.
In one embodiment, after broadcasting the first communication port address to the daemon process of the slave node through the daemon process of the master node, the method further includes: and sending a message for starting the OpenCL-handler process to the daemon process of the slave node through the daemon process of the master node.
And sending a message for starting the OpenCL-handler process to the daemon process of the slave node through the daemon process of the master node, so that the slave node pulls up the OpenCL-handler process after receiving the message for starting the OpenCL-handler process.
In one embodiment, the daemon process of the slave node comprises the step that the slave node shares a memory space; writing the first communication port address into the shared memory space of each slave node through the daemon process of the slave node on each slave node receiving the first communication port address; the first communication port address is received from the node through the interface of the Tianhe high-speed communication library; receiving a message for starting an OpenCL-handler process sent by a daemon process of a main node through the daemon process of a slave node; starting an OpenCL-handler process on the master node and the slave node, and initializing a second communication port corresponding to the OpenCL-handler process; the second communication port is a high-speed communication port of the sky river; reading information in a memory space shared by the master node and the slave node through OpenCL-handler processes of the master node and the slave node to obtain a first communication port address; enabling the OpenCL-handler processes of the master node and the slave node to send a second communication port address of a second communication port to the OpenCL master process according to the first communication port address; the second communication port address is sent by the main node and the slave node through the interface of the sky-river high-speed communication library.
The received first communication port address is written into the maintained shared memory space through the daemon process of the slave node, after the OpenCL-handler process is pulled up by the slave node, the OpenCL-handler process can read the first communication port address from the shared memory space so as to communicate with the OpenCL main process, and the second communication port address corresponding to the OpenCL-handler process is sent to the OpenCL main process. The communication between the OpenCL-handler process and the OpenCL main process is realized through a sky-river high-speed communication library interface, and the conflict with an outer-layer MPI environment can be prevented when cross-software super-node computer resource access is carried out.
In one embodiment, the global address information is a global array; packaging all the received second communication port addresses and first communication port addresses through an OpenCL main process, and packaging the obtained global array so that the OpenCL main process can send the global array to all physical nodes through a Tianhe high-speed communication library interface; and enabling the OpenCL-handler process on the physical node to receive the global array through the Tianhe high-speed communication library interface, and completing the construction of the software supernode OpenCL programming environment.
The global array comprises communication port addresses of all physical nodes in the software supernode.
When the global address information is sent, the communication between the OpenCL main process and the OpenCL-handler process is also realized through a Tianhe high-speed communication library interface, so that the conflict with an outer-layer MPI environment can be prevented when cross-software super-node computer resource access is carried out.
It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 3, there is provided an OpenCL programming framework building apparatus for software supernodes, including: a software supernode module 302, a first communication port address broadcasting module 304, a second communication port address sending module 306 and a global address information establishing module 308, wherein:
a software supernode module 302, configured to obtain a physical node list of the array heterogeneous computer system, and construct a software supernode of the array heterogeneous computer system; the software super-node comprises a main node and a plurality of slave nodes; a daemon is operated on each physical node;
a first communication port address broadcasting module 304, configured to start an OpenCL host process at a master node, initialize a first communication port on the OpenCL host process, and broadcast a first communication port address of the first communication port to a daemon process of a slave node through a daemon process of the master node;
a second communication port address sending module 306, configured to start an OpenCL-handler process on each slave node and master node that receive the first communication port address, and initialize a second communication port of the OpenCL-handler process, so that the physical node sends a second communication port address of the second communication port to the OpenCL master process according to the first communication port address;
the global address information establishing module 308 is configured to package all the received second communication port addresses and first communication port addresses through an OpenCL host process, and send the global address information obtained by the packaging to all the physical nodes, so that the OpenCL-handler process on the physical nodes receives the global address information, and the construction of the software supernode OpenCL programming environment is completed.
The first communication port address broadcasting module 304 is further configured to start an OpenCL host process at the host node, and write a first communication port address into the shared memory space of the host node after initializing the first communication port on the OpenCL host process.
The first communication port address broadcasting module 304 is further configured to read information in the main node shared memory space through a daemon process of the main node to obtain a first communication port address; and acquiring the slave node process number of the daemon process of the slave node in the software supernode, so that the daemon process of the master node broadcasts the first communication port address to the daemon process of the slave node through the Tianhe high-speed communication library interface according to the slave node process number.
The first communication port address broadcasting module 304 is further configured to send a message for starting the OpenCL-handler process to the daemon process of the slave node through the daemon process of the master node after broadcasting the first communication port address to the daemon process of the slave node through the daemon process of the master node.
The second communication port address sending module 306 is further configured to, on each slave node that receives the first communication port address, write the first communication port address into the shared memory space of the slave node through a daemon process of the slave node; the first communication port address is received from the node through the interface of the Tianhe high-speed communication library; receiving a message for starting an OpenCL-handler process sent by a daemon process of a main node by using a Tianhe high-speed communication library interface through the daemon process of a slave node; starting an OpenCL-handler process on the master node and the slave node, and initializing a second communication port corresponding to the OpenCL-handler process; the second communication port is a high-speed communication port of the sky river; reading information in a memory space shared by the master node and the slave node through OpenCL-handler processes of the master node and the slave node to obtain a first communication port address; enabling the OpenCL-handler processes of the master node and the slave node to send a second communication port address of a second communication port to the OpenCL master process according to the first communication port address; the second communication port address is sent by the main node and the slave node through the interface of the sky-river high-speed communication library.
The global address information establishing module 308 is further configured to package all the received second communication port addresses and first communication port addresses through the OpenCL host process, and package the obtained global address information, so that the OpenCL host process sends the global address information to all the physical nodes through the sky-river high-speed communication library interface; and enabling the OpenCL-handler process on the physical node to receive global address information through the Tianhe high-speed communication library interface, and completing construction of the software supernode OpenCL programming environment.
For specific limitations of the OpenCL programming framework construction device for the software supernode, refer to the above limitations on the OpenCL programming framework construction method for the software supernode, which are not described herein again. The modules in the OpenCL programming framework building device for the software supernode can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 4. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to realize the OpenCL programming framework construction method facing the software supernode. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In an embodiment, a computer device is provided, comprising a memory storing a computer program and a processor implementing the steps of the above method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A software supernode-oriented OpenCL programming framework construction method is characterized by comprising the following steps:
acquiring a physical node list of an array heterogeneous computer system, and constructing a software super node of the array heterogeneous computer system; the software super-nodes comprise a main node and a plurality of slave nodes; a daemon is operated on each physical node in the physical node list;
starting an OpenCL host process on the master node, initializing a first communication port on the OpenCL host process, and broadcasting a first communication port address of the first communication port to a daemon process of the slave node through the daemon process of the master node;
starting an OpenCL-handler process on each slave node and each master node which receive the first communication port address, and initializing a second communication port of the OpenCL-handler process, so that the physical node sends a second communication port address of the second communication port to the OpenCL master process according to the first communication port address;
and packaging all the received second communication port addresses and the first communication port addresses through the OpenCL main process, and sending the global address information obtained by packaging to all the physical nodes, so that the OpenCL-handler process on the physical nodes receives the global address information, and the construction of the software supernode OpenCL programming environment is completed.
2. The method according to claim 1, wherein the daemon process of the main node comprises a main node shared memory space; the first communication port is a high-speed communication port of a river;
after an OpenCL host process is started at the host node and a first communication port on the OpenCL host process is initialized, the method further includes:
and writing the first communication port address into the main node shared memory space.
3. The method of claim 2, wherein broadcasting the first communication port address to the daemon of the slave node by the daemon of the master node comprises:
reading information in the main node shared memory space through a daemon process of the main node to obtain the address of the first communication port;
and acquiring the slave node process number of the daemon process of the slave node in the software supernode, so that the daemon process of the master node broadcasts the first communication port address to the daemon process of the slave node through a Tianhe high-speed communication library interface according to the slave node process number.
4. The method of claim 3, further comprising, after broadcasting the first communication port address to the daemon of the slave node by the daemon of the master node:
and sending a message for starting an OpenCL-handler process to the daemon process of the slave node through the daemon process of the master node.
5. The method of claim 4, wherein the daemon of the slave node comprises that the slave node shares a memory space;
starting an OpenCL-handler process on each slave node and the master node that receive the first communication port address, and initializing a second communication port of the OpenCL-handler process, so that the physical node sends a second communication port address of the second communication port to the OpenCL master process according to the first communication port address, including:
writing the first communication port address into a shared memory space of the slave node through a daemon process of the slave node on each slave node which receives the first communication port address; the first communication port address is received by the slave node through a sky and river high-speed communication library interface;
receiving a message for starting an OpenCL-handler process sent by the daemon process of the main node by the daemon process of the slave node through a Tianhe high-speed communication library interface;
starting an OpenCL-handler process on the master node and the slave node, and initializing a second communication port corresponding to the OpenCL-handler process; the second communication port is a high-speed communication port of the sky river;
reading information in a memory space shared by the master node and the slave node through OpenCL-handler processes of the master node and the slave node to obtain the address of the first communication port;
enabling OpenCL-handler processes of the master node and the slave node to send a second communication port address of the second communication port to the OpenCL master process according to the first communication port address; and the second communication port address is sent by the master node and the slave node through a high-speed communication library interface of the sky river.
6. The method according to claim 5, wherein the OpenCL host process packages all the received second communication port addresses and the first communication port addresses, and sends global address information obtained by the packaging to all the physical nodes, so that the OpenCL-handler process on the physical nodes receives the global address information, and completes construction of the software supernode OpenCL programming environment, including:
packaging all the received second communication port addresses and the first communication port addresses through the OpenCL host process, and packaging the obtained global address information so that the OpenCL host process sends the global address information to all the physical nodes through a high-speed communication library interface of a sky river;
and enabling the OpenCL-handler process on the physical node to receive the global address information through a Tianhe high-speed communication library interface, and completing the construction of the software supernode OpenCL programming environment.
7. The method according to any one of claims 1 to 6, wherein the global address information is a global array.
8. An OpenCL programming framework building device for software supernodes, which is characterized by comprising:
the software super-junction point module is used for acquiring a physical node list of the array heterogeneous computer system and constructing a software super-junction of the array heterogeneous computer system; the software super-nodes comprise a main node and a plurality of slave nodes; a daemon is operated on each physical node;
a first communication port address broadcasting module, configured to start an OpenCL host process at the master node, initialize a first communication port on the OpenCL host process, and broadcast a first communication port address of the first communication port to a daemon process of the slave node through the daemon process of the master node;
a second communication port address sending module, configured to start an OpenCL-handler process on each slave node and the master node that receive the first communication port address, and initialize a second communication port of the OpenCL-handler process, so that the physical node sends a second communication port address of the second communication port to the OpenCL master process according to the first communication port address;
and the global address information establishing module is used for packaging all the received second communication port addresses and the first communication port addresses through the OpenCL master process, and sending the global address information obtained by packaging to all the physical nodes, so that the OpenCL-handler process on the physical nodes receives the global address information, and the construction of the software supernode OpenCL programming environment is completed.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202011398621.1A 2020-12-04 2020-12-04 Software supernode-oriented OpenCL programming framework construction method and device Active CN112433869B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011398621.1A CN112433869B (en) 2020-12-04 2020-12-04 Software supernode-oriented OpenCL programming framework construction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011398621.1A CN112433869B (en) 2020-12-04 2020-12-04 Software supernode-oriented OpenCL programming framework construction method and device

Publications (2)

Publication Number Publication Date
CN112433869A CN112433869A (en) 2021-03-02
CN112433869B true CN112433869B (en) 2021-07-09

Family

ID=74691821

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011398621.1A Active CN112433869B (en) 2020-12-04 2020-12-04 Software supernode-oriented OpenCL programming framework construction method and device

Country Status (1)

Country Link
CN (1) CN112433869B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111459871A (en) * 2020-04-01 2020-07-28 济南浪潮高新科技投资发展有限公司 FPGA heterogeneous computation based block chain acceleration system and method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101332840B1 (en) * 2012-01-05 2013-11-27 서울대학교산학협력단 Cluster system, Host node, Computing node, and application execution method based on parallel computing framework

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111459871A (en) * 2020-04-01 2020-07-28 济南浪潮高新科技投资发展有限公司 FPGA heterogeneous computation based block chain acceleration system and method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《programming for scientific computing on peta-scale heterogeneous parallel systems》;Yang Can-qun等;《Journal of Central South University》;20130531;1189-1203 *
《异构系统编程方法综述》;唐滔,杨学军;《计算机工程与科学》;20120331;第34卷(第3期);29-34 *

Also Published As

Publication number Publication date
CN112433869A (en) 2021-03-02

Similar Documents

Publication Publication Date Title
US7738443B2 (en) Asynchronous broadcast for ordered delivery between compute nodes in a parallel computing system where packet header space is limited
CN108062252B (en) Information interaction method, object management method, device and system
US8381230B2 (en) Message passing with queues and channels
US7925842B2 (en) Allocating a global shared memory
CN109669772B (en) Parallel execution method and equipment of computational graph
US20090153897A1 (en) Method, System and Program Product for Reserving a Global Address Space
JP5826471B2 (en) Autonomous subsystem architecture
JP5658509B2 (en) Autonomous memory architecture
US8943516B2 (en) Mechanism for optimized intra-die inter-nodelet messaging communication
CN101216781B (en) Multiprocessor system, device and method
WO2019028682A1 (en) Multi-system shared memory management method and device
CN113760543A (en) Resource management method and device, electronic equipment and computer readable storage medium
CN113886058A (en) Cross-cluster resource scheduling method and device
CN105677491A (en) Method and device for transmitting data
US8543722B2 (en) Message passing with queues and channels
CN112506676B (en) Inter-process data transmission method, computer device and storage medium
CN112433869B (en) Software supernode-oriented OpenCL programming framework construction method and device
CN111857972A (en) Deployment method, deployment device and deployment equipment of virtual network function VNF
CN116010093A (en) Data processing method, apparatus, computer device and readable storage medium
CN111338998B (en) FLASH access processing method and device based on AMP system
JP2018120307A (en) Accelerator processing management apparatus, host apparatus, accelerator processing execution system, method and program
US10846246B2 (en) Trans-fabric instruction set for a communication fabric
CN109240602B (en) Data access method
GB2483884A (en) Parallel processing system using dual port memories to communicate between each processor and the public memory bus
CN112199326B (en) Method and device for dynamically constructing software supernodes on array heterogeneous computing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant