CN113391917A - Multi-machine heterogeneous parallel computing method and device for geophysical prospecting application - Google Patents

Multi-machine heterogeneous parallel computing method and device for geophysical prospecting application Download PDF

Info

Publication number
CN113391917A
CN113391917A CN202010173738.3A CN202010173738A CN113391917A CN 113391917 A CN113391917 A CN 113391917A CN 202010173738 A CN202010173738 A CN 202010173738A CN 113391917 A CN113391917 A CN 113391917A
Authority
CN
China
Prior art keywords
computing
heterogeneous
task
node
machine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010173738.3A
Other languages
Chinese (zh)
Other versions
CN113391917B (en
Inventor
潘英杰
何宝庆
何永清
罗开云
杜清波
皮红梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China National Petroleum Corp
BGP Inc
Original Assignee
China National Petroleum Corp
BGP Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China National Petroleum Corp, BGP Inc filed Critical China National Petroleum Corp
Priority to CN202010173738.3A priority Critical patent/CN113391917B/en
Publication of CN113391917A publication Critical patent/CN113391917A/en
Application granted granted Critical
Publication of CN113391917B publication Critical patent/CN113391917B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/542Event management; Broadcasting; Multicasting; Notifications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/02Agriculture; Fishing; Forestry; Mining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Marine Sciences & Fisheries (AREA)
  • Economics (AREA)
  • Animal Husbandry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mining & Mineral Resources (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Agronomy & Crop Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Geophysics And Detection Of Objects (AREA)

Abstract

The invention provides a multi-machine heterogeneous parallel computing method and a device for geophysical prospecting application, wherein the method comprises the following steps: the user node sends a heterogeneous computing resource searching command to the management node; the command includes lookup parameter information; the management node broadcasts the command to each computing node; each computing node generates and starts a plurality of computing task heterogeneous execution ends according to the searched heterogeneous computing resource condition and the searched parameter information, and feeds back resource information to the management node; the management node sends the fed back resource information to the user node; the user node screens the fed back resource information and sends the selected resource information to the management node; the management node sends a selected confirmation message to each computing node according to the selected resource information; and each computing node determines a computing task heterogeneous execution end according to the confirmation message to perform multi-machine parallel computing. The technical scheme gives play to the performance of all heterogeneous computing resources in the computing node, and realizes load balancing and high-performance computing of operation.

Description

Multi-machine heterogeneous parallel computing method and device for geophysical prospecting application
Technical Field
The invention relates to the technical field of petroleum geophysical prospecting, in particular to a multi-machine heterogeneous parallel computing method and device for geophysical prospecting application.
Background
In oil and seismic exploration, a large number of business requirements and algorithms which need high-performance computation, such as two-dimensional model forward modeling, two-dimensional model illumination and the like, are needed, in order to fully exert the computation performance of a single machine, the geophysical prospecting applications usually utilize a CPU and a GPU to respectively compute according to cannons on the single machine, utilize technologies such as multithreading and OpenCL to fully exert the performance of all hardware resources, and generally encapsulate the whole forward modeling, illumination and other business algorithms or computation processes into independent dynamic libraries or independent processes for convenient use, and use an external program to call library interfaces or call the processes to perform computation. Because the construction area is larger and larger at present, the number of excitation points, namely the number of cannons, in a work area is more and more, and the realization of model forward modeling and model illumination calculation of the whole work area by using a single machine becomes more and more difficult, how to utilize the existing multi-machine heterogeneous resources and how to conveniently and quickly call the existing program or algorithm library to complete the calculation of model illumination, model forward modeling and the like of the whole work area becomes the key of multi-machine parallel calculation.
In the process of realizing single-computer high-performance calculation, hardware, methods and strategies used by different algorithms and applications are different, heterogeneous non-cooperative calculation of a CPU and a GPU is adopted in some cases, heterogeneous cooperative calculation of the CPU and the GPU is adopted in some cases, and therefore high-performance calculation of geophysical prospecting application operation cannot be met.
For the current multi-computer parallel computation, one physical computer is usually used as a computation node, and all computation resources of the physical computer are used for a computation program, but when the multi-computer heterogeneous parallel computation is performed on the physical computer, if task allocation is performed by using a minimum task unit, since a plurality of heterogeneous devices cannot be used for coordinated computation, only one computation device can be used for computation, all computation resource performances in the computation node cannot be fully played, and the use of the heterogeneous computation resources is unreasonable; if multiple guns are distributed to be used as calculation task units for calculation, the phenomenon that the GPU completes calculation quickly and then waits for the CPU to complete calculation can occur due to the difference of calculation performance among different devices, so that the load of the calculation device is uneven, and meanwhile, the phenomenon that the load of the calculation device in the calculation node is uneven and idle waiting occurs can be eliminated by using a debugging algorithm and difficultly accurately dividing proper gun numbers for different calculation nodes.
In view of the above technical problems, no effective solution has been proposed at present.
Disclosure of Invention
The embodiment of the invention provides a multi-machine heterogeneous parallel computing method for geophysical application, which is used for fully exerting the performance of all heterogeneous computing resources in a computing node and realizing load balancing and high-performance computing of operation, and comprises the following steps:
before seismic exploration multi-machine parallel computation is carried out, a user node sends a heterogeneous computing resource searching command to a management node; the heterogeneous computing resource searching command comprises searching parameter information;
the management node broadcasts a heterogeneous computing resource searching command to each computing node;
after each computing node receives a heterogeneous computing resource searching command, hardware scanning is carried out on a physical machine where the computing node is located, a plurality of computing task heterogeneous execution ends are generated and started according to the searched heterogeneous computing resource condition and the searching parameter information, and after the plurality of computing task heterogeneous execution ends are started, heterogeneous computing resource information is fed back to a management node; each computing task heterogeneous execution end corresponds to one type of heterogeneous computing resource;
the management node sends heterogeneous computing resource information fed back by each computing node to the user node;
the user node performs screening processing on heterogeneous computing resource information fed back by each computing node by using preset screening conditions to obtain selected heterogeneous computing resource information, and sends the selected heterogeneous computing resource information to the management node;
the management node sends a heterogeneous computing resource use confirmation message to each computing node according to the selected heterogeneous computing resource information;
and each computing node determines a computing task heterogeneous execution end participating in the multi-machine parallel computing according to the heterogeneous computing resource use confirmation message, and the determined computing task heterogeneous execution end performs the seismic exploration multi-machine parallel computing.
The embodiment of the invention also provides a multi-machine heterogeneous parallel computing device for geophysical prospecting application, which is used for fully exerting the performance of all heterogeneous computing resources in a computing node and realizing load balancing and high-performance computing of operation, and comprises the following steps:
the user node is used for sending a heterogeneous computing resource searching command to the management node before seismic exploration multi-machine parallel computing is carried out; the heterogeneous computing resource searching command comprises searching parameter information; screening heterogeneous computing resource information fed back by each computing node by using preset screening conditions to obtain selected heterogeneous computing resource information, and sending the selected heterogeneous computing resource information to a management node;
the management node is used for broadcasting the heterogeneous computing resource searching command to each computing node; sending heterogeneous computing resource information fed back by each computing node to the user node; sending a heterogeneous computing resource use confirmation message to each computing node according to the selected heterogeneous computing resource information;
each computing node is used for carrying out hardware scanning on a physical machine where the computing node is located after receiving a heterogeneous computing resource searching command, generating and starting a plurality of computing task heterogeneous execution ends according to the searched heterogeneous computing resource condition and the searching parameter information, and feeding back heterogeneous computing resource information to the management node after the plurality of computing task heterogeneous execution ends are started; each computing task heterogeneous execution end corresponds to one type of heterogeneous computing resource; and determining a computation task heterogeneous execution end participating in the multi-machine parallel computation according to the heterogeneous computation resource use confirmation message, and performing the seismic exploration multi-machine parallel computation by the determined computation task heterogeneous execution end.
The embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the multi-machine heterogeneous parallel computing method for geophysical prospecting application when executing the computer program.
The embodiment of the invention also provides a computer readable storage medium, which stores a computer program for executing the multi-machine heterogeneous parallel computing method for geophysical prospecting application.
The technical scheme provided by the embodiment of the invention comprises the following steps: before seismic exploration multi-machine parallel computation is carried out, a user node sends a heterogeneous computing resource searching command to a management node; the heterogeneous computing resource searching command comprises searching parameter information; the management node broadcasts a heterogeneous computing resource searching command to each computing node; after each computing node receives a heterogeneous computing resource searching command, hardware scanning is carried out on a physical machine where the computing node is located, a plurality of computing task heterogeneous execution ends are generated and started according to the searched heterogeneous computing resource condition and the searching parameter information, and after the plurality of computing task heterogeneous execution ends are started, heterogeneous computing resource information is fed back to a management node; each computing task heterogeneous execution end corresponds to one type of heterogeneous computing resource; the management node sends heterogeneous computing resource information fed back by each computing node to the user node; the user node performs screening processing on heterogeneous computing resource information fed back by each computing node by using preset screening conditions to obtain selected heterogeneous computing resource information, and sends the selected heterogeneous computing resource information to the management node; the management node sends a heterogeneous computing resource use confirmation message to each computing node according to the selected heterogeneous computing resource information; each computing node determines a computing task heterogeneous execution end participating in multi-computer parallel computing according to the heterogeneous computing resource use confirmation message, the determined computing task heterogeneous execution end performs seismic exploration multi-computer parallel computing, a plurality of computing task heterogeneous execution ends aiming at different computing devices are automatically generated in a single computer, the different heterogeneous computing devices are respectively used for performing multi-computer heterogeneous parallel task computing, the technical scheme can meet the requirements of heterogeneous cooperative computing and heterogeneous non-cooperative computing in the single computer, the performance of all heterogeneous computing resources in the computing node is fully exerted, and load balancing and high-performance computing of operation are achieved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a multi-machine heterogeneous parallel computing method for geophysical application according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating selected heterogeneous computing resource information in an embodiment of the invention;
FIG. 3 is a schematic structural diagram of a compute node in an embodiment of the present invention;
FIG. 4 is a schematic diagram of a compute node in an embodiment of the invention;
FIG. 5 is a flowchart illustrating a multi-machine heterogeneous job based on autonomous execution of computing tasks according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a multi-machine heterogeneous parallel computing device for geophysical prospecting applications according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Aiming at different requirements of multi-machine parallel of different geophysical applications at present, the requirements of multi-machine parallel computing support of different algorithm flows, heterogeneous computing resource performance exertion, load balancing and operation high-performance computing are realized by utilizing the existing geophysical algorithm programs or geophysical algorithm libraries as much as possible. This scheme is described in detail below.
Fig. 1 is a schematic flow chart of a multi-machine heterogeneous parallel computing method for geophysical application in the embodiment of the present invention, and as shown in fig. 1, the method includes the following steps:
step 101: before seismic exploration multi-machine parallel computation is carried out, a user node sends a heterogeneous computing resource searching command to a management node; the heterogeneous computing resource searching command comprises searching parameter information;
step 102: the management node broadcasts a heterogeneous computing resource searching command to each computing node;
step 103: after each computing node receives a heterogeneous computing resource searching command, hardware scanning is carried out on a physical machine where the computing node is located, a plurality of computing task heterogeneous execution ends are generated and started according to the searched heterogeneous computing resource condition and the searching parameter information, and after the plurality of computing task heterogeneous execution ends are started, heterogeneous computing resource information is fed back to a management node; each computing task heterogeneous execution end corresponds to one type of heterogeneous computing resource;
step 104: the management node sends heterogeneous computing resource information fed back by each computing node to the user node;
step 105: the user node performs screening processing on heterogeneous computing resource information fed back by each computing node by using preset screening conditions to obtain selected heterogeneous computing resource information, and sends the selected heterogeneous computing resource information to the management node;
step 106: the management node sends a heterogeneous computing resource use confirmation message to each computing node according to the selected heterogeneous computing resource information;
step 107: and each computing node determines a computing task heterogeneous execution end participating in the multi-machine parallel computing according to the heterogeneous computing resource use confirmation message, and the determined computing task heterogeneous execution end performs the seismic exploration multi-machine parallel computing.
The technical scheme provided by the embodiment of the invention realizes the automatic generation of a plurality of computing task heterogeneous execution ends aiming at different computing devices in a single computer, and the multi-computer heterogeneous parallel task computing is carried out by respectively utilizing different heterogeneous computing devices.
In a specific embodiment, the command in the embodiment of the present invention may include a message command, a plug-in command, and the like. Wherein:
the message command is mainly used for message communication between nodes, and the message command includes a command name, a command parameter item, and a parameter, such as a find heterogeneous computing resource command, FindExecutror-GPUClient-true-DriverVersion > -1.0-GPUMem > -1G. Where FindExecutror is a command, -is a command parameter followed by the specific parameter content.
The plug-in command is encapsulation of a function, a statement of the plug-in command is in a configuration file of the plug-in, the function of the plug-in command is implemented in the plug-in, for example, cmd _ calculated $ TASKDATA, the cmd _ calculated command is a command of a plug-in calculation function, and a command parameter $ TASKDATA represents task data, the statements are implemented in the plug-in configuration file, the function is implemented in a corresponding plug-in, a parallel platform provides some common basic plug-in commands such as data transmission, compression and the like, a parallel calculation plug-in of a service provides plug-in commands for specific applications such as illumination calculation, task result processing and the like, and the plug-in commands can be called in an algorithm sequence in a task flow configuration of automatically running parallel heterogeneous calculation so as to implement customization of a complex geophysical algorithm flow, see introduction of the following embodiment.
The steps involved in the embodiments of the present invention are described in detail below with reference to fig. 2 to 5.
First, the multi-machine heterogeneous parallel computing method provided in the embodiment of the present invention is a multi-machine heterogeneous resource usage method for a single-machine multi-heterogeneous device collaborative or non-collaborative computing program.
The inventors have found a technical problem: in the forward modeling calculation process of a model (geological model required to be used in geophysical model forward modeling calculation), calculation needs to be performed by different heterogeneous devices according to cannons, the cannons are the minimum division units of calculation tasks, model forward modeling calculation is performed by multithreading according to cannons in a CPU, model forward modeling calculation is performed by OpenCL or CUDA according to cannons on a GPU, and cooperative calculation is not performed between the independent calculation devices, namely, the CPU and the GPU are not used for completing the model forward modeling calculation of one cannon together. Because the difference of the computing performance between the CPU and the GPU and between different types of GPUs is huge, for example, when the CPU completes the forward calculation of one gun, the GPU completes the calculation of several guns and dozens of guns. When multi-machine parallel computing is carried out, one physical machine is usually used as a computing node, all computing resources of the physical machine are used for a computing program, but when multi-machine heterogeneous parallel computing is carried out on the physical machine, if task allocation is carried out by using a minimum task unit cannon, since various heterogeneous devices cannot be used for coordinating and computing, only one computing device can be used for computing, all computing resource performances in the computing node cannot be fully played, and the use of heterogeneous computing resources is unreasonable; if multiple guns are distributed to be used as calculation task units for calculation, due to the difference of calculation performance among different devices, the phenomenon that a GPU completes calculation quickly and then waits for a CPU to complete calculation occurs, so that the load of calculation devices is uneven, and meanwhile, the phenomenon that the load of the calculation devices in the calculation nodes is uneven and idle waiting occurs is eliminated by using a debugging algorithm to difficultly accurately divide proper gun numbers for different calculation nodes.
Meanwhile, in the process of performing multi-computer parallel computing, a default computing task execution end needs to be installed in a physical computer participating in computing, the physical computer is used as a computing node to be added into the multi-computer parallel computing, and the computing task execution end provides basic network communication, data transmission, computing resource management, task operation environment and the like for the multi-computer parallel computing. And the overall use mode of the resources cannot effectively support the multi-computer heterogeneous computing requirement of users.
The invention provides a multi-machine heterogeneous parallel computing method aiming at geophysical prospecting application, which automatically generates a plurality of computing task heterogeneous execution ends aiming at different computing devices in a single machine according to requirements and respectively utilizes different heterogeneous devices to perform task computing.
In one implementation, after receiving the heterogeneous computing resource search command, each computing node performs hardware scanning on the physical machine where the computing node is located, generates and starts a plurality of computing task heterogeneous execution ends according to the searched heterogeneous computing resource condition and the search parameter information, and after the plurality of computing task heterogeneous execution ends are started, feeds back heterogeneous computing resource information to the management node, which may include:
after each computing node receives a heterogeneous computing resource searching command, hardware scanning is carried out on a physical machine where the computing node is located; each computing node copies the same-level directory where the default computing task heterogeneous execution end program of the computing node is located to obtain a plurality of computing task heterogeneous execution ends according to the searched heterogeneous computing resource condition and the searched parameter information;
the default computing task heterogeneous execution end of each computing node respectively starts each computing task heterogeneous execution end obtained by copying;
and after the copied computing task heterogeneous execution ends are started, each computing node feeds back heterogeneous computing resource information to the management node.
In specific implementation, a plurality of computing task heterogeneous execution ends aiming at different computing devices are automatically generated in a single computer, and one computing task heterogeneous execution end corresponds to one type of heterogeneous device, so that an original physical computer is divided into a plurality of abstract computing nodes according to the type and the number of the computing devices, and different heterogeneous computing devices are respectively utilized to perform multi-computer heterogeneous parallel task computing, the requirements of heterogeneous cooperative computing and heterogeneous non-cooperative computing in the single computer are met, the performance of all heterogeneous computing resources in the computing nodes is fully exerted, and load balancing and high-performance computing of operation are realized.
In an embodiment, the method for multi-machine heterogeneous parallel computing for geophysical exploration application may further include: and each computing node closes the computing task heterogeneous execution end obtained by copying when determining that the computing task heterogeneous execution end obtained by copying is not selected according to the heterogeneous computing resource use confirmation message, and sets the default computing task heterogeneous execution end to be in a free state when determining that the default computing task heterogeneous execution end is not selected.
In specific implementation, the processing flow of each computing node after receiving the resource use confirmation message ensures stable multi-machine heterogeneous parallel computing for geophysical prospecting application.
Specifically, the embodiment of the present invention fully utilizes the performance of all heterogeneous computing resources in a computing node, and as shown in fig. 2, the method for implementing load balancing and high-performance computing processing of jobs has the following flow:
1. when a user carries out multi-machine parallel computing at a user node, the user node needs to search and screen available heterogeneous computing resources, and when the user node searches the computing resources, the user node sends a heterogeneous computing resource searching command added with additional searching parameter information (searching parameter information) to a management node.
2. When receiving a command for searching for heterogeneous computing resources (heterogeneous computing resource searching command) sent by a user, the management node broadcasts the command to each computing node.
3. The computing node is generally a physical machine or a cluster node on which a computing task executing end (such as the computing task executing end in fig. 2, the executing end may also be referred to as a default computing task heterogeneous executing end) is installed, where the default installed computing task executing end (a default computing task heterogeneous executing end, whose structure may be shown in fig. 3) represents a physical machine or a physical node on which the computing task executing end is located, and after receiving the heterogeneous computing resource search command, a heterogeneous device detection module (a module of the default computing task heterogeneous executing end, shown in fig. 3) is used to perform heterogeneous device detection, and when detecting available computing resources such as a GPU, the heterogeneous computing task heterogeneous executing end generating and starting module (shown in fig. 3) included in the default computing task heterogeneous executing end may be used to perform the following operations: according to the parameter requirements (searching parameter information), copying multiple computing task execution end programs (the computing task heterogeneous execution end obtained by copying, such as the computing task heterogeneous execution end shown in fig. 2) on a peer directory where the computing task execution end program (the computing task heterogeneous execution end which is obtained by copying) is located, wherein the programs respectively correspond to different heterogeneous devices, and marks and differences are added in configuration files and starting parameters of the computing task execution end program. After the computing task heterogeneous execution end program is generated, the default computing task execution end respectively starts each computing task heterogeneous execution end, and one computing task heterogeneous execution end corresponds to one type of heterogeneous equipment, so that an original physical computer is split into a plurality of abstract computing nodes according to the type and the number of the computing equipment. And after the starting of each computing task execution end is finished, sending a registration message to the management node.
4. And the management node collects the information of the computing nodes and then sends the information to the user node.
5. The user node analyzes the heterogeneous computing resource information fed back by each computing node, and utilizes the screening conditions to screen the computing resources, because the GPU heterogeneous computing has a plurality of requirements on hardware resources, driving versions and the like, the screening conditions are utilized to screen the available computing resources, delete the computing resources which do not meet the requirements and are not used, form a computing resource list and feed the computing resource list to the management node.
6. And after receiving the computing resource list selected by the user, the management node sends a confirmation message to each computing node.
7. After receiving the confirmation message, each computing node performs next processing according to the contents of the confirmation message, if a certain computing task heterogeneous execution end (including a default computing task execution end and a copied computing task heterogeneous execution end) of the computing node is selected by a user, the computing task heterogeneous execution end starts to participate in seismic exploration operation calculation, and if the computing task heterogeneous execution end is removed, the computing task heterogeneous execution end is closed; if the default computing task execution end (the default computing task heterogeneous execution end) is removed, the default computing task execution end is set to be in a free state.
Secondly, the multi-machine heterogeneous parallel computing method provided by the embodiment of the invention is a processing method for independently executing computing tasks.
In an embodiment, the method for multi-machine heterogeneous parallel computing for geophysical exploration application may further include:
the user node divides the job into calculation tasks according to the selected heterogeneous calculation resource information, sets a calculation task flow configuration file, and performs job submission and starting operation after the calculation task flow configuration file is set; the computing task flow configuration file comprises a computing task processing flow;
after receiving a trigger instruction of job submission and starting operation, the management node sends the calculation task flow configuration file to each calculation node;
each computing node determines a computing task heterogeneous execution end participating in multi-machine parallel computing according to the heterogeneous computing resource use confirmation message, and the determined computing task heterogeneous execution end performs seismic exploration multi-machine parallel computing, and the method comprises the following steps:
and the computing task heterogeneous execution end determined in each computing node processes the seismic exploration computing task according to the computing task processing flow.
In specific implementation, after the user node selects heterogeneous computing resources, the user node can automatically execute seismic exploration computing task processing according to a preset computing task flow configuration file, and the efficiency of multi-machine heterogeneous parallel computing processing for geophysical prospecting application is improved.
In one embodiment, the processing of the seismic exploration computing task by the computing task heterogeneous execution end determined in each computing node according to the computing task processing flow may include:
the computing task heterogeneous execution end loads a corresponding computing task plug-in according to the computing task processing flow; the computing task processing flow comprises a pre-defined computing task plug-in, a calling sequence and a calling mode thereof;
and the computation task heterogeneous execution end calls a corresponding geophysical prospecting algorithm from a geophysical prospecting service algorithm library established in advance through the loaded computation task plug-in to complete the processing of the seismic exploration computation task.
When the method is specifically implemented, the corresponding calculation task plug-in is loaded through the calculation task processing flow, then the corresponding geophysical prospecting algorithm is called from the geophysical prospecting service algorithm library established in advance, the seismic exploration calculation task processing is completed, the seismic exploration calculation task processing can be completed by calling the existing geophysical prospecting algorithm or program, and the development manpower and material resources cost and the system transformation cost of system software are saved.
The processing method for autonomous execution of the computation task is described in detail below with reference to fig. 4 and 5.
In the specific implementation, the forward modeling is realized by an independent console program, the model illumination is realized by calling an independent algorithm library, the development of these programs and algorithms (algorithms may refer to specific geophysical algorithms, such as forward model illumination calculation, elastic wave illumination calculation, gaussian illumination calculation, which all belong to different geophysical exploration businesses) is based on stand-alone development, these programs and algorithms have similar processing procedures, for example, preparing data (data refers to data which needs different data for realizing the algorithm functions and includes model file data, observation system file data, parameter file data and the like), setting calculation parameters (algorithm parameters are parameters required by a specific algorithm, such as model range parameters, calculated shot point range parameters, target stratum depth parameters and the like), calculating one by one according to shots, processing results and the like, but the calling details, the calling sequence and the like have certain differences. How to realize multi-machine parallel computation without changing the current program and algorithm library as much as possible is one of the problems to be solved by the present invention, and can be realized by the computation task realization unit in fig. 4. The invention provides a method for autonomously processing tasks of a computing node, which utilizes a method of a computing task plug-in and a task flow (computing task processing flow) configuration to call the existing library and program to realize multi-machine parallel computing of different geophysical applications, wherein the computing task plug-in realizes the calling of different algorithm interfaces and program functions, and utilizes the task flow configuration to realize the customization and adjustment of processing procedures in different business algorithms.
In specific implementation, as shown in fig. 4, a computing task plug-in (which may also be referred to as a geophysical algorithm plug-in, as shown in fig. 4) implements calling of different algorithm interfaces and application functions in a plug-in command interface manner, a function interface name is defined in a plug-in configuration file of the computing task plug-in, plug-in command parsing and interface calling are implemented in a command parsing module and a plug-in running module in a computing task execution end, calling of function interfaces with different names and different parameters can be implemented by using the plug-in interface definition and parsing calling manner, and encapsulation and calling of function interfaces such as data preparation, parameter setting, calculation, result processing and the like in an algorithm library (such as the geophysical algorithm library in fig. 4) or an application (such as the geophysical algorithm program in fig. 4) can be implemented by defining and implementing different plug-in interfaces. The command parsing module of fig. 4 may also be used to parse heterogeneous computing resource lookup commands.
During specific implementation, the calculation process of the geophysical prospecting algorithms such as forward modeling or model illumination and the like by the calculation node is realized according to a certain algorithm flow, such as data preparation, calculation, result feedback and the like, the processing steps of different types of geophysical prospecting algorithms are different greatly, and data, parameters and the like are different. Meanwhile, the implementation methods are different, and some of the implementation methods are calling an algorithm library, some are calling a process, and some are calling other plug-ins and the like. How to directly realize the calling process of the different geophysical prospecting algorithms on the existing multi-machine parallel platform in the multi-machine parallel process is one of the invention points of the invention. The method comprises the steps of configuring a task flow (a calculation task processing flow) of a calculation node, and predefining calculation task plug-ins, calling sequences and calling modes thereof, namely arranging plug-in interfaces and corresponding parameters in plug-in configuration files of the calculation task plug-ins according to a specified format and sequence, and realizing the control of calculation processes of different algorithms by defining the calling sequences and calling modes of the plug-in interfaces in the task flow configuration.
In specific implementation, as shown in fig. 4, a plug-in running module and a task flow analysis and execution module are arranged in a computing task execution end (computing task heterogeneous execution end). The plug-in running module realizes functions of plug-in loading, plug-in interface analysis, plug-in function calling and the like, and the task flow analysis and execution module in fig. 4 is used for realizing functions of task flow analysis, task flow execution and the like. In order to provide complex process control, some basic process control primitives, such as judgment, circulation, synchronization, etc., are implemented in the task flow configuration. The common multi-machine parallel computing functions such as file transmission, compression, decompression and the like are built in as internal plug-in commands and are directly called by plug-ins. The two modules are jointly used to realize the computing task execution method which takes the plug-in interface as a calling unit, the task flow as an execution sequence and the plug-in and plug-in functions as a realization carrier. In addition, the network communication module in fig. 3 may be used for the computing node to communicate with the management node. The flow analysis module in fig. 3 corresponds to the task flow analysis and execution module in fig. 4.
In specific implementation, as shown in fig. 5, in order to implement the call of different algorithms or programs at the computation task execution end, a task flow analysis and execution module is first used to read, analyze and execute a task flow, the task flow needs to be configured separately according to the algorithm characteristics of different applications, and generally includes the steps of inputting input data, inputting application program plug-ins and the like, loading application program plug-ins, preprocessing data, requesting computation tasks, executing computations, returning task results, unloading application program plug-ins and the like, and each step calls a relevant plug-in interface command to transmit corresponding interface parameters. The method can realize the calling of different business algorithm functions such as forward modeling of the two-dimensional model, model illumination and the like through the computing task plug-in, can also call the algorithm program of the two-dimensional model illumination according to a process mode to directly carry out computation, reduces the adjustment and the change of the original algorithm and program to the maximum extent, directly transplants the algorithm and program to a multi-machine platform as far as possible to carry out multi-machine heterogeneous parallel computation, and can automatically realize the execution of different algorithm tasks by calling the computing task plug-in according to a preset computing task flow.
Thirdly, an exception handling method in the multi-machine heterogeneous parallel computing process provided by the embodiment of the invention is introduced.
In an embodiment, the method for multi-machine heterogeneous parallel computing for geophysical exploration application may further include:
when the management node is detected to be abnormal, the user node detects the received calculation task results, combines the completed calculation task results and gives an uncompleted calculation task list.
In specific implementation, the exception handling implementation mode in the multi-machine heterogeneous parallel computing process guarantees the system stability of seismic exploration multi-machine heterogeneous parallel computing processing and guarantees the accuracy of results.
In an embodiment, the method for multi-machine heterogeneous parallel computing for geophysical exploration application may further include:
when the user node is detected to be abnormal, the user node performs program restart processing, detects the content in the job result directory, performs merging processing on the completed calculation task results, and gives an uncompleted calculation task list.
In specific implementation, the exception handling implementation mode in the multi-machine heterogeneous parallel computing process guarantees the system stability of seismic exploration multi-machine heterogeneous parallel computing processing and guarantees the accuracy of results.
In an embodiment, the method for multi-machine heterogeneous parallel computing for geophysical exploration application may further include:
when the computing nodes are detected to be abnormal, the management node redistributes the computing tasks of the abnormal computing nodes to other computing nodes.
In specific implementation, the exception handling implementation mode in the multi-machine heterogeneous parallel computing process guarantees the system stability of seismic exploration multi-machine heterogeneous parallel computing processing and guarantees the accuracy of results.
In specific implementation, various abnormal phenomena, such as computing node abnormality, management node abnormality and user node abnormality, can occur in the multi-machine parallel process, and different abnormal conditions are processed in different modes. The detection of the anomaly is completed through network connection between the nodes and a heartbeat thread. And if the management node is abnormal, the user node detects the received task results, combines the completed calculation task results and gives an uncompleted calculation task list. And if the user node is abnormal, restarting a user node program, automatically detecting the content in the operation result directory, merging the completed calculation task results, and simultaneously giving an uncompleted calculation task list. When the computing node is abnormal, the management node redistributes the computing task computed by the node to other nodes for computing, and stable operation of multi-computer heterogeneous parallel computing for geophysical prospecting application is guaranteed.
Fourthly, the multi-machine heterogeneous parallel computing method provided by the embodiment of the invention is introduced in a whole.
In specific implementation, in order to implement multi-machine heterogeneous parallel computation for different geophysical applications such as forward modeling, an embodiment of the present invention provides a multi-machine heterogeneous parallel computation processing flow and method based on autonomous execution of computation tasks, which mainly include:
firstly, preparing operation data: preparing a calculation task flow configuration file, a calculation task plug-in and a configuration file thereof, a dynamic library program (a geophysical prospecting service algorithm library, which can comprise a geophysical prospecting algorithm and a geophysical prospecting algorithm program) and the like, and preparing input data required by multi-machine operation calculation such as forward modeling, model illumination and the like. The computing task flow configuration file needs to correspond to the computing task plug-in and the configuration file thereof, and the calling sequence of the plug-in function interface, namely the execution sequence and the steps of the computing task, is defined in the computing task flow configuration file according to the algorithm flow.
And (2) heterogeneous computing resource screening: the method comprises the steps that required computing resources need to be screened before multi-computer heterogeneous parallel computing is carried out, heterogeneous computing resource searching commands are sent to management nodes, after a default computing task execution end on each computing node receives the heterogeneous computing resource searching commands, hardware scanning is carried out on a physical machine or the physical node where the default computing task execution end is located, a plurality of full-time computing task execution ends (computing task heterogeneous execution ends) are generated and started according to the searched heterogeneous resource conditions, one full-time computing task execution end corresponds to one heterogeneous resource, and after the computing task heterogeneous execution ends are started, the computing resources are registered to the management nodes to be available. And the user screens the available computing resources according to the computing requirement and submits a final computing resource screening result to the management node. And the management node processes according to the screening result, and each computing task heterogeneous execution end participates in the received screening result if the received screening result is selected, restores the non-selected default computing task execution end to a free state, and closes the non-selected computing task heterogeneous execution end (the copied computing task heterogeneous execution end).
And thirdly, job submission and starting: after heterogeneous computing resources are selected, tasks of the jobs are divided according to the data range to be computed, meanwhile, information of a computing task flow configuration file is set, and after the setting of the job information is completed, job submission and job starting work are carried out.
Fourthly, operation is carried out: after a user submits and starts a job, a management node firstly sends information of a computing task flow configuration file to each computing node, each computing node reads the configuration file from a specified position to the local according to the information, then loads and analyzes the configuration file, and autonomously performs task processing according to a task processing flow defined in the configuration file, such as remotely reading computing task plug-ins or programs, program input data and the like to a node local directory, loading the computing task plug-ins, requesting computing tasks, task computing, task result returning and the like. Each compute node implements an autonomous, customized task processing flow method. The calling of different geophysical prospecting algorithm libraries or the calling of a computing process can be realized through the plug-in function interface.
Completing the operation: when the operation is finished, the user processes the result of the calculation task, after the processing is finished, the management node is informed of the completion of the operation, meanwhile, the management node sends an operation completion command to each calculation node participating in the operation calculation, after a calculation task execution end corresponding to the calculation node receives an operation completion message, each unloading and cleaning operation is finished, if the calculation task execution end is a default calculation task execution end (a default calculation task heterogeneous execution end), the operation is reset to be in an idle state, and if the calculation task heterogeneous execution end is a calculation task heterogeneous execution end (a copied calculation task heterogeneous execution end), the program quits the operation.
The following is a further example to facilitate an understanding of how the invention may be practiced.
The invention provides a multi-machine heterogeneous parallel method which is suitable for different geophysical prospecting algorithm computing nodes and can be executed independently, and the implementation process mainly comprises the following steps:
1) computing task plug-in development
Developing a computing task plug-in to implement a geophysical prospecting business algorithm library, such as the business algorithm library shown in fig. 5, which may include: a library of geophysical algorithms (as shown in fig. 4), a program of geophysical algorithms (as shown in fig. 4), or other algorithm plug-ins.
And completing the development of the configuration file of the computing task plug-in and realizing the definition of the plug-in interface.
And finishing the design of the processing flow of the computing task, and designing the interface calling sequence and parameter (calling sequence and calling mode) setting of the computing task plug-in according to the processing steps of the geophysical algorithm.
2) Multi-machine heterogeneous parallel computing job
Screening heterogeneous computing resources (a method for using and processing multi-machine heterogeneous resources compatible with a single-machine multi-heterogeneous equipment collaborative or non-collaborative computing program)
The user sends a command for searching heterogeneous computing resources to a management node, the management node informs an idle computing node to search heterogeneous computing resources, a default computing task execution end (computing task heterogeneous execution end) detects the available heterogeneous computing resources in the physical machine after receiving the heterogeneous computing resource searching command, generates and starts a computing task heterogeneous execution end program representing the corresponding heterogeneous computing resources, registers the program to the management node, namely copies and configures a plurality of special computing task execution ends (computing task heterogeneous execution ends) which use different heterogeneous devices for computing on a peer directory of the default computing task execution end (computing task heterogeneous execution end), starts the computing task heterogeneous execution ends and registers and uses the computing task heterogeneous execution ends to the management node, and the management node feeds back the registration information of each computing task heterogeneous execution end to the user node after receiving the registration information of each computing task heterogeneous execution end, and (3) the user performs resource screening, the selected computing task execution end (comprising a default computing task heterogeneous execution end and a copied computing task heterogeneous execution end) participates in the computation of the parallel operation, the unselected default computing task execution end (computing task heterogeneous execution end) is set to be idle, and the unselected copied computing task heterogeneous execution end is closed.
Second, job preparation and job submission (a computing node autonomous operation method based on computing task plug-in and computing task flow design)
Before multi-machine parallel operation is carried out, input data, task input data, application program data (including calculation task plug-ins, plug-in configuration files, calculation task flows, and dependent geophysical prospecting service algorithm libraries, which can also be called dynamic libraries and the like) and the like required by the operation need to be prepared, after the preparation of the operation input data is completed, operation basic information and task flow settings are submitted, and then the operation is started.
The method comprises the steps of utilizing a computing task plugin to call functional interfaces in a geophysical algorithm library, a geophysical algorithm process and the geophysical algorithm plugin, defining the functional interfaces of the plugin through a plugin configuration file, defining the calling sequence and parameter setting of the functional interfaces of the plugin through a computing task flow, packaging parallel computing tasks of common geophysical algorithms, and loading the plugin according to a pre-designed task flow and calling the function of the plugin to realize the geophysical computing task algorithm through a plugin operation module and a task flow analysis and execution module (shown in figure 4) of a computing task execution end.
Operation of
After the operation is started, the management node firstly distributes task flow setting information (calculation task processing flow) to the calculation nodes, after the task flow setting information is read by the calculation task execution end of each calculation node, the operation of reading, loading, analyzing and executing is carried out, the execution of the calculation tasks is automatically carried out according to the task flow setting (calculation task flow configuration file), such as reading task data, reading calculation task plug-ins and programs, requesting the calculation tasks, loading the plug-ins, carrying out task calculation, returning task results and the like, the calculation nodes automatically run according to the preset flow configuration (calculation task flow configuration file), and the management node is responsible for the overall progress of the operation.
Work completion
And when the operation is completed, the user node processes the task result, the management node informs each computing node of completing the operation, the default computing task execution terminal is set to be in an idle state, the next operation is waited, and the computing task heterogeneous execution terminal is closed.
Based on the same inventive concept, embodiments of the present invention further provide a multi-machine heterogeneous parallel computing apparatus for geophysical prospecting applications, as described in the following embodiments. Because the principle of solving the problems of the multi-machine heterogeneous parallel computing device for geophysical application is similar to that of the multi-machine heterogeneous parallel computing method for geophysical application, the implementation of the multi-machine heterogeneous parallel computing device for geophysical application can refer to the implementation of the multi-machine heterogeneous parallel computing method for geophysical application, and repeated parts are not described again. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 6 is a schematic structural diagram of a multi-machine heterogeneous parallel computing apparatus for geophysical prospecting applications in the embodiment of the present invention, as shown in fig. 6, the apparatus includes: the system comprises a user node 01, a management node 02 and a plurality of computing nodes 03; wherein:
the user node 01 is used for sending a heterogeneous computing resource searching command to the management node before seismic exploration multi-machine parallel computing; the heterogeneous computing resource searching command comprises searching parameter information; screening heterogeneous computing resource information fed back by each computing node by using preset screening conditions to obtain selected heterogeneous computing resource information, and sending the selected heterogeneous computing resource information to a management node;
the management node 02 is used for broadcasting the heterogeneous computing resource searching command to each computing node; sending heterogeneous computing resource information fed back by each computing node to the user node; sending a heterogeneous computing resource use confirmation message to each computing node according to the selected heterogeneous computing resource information;
each computing node 03 is configured to perform hardware scanning on a physical machine where the computing node is located after receiving a heterogeneous computing resource search command, generate and start a plurality of computing task heterogeneous execution ends according to a searched heterogeneous computing resource situation and the search parameter information, and feed back heterogeneous computing resource information to a management node after the plurality of computing task heterogeneous execution ends are started; each computing task heterogeneous execution end corresponds to one type of heterogeneous computing resource; and determining a computation task heterogeneous execution end participating in the multi-machine parallel computation according to the heterogeneous computation resource use confirmation message, and performing the seismic exploration and seismic exploration multi-machine parallel computation by the determined computation task heterogeneous execution end.
In a specific implementation, the number of user nodes may be multiple.
In one embodiment, the user node may be further configured to perform calculation task division processing on the job according to the selected heterogeneous calculation resource information, set a calculation task flow configuration file, and perform job submission and start operation after the calculation task flow configuration file is set; the computing task flow configuration file comprises a computing task processing flow;
the management node may be further configured to send the computing task flow configuration file to each computing node after receiving a trigger instruction for job submission and start-up operation;
each computing node can be specifically used for processing the seismic exploration computing task by the determined computing task heterogeneous execution end according to the computing task processing flow.
In one embodiment, the determined heterogeneous execution end of the computing task may be specifically configured to:
loading corresponding computing task plug-ins according to the computing task processing flow; the computing task processing flow comprises a pre-defined computing task plug-in, a calling sequence and a calling mode thereof;
and calling a corresponding geophysical prospecting algorithm from a pre-established geophysical prospecting service algorithm library through the loaded computing task plug-in to complete the processing of the seismic exploration computing task.
In one embodiment, each compute node may be specifically configured to:
after receiving a heterogeneous computing resource searching command, carrying out hardware scanning on a physical machine where the heterogeneous computing resource searching command is located;
copying a plurality of computing task heterogeneous execution ends on a same-level directory where a default computing task heterogeneous execution end program of the computing node is located according to the searched heterogeneous computing resource condition and the searched parameter information;
respectively starting each calculation task heterogeneous execution end obtained by copying by a default calculation task heterogeneous execution end;
and after the start of each computing task heterogeneous execution end obtained by copying is finished, feeding back heterogeneous computing resource information to the management node.
In one embodiment, each compute node may be further operable to: and according to the heterogeneous computing resource use confirmation message, when the computing task heterogeneous execution end obtained by copying is determined to be unselected, closing the computing task heterogeneous execution end obtained by copying, and when the default computing task heterogeneous execution end is determined to be unselected, setting the default computing task heterogeneous execution end to be in a free state.
In one embodiment, the user node may comprise a first exception handling unit for:
when the management node is detected to be abnormal, the received calculation task results are detected, the completed calculation task results are merged, and an uncompleted calculation task list is given.
In one embodiment, the user node may comprise a second exception handling unit for:
when the user node is detected to be abnormal, program restart processing is carried out, the content in the operation result directory is detected, the completed calculation task results are merged, and meanwhile, an uncompleted calculation task list is given.
In one embodiment, the management node may comprise a third exception handling unit for:
and when the computing nodes are detected to be abnormal, redistributing the computing tasks of the abnormal computing nodes to other computing nodes.
The embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the multi-machine heterogeneous parallel computing method for geophysical prospecting application when executing the computer program.
The embodiment of the invention also provides a computer readable storage medium, which stores a computer program for executing the multi-machine heterogeneous parallel computing method for geophysical prospecting application.
The technical scheme provided by the embodiment of the invention has the beneficial technical effects that:
the invention extracts a multi-machine parallel computing method suitable for single-machine heterogeneous cooperative computing and heterogeneous non-cooperative computing according to different computing resource using modes of multi-machine heterogeneous computing in geophysical exploration application, can meet the requirements of multi-machine parallel heterogeneous computing of different applications, efficiently utilizes the existing computing resources to the maximum extent to perform load-balanced heterogeneous parallel computing, and realizes efficient multi-machine heterogeneous parallel.
Meanwhile, in order to meet the requirements of processing flows of different geophysical prospecting algorithms and support different geophysical prospecting algorithm implementation modes, a multi-machine parallel computing method based on a computing task plug-in and a computing task flow is realized, function calling is realized by calling an algorithm library function or a process interface through a plug-in function interface, operation is carried out according to a specified algorithm flow through computing task flow configuration, and autonomous execution of computing tasks is realized at computing nodes.
Based on the two points, a multi-machine heterogeneous parallel processing flow and a method with configurable computing task flow are provided, heterogeneous computing resources are screened, computing task flow submission is taken as a job submission premise, a job processing mode based on computing task flow distribution and analysis is taken as a core, a computing task is processed according to a preset flow in an autonomous execution mode, and a management node macroscopically monitors and manages the whole job operation on a higher level.
The invention can utilize the existing multi-computer heterogeneous computing resources and utilize the directly existing geophysical prospecting algorithm to the maximum extent to realize multi-computer heterogeneous computing.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (18)

1. A multi-machine heterogeneous parallel computing method for geophysical exploration application is characterized by comprising the following steps:
before seismic exploration multi-machine parallel computation is carried out, a user node sends a heterogeneous computing resource searching command to a management node; the heterogeneous computing resource searching command comprises searching parameter information;
the management node broadcasts a heterogeneous computing resource searching command to each computing node;
after each computing node receives a heterogeneous computing resource searching command, hardware scanning is carried out on a physical machine where the computing node is located, a plurality of computing task heterogeneous execution ends are generated and started according to the searched heterogeneous computing resource condition and the searching parameter information, and after the plurality of computing task heterogeneous execution ends are started, heterogeneous computing resource information is fed back to a management node; each computing task heterogeneous execution end corresponds to one type of heterogeneous computing resource;
the management node sends heterogeneous computing resource information fed back by each computing node to the user node;
the user node performs screening processing on heterogeneous computing resource information fed back by each computing node by using preset screening conditions to obtain selected heterogeneous computing resource information, and sends the selected heterogeneous computing resource information to the management node;
the management node sends a heterogeneous computing resource use confirmation message to each computing node according to the selected heterogeneous computing resource information;
and each computing node determines a computing task heterogeneous execution end participating in the multi-machine parallel computing according to the heterogeneous computing resource use confirmation message, and the determined computing task heterogeneous execution end performs the seismic exploration multi-machine parallel computing.
2. The method of multi-machine heterogeneous parallel computation for geophysical applications of claim 1 further comprising:
the user node divides the job into calculation tasks according to the selected heterogeneous calculation resource information, sets a calculation task flow configuration file, and performs job submission and starting operation after the calculation task flow configuration file is set; the computing task flow configuration file comprises a computing task processing flow;
after receiving a trigger instruction of job submission and starting operation, the management node sends the calculation task flow configuration file to each calculation node;
each computing node determines a computing task heterogeneous execution end participating in multi-machine parallel computing according to the heterogeneous computing resource use confirmation message, and the determined computing task heterogeneous execution end performs seismic exploration multi-machine parallel computing, and the method comprises the following steps:
and the computing task heterogeneous execution end determined in each computing node processes the seismic exploration computing task according to the computing task processing flow.
3. The multi-machine heterogeneous parallel computing method for geophysical applications according to claim 2, wherein the processing of the seismic exploration computing task by the computing task heterogeneous execution end determined in each computing node according to the computing task processing flow comprises:
the computing task heterogeneous execution end loads a corresponding computing task plug-in according to the computing task processing flow; the computing task processing flow comprises a pre-defined computing task plug-in, a calling sequence and a calling mode thereof;
and the computation task heterogeneous execution end calls a corresponding geophysical prospecting algorithm from a geophysical prospecting service algorithm library established in advance through the loaded computation task plug-in to complete the processing of the seismic exploration computation task.
4. The method as claimed in claim 1, wherein each computing node performs hardware scanning on a physical machine after receiving a heterogeneous computing resource search command, generates and starts a plurality of computing task heterogeneous execution ends according to the searched heterogeneous computing resource condition and the search parameter information, and feeds back heterogeneous computing resource information to the management node after the plurality of computing task heterogeneous execution ends are started, the method comprising:
after each computing node receives a heterogeneous computing resource searching command, hardware scanning is carried out on a physical machine where the computing node is located; each computing node copies the same-level directory where the default computing task heterogeneous execution end program of the computing node is located to obtain a plurality of computing task heterogeneous execution ends according to the searched heterogeneous computing resource condition and the searched parameter information;
the default computing task heterogeneous execution end of each computing node respectively starts each computing task heterogeneous execution end obtained by copying;
and after the copied computing task heterogeneous execution ends are started, each computing node feeds back heterogeneous computing resource information to the management node.
5. The method of multi-machine heterogeneous parallel computation for geophysical applications of claim 4 further comprising:
and each computing node closes the computing task heterogeneous execution end obtained by copying when determining that the computing task heterogeneous execution end obtained by copying is not selected according to the heterogeneous computing resource use confirmation message, and sets the default computing task heterogeneous execution end to be in a free state when determining that the default computing task heterogeneous execution end is not selected.
6. The method of multi-machine heterogeneous parallel computation for geophysical applications of claim 1 further comprising:
when the management node is detected to be abnormal, the user node detects the received calculation task results, combines the completed calculation task results and gives an uncompleted calculation task list.
7. The method of multi-machine heterogeneous parallel computation for geophysical applications of claim 1 further comprising:
when the user node is detected to be abnormal, the user node performs program restart processing, detects the content in the job result directory, performs merging processing on the completed calculation task results, and gives an uncompleted calculation task list.
8. The method of multi-machine heterogeneous parallel computation for geophysical applications of claim 1 further comprising:
when the computing nodes are detected to be abnormal, the management node redistributes the computing tasks of the abnormal computing nodes to other computing nodes.
9. A multi-machine heterogeneous parallel computing device for geophysical applications, comprising:
the user node is used for sending a heterogeneous computing resource searching command to the management node before seismic exploration multi-machine parallel computing is carried out; the heterogeneous computing resource searching command comprises searching parameter information; screening heterogeneous computing resource information fed back by each computing node by using preset screening conditions to obtain selected heterogeneous computing resource information, and sending the selected heterogeneous computing resource information to a management node;
the management node is used for broadcasting the heterogeneous computing resource searching command to each computing node; sending heterogeneous computing resource information fed back by each computing node to the user node; sending a heterogeneous computing resource use confirmation message to each computing node according to the selected heterogeneous computing resource information;
each computing node is used for carrying out hardware scanning on a physical machine where the computing node is located after receiving a heterogeneous computing resource searching command, generating and starting a plurality of computing task heterogeneous execution ends according to the searched heterogeneous computing resource condition and the searching parameter information, and feeding back heterogeneous computing resource information to the management node after the plurality of computing task heterogeneous execution ends are started; each computing task heterogeneous execution end corresponds to one type of heterogeneous computing resource; and determining a computation task heterogeneous execution end participating in the multi-machine parallel computation according to the heterogeneous computation resource use confirmation message, and performing the seismic exploration multi-machine parallel computation by the determined computation task heterogeneous execution end.
10. The multi-machine heterogeneous parallel computing device for geophysical applications of claim 9 wherein,
the user node is also used for dividing the job into calculation tasks according to the selected heterogeneous calculation resource information, setting a calculation task flow configuration file, and performing job submission and starting operation after the calculation task flow configuration file is set; the computing task flow configuration file comprises a computing task processing flow;
the management node is also used for sending the computing task flow configuration file to each computing node after receiving a triggering instruction of job submission and starting operation;
each computing node is specifically used for processing the seismic exploration computing task by the determined computing task heterogeneous execution end according to the computing task processing flow.
11. The multi-machine heterogeneous parallel computing device for geophysical applications of claim 10 wherein the determined heterogeneous execution end of computing tasks is specifically configured to:
loading corresponding computing task plug-ins according to the computing task processing flow; the computing task processing flow comprises a pre-defined computing task plug-in, a calling sequence and a calling mode thereof;
and calling a corresponding geophysical prospecting algorithm from a pre-established geophysical prospecting service algorithm library through the loaded computing task plug-in to complete the processing of the seismic exploration computing task.
12. The multi-machine heterogeneous parallel computing device for geophysical applications of claim 9 wherein each compute node is specifically configured to:
after receiving a heterogeneous computing resource searching command, carrying out hardware scanning on a physical machine where the heterogeneous computing resource searching command is located;
copying a plurality of computing task heterogeneous execution ends on a same-level directory where a default computing task heterogeneous execution end program of the computing node is located according to the searched heterogeneous computing resource condition and the searched parameter information;
respectively starting each calculation task heterogeneous execution end obtained by copying by a default calculation task heterogeneous execution end;
and after the start of each computing task heterogeneous execution end obtained by copying is finished, feeding back heterogeneous computing resource information to the management node.
13. The multi-machine heterogeneous parallel computing device for geophysical applications of claim 12 wherein each compute node is further configured to: and according to the heterogeneous computing resource use confirmation message, when the computing task heterogeneous execution end obtained by copying is determined to be unselected, closing the computing task heterogeneous execution end obtained by copying, and when the default computing task heterogeneous execution end is determined to be unselected, setting the default computing task heterogeneous execution end to be in a free state.
14. The multi-machine heterogeneous parallel computing device for geophysical applications of claim 9 wherein the user node comprises a first exception handling unit to:
when the management node is detected to be abnormal, the received calculation task results are detected, the completed calculation task results are merged, and an uncompleted calculation task list is given.
15. The multi-machine heterogeneous parallel computing device for geophysical applications of claim 9 wherein the user node comprises a second exception handling unit to:
when the user node is detected to be abnormal, program restart processing is carried out, the content in the operation result directory is detected, the completed calculation task results are merged, and meanwhile, an uncompleted calculation task list is given.
16. The multi-machine heterogeneous parallel computing device for geophysical applications of claim 9 wherein the management node comprises a third exception processing unit to:
and when the computing nodes are detected to be abnormal, redistributing the computing tasks of the abnormal computing nodes to other computing nodes.
17. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 8 when executing the computer program.
18. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the method of any one of claims 1 to 8.
CN202010173738.3A 2020-03-13 2020-03-13 Multi-machine heterogeneous parallel computing method and device for geophysical prospecting application Active CN113391917B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010173738.3A CN113391917B (en) 2020-03-13 2020-03-13 Multi-machine heterogeneous parallel computing method and device for geophysical prospecting application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010173738.3A CN113391917B (en) 2020-03-13 2020-03-13 Multi-machine heterogeneous parallel computing method and device for geophysical prospecting application

Publications (2)

Publication Number Publication Date
CN113391917A true CN113391917A (en) 2021-09-14
CN113391917B CN113391917B (en) 2024-04-30

Family

ID=77615835

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010173738.3A Active CN113391917B (en) 2020-03-13 2020-03-13 Multi-machine heterogeneous parallel computing method and device for geophysical prospecting application

Country Status (1)

Country Link
CN (1) CN113391917B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115729715A (en) * 2023-01-10 2023-03-03 摩尔线程智能科技(北京)有限责任公司 Load distribution method, device, equipment and medium for GPU system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1549964A (en) * 2002-01-04 2004-11-24 Method for controlling calculation resource in coprocessor in computing system and computing device20010942134
US20120278811A1 (en) * 2011-04-26 2012-11-01 Microsoft Corporation Stream processing on heterogeneous hardware devices
CN104598425A (en) * 2013-10-31 2015-05-06 中国石油天然气集团公司 General multiprocessor parallel calculation method and system
US20190122415A1 (en) * 2017-10-20 2019-04-25 Westghats Technologies Private Limited Graph based heterogeneous parallel processing system
CN110471758A (en) * 2019-07-02 2019-11-19 中国电力科学研究院有限公司 A kind of network analysis applications multi-user concurrent job scheduling system and method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1549964A (en) * 2002-01-04 2004-11-24 Method for controlling calculation resource in coprocessor in computing system and computing device20010942134
CN101685391A (en) * 2002-01-04 2010-03-31 微软公司 Methods and system for managing computational resources of a coprocessor in a computing system
US20120278811A1 (en) * 2011-04-26 2012-11-01 Microsoft Corporation Stream processing on heterogeneous hardware devices
CN104598425A (en) * 2013-10-31 2015-05-06 中国石油天然气集团公司 General multiprocessor parallel calculation method and system
US20190122415A1 (en) * 2017-10-20 2019-04-25 Westghats Technologies Private Limited Graph based heterogeneous parallel processing system
CN110471758A (en) * 2019-07-02 2019-11-19 中国电力科学研究院有限公司 A kind of network analysis applications multi-user concurrent job scheduling system and method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
李洋;: "新常态下一种新型计算技术在物探中的应用", 中国石油和化工标准与质量, no. 17, 8 September 2017 (2017-09-08) *
潘英杰;马青坡;杜清波;王汉钧;: "一种针对物探应用的多机并行计算框架", 地球物理学进展, no. 02, 15 April 2017 (2017-04-15) *
高雷, 胡铭曾, 张伟哲: "HITGRID:基于动态规划技术的网格任务调度中间件", 计算机工程与应用, no. 12, 1 December 2005 (2005-12-01) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115729715A (en) * 2023-01-10 2023-03-03 摩尔线程智能科技(北京)有限责任公司 Load distribution method, device, equipment and medium for GPU system
CN115729715B (en) * 2023-01-10 2023-09-01 摩尔线程智能科技(北京)有限责任公司 Load distribution method, device, equipment and medium for GPU (graphics processing Unit) system

Also Published As

Publication number Publication date
CN113391917B (en) 2024-04-30

Similar Documents

Publication Publication Date Title
CN111831543B (en) Data processing method and related product
CN107590075B (en) Software testing method and device
CN111459621B (en) Cloud simulation integration and scheduling method and device, computer equipment and storage medium
CN112114950A (en) Task scheduling method and device and cluster management system
CN109684319B (en) Data cleaning system, method, device and storage medium
CN111966597B (en) Test data generation method and device
CN109684088B (en) Remote sensing big data rapid processing task scheduling method based on cloud platform resource constraint
CN114297056A (en) Automatic testing method and system
Liu et al. Hanayo: Harnessing wave-like pipeline parallelism for enhanced large model training efficiency
CN114528186A (en) Model management method and device and model management server
CN113391917B (en) Multi-machine heterogeneous parallel computing method and device for geophysical prospecting application
CN116795524A (en) Task processing method, device, computer equipment, storage medium and program product
CN112463833A (en) Data set acquisition method, system, device and medium
CN113495723B (en) Method, device and storage medium for calling functional component
CN111160403B (en) API (application program interface) multiplexing discovery method and device
CN114443050A (en) Novel log display method based on CI engine assembly line
CN110868461B (en) Data distribution method facing heterogeneous bandwidth between nodes in Gaia cluster
Dufaud et al. Design of data management for multi SPMD workflow programming model
CN112444851B (en) Reverse time migration imaging method based on MapReduce parallel frame and storage medium
US11748155B1 (en) Declarative engine for workloads
Zhang et al. ITIF: Integrated Transformers Inference Framework for Multiple Tenants on GPU
CN118170367B (en) Workflow operation method and device, electronic equipment and medium
CN117632453A (en) Method and system for executing earthquake operation and computer readable storage medium
CN116366467A (en) Server-agnostic distributed training software defined aggregate communication framework and method
CN115629770A (en) Software deployment method, device, storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant