CN116755938B - Computing restarting method and device, storage medium and electronic equipment - Google Patents

Computing restarting method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN116755938B
CN116755938B CN202311015486.1A CN202311015486A CN116755938B CN 116755938 B CN116755938 B CN 116755938B CN 202311015486 A CN202311015486 A CN 202311015486A CN 116755938 B CN116755938 B CN 116755938B
Authority
CN
China
Prior art keywords
initial
target
processing object
information set
partition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311015486.1A
Other languages
Chinese (zh)
Other versions
CN116755938A (en
Inventor
余芬芬
陈焕盛
马金钢
张稳定
王文丁
吴剑斌
秦东明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongke Sanqing Environmental Technology Co ltd
3Clear Technology Co Ltd
Original Assignee
Beijing Zhongke Sanqing Environmental Technology Co ltd
3Clear Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Sanqing Environmental Technology Co ltd, 3Clear Technology Co Ltd filed Critical Beijing Zhongke Sanqing Environmental Technology Co ltd
Priority to CN202311015486.1A priority Critical patent/CN116755938B/en
Publication of CN116755938A publication Critical patent/CN116755938A/en
Application granted granted Critical
Publication of CN116755938B publication Critical patent/CN116755938B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1438Restarting or rejuvenating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/485Task life-cycle, e.g. stopping, restarting, resuming execution

Abstract

The invention provides a computing restarting method, a computing restarting device, a storage medium and electronic equipment, wherein the method comprises the following steps: acquiring an initial partition information set and M initial restart files corresponding to the initial partition information set, wherein the initial partition information set comprises initial partition information corresponding to each initial process in the M initial processes; when the calculation is required to be restarted, determining a target partition information set, wherein the target partition information set comprises target partition information corresponding to each target process in N target processes; and respectively acquiring the operation data of each target process from M initial restart files according to the initial partition information set and the target partition information set, and respectively adding the acquired operation data into the data variables of the corresponding target processes so as to perform calculation restart based on the data variables of each target process. The embodiment of the invention can conveniently determine the data variable of each target process, thereby effectively ensuring the calculation continuity.

Description

Computing restarting method and device, storage medium and electronic equipment
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a computing restart method, a computing restart device, a storage medium, and an electronic device.
Background
At present, a computing restarting technology is widely applied to various fields (such as numerical computation of air quality modes and the like), wherein a restarting file in the computing restarting technology is one of important data for guaranteeing long-term continuous operation of a service, is important data for protecting the safety of early computing data achievements and the cold starting restarting file required for continuing subsequent computing work when hardware of a system fails (such as sudden power off and the like) or bottom software of an operating system and the like is abnormal, and is also important data required for adjusting the number of processes. Accordingly, in order to continuously calculate and reduce the error introduced by the restarting process to the greatest possible extent, the schema system needs to store the complete and comprehensive information of the designated time before restarting into the restarting file as much as possible, so that the restarting file is much larger than the ordinary schema result output file. In the prior art, when parallel computing is performed, the restart files of each process are usually combined into one restart file, or the running data of each process is directly stored in one restart file, so that the computing restart is performed based on the generated single restart file, but the single restart file required to be generated cannot be generated due to the excessively large file because of the memory limitation of the system and the limitation of the size of the single file, so that the computing restart is difficult. Based on this, how to conveniently determine the data variable of each target process, so as to perform calculation restart based on the data variable of each target process becomes a research hotspot.
Disclosure of Invention
In view of this, the embodiment of the invention provides a method, a device, a storage medium and an electronic apparatus for computing restart, so as to solve the problem that a single restart file required for computing restart cannot be generated due to oversized file, and thus is difficult to perform computing restart.
According to an aspect of the present invention, there is provided a computing restart method, the method including:
acquiring an initial partition information set and M initial restart files corresponding to the initial partition information set, wherein the initial partition information set comprises initial partition information corresponding to each initial process in M initial processes, one initial restart file is used for storing operation data of the corresponding initial process, one partition information comprises a processing object range corresponding to the operation data of the corresponding process, one operation data corresponds to one processing object, and M is a positive integer;
when the calculation is required to be restarted, determining a target partition information set, wherein the target partition information set comprises target partition information corresponding to each target process in N target processes, and N is a positive integer;
And respectively acquiring the operation data of each target process from the M initial restart files according to the initial partition information set and the target partition information set, and respectively adding the acquired operation data into the data variables of the corresponding target processes so as to calculate and restart based on the data variables of each target process.
According to another aspect of the present invention, there is provided a computing restart apparatus, the apparatus including:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an initial partition information set and M initial restart files corresponding to the initial partition information set, the initial partition information set comprises initial partition information corresponding to each initial process in M initial processes, one initial restart file is used for storing operation data of the corresponding initial process, one partition information comprises a processing object range corresponding to the operation data of the corresponding process, one operation data corresponds to one processing object, and M is a positive integer;
the processing unit is used for determining a target partition information set when the calculation is required to be restarted, wherein the target partition information set comprises target partition information corresponding to each target process in N target processes, and N is a positive integer;
The processing unit is further configured to obtain, from the M initial restart files, operation data of each target process according to the initial partition information set and the target partition information set, and add the obtained operation data to data variables of corresponding target processes, respectively, so as to perform calculation restart based on the data variables of each target process.
According to another aspect of the invention there is provided an electronic device comprising a processor, and a memory storing a program, wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the above mentioned method.
According to another aspect of the present invention there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the above mentioned method.
The embodiment of the invention can acquire an initial partition information set and M initial restart files corresponding to the initial partition information set, wherein one initial restart file is used for storing the operation data of a corresponding initial process, and M is a positive integer; then, when the computing is required to restart, a target partition information set can be determined, wherein the target partition information set comprises target partition information corresponding to each target process in N target processes, and N is a positive integer. Based on the above, the operation data of each target process can be obtained from M initial restart files according to the initial partition information set and the target partition information set, and the obtained operation data are added to the data variables of the corresponding target processes, respectively, so as to perform calculation restart based on the data variables of each target process. Therefore, the embodiment of the invention can conveniently acquire the running data of each target process from the M initial restart files through the initial partition information set and the target partition information set without generating a single restart file corresponding to the M initial restart files, and can avoid the situation that the single restart file is too large to be generated, thereby effectively ensuring the calculation continuity.
Drawings
Further details, features and advantages of the invention are disclosed in the following description of exemplary embodiments with reference to the following drawings, in which:
FIG. 1 illustrates a flow diagram of a method of computing a restart in accordance with an exemplary embodiment of the present invention;
FIG. 2a shows a schematic diagram of merging single restart files according to an exemplary embodiment of the present invention;
FIG. 2b illustrates a schematic diagram of a direct generation of a single restart file in accordance with an exemplary embodiment of the present invention;
FIG. 3 illustrates a flow diagram of another method of computing reboot in accordance with an exemplary embodiment of the invention;
FIG. 4 illustrates a schematic diagram of file naming in accordance with an exemplary embodiment of the present invention;
FIG. 5 illustrates a schematic diagram of a data read in accordance with an exemplary embodiment of the present invention;
FIG. 6 shows a schematic diagram of a mesh model according to an exemplary embodiment of the present invention;
FIG. 7 illustrates a flow diagram of yet another method of computing a restart in accordance with an exemplary embodiment of the present invention;
FIG. 8a shows a schematic diagram of an initial set of partition information according to an exemplary embodiment of the present invention;
FIG. 8b shows a schematic diagram of an initial set of process object information according to an exemplary embodiment of the present invention;
FIG. 8c illustrates a schematic diagram of a target partition information set in accordance with an exemplary embodiment of the present invention;
FIG. 8d shows a schematic diagram of a target process object information set according to an exemplary embodiment of the present invention;
FIG. 8e illustrates a schematic diagram of a partition object mapping information set in accordance with an exemplary embodiment of the present invention;
FIG. 9 shows a schematic block diagram of a computing restart apparatus according to an exemplary embodiment of the present invention;
fig. 10 shows a block diagram of an exemplary electronic device that can be used to implement an embodiment of the invention.
Detailed Description
Embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While the invention is susceptible of embodiment in the drawings, it is to be understood that the invention may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided to provide a more thorough and complete understanding of the invention. It should be understood that the drawings and embodiments of the invention are for illustration purposes only and are not intended to limit the scope of the present invention.
It should be understood that the various steps recited in the method embodiments of the present invention may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the invention is not limited in this respect.
The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below. It should be noted that the terms "first," "second," and the like herein are merely used for distinguishing between different devices, modules, or units and not for limiting the order or interdependence of the functions performed by such devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those skilled in the art will appreciate that "one or more" is intended to be construed as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the devices in the embodiments of the present invention are for illustrative purposes only and are not intended to limit the scope of such messages or information.
It should be noted that, the execution body of the computing restart method provided by the embodiment of the present invention may be one or more electronic devices, which is not limited in this aspect of the present invention; the electronic device may be a terminal (i.e. a client) or a server, and when the execution body includes a plurality of electronic devices and the plurality of electronic devices include at least one terminal and at least one server, the computing restarting method provided by the embodiment of the present invention may be executed by the terminal and the server together; for example, the terminal may acquire data required for calculation restart, and send the acquired data to the server, so that the server performs calculation restart through the received data; for another example, the terminal may obtain a calculation restart instruction, and send the calculation restart instruction to the server, so that the server performs calculation restart according to the calculation restart instruction, and so on. Accordingly, the terminals referred to herein may include, but are not limited to: smart phones, tablet computers, notebook computers, desktop computers, intelligent voice interaction devices, supercomputers, and the like. The server mentioned herein may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing (cloud computing), cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network ), and basic cloud computing services such as big data and artificial intelligence platforms, and so on.
Based on the above description, an embodiment of the present invention proposes a calculation restart method that can be executed by the above-mentioned electronic device (terminal or server); alternatively, the calculation restart method may be performed by the terminal and the server together. For convenience of explanation, the electronic device will be used to execute the calculation restarting method in the following description; as shown in fig. 1, the calculation restart method may include the following steps S101 to S103:
s101, acquiring an initial partition information set and M initial restart files corresponding to the initial partition information set, wherein the initial partition information set comprises initial partition information corresponding to each initial process in the M initial processes, one initial restart file is used for storing operation data of the corresponding initial process, one partition information comprises a processing object range corresponding to the operation data of the corresponding process, one operation data corresponds to one processing object, and M is a positive integer.
Wherein, one operation data may refer to a processing result of the data indicated by the corresponding processing object, may refer to a processing result of the data indicated by the corresponding processing object and the data indicated by the corresponding processing object, and so on; the invention is not limited in this regard. Accordingly, one processing object may be used to indicate data to be processed for one calculation; based on this, when data acquisition is performed for each of the grids in the target area, respectively, so that the data acquired within each of the grids is used as data to be processed by one calculation, one processing object may be one of the grids in the grid model, which is a division result of the target area into a plurality of grids, and one processing object may be used to indicate the data acquired in the corresponding grid (i.e., the grid data). Wherein the grid may also be referred to as a grid point; accordingly, the target area may refer to any research area, such as a national area or an area where a province is located, which is not limited by the present invention.
Accordingly, the initial partition information set and the M initial restart files may be obtained by, but not limited to, the following methods:
the first acquisition mode is as follows: the electronic device may acquire the initial partition information set and the download links of the M initial restart files through a remote transmission manner, and download the initial partition information set and the M initial restart files according to the download links, thereby using the downloaded initial partition information set and the M initial restart files as the acquired initial partition information set and the M initial restart files.
The second acquisition mode is as follows: the electronic device stores a plurality of partition information sets and at least one initial restart file corresponding to each partition information set in the plurality of partition information sets, so that the electronic device can select one partition information set from the plurality of partition information sets, take the selected partition information set as the initial partition information set, and take at least one initial restart file corresponding to the selected partition information set as M initial restart files corresponding to the initial partition information set.
The third acquisition mode is as follows: the electronic device may have a generating component, and when performing parallel computation using M initial processes, the electronic device may establish initial partition information corresponding to each initial process of the M initial processes through the generating component to generate an initial partition information set, and generate initial restart files corresponding to each initial process, so as to add operation data of the initial process to the corresponding initial restart files, to obtain M initial restart files corresponding to the initial partition information set, and so on.
It should be understood that a process may correspond to a CPU (Central Processing Unit ) core, or simply referred to as a core, and the initial partition information set may be stored in a data table or may be stored in an array, which is not limited by the present invention. Accordingly, when the initial partition information set is stored in the form of a data table, the initial partition information set may also be referred to as an old partition table, where the initial partition information set is a partition table in the running process before the computing restart, and one piece of initial partition information is one line or one column of data in the old partition table. The initial partition information set may be three-dimensional partition data (e.g., a three-dimensional partition table) or two-dimensional partition data (e.g., a two-dimensional partition table), which is not limited in the present invention.
In one embodiment, when a processing object is a grid in the grid model and the target area is divided according to the X direction, the Y direction, and the Z direction, the initial partition information set may be three-dimensional partition data, where an initial partition information includes, but is not limited to: initial process identification, X-direction starting grid point identification, X-direction ending grid point identification, Y-direction starting grid point identification, Y-direction ending grid point identification, Z-direction starting grid point identification, Z-direction ending grid point identification, and the like; the invention is not limited in this regard. Wherein, each direction start lattice point identification and each direction end lattice point identification in one partition information are used for indicating the processing object range in the corresponding partition information. It should be noted that, the process identifier may be a process number (i.e. a numerical identifier such as 1 or 2, etc.), or may be an alphabetical identifier (such as a or b, etc.), which is not limited in the present invention; accordingly, the lattice point identifier may be a lattice point number (i.e. a numerical identifier) or an alphabetical identifier, which is not limited in the present invention.
Alternatively, the embodiment of the present invention may use pid_pre to represent the initial process identifier, and xstart_pre, xend_pre, ystart_pre, yend_pre, zstart_pre, and zend_pre to represent the X-direction start lattice point identifier, the X-direction end lattice point identifier, the Y-direction start lattice point identifier, the Y-direction end lattice point identifier, the Z-direction start lattice point identifier, and the Z-direction end lattice point identifier in the initial partition information, respectively. Based on this, when the initial partition information set is three-dimensional partition data, the initial partition information may include, but is not limited to: pid_pre, xstart_pre, xend_pre, ystart_pre, yend_pre, zstart_pre, zend_pre, etc. Optionally, embodiments of the present invention may also use nproc_pre to represent the number of initial processes (i.e., the core number), that is, nproc_pre is equal to M; based on this, the electronic device may initiate nproc_pre cores to perform the computation.
In another embodiment, when a processing object is a grid in the grid model and the target area is divided according to the X direction and the Y direction, the initial partition information set may be two-dimensional partition data, where an initial partition information includes, but is not limited to: initial process identification, X-direction start grid point identification, X-direction end grid point identification, Y-direction start grid point identification, Y-direction end grid point identification, and so forth; the invention is not limited in this regard.
S102, when the calculation is required to be restarted, determining a target partition information set, wherein the target partition information set comprises target partition information corresponding to each target process in N target processes, and N is a positive integer.
It should be noted that the value of N may be the same as the value of M, or may be different from the value of M, which is not limited in the present invention; that is, the number of processes before the restart (i.e., the initial number of processes) may be the same as the number of processes after the restart (i.e., the target number of processes), or may be different from the number of processes after the restart, which is not limited in the present invention.
The target partition information set may be stored in a data table or an array, which is not limited in the present invention; accordingly, when the target partition information set is stored in the form of a data table, the target partition information set may also be referred to as a new partition table, where the target partition information set is a partition table in the running process after the computing restart. The target partition information set may be three-dimensional partition data (e.g., a three-dimensional partition table) or two-dimensional partition data (e.g., a two-dimensional partition table), which is not limited in the present invention. Similarly, when a processing object is a grid in the grid model and the target partition information set is three-dimensional partition data, the target partition information includes, but is not limited to: target process identification, X-direction starting grid point identification, X-direction ending grid point identification, Y-direction starting grid point identification, Y-direction ending grid point identification, Z-direction starting grid point identification, Z-direction ending grid point identification, and the like; the invention is not limited in this regard; alternatively, when a processing object is a grid in the grid model and the target partition information set is two-dimensional partition data, one target partition information includes, but is not limited to: target process identification, X-direction start grid point identification, X-direction end grid point identification, Y-direction start grid point identification, Y-direction end grid point identification, and so forth; the invention is not limited in this regard.
Optionally, in the embodiment of the present invention, pid_cur may be used to represent the target process identifier, and xstart_cur, xend_cur, ystart_cur, yend_cur, zstart_cur, and zend_cur may be used to represent the X-direction start grid point identifier, the X-direction end grid point identifier, the Y-direction start grid point identifier, the Y-direction end grid point identifier, the Z-direction start grid point identifier, and the Z-direction end grid point identifier in the target partition information, respectively. Based on this, when the target partition information set is three-dimensional partition data, the initial partition information may include, but is not limited to: pid_cur, xstart_cur, xend_cur, ystart_cur, yend_cur, zstart_cur, zend_cur, and the like.
In one embodiment, when the resubmitting task performs computation (i.e., compute restart), the electronic device may receive a partition setting instruction (may also be referred to as a compute restart instruction), and establish a target partition information set according to the number of CPU cores indicated by the partition setting instruction (i.e., target number of processes N); specifically, the electronic device may perform range division according to the target number of processes, where the range of the processing object indicated by the partition setting instruction is equal to the integration result of the processing object ranges in the initial partition information, so that each processing object range in the range division result is added to the corresponding target partition information to determine the target partition information set.
In another embodiment, when the task is re-submitted for calculation, the electronic device may receive a task submitting instruction, where the task submitting instruction may carry a partition information set, and then the electronic device may use the partition information set carried by the task submitting instruction as a target partition information set, so as to determine the target partition information set, and so on.
S103, according to the initial partition information set and the target partition information set, respectively acquiring the operation data of each target process from M initial restart files, and respectively adding the acquired operation data into the data variables of the corresponding target processes to perform calculation restart based on the data variables of each target process.
It should be understood that the computing restart method mentioned in the embodiments of the present invention may be applied to various scenarios, such as a digital mode processing scenario and an image processing scenario, and the present invention is not limited thereto. Wherein, numerical mode processing scenarios include, but are not limited to: air quality mode scenes, pollutant source analysis scenes and the like, and the invention is not limited to the air quality mode scenes, the pollutant source analysis scenes and the like; the processing objects in the numerical mode processing scene are in one-to-one correspondence with grids in the corresponding grid model. Based on this, when the calculation restart method is applied to an image processing scene, the above-described M initial processes are used for: before the calculation is restarted, running image processing; the N target processes are used for: after the calculation is restarted, the image processing is run.
Accordingly, when the computing restart method is applied to the numerical mode processing scenario, the above-mentioned M initial processes are used for: before the calculation is restarted, running numerical mode processing; the N target processes are used for: after the calculation is restarted, the numerical mode processing is run. Wherein, a processing object is a grid in a grid model, and the grid model is a division result of a target area divided into a plurality of grids; when the numerical mode processing is pollutant source analysis processing, the operation data of one process includes source analysis data corresponding to the processing objects in the corresponding processing object range, and the pollutant source analysis processing can also be simply referred to as pollutant source analysis.
It should be understood that, in the numerical mode processing service operation process such as air quality mode and pollutant source analysis, a multi-process (i.e. multi-core) parallel computing mode is generally adopted, and each core is responsible for a part of data computing work; these numerical mode processes require additional output of restart data (i.e., restart files) to meet the data requirements of computing a restart in the event of a business need or hardware failure, etc. In the parallel computing process of numerical mode processing, particularly in a business operation scene, the prior art generally adopts a single restarting file so as to facilitate the subsequent use; for example: after the subsequent computing resource is regulated, the core number restarting computation needs to be modified, and the traditional mode can still directly read a single restarting file, but the generated restarting file is larger, so that the restarting file in the process of computing restarting is difficult to read and takes a longer time.
For example, as shown in FIG. 2a, taking the air quality model NAQPMS (Nested Air Quality Prediction Modeling System, nested grid air quality forecast model System) and its source resolution OSAM (a pollutant source resolution model) as an example, the specific process of outputting the restart files is to output partition data (i.e. the operation data included in the initial restart files) responsible for each initial process, and then combine them into a single restart file, which may include PM 2.5 (Fine particulate matter), O 3 Concentration data of a plurality of pollutant species such as (ozone), different industries and different areasIs used to parse the data. As another example, as shown in fig. 2b, taking the air quality mode CMAQ (regional multi-scale air quality model) and source analysis thereof, WRF-Chem (a regional air quality mode) and so on as an example for explanation, the output mode of the restart file is generally that a single process collects data of other processes, and then outputs a single restart file containing all contaminant concentration data or source analysis data of different industries and different regions.
The embodiment of the invention can acquire an initial partition information set and M initial restart files corresponding to the initial partition information set, wherein one initial restart file is used for storing the operation data of a corresponding initial process, and M is a positive integer; then, when the computing is required to restart, a target partition information set can be determined, wherein the target partition information set comprises target partition information corresponding to each target process in N target processes, and N is a positive integer. Based on the above, the operation data of each target process can be obtained from M initial restart files according to the initial partition information set and the target partition information set, and the obtained operation data are added to the data variables of the corresponding target processes, respectively, so as to perform calculation restart based on the data variables of each target process. Therefore, the embodiment of the invention can conveniently acquire the running data of each target process from the M initial restart files through the initial partition information set and the target partition information set without generating a single restart file corresponding to the M initial restart files, and can avoid the situation that the single restart file is too large to be generated, thereby effectively ensuring the calculation continuity.
Based on the above description, the embodiment of the present invention further provides a more specific computing restart method, where an initial restart file includes initial process identifiers of corresponding initial processes. Accordingly, the calculation restart method may be performed by the above-mentioned electronic device (terminal or server); alternatively, the calculation restart method may be performed by the terminal and the server together. For convenience of explanation, the electronic device will be used to execute the calculation restarting method in the following description; referring to fig. 3, the computing restart method may include the following steps S301 to S306:
s301, acquiring an initial partition information set and M initial restart files corresponding to the initial partition information set, wherein the initial partition information set comprises initial partition information corresponding to each initial process in the M initial processes, one initial restart file is used for storing operation data of the corresponding initial process, one partition information comprises a processing object range corresponding to the operation data of the corresponding process, one operation data corresponds to one processing object, and M is a positive integer.
The restart files corresponding to the processes are the restart files corresponding to the cores. In the embodiment of the present invention, a restart file may include a process identifier of a corresponding process, and taking a partition information set as three-dimensional partition data for illustration, the first row description information of the restart file includes, but is not limited to: process identification, X-direction starting grid point identification, X-direction ending grid point identification, Y-direction starting grid point identification, Y-direction ending grid point identification, Z-direction starting grid point identification, and Z-direction ending grid point identification, and so forth; the invention is not limited in this regard.
Optionally, when the calculation restarting method provided by the invention is applied to a pollutant source analysis scene, the restarting file can also comprise a source analysis industry and area identifier, a source analysis species identifier and the like; the source analysis industry and the region identifier can be numerical identifiers, letter identifiers and the like, and the invention is not limited to the numerical identifier; accordingly, the source resolution species identification may be a numeric identification or an alphabetical identification, which is not limited by the present invention. In this case, restarting the second line of the file begins including, but not limited to: the contaminant concentration and source resolution data in the partition slice range (i.e., the range indicated by the X-direction start grid point identifier, the X-direction end grid point identifier, the Y-direction start grid point identifier, the Y-direction end grid point identifier, the Z-direction start grid point identifier, and the Z-direction end grid point identifier, or the range indicated by the X-direction start grid point identifier, the X-direction end grid point identifier, the Y-direction start grid point identifier, and the Y-direction end grid point identifier) corresponding to the corresponding process, etc., which is not limited in this regard by the present invention; the partitioned slice range refers to the processing object range. It should be understood that the data size of the restart file according to the embodiments of the present invention is relatively small, and the file size is generally at most in the MB or GB range, so that a large file exceeding 2TB needs to be generated is avoided.
Illustratively, the initial restart file is illustrated as an example, the initial restart file may include a pid_pre field, and the first line of the initial restart file may include: pid_pre, xstart_pre, xend_pre, ystart_pre, yend_pre, zstart_pre, zn_pre, ism_pre, and idm_pre, etc., ism_pre being used to represent source resolving industry and area identity, and idm_pre being used to represent source resolving species identity.
Further, the file name of the initial restart file output by the electronic device may have pid_pre information, that is, may have a process identifier, so that the corresponding initial restart file may be found by the process identifier, as shown in fig. 4; and, the electronic device may output in binary form, thereby reducing the size of the initial restart file.
S302, when the calculation is needed to be restarted, a target partition information set is determined, wherein the target partition information set comprises target partition information corresponding to each target process in N target processes, and N is a positive integer.
S303, comparing and analyzing the initial partition information set and the target partition information set to obtain a comparison and analysis result.
Specifically, the electronic device may compare the number of initial partition information in the initial partition information set with the number of target partition information in the target partition information set, that is, the electronic device may compare the difference between the previous running core number (i.e., M) and the current restarting running core number (i.e., N, which may also be denoted as nproc_cur); if the number of the initial partition information is different from the number of the target partition information, that is, if the number of cores is different from the number of cores, the partition mode is different from the two operation modes (that is, the initial partition information set and the target partition information set are different), and the comparison analysis result can be used for indicating that the initial partition information set is different from the target partition information set.
Accordingly, if the number of the initial partition information is the same as the number of the target partition information, that is, if the number of cores is the same twice, the electronic device may further check the partition information to implement a comparative analysis of the initial partition information set and the target partition information set. Alternatively, if it is on the same electronic device and operating environment (i.e., the same automatic partition algorithm, the same operating system, and compiler version, etc.), the electronic device may determine that the initial partition information set is the same as the target partition information set.
Specifically, when the partition information is further checked, the electronic device may determine whether the start lattice point identifier of each direction and the end lattice point identifier of each direction in the partition information corresponding to each core are the same, that is, for any one of the initial partition information sets, the electronic device may determine whether the processing object range in the any one of the initial partition information sets is the same as the processing object range in one of the target partition information sets included in the target partition information set. Based on this, when each line of data is one partition information, the electronic device can compare line by line; alternatively, the program uses an application such as a system call diff (an instruction to compare the contents of two files) to directly compare the two partition information sets. It should be noted that the comparison analysis result may be a numerical value (e.g., 0), a text (e.g., the same or different), etc., which is not limited in the present invention. Optionally, when the comparison analysis result is 0, the comparison analysis result may be used to indicate that the initial partition information set and the target partition information set are the same; when the comparison analysis result is not 0, the comparison analysis result may be used to indicate that the initial partition information set is different from the target partition information set. In the embodiment of the invention, because the embodiment of the invention uses a small amount of data such as the process identifier included in the partition information set and the processing object range in the comparison analysis process, instead of using the detailed processing object information set with larger data volume to perform the comparison analysis, the comparison analysis speed can be effectively improved so as to conveniently determine whether to perform partition adjustment (namely, the partition object mapping information set is constructed to realize the acquisition of the running data of each target process).
For example, assuming that the initial partition information set includes initial partition information 1, initial partition information 2, and initial partition information 3, and the target partition information set includes target partition information 1 and target partition information 2, where M has a value of 3 and n has a value of 2, the electronic device may obtain a comparative analysis result indicating that the initial partition information set and the target partition information set are different. For another example, assuming that the initial partition information set includes initial partition information 1 and initial partition information 2, and the target partition information set includes target partition information 1 and target partition information 2, where the value of M and the value of N are both 2, the electronic device needs to further check the partition information; assuming that the processing object range in the initial partition information 1 is the processing object 1-30, the processing object range in the initial partition information 2 is the processing object 31-50, the processing object range in the target partition information 1 is the processing object 1-20, and the processing object range in the target partition information 2 is the processing object 21-50, at this time, the processing object range in one initial partition information is different from the processing object range in any target partition information, and then the electronic device can obtain a comparison analysis result for indicating that the initial partition information set is different from the target partition information set. For another example, assuming that the values of M and N are both 2, the processing object ranges in the initial partition information 1 and the target partition information 1 are both processing objects 1 to 30, and the processing object ranges in the initial partition information 2 and the target partition information 2 are both processing objects 31 to 50, then a comparison analysis result indicating that the initial partition information set is the same as the target partition information set may be obtained at this time, and so on.
S304, if the comparison analysis result indicates that the initial partition information set is the same as the target partition information set, the operation data of each target process are respectively obtained from the M initial restart files based on the target process identifiers of each target process and the initial process identifiers corresponding to the M initial restart files.
The initial process identifiers corresponding to the M initial restart files refer to initial process identifiers corresponding to all initial restart files in the M initial restart files; since the initial process identifier in any initial restart file is the same as the initial process identifier carried by the file name of any initial restart file, the initial process identifier corresponding to one initial restart file may be determined by the initial process identifier in the corresponding initial restart file or may be determined by the initial process identifier carried by the file name of the corresponding initial restart file, which is not limited in this invention.
Specifically, when the initial partition information set is the same as the target partition information set, that is, when the calculated partitions before and after restarting are the same, the electronic device may obtain operation data of each target process from the M initial restart files according to the target process identifier of each target process, and the initial process identifier and the processing object range in the M initial restart files; further, according to the target process identifier and the processing object range in the target partition information corresponding to each target process and the initial process identifier and the processing object range in each initial restart file, the operation data of each target process can be obtained from the M initial restart files. Alternatively, the reading may be performed using PID_PRE, XSTART_PRE, XEND_PRE, YSTART_PRE, YEND_PRE, ZSTART_PRE, and ZEND_PRE of the first row in the initial restart file.
Optionally, because the file name (i.e. the file name) of the restart file may carry a corresponding process identifier, when the initial partition information set is the same as the target partition information set, for any target process, the electronic device may determine, according to the target process identifier of the any target process and the file of each initial restart file, the initial restart file indicated by the initial process identifier corresponding to the target process identifier, and perform data reading on the determined initial restart file, so as to obtain the running data of any target process.
It should be understood that, in the process of reading in the partition small file (i.e. the initial restart file) by the electronic device, compared with reading in a single restart file containing all data, a great amount of time consumed in reading data is saved, and the time of merging data is saved in this way, so that the restart efficiency can be effectively improved.
Optionally, the electronic device may include a multi-core read-in module, which may also be referred to as a read-in module; based on the above, the electronic device can read in partition data required by the corresponding CPU core through the multi-core read-in module, that is, can obtain operation data of the corresponding target process through the multi-core read-in module, where one partition data refers to operation data included in one initial restart file.
S305, if the comparison analysis result indicates that the initial partition information set is different from the target partition information set, constructing a partition object mapping information set, and respectively acquiring the operation data of each target process from M initial restart files according to the partition object mapping information set.
When one processing object is a grid in the grid model, the partition object mapping information set may also be called a partition grid point mapping information set, and the partition object mapping information set may be stored in a data table form or in an array or linked list form, which is not limited in the present invention; accordingly, when the partition object mapping information set is stored in the form of a data table, the partition object mapping information set may also be referred to as a partition grid point mapping table or a new and old partition grid point mapping table.
Since one initial partition information includes an initial process identifier of a corresponding initial process, one target partition information includes a target process identifier of a corresponding target process, and the process identifier in one partition information is used to indicate: when the partition object mapping information set is constructed, the electronic device can determine an initial processing object information set according to the processing object range and the initial process identifier included in each piece of initial partition information in the initial partition information set, wherein one piece of initial processing object information includes an initial process identifier of an initial process corresponding to the corresponding processing object, and the initial process identifier in the one piece of initial processing object information is used for indicating: the corresponding processing object corresponds to an initial restarting file where the running data is located; determining a target processing object information set according to the processing object range and the target process identifier included in each target partition information in the target partition information set, wherein one target processing object information set comprises the target process identifier of the target process corresponding to the corresponding processing object, and the target process identifier in one target processing object information set is used for indicating: a target process which needs to receive the corresponding operation data of the corresponding processing object; and constructing a partition object mapping information set by adopting the initial processing object information set and the target processing object information set.
In a specific implementation, if the partition information set includes all information required for determining the processing object information set, when determining the initial processing object information set according to the processing object range and the initial process identifier included in each initial partition information in the initial partition information set, the electronic device may perform object conversion processing on the initial partition information set according to the processing object range and the initial process identifier included in each initial partition information in the initial partition information set, to obtain the initial processing object information set; accordingly, when the target processing object information set is determined according to the processing object range and the target process identifier included in each target partition information in the target partition information set, the electronic device may perform object conversion processing on the target partition information set according to the processing object range and the target process identifier included in each target partition information in the target partition information set, to obtain the target processing object information set.
In another specific implementation, if the partition information set does not include all information required for determining the processing object information set, when determining the initial processing object information set according to the processing object range and the initial process identifier included in each piece of initial partition information in the initial partition information set, the electronic device may generate the initial processing object information set according to the processing object range and the initial process identifier included in each piece of initial partition information in the initial partition information set and the initial processing object generation information of each processing object; accordingly, when determining the target processing object information set according to the processing object range and the target process identifier included in each target partition information in the target partition information set, the electronic device may generate the target processing object information set according to the processing object range and the target process identifier included in each target partition information in the target partition information set, and the target processing object generation information of each processing object, and so on. It can be seen that the processing object information set generated at this time may include processing object generation information of the respective processing objects.
Wherein the processing object generation information of one processing object includes, but is not limited to: local object identification, global object identification and the like of the corresponding processing object, which is not limited by the invention; the details of the local object identifier and the global object identifier are shown below, and the embodiments of the present invention are not described herein.
It should be noted that, because the data volume of any partition information set in the initial partition information set and the target partition information set is smaller, and only the partition information is included, the electronic device may further establish and output a processing object information set, and one processing object information may provide a more detailed correspondence between a process and a processing object; when the processing object information sets are stored in the form of a data table and one processing object is a grid, the initial processing object information set may also be referred to as an old process grid point information table, and the target processing object information set may also be referred to as a new process grid point information table. Alternatively, the set of processing object information may be stored in the form of an array, which is not limited by the present invention.
It should be understood that, in the embodiment of the present invention, the determination time of the initial processing object information set is not limited, that is, the electronic device may determine the initial processing object information set when acquiring the processing object range and the initial process identifier included in each initial partition information in the initial partition information set; the initial processing object information set may also be determined when a result of the comparative analysis indicating that the initial partition information set is different from the target partition information set is obtained, that is, the initial processing object information set may be determined when the partition object mapping information set needs to be constructed, and so on. Similarly, the determining time of the target processing object information set is not limited, that is, the electronic device may determine the target processing object information set when acquiring the processing object range and the target process identifier included in each target partition information in the target partition information set; the target processing object information set may also be determined when a result of the comparative analysis indicating that the initial partition information set is different from the target partition information set is obtained, that is, the target processing object information set may be determined when the partition object mapping information set needs to be constructed, and so on.
Further, one piece of initial processing object information may include an initial global object identifier of the corresponding processing object, one piece of target processing object information may include a target global object identifier of the corresponding processing object, and one piece of global object identifier of the processing object refers to an object identifier of the corresponding processing object in the processing object set; that is, one initial processing object information may include an initial process identifier of an initial process corresponding to the corresponding processing object and an initial global object identifier of the corresponding processing object, and one target processing object information may include a target process identifier of a target process corresponding to the corresponding processing object and a target global object identifier of the corresponding processing object. Based on this, when the partition object mapping information set is constructed by adopting the initial processing object information set and the target processing object information set, the electronic device may connect each initial processing object information with the matched target processing object information according to the initial global object identifier included in each initial processing object information in the initial processing object information set and the target global object identifier included in each target processing object information in the target processing object information set, so as to construct the partition object mapping information set. The partition object mapping information is information obtained by connecting initial processing object information with matched target processing object information, and the initial global object identification in any initial processing object information is the same as the target global object identification in the matched target processing object information; that is, the electronic device may merge the initial processing object information and the target processing object information, which are identical in the initial global object identification and the target global object identification, together, such as in a line, to thereby achieve the mapping of the initial processing object information and the target processing object information.
In a specific implementation, the process of constructing the partition object mapping information set may refer to: a step of merging the initial processing object information set and the target processing object information set; because the electronic equipment performs the matching of the initial processing object information and the target processing object information through the global object identification, the electronic equipment can construct a partition object mapping information set which is ordered by taking the global object identification as a main key, so that the data positioning is performed on the initial restart file corresponding to the processing object through the partition object mapping information set, and binary data files (namely the initial restart files) are not required to be opened one by one in a time-consuming way for searching.
Note that, the global object identifier may be a numeric identifier (i.e., a global data number), or may be a letter identifier, which is not limited in this invention. Correspondingly, the processing object set refers to a set containing all processing objects; when one processing object is one mesh in the mesh model, the processing object set includes each mesh in the mesh model.
In an embodiment of the present invention, when a processing object is a grid, the processing object information may include, but is not limited to: process identification, global object identification, local object identification, nested region identification, etc. Wherein, the local object identification of one processing object refers to: the local object identifier may be a digital identifier (i.e. a local data number) or an alphabetical identifier, which is not limited in this invention. Accordingly, the nesting area identifier is used for indicating any nesting layer in at least one layer of nesting referred to by the grid model, that is, the nesting area identifier is used for indicating the nesting layer of the processing object in the grid model, and the nesting area identifier may be a digital identifier (i.e. a nesting area ID (identity identifier number)), or may be a letter identifier, etc., which is not limited in this invention.
For example, assuming that the process object set includes a process object 1, a process object 2, a process object 3, a process object 4, a process object 5, and a process object 6, the initial partition information set includes initial partition information 1 and initial partition information 2, and the process object range in the initial partition information 1 includes the process object 1, the process object 2, and the process object 3, and the process object range in the initial partition information 2 includes the process object 4, the process object 5, and the process object 6, the initial global object identifications of the process objects 1 to 6 may be 1-6, the initial local object identifications of the process objects 1 to 3 may be 1-3, and the initial local object identifications of the process objects 4 to 6 may be 1-3.
Illustratively, assuming that PID_PRE, IG_PRE, IL_PRE, and NE_PRE represent an initial process identity, an initial global object identity, an initial local object identity, and an initial nesting region identity, respectively, the initial process object information may include, but is not limited to: pid_pre, ig_pre, il_pre, ne_pre, etc. Accordingly, assuming that pid_cur, ig_cur, il_cur, and ne_cur represent a target process identifier, a target global object identifier, a target local object identifier, and a target nesting region identifier, respectively, the target process object information may include, but is not limited to: pid_cur, ig_cur, il_cur, ne_cur, and the like. Based on this, one partition object mapping information may include, but is not limited to: pid_pre, ig_pre, il_pre, ne_pre, pid_cur, ig_cur, il_cur, ne_cur, and the like.
Further, the partition object mapping information comprises an initial process identifier of an initial process corresponding to the corresponding processing object and a target process identifier of a target process corresponding to the corresponding processing object; based on this, when the running data of each target process is obtained from M initial restart files according to the partition object mapping information set, the electronic device may traverse each partition object mapping information in the partition object mapping information set for any one of the N target processes, and use the currently traversed partition object mapping information as current partition object mapping information, and use the processing object corresponding to the current partition object mapping information as the current processing object. If the target process identification of any target process is the same as the target process identification in the current partition object mapping information, determining an initial process identification from the current partition object mapping information, and acquiring running data corresponding to the current processing object for any target process from an initial restart file indicated by the determined initial process identification. And correspondingly, after traversing the partition object mapping information set, obtaining the running data of any target process.
Specifically, the partition object mapping information includes an initial local object identifier and a target local object identifier of a corresponding processing object, where the initial local object identifier of a processing object refers to: object identification of the corresponding processing object under the processing object range indicated by the corresponding initial partition information, and target local object identification of one processing object refers to: object identification of the corresponding processing object under the processing object range indicated by the corresponding target partition information. In this case, when the running data corresponding to the current processing object is obtained for any target process from the initial restart file indicated by the determined initial process identifier, the electronic device may determine an initial local object identifier of the current processing object from the current partition object mapping information; and then, calculating the running data corresponding to the current processing object based on the initial local object identification of the current processing object, and acquiring the running data corresponding to the current processing object for any target process from the initial restart file indicated by the determined initial process identification according to the initial position information in the initial restart file indicated by the determined initial process identification.
It should be noted that, if an initial restart file includes an initial process identifier of a corresponding initial process and a processing object range corresponding to the corresponding initial process, then, based on an initial local object identifier of a current processing object, operation data corresponding to the current processing object is calculated, when initial position information in the initial restart file indicated by the determined initial process identifier is indicated, the electronic device may calculate, based on the initial local object identifier of the current processing object and the determined processing object range corresponding to the initial process indicated by the initial process identifier, operation data corresponding to the current processing object, and initial position information in the initial restart file indicated by the determined initial process identifier.
And S306, respectively adding the acquired operation data into the data variables of the corresponding target processes to calculate and restart based on the data variables of the target processes.
In one embodiment, if the initial partition information set is the same as the target partition information set, the electronic device may call the multi-core read-in module to read in the operation data required by the corresponding target process, so that each target process reads in the operation data of the corresponding initial process, for example, each target process reads in the data required by the CPU core with the corresponding number, as shown in fig. 5.
In another embodiment, if the initial partition information set is different from the target partition information set, for any one of the N target processes, and the current partition object mapping information and the current processing object, when the obtained operation data are respectively added to the data variables of the corresponding target processes, the electronic device may determine, from the current partition object mapping information, a target local object identifier of the current processing object; then, based on the target local object identification of the current processing object, calculating the running data corresponding to the current processing object, and adding the running data corresponding to the current processing object into the data variable of any target process according to the target position information.
Specifically, when calculating the running data corresponding to the current processing object based on the target local object identifier of the current processing object, and when calculating the target position information in the data variable of any target process, the electronic device may calculate the running data corresponding to the current processing object based on the target local object identifier of the current processing object and the processing object range corresponding to any target process, and calculating the target position information in the data variable of any target process. It should be appreciated that in performing the calculation of the location information based on the local object identification, the electronic device may perform the calculation of the location information by the data local number indicated by the local object identification.
It should be noted that, the electronic device may read the mapping information of the partition objects one by one according to the sequence in the mapping information set of the partition objects, then open the corresponding initial restart file, and read the corresponding data (i.e. the running data) into the data variables responsible for the respective target processes; or the electronic equipment can read the mapping information of the partition objects one by one, map the corresponding initial restart file into the memory, and send the running data in the initial restart file to the data variable of the corresponding target process through point-to-point communication. Accordingly, when the operation data of any initial position information is added to the data variable of the corresponding target process, the electronic device may assign the operation data of any initial position information to the corresponding target process, or SEND the operation data of any initial position information to the data variable corresponding to the corresponding target process through an MPI (transfer information interface) communication function (e.g., mpi_send (), mpi_recv (), etc.). It should be understood that, the electronic device maps the initial restart file to the memory more efficiently than directly opening the corresponding initial restart file, so that the embodiment of the invention can preferably map the initial restart file to the memory for data reading.
For example, when the calculation restarting method provided by the present invention is applied to a pollutant source analysis scene, assuming that a current processing object (i.e., a current lattice point) is a lattice point corresponding to i, j, k, i e [1, NX ], j e [1, NY ], k e [1, NZ ], NX, NY, NZ are respectively the number of grids in the X direction, the number of grids in the Y direction, and the number of vertical layers in the grid model, the running data corresponding to the current processing object may be the target position information in the data variable of any target process: the k layer starting position + (xend_cur-xstart_cur+buffer×2+1) × (j-ystart_cur+1) +i-xstart_cur+1+il_cur, where il_cur is used to indicate the data local number indicated by the target local object identifier, that is, when the local object identifier is not the data local number, the local object identifier needs to be converted into the corresponding data local number for calculation. Correspondingly, the initial position information in the initial restart file indicated by the determined initial process identifier of the operation data corresponding to the current processing object may be: k layer start position + (XEND_PRE-XSTART_PRE+BUFFER×2+1) × (j-YSTART_PRE+1) +i-XSTART_PRE+1+IL_PRE, where IL_PRE is used to indicate the data partial number indicated by the initial partial object identification. The BUFFER is the number of grid points expanded in the X, Y direction required by the partition communication in the air quality mode and the pollutant source analysis partition, that is, boundary data can be provided for parallel communication, and the value of the BUFFER can be 1 or 2, etc., which is not limited by the invention.
For example, as shown in fig. 6, taking the example of the target process after the restart calculation as an example, assuming that the mesh model includes 2 layers of meshes, each layer of meshes includes 12 meshes (i.e., processing objects), the target global object identifier of each mesh in the mesh model is 1-24, and the target partition information set includes target partition information 1, target partition information 2, and target partition information 3, the processing object range in the target partition information 1 may be represented as (1,2,1,2,1,2), that is, xstart_cur, xend_cur, ystart_cur, yend_cur, zstart_cur, and zend_cur are 1,2, that is, the processing object range in the target partition information 1 may include: grids 1,2, 5, 6, 13, 14, 17, and 18; similarly, the processing object range in the target partition information 2 may be represented as (3,4,1,2,1,2), and the processing object range in the target partition information 3 may be represented as (1,4,3,3,1,2), and the mesh model may be divided into 3 data blocks. Correspondingly, assuming that the value of the BUFFER is 1, when calculating the position of any grid data in the data block corresponding to the target partition information 1 in the data variable of the corresponding target process, the electronic device can perform position calculation on the data block after expanding the data block corresponding to the target partition information 1 based on the BUFFER, and can further perform position calculation according to the minimum rectangle containing the expanded data block; in this case, in calculating the layer 2 mesh data of the data block corresponding to the target partition information 1, the k layer start position (i.e., layer 2 start position) may be 16 when the target position information in the data variable of the corresponding target process. It should be appreciated that if any expanded lattice point in the expanded data block coincides with any lattice in the lattice model, then any expanded lattice point data may be equal to any lattice data, such as the expanded lattice point located to the right of the target global object identifier 14 coincides with the lattice indicated by the target global object identifier 15; if any expanded lattice point in the expanded data block does not coincide with any lattice in the lattice model, the electronic device may calculate a number according to the numerical calculation boundary condition, and use the calculated number as any expanded lattice point data, or select a lattice data (such as the lattice data of the nearest lattice) as any expanded lattice point data, etc., for example, the expanded lattice point located at the left side of the target global object identifier 13 does not coincide with any lattice in the lattice model, etc. The numerical calculation boundary conditions may be empirically set, or may be set according to actual needs, which is not limited in the present invention.
For another example, when the restart file sequentially stores the operation data corresponding to each processing object in the processing object range indicated by the partition information according to the sequence of the local object identifier in the processing object range indicated by the corresponding partition information, the target position information in the data variable of any target process may be: the operation data storage initial position +IL_CUR-1 corresponding to the data variable of any target process, wherein the operation data storage initial position refers to a first storage position for storing operation data corresponding to each processing object in the processing object range indicated by the corresponding partition information; correspondingly, the initial position information in the initial restart file indicated by the determined initial process identifier of the operation data corresponding to the current processing object may be: the determined initial process identifier indicates the operation data storage starting position +IL_PRE-1 corresponding to the initial restart file, and the like; the invention is not limited to a specific calculation process of the position information.
In the embodiment of the invention, the electronic equipment can also generate the target restart file corresponding to each target process based on the data variable of each target process so as to provide the restart file for the next calculation restart based on the target restart file corresponding to each target process; one of the target restart files is used for storing the running data of the corresponding target process.
The embodiment of the invention can acquire the initial partition information set, acquire M initial restart files corresponding to the initial partition information set, and determine the target partition information set when the restart is required to be calculated, wherein the target partition information set comprises target partition information corresponding to each target process in N target processes. Then, the initial partition information set and the target partition information set can be subjected to comparison analysis to obtain comparison analysis results; if the comparison analysis result indicates that the initial partition information set is the same as the target partition information set, acquiring the running data of each target process from the M initial restart files based on the target process identification of each target process and the initial process identifications corresponding to the M initial restart files; if the comparison analysis result indicates that the initial partition information set is different from the target partition information set, constructing a partition object mapping information set, and respectively acquiring the operation data of each target process from M initial restart files according to the partition object mapping information set, so that the acquired operation data are respectively added into the data variables of the corresponding target processes to perform calculation restart based on the data variables of each target process. Therefore, the embodiment of the invention can directly read data or read or send the data in each initial restart file to the data variable of the target process through the partition object mapping information set, can conveniently realize the restart file mapping of the same parallel partition, different parallel core numbers or different parallel partitions, avoid generating a larger single restart file corresponding to each initial restart file, and can effectively improve the restart efficiency.
Based on the above description, the application of the calculation restarting method provided by the invention to the pollutant source analysis scene is taken as an example for explanation, at this time, the restarting file may also be called a source analysis data file or a source analysis restarting file, and the partition information set is taken as a partition table, and the processing object information set is taken as a process grid point information table for example for explanation. It should be understood that when the resolution is high (i.e. the area indicated by one grid is smaller than the preset threshold, such as 3 km or 1 km, etc.), the number of pollutant species and source resolution IDs is large, the number of pollutant source resolution calculation grid points of high resolution multi-ID such as 3 km nationwide can reach more than 1 million, the number of single source resolution restart files generated by horizontal resolution above 3 km nationwide reaches more than 2TB or even more than tens of TB, the available memory of the present high performance computer is generally hundreds of GB, and the common operating system also has a limitation on the size of a single file, so that the manner of generating a single restart file in the traditional manner becomes infeasible on the common operating system due to insufficient memory, disk file management of the operating system, etc. For example, in a source resolution scenario of 3 km horizontal resolution in the whole country, 1 km horizontal resolution in a key area (i.e., a designated area), nx=2025 in the X-direction grid number, ny=2025 in the y-direction grid number, nz=20 vertical layers, and 18 city IDs in all counties 371 in the whole country, the steps shown in fig. 7 may be performed to obtain source resolution data files corresponding to different cores in a new setting (i.e., target restart files corresponding to each target process), and implement calculation restart:
1. And (3) performing first mode calculation, establishing an old partition table, and establishing an old process grid point information table through the old partition table. For convenience of explanation, the following description will take a two-dimensional partition table as an example (i.e., the vertical layer number is 1). Assuming that 3600 cores (i.e., nproc_pre=3600) are used for calculation in the first mode calculation, 3600 initial processes may be used for calculation, where an old partition table (i.e., initial partition information set) may be as shown in fig. 8a, and an old process lattice point information table (i.e., initial process object information set) may be as shown in fig. 8 b.
2. And outputting source analysis data files (namely, each initial restart file) corresponding to different cores under the first setting. As can be seen from the above, the source resolved data file size (e.g., 568MB in fig. 4) is much smaller than the single node memory limit of 256GB, and there is no difficulty in file generation.
3. Restarting calculation (i.e. calculating restart), outputting a new partition table under new setting (i.e. calculating target partition information set when restarting), and generating a new process grid point information table through the new partition table. Assuming that the number of cores that can be used at the time of restarting the computation is reduced to 960 due to resource adjustment, i.e., nproc_cur=960, a new partition table (i.e., target partition information set) may be as shown in fig. 8c, and a new process lattice point information table (i.e., target process object information set) may be as shown in fig. 8 d.
4. The new core number and the old core number (namely, the core number before the restart is calculated and the core number after the restart is calculated) are compared, namely, the number of initial processes and the number of target processes are compared.
5. If the new core number is the same as the old core number, the new partition table and the old partition table (namely the new partition table and the old partition table) are compared, namely the initial partition information set and the target partition information set are compared.
And 6A, if the new partition table and the old partition table are the same, calling a multi-core reading module to read partition data required by the corresponding core, namely, read operation data required by the target process.
6B, if the new core number is different from the old core number or the old partition table is different from the old core number, the old partition grid point mapping table (i.e. the partition object mapping information set) can be constructed. It should be understood that when nproc_pre=3600 and nproc_cur=960, it is obvious that the new and old cores are different, then when this example is performed, the step may be directly skipped to 6B, and at this time the new and old partition grid point mapping table may be as shown in fig. 8 e.
7. The old source-resolved data file (i.e., the initial restart file) is opened and the corresponding data is read or sent to the data variables of the corresponding process.
8. And outputting source analysis data files (namely target restarting files) corresponding to different cores under the new setting for the next restarting.
Based on the method, in the scene of pollutant source analysis and the like which need to output a large amount of data to perform calculation restarting, the parallel output restarting files can be directly read in, or the restarting file mapping of different parallel core numbers can be realized, so that the calculation restarting can be conveniently realized.
Based on the above description of the related embodiments of the computing restart method, the embodiments of the present invention also provide a computing restart apparatus, which may be a computer program (including program code) running in an electronic device; as shown in fig. 9, the calculation restarting means may include an acquisition unit 901 and a processing unit 902. The computing restart means may perform the computing restart method shown in fig. 1 or 3, i.e. the computing restart means may run the above units:
an obtaining unit 901, configured to obtain an initial partition information set, and obtain M initial restart files corresponding to the initial partition information set, where the initial partition information set includes initial partition information corresponding to each initial process in M initial processes, one initial restart file is used to store operation data of the corresponding initial process, one partition information includes a processing object range corresponding to the operation data of the corresponding process, one operation data corresponds to one processing object, and M is a positive integer;
The processing unit 902 is configured to determine, when a restart is required to be calculated, a target partition information set, where the target partition information set includes target partition information corresponding to each target process in N target processes, and N is a positive integer;
the processing unit 902 is further configured to obtain, from the M initial restart files, operation data of each target process according to the initial partition information set and the target partition information set, and add the obtained operation data to data variables of corresponding target processes, respectively, so as to perform a computing restart based on the data variables of each target process.
In one embodiment, when an initial restart file includes initial process identifiers of corresponding initial processes, and the processing unit 902 obtains, from the M initial restart files, operation data of each target process according to the initial partition information set and the target partition information set, respectively, the method may specifically be used to:
performing comparative analysis on the initial partition information set and the target partition information set to obtain a comparative analysis result;
if the comparison analysis result indicates that the initial partition information set is the same as the target partition information set, acquiring operation data of each target process from the M initial restart files based on target process identifiers of each target process and initial process identifiers corresponding to the M initial restart files respectively;
If the comparison analysis result indicates that the initial partition information set is different from the target partition information set, constructing a partition object mapping information set, and respectively acquiring the operation data of each target process from the M initial restart files according to the partition object mapping information set.
In another embodiment, one initial partition information includes an initial process identifier of a corresponding initial process, one target partition information includes a target process identifier of a corresponding target process, and the process identifier in the one partition information is used to indicate: the process corresponding to each processing object in the processing object range in the corresponding partition information, when the processing unit 902 constructs the partition object mapping information set, may be specifically used to:
according to the processing object range and the initial process identifier included in each initial partition information in the initial partition information set, determining an initial processing object information set, wherein one initial processing object information set comprises the initial process identifier of the initial process corresponding to the corresponding processing object, and the initial process identifier in one initial processing object information set is used for indicating: the corresponding processing object corresponds to an initial restarting file where the running data is located;
Determining a target processing object information set according to the processing object range and the target process identifier included in each target partition information in the target partition information set, wherein one target processing object information set includes the target process identifier of the target process corresponding to the corresponding processing object, and the target process identifier in one target processing object information set is used for indicating: a target process which needs to receive the corresponding operation data of the corresponding processing object;
and constructing a partition object mapping information set by adopting the initial processing object information set and the target processing object information set.
In another embodiment, the initial processing object information includes an initial global object identifier of the corresponding processing object, the target processing object information includes a target global object identifier of the corresponding processing object, and the global object identifier of the corresponding processing object refers to an object identifier of the corresponding processing object in the processing object set; the processing unit 902, when using the initial processing object information set and the target processing object information set to construct a partition object mapping information set, may be specifically configured to:
according to the initial global object identification included in each initial processing object information in the initial processing object information set and the target global object identification included in each target processing object information in the target processing object information set, respectively connecting each initial processing object information with the matched target processing object information to construct a partition object mapping information set;
The partition object mapping information is information obtained by connecting initial processing object information with matched target processing object information, and the initial global object identification in any initial processing object information is the same as the target global object identification in the matched target processing object information.
In another embodiment, when the partition object mapping information includes an initial process identifier of an initial process corresponding to the corresponding processing object and a target process identifier of a target process corresponding to the corresponding processing object, and the processing unit 902 obtains, according to the partition object mapping information set, operation data of each target process from the M initial restart files, the method may specifically be used to:
traversing each partition object mapping information in the partition object mapping information set aiming at any one of the N target processes, taking the currently traversed partition object mapping information as current partition object mapping information, and taking a processing object corresponding to the current partition object mapping information as a current processing object;
if the target process identification of any target process is the same as the target process identification in the current partition object mapping information, determining an initial process identification from the current partition object mapping information, and acquiring running data corresponding to the current processing object for any target process from an initial restart file indicated by the determined initial process identification;
And after traversing the partition object mapping information set, obtaining the running data of any target process.
In another embodiment, the partition object mapping information includes an initial local object identifier and a target local object identifier of a corresponding processing object, where the initial local object identifier of a processing object refers to: object identification of the corresponding processing object under the processing object range indicated by the corresponding initial partition information, and target local object identification of one processing object refers to: object identification of the corresponding processing object under the processing object range indicated by the corresponding target partition information; the processing unit 902 may be specifically configured to, when acquiring, from the initial restart file indicated by the determined initial process identifier, operation data corresponding to the current processing object for the any target process:
determining an initial local object identification of the current processing object from the current partition object mapping information;
calculating operation data corresponding to the current processing object based on the initial local object identification of the current processing object, and determining initial position information in an initial restarting file indicated by the initial process identification;
Acquiring operation data corresponding to the current processing object for any target process from an initial restart file indicated by the determined initial process identifier according to the initial position information;
the processing unit 902, when adding the acquired running data to the data variables of the corresponding target process, may be specifically configured to:
determining a target local object identification of the current processing object from the current partition object mapping information;
calculating the running data corresponding to the current processing object based on the target local object identification of the current processing object, and the target position information in the data variable of any target process;
and adding the running data corresponding to the current processing object into the data variable of any target process according to the target position information.
In another embodiment, an initial restart file includes an initial process identifier of a corresponding initial process and a processing object range corresponding to the corresponding initial process, where, when calculating, based on the initial local object identifier of the current processing object, operation data corresponding to the current processing object, and when determining initial position information in the initial restart file indicated by the initial process identifier, the processing unit 902 may be specifically configured to:
Calculating operation data corresponding to the current processing object based on the initial local object identification of the current processing object and the processing object range corresponding to the initial process indicated by the determined initial process identification, and initial position information in an initial restarting file indicated by the determined initial process identification;
the processing unit 902, when calculating the running data corresponding to the current processing object based on the target local object identifier of the current processing object, may be specifically configured to:
and calculating the running data corresponding to the current processing object based on the target local object identification of the current processing object and the processing object range corresponding to any target process, and the target position information in the data variable of any target process.
In another embodiment, the processing unit 902 may be further configured to:
generating target restarting files corresponding to all target processes based on the data variables of all target processes so as to provide restarting files for the next calculation restarting based on the target restarting files corresponding to all target processes;
One of the target restart files is used for storing the running data of the corresponding target process.
In another embodiment, M initial processes are used to: before the calculation is restarted, running numerical mode processing; the N target processes are used for: after the calculation is restarted, the numerical mode processing is operated;
wherein, one processing object is one grid in a grid model, and the grid model is a division result of a target area divided into a plurality of grids;
when the numerical mode processing is pollutant source analysis processing, the operation data of one process comprises source analysis data corresponding to the processing objects in the corresponding processing object range.
According to one embodiment of the invention, the steps involved in the method of fig. 1 or 3 may be performed by the units in the computing restarting device of fig. 9. For example, step S101 shown in fig. 1 may be performed by the acquisition unit 901 shown in fig. 9, and steps S102 and S103 may each be performed by the processing unit 902 shown in fig. 9. As another example, step S301 shown in fig. 3 may be performed by the acquisition unit 901 shown in fig. 9, steps S302 to S306 may each be performed by the processing unit 902 shown in fig. 9, and so on.
According to another embodiment of the present invention, each unit in the computing restarting device shown in fig. 9 may be separately or completely combined into one or several other units, or some unit(s) may be further split into a plurality of units with smaller functions, which may achieve the same operation without affecting the implementation of the technical effects of the embodiments of the present invention. The above units are divided based on logic functions, and in practical applications, the functions of one unit may be implemented by a plurality of units, or the functions of a plurality of units may be implemented by one unit. In other embodiments of the present invention, any computing restarting device may also include other units, and in practical applications, these functions may also be implemented with assistance from other units, and may be implemented by cooperation of multiple units.
According to another embodiment of the present invention, a computer restart apparatus as shown in fig. 9 may be constructed by running a computer program (including program code) capable of executing the steps involved in the respective methods as shown in fig. 1 or 3 on a general-purpose electronic device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read only storage medium (ROM), and the like, and a storage element, and the computer restart method of the embodiment of the present invention is implemented. The computer program may be recorded on, for example, a computer storage medium, and loaded into and run in the above-described electronic device through the computer storage medium.
The embodiment of the invention can acquire an initial partition information set and M initial restart files corresponding to the initial partition information set, wherein one initial restart file is used for storing the operation data of a corresponding initial process, and M is a positive integer; then, when the computing is required to restart, a target partition information set can be determined, wherein the target partition information set comprises target partition information corresponding to each target process in N target processes, and N is a positive integer. Based on the above, the operation data of each target process can be obtained from M initial restart files according to the initial partition information set and the target partition information set, and the obtained operation data are added to the data variables of the corresponding target processes, respectively, so as to perform calculation restart based on the data variables of each target process. Therefore, the embodiment of the invention can conveniently acquire the running data of each target process from the M initial restart files through the initial partition information set and the target partition information set without generating a single restart file corresponding to the M initial restart files, and can avoid the situation that the single restart file is too large to be generated, thereby effectively ensuring the calculation continuity.
Based on the description of the method embodiment and the apparatus embodiment, the exemplary embodiment of the present invention further provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor for causing the electronic device to perform a method according to an embodiment of the invention when executed by the at least one processor.
The exemplary embodiments of the present invention also provide a non-transitory computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, is for causing the computer to perform a method according to an embodiment of the present invention.
The exemplary embodiments of the invention also provide a computer program product comprising a computer program, wherein the computer program, when being executed by a processor of a computer, is for causing the computer to perform a method according to an embodiment of the invention.
Referring to fig. 10, a block diagram of an electronic device 1000 that may be a server or a client of the present invention will now be described, which is an example of a hardware device that may be applied to aspects of the present invention. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 10, the electronic device 1000 includes a computing unit 1001 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1002 or a computer program loaded from a storage unit 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data required for the operation of the device 1000 can also be stored. The computing unit 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
Various components in the electronic device 1000 are connected to the I/O interface 1005, including: an input unit 1006, an output unit 1007, a storage unit 1008, and a communication unit 1009. The input unit 1006 may be any type of device capable of inputting information to the electronic device 1000, and the input unit 1006 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. The output unit 1007 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 1008 may include, but is not limited to, magnetic disks, optical disks. Communication unit 1009 allows electronic device 1000 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.
The computing unit 1001 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1001 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1001 performs the respective methods and processes described above. For example, in some embodiments, the computing restart method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 1008. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 1000 via the ROM 1002 and/or the communication unit 1009. In some embodiments, the computing unit 1001 may be configured to perform the computing restart method by any other suitable means (e.g., by means of firmware).
Program code for carrying out methods of the present invention may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It is also to be understood that the foregoing is merely illustrative of the present invention and is not to be construed as limiting the scope of the invention, which is defined by the appended claims.

Claims (11)

1. A method of computing a restart, comprising:
acquiring an initial partition information set and M initial restart files corresponding to the initial partition information set, wherein the initial partition information set comprises initial partition information corresponding to each initial process in M initial processes, one initial restart file is used for storing operation data of the corresponding initial process, one partition information comprises a processing object range corresponding to the operation data of the corresponding process, one operation data corresponds to one processing object, and M is a positive integer; an initial restart file includes initial process identifiers of corresponding initial processes;
When the calculation is required to be restarted, determining a target partition information set, wherein the target partition information set comprises target partition information corresponding to each target process in N target processes, and N is a positive integer;
according to the initial partition information set and the target partition information set, respectively acquiring operation data of each target process from the M initial restart files, wherein the operation data comprises the following steps:
performing comparative analysis on the initial partition information set and the target partition information set to obtain a comparative analysis result;
if the comparison analysis result indicates that the initial partition information set is the same as the target partition information set, acquiring operation data of each target process from the M initial restart files based on target process identifiers of each target process and initial process identifiers corresponding to the M initial restart files respectively;
if the comparison analysis result indicates that the initial partition information set is different from the target partition information set, constructing a partition object mapping information set, and respectively acquiring the operation data of each target process from the M initial restart files according to the partition object mapping information set;
And respectively adding the acquired running data into the data variables of the corresponding target processes to calculate and restart based on the data variables of the target processes.
2. The method of claim 1, wherein one initial partition information includes an initial process identification of a corresponding initial process, one target partition information includes a target process identification of a corresponding target process, and the process identification in one partition information is used to indicate: and constructing a partition object mapping information set by processes corresponding to all processing objects in the processing object range in the corresponding partition information, wherein the processes comprise:
according to the processing object range and the initial process identifier included in each initial partition information in the initial partition information set, determining an initial processing object information set, wherein one initial processing object information set comprises the initial process identifier of the initial process corresponding to the corresponding processing object, and the initial process identifier in one initial processing object information set is used for indicating: the corresponding processing object corresponds to an initial restarting file where the running data is located;
determining a target processing object information set according to the processing object range and the target process identifier included in each target partition information in the target partition information set, wherein one target processing object information set includes the target process identifier of the target process corresponding to the corresponding processing object, and the target process identifier in one target processing object information set is used for indicating: a target process which needs to receive the corresponding operation data of the corresponding processing object;
And constructing a partition object mapping information set by adopting the initial processing object information set and the target processing object information set.
3. The method of claim 2, wherein one initial processing object information includes an initial global object identification of the corresponding processing object, one target processing object information includes a target global object identification of the corresponding processing object, and one global object identification of the corresponding processing object refers to an object identification of the corresponding processing object in the processing object set; the step of constructing a partition object mapping information set by using the initial processing object information set and the target processing object information set includes:
according to the initial global object identification included in each initial processing object information in the initial processing object information set and the target global object identification included in each target processing object information in the target processing object information set, respectively connecting each initial processing object information with the matched target processing object information to construct a partition object mapping information set;
the partition object mapping information is information obtained by connecting initial processing object information with matched target processing object information, and the initial global object identification in any initial processing object information is the same as the target global object identification in the matched target processing object information.
4. A method according to any one of claims 1 to 3, wherein one partition object mapping information includes an initial process identifier of an initial process corresponding to a corresponding processing object and a target process identifier of a target process corresponding to a corresponding processing object, and the obtaining, according to the partition object mapping information set, running data of each target process from the M initial restart files respectively includes:
traversing each partition object mapping information in the partition object mapping information set aiming at any one of the N target processes, taking the currently traversed partition object mapping information as current partition object mapping information, and taking a processing object corresponding to the current partition object mapping information as a current processing object;
if the target process identification of any target process is the same as the target process identification in the current partition object mapping information, determining an initial process identification from the current partition object mapping information, and acquiring running data corresponding to the current processing object for any target process from an initial restart file indicated by the determined initial process identification;
And after traversing the partition object mapping information set, obtaining the running data of any target process.
5. The method of claim 4, wherein one partition object mapping information includes an initial local object identification and a target local object identification of a corresponding processing object, the initial local object identification of one processing object being: object identification of the corresponding processing object under the processing object range indicated by the corresponding initial partition information, and target local object identification of one processing object refers to: object identification of the corresponding processing object under the processing object range indicated by the corresponding target partition information; and acquiring the operation data corresponding to the current processing object for any target process from the initial restart file indicated by the determined initial process identifier, including:
determining an initial local object identification of the current processing object from the current partition object mapping information;
calculating operation data corresponding to the current processing object based on the initial local object identification of the current processing object, and determining initial position information in an initial restarting file indicated by the initial process identification;
Acquiring operation data corresponding to the current processing object for any target process from an initial restart file indicated by the determined initial process identifier according to the initial position information;
the step of adding the acquired operation data to the data variables of the corresponding target processes respectively comprises the following steps:
determining a target local object identification of the current processing object from the current partition object mapping information;
calculating the running data corresponding to the current processing object based on the target local object identification of the current processing object, and the target position information in the data variable of any target process;
and adding the running data corresponding to the current processing object into the data variable of any target process according to the target position information.
6. The method of claim 5, wherein one initial restart file includes initial process identifications of respective initial processes and process object ranges corresponding to the respective initial processes; the calculating the operation data corresponding to the current processing object based on the initial local object identifier of the current processing object, and the initial position information in the initial restart file indicated by the determined initial process identifier includes:
Calculating operation data corresponding to the current processing object based on the initial local object identification of the current processing object and the processing object range corresponding to the initial process indicated by the determined initial process identification, and initial position information in an initial restarting file indicated by the determined initial process identification;
the calculating, based on the target local object identifier of the current processing object, the operation data corresponding to the current processing object, and the target position information in the data variable of any target process, includes:
and calculating the running data corresponding to the current processing object based on the target local object identification of the current processing object and the processing object range corresponding to any target process, and the target position information in the data variable of any target process.
7. The method according to claim 1, wherein the method further comprises:
generating target restarting files corresponding to all target processes based on the data variables of all target processes so as to provide restarting files for the next calculation restarting based on the target restarting files corresponding to all target processes;
One of the target restart files is used for storing the running data of the corresponding target process.
8. The method of claim 1, wherein the M initial processes are for: before the calculation is restarted, running numerical mode processing; the N target processes are used for: after the calculation is restarted, the numerical mode processing is operated;
wherein, one processing object is one grid in a grid model, and the grid model is a division result of a target area divided into a plurality of grids;
when the numerical mode processing is pollutant source analysis processing, the operation data of one process comprises source analysis data corresponding to the processing objects in the corresponding processing object range.
9. A computing restart apparatus, the apparatus comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an initial partition information set and M initial restart files corresponding to the initial partition information set, the initial partition information set comprises initial partition information corresponding to each initial process in M initial processes, one initial restart file is used for storing operation data of the corresponding initial process, one partition information comprises a processing object range corresponding to the operation data of the corresponding process, one operation data corresponds to one processing object, and M is a positive integer; an initial restart file includes initial process identifiers of corresponding initial processes;
The processing unit is used for determining a target partition information set when the calculation is required to be restarted, wherein the target partition information set comprises target partition information corresponding to each target process in N target processes, and N is a positive integer;
the processing unit is further configured to obtain, according to the initial partition information set and the target partition information set, operation data of each target process from the M initial restart files, respectively, where the processing unit includes: performing comparative analysis on the initial partition information set and the target partition information set to obtain a comparative analysis result; if the comparison analysis result indicates that the initial partition information set is the same as the target partition information set, acquiring operation data of each target process from the M initial restart files based on target process identifiers of each target process and initial process identifiers corresponding to the M initial restart files respectively; if the comparison analysis result indicates that the initial partition information set is different from the target partition information set, constructing a partition object mapping information set, and respectively acquiring the operation data of each target process from the M initial restart files according to the partition object mapping information set; and respectively adding the acquired running data into the data variables of the corresponding target processes to calculate and restart based on the data variables of the target processes.
10. An electronic device, comprising:
a processor; and
a memory in which a program is stored,
wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the method according to any of claims 1-8.
11. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-8.
CN202311015486.1A 2023-08-14 2023-08-14 Computing restarting method and device, storage medium and electronic equipment Active CN116755938B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311015486.1A CN116755938B (en) 2023-08-14 2023-08-14 Computing restarting method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311015486.1A CN116755938B (en) 2023-08-14 2023-08-14 Computing restarting method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN116755938A CN116755938A (en) 2023-09-15
CN116755938B true CN116755938B (en) 2023-11-03

Family

ID=87953583

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311015486.1A Active CN116755938B (en) 2023-08-14 2023-08-14 Computing restarting method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN116755938B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110992992A (en) * 2019-10-31 2020-04-10 苏州浪潮智能科技有限公司 Hard disk test method, device and storage medium
CN113806139A (en) * 2021-06-15 2021-12-17 荣耀终端有限公司 Operating system recovery method, operating system recovery device, storage medium and computer program product
CN115185745A (en) * 2022-07-26 2022-10-14 元心信息科技集团有限公司 Data processing method, system, electronic device and computer readable storage medium
CN116340053A (en) * 2023-03-13 2023-06-27 西安万像电子科技有限公司 Log processing method, device, computer equipment and medium for system crash

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6745209B2 (en) * 2001-08-15 2004-06-01 Iti, Inc. Synchronization of plural databases in a database replication system
US20070067366A1 (en) * 2003-10-08 2007-03-22 Landis John A Scalable partition memory mapping system
US8732108B2 (en) * 2010-10-07 2014-05-20 International Business Machines Corporation Rule authoring for events in a grid environment
US20220398221A1 (en) * 2021-06-10 2022-12-15 EMC IP Holding Company LLC Persistent memory tiering supporting fast failover in a deduplicated file system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110992992A (en) * 2019-10-31 2020-04-10 苏州浪潮智能科技有限公司 Hard disk test method, device and storage medium
CN113806139A (en) * 2021-06-15 2021-12-17 荣耀终端有限公司 Operating system recovery method, operating system recovery device, storage medium and computer program product
CN115185745A (en) * 2022-07-26 2022-10-14 元心信息科技集团有限公司 Data processing method, system, electronic device and computer readable storage medium
CN116340053A (en) * 2023-03-13 2023-06-27 西安万像电子科技有限公司 Log processing method, device, computer equipment and medium for system crash

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Resolve the issue of insufficient disk space on a Linux instance;Alibaba Cloud;SME Support Program;全文 *
利用WMI实现移动智能设备对服务器性能的监控;鲜海等;电脑编程技巧与维护(第24期) *
基于HDF5实现多区结构网格CFD程序的并行I/O;杨丽鹏等;计算机研究与发展(第04期);全文 *
基于代理的并行文件系统元数据优化与实现;易建亮等;计算机研究与发展(第02期);全文 *
计算与通信重叠和并行I/O在粒子模拟中的应用;颜小洋等;计算机应用(第S1期);全文 *

Also Published As

Publication number Publication date
CN116755938A (en) 2023-09-15

Similar Documents

Publication Publication Date Title
CN111800443B (en) Data processing system and method, device and electronic equipment
CN111930770A (en) Data query method and device and electronic equipment
CN112860412B (en) Service data processing method and device, electronic equipment and storage medium
US10552419B2 (en) Method and system for performing an operation using map reduce
CN113094125A (en) Business process processing method, device, server and storage medium
CN116755938B (en) Computing restarting method and device, storage medium and electronic equipment
CN112181724A (en) Big data disaster tolerance method and device and electronic equipment
CN111767126A (en) System and method for distributed batch processing
CN113886353B (en) Data configuration recommendation method and device for hierarchical storage management software and storage medium
CN114327673B (en) Task starting method and device, electronic equipment and storage medium
CN113590217B (en) Function management method and device based on engine, electronic equipment and storage medium
CN115373861A (en) GPU resource scheduling method and device, electronic equipment and storage medium
CN116185578A (en) Scheduling method of computing task and executing method of computing task
CN109902067B (en) File processing method and device, storage medium and computer equipment
CN114564249A (en) Recommendation scheduling engine, recommendation scheduling method, and computer-readable storage medium
CN113691403A (en) Topological node configuration method, related device and computer program product
CN112988738A (en) Data slicing method and device for block chain
CN110750362A (en) Method and apparatus for analyzing biological information, and storage medium
CN113568687A (en) Method for displaying Web page, related equipment and computer readable storage medium
CN111026571B (en) Processor down-conversion processing method and device and electronic equipment
CN116561106B (en) Configuration item data management method and system
CN113568936B (en) Real-time stream data storage method, device and terminal equipment
CN110727654B (en) Data extraction method and device for distributed system, server and storage medium
CN115729552A (en) Method and device for setting parallelism of operator level
CN116028109A (en) System entry monitoring method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant