CN114217871A

CN114217871A - Multi-computer cluster parallel simulation method and multi-computer cluster system

Info

Publication number: CN114217871A
Application number: CN202111528670.7A
Authority: CN
Inventors: 朱静; 高亚辉; 姜殿文; 余丹妮; 杨晖
Original assignee: AECC Aero Engine Control System Institute
Current assignee: AECC Aero Engine Control System Institute
Priority date: 2021-12-14
Filing date: 2021-12-14
Publication date: 2022-03-22

Abstract

The invention relates to the technical field of parallel simulation of aero-engines, and particularly discloses a parallel simulation method of a multi-computer cluster, which comprises the steps of sending a model file to a file server, wherein the model file comprises a simulation task configuration file and a simulation task file; determining a target node computer, and sending a task instruction to the target node computer to distribute a simulation task; after the target node computer acquires the model file in the file server according to the task instruction, sending a starting instruction to the target node computer to start a simulation task execution tool in the target node computer, wherein the simulation task execution tool can execute a corresponding simulation task according to the simulation task configuration file; and after the target node computer completes the simulation task, acquiring a simulation result of the target node computer, and analyzing and processing the simulation result. The invention also discloses an aircraft engine multi-computer cluster system. The multi-computer cluster parallel simulation method provided by the invention can realize parallel simulation.

Description

Multi-computer cluster parallel simulation method and multi-computer cluster system

Technical Field

The invention relates to the technical field of parallel simulation of aero-engines, in particular to a multi-computer cluster parallel simulation method and an aero-engine multi-computer cluster system.

Background

The development of an aircraft engine control system involves a plurality of disciplines and specialties such as systems, electronics, software, hydraulics, fluids and the like, and is a typical cross-discipline and multi-speciality collaborative iterative development process. The traditional research and development process is to design and manufacture firstly, and then verify and regress problems through a test method. The design process relies on a physical test to carry out problem regression, the iteration period is longer, and the research and development cost is higher. The digital simulation can verify the design in advance by means of a virtual test method, so that a large number of problems can be regressed on a desktop, the system test requirements are reduced, and the risks of an engine bench test and a flight test are reduced. Through simulation, the understanding of the internal operation essence and the law of the aircraft engine can be deepened, faults which may occur are exposed in advance, design defects are discovered, the development efficiency and the quality are greatly improved, the repeated material object tests are reduced, the development risk and the development cost are reduced, and the development process is accelerated.

The multidisciplinary joint simulation platform of the aeroengine control system refers to: the method comprises the steps of integrating an engine model, a control model, a sensing model, an electronic model, a fuel and actuating (combustion) model and the like to form a full-digital functional prototype of the control system, and carrying out simulation analysis on the functions, the performance, the failure influence and the like of the engine control system based on the full-digital functional prototype. Different from the performance model expressed by the simplified mathematical formula in the past, the all-digital functional prototype consists of the minimum design unit of the control system, and can complete the analysis of the influence of the failure of the minimum design unit on the whole system.

Because the control system is a serial process from measurement, collection, calculation to output and driving, and a structured model is adopted, a multidisciplinary combined simulation model becomes very complex, and the overall computation load is extremely large. Initially, when running a simulation model of the main fuel control loop, the simulation is set for 25ms, and is completed after 73 hours of running, and 25ms has only one large closed-loop control period. Therefore, to realize the engineering application based on the multidisciplinary joint simulation, the simulation speed must be increased. To increase the simulation speed, it is very important and urgent to adopt multi-computer parallel simulation.

Disclosure of Invention

The invention provides a multi-computer cluster parallel simulation method and an aircraft engine multi-computer cluster system, which solve the problem that parallel simulation cannot be realized in the related technology.

As a first aspect of the present invention, a method for simulating multiple computer clusters in parallel is provided, where the method is applied to an aircraft engine multiple computer cluster system, where the aircraft engine multiple computer cluster system includes a file server and multiple node computers performing instruction interaction through MDCS, each node computer is in communication connection with the file server, and any one node computer can initiate a simulation task to another node computer in the system, and the method includes:

sending a model file to the file server, wherein the model file comprises a simulation task configuration file and a simulation task file;

determining a target node computer and sending a task instruction to the target node computer to distribute a simulation task;

after the target node computer acquires the model file in the file server according to the task instruction, sending a starting instruction to the target node computer to start a simulation task execution tool in the target node computer, wherein the simulation task execution tool can execute a corresponding simulation task according to the simulation task configuration file;

and after the target node computer completes the simulation task, acquiring a simulation result of the target node computer, and analyzing and processing the simulation result.

Further, the sending the model file to the file server includes:

compiling a simulation task configuration file according to the simulation task;

and forming the simulation task into a simulation task file, forming the configuration file and the simulation task file into a model file, and sending the model file to the file server.

Further, the determining a target node computer and sending task instructions to the target node computer to distribute simulation tasks includes:

checking and displaying the current available node computer;

and determining target node computers from the available node computers according to the number of the received model files, and sending task instructions to each target node computer to distribute simulation tasks.

Further, the determining target node computers from the available node computers according to the number of the received model files and sending task instructions to each of the target node computers to distribute simulation tasks includes:

initializing a node computer real-time state table and a simulation task running state table, wherein the node computer real-time state table is used for recording the number of available node computers, and the simulation task running state table is used for recording the number of simulation task files to be executed;

determining a target node computer according to the comparison result of the number of the simulation task files to be executed currently and the number of the available node computers;

and sending a task instruction to each determined target node computer to distribute simulation tasks.

Further, the determining a target node computer according to a comparison result between the number of the simulation task files to be executed currently and the number of the available node computers includes:

if the number of the simulation task files to be executed currently is smaller than the number of the available node computers, directly determining target node computers corresponding to the number of the simulation task files from the available node computers;

and if the number of the simulation task files to be executed currently is not less than the number of the available node computers, determining all the currently available node computers as target node computers.

Further, the sending a task instruction to each determined target node computer to distribute a simulation task includes:

when the number of the simulation task files to be executed at present is smaller than that of the available node computers, directly sending task instructions to the determined target node computers to distribute simulation tasks;

and when the number of the simulation task files to be executed currently is not less than the number of the available node computers, sending a task instruction to the target node computer according to a dynamic recursive allocation algorithm to allocate simulation tasks.

Further, when the number of the simulation task files to be executed currently is not less than the number of the available node computers, sending a task instruction to the target node computer according to a dynamic recursive allocation algorithm to allocate a simulation task, including:

dynamically sending task instructions to all current target node computers according to the number of the current simulation task files to be executed;

acquiring simulation result files of all target node computers;

comparing whether the number of the sent task instructions is consistent with the number of the received simulation result files;

and if the two nodes are not consistent, positioning the reasons of inconsistency according to whether the target node computer is abnormal or not and whether the simulation task is abnormal or not.

Further, the determining, according to whether the target node computer is abnormal and whether the simulation task is abnormal, the reason for positioning inconsistency includes:

judging whether the target node computer is abnormal or not;

if the target node computer is abnormal, judging the times of the abnormality of the target node computer;

if the number of times of the abnormity of the target node computer is greater than a preset number threshold, deleting the target node computer from the node computer real-time state table, and updating the node computer real-time state table;

if the target node computer is normal, judging whether the simulation task is abnormal or not;

if the simulation task is abnormal, judging the times of the abnormal simulation task;

and if the abnormal times of the simulation task are larger than a preset time threshold, deleting the simulation task from the simulation task running state table, and updating the simulation task running state table.

Further, if the simulation task files are consistent, updating the real-time state table of the node computer and the running state table of the simulation task, and repeatedly executing the step of dynamically sending task instructions to all current target node computers according to the number of the current simulation task files to be executed until all the simulation of the current simulation task files to be executed are finished.

As another aspect of the invention, an aircraft engine multi-computer cluster system is provided, comprising: the multi-computer cluster parallel simulation method comprises a file server and a plurality of node computers which carry out instruction interaction through MDCS, wherein each node computer is in communication connection with the file server, any node computer can initiate simulation tasks to other node computers in a system, each node computer comprises a storage and a processor, the storage is used for storing computer instructions, and when any node computer initiates the simulation tasks, the processor is used for loading and executing the computer instructions to realize the multi-computer cluster parallel simulation method.

According to the multi-computer cluster parallel simulation method provided by the invention, one node computer initiates simulation tasks to other multiple node computers in the system, so that the simulation tasks can be executed in the multiple node computers in parallel, and the speed of joint simulation can be greatly improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a schematic diagram of a multi-computer cluster according to the present invention.

Fig. 2 is a schematic diagram of the operation of the multi-computer parallel simulation based on the MDCS according to the present invention.

FIG. 3 is a diagram of a multi-computer parallel simulation scheduling process according to the present invention.

FIG. 4 is a flow chart of a method for concurrent simulation of multiple computer clusters according to the present invention.

FIG. 5 is a flow chart of the dynamic recursive assignment task algorithm provided by the present invention.

Detailed Description

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.

In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged under appropriate circumstances in order to facilitate the description of the embodiments of the invention herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In this embodiment, a method for simulating multiple computer clusters in parallel is provided, which is applied to an aircraft engine multiple computer cluster system, as shown in fig. 1, where the aircraft engine multiple computer cluster system includes a file Server and multiple node computers performing instruction interaction through an MDCS (media Distributed Computing Server, Matlab Distributed Computing service kit), each node computer is in communication connection with the file Server, any one node computer can initiate a simulation task to another node computer in the system, when one node computer initiates a simulation task, the node computer can serve as a host computer and distribute the simulation task to another node computer, and at this time, a node computer capable of executing the simulation task in another node computer serves as a target node computer, so as to implement parallel simulation.

It should be noted that, in the embodiment of the present invention, the distributed computing service kit MDCS of Matlab realizes sharing of computing resources in the local area network through a specific port of the TCP/IP, and performs parallel simulation of multiple computer clusters. The MDCS provides data link service among local area network nodes and can realize data instruction interaction based on the m file. A schematic diagram of the operation of the MDCS-based multi-computer parallel simulation is shown in fig. 2.

In the embodiment of the invention, the MDCS tool box is used for parallel computing. The MDCS is used for constructing a multi-computer information interaction channel and platform and mainly used for receiving and transmitting command information among computers of various nodes. In an aircraft engine multi-computer cluster system, command interaction is only a basic part of functions. A great deal of work also includes: the method comprises the steps of parallel distribution of tasks, a parallel scheduling algorithm, joint simulation, data recovery, data analysis, simulation data report and the like.

As shown in fig. 3, the aircraft engine multi-computer cluster system further includes a file server, where the file server is configured to store the model file sent by the node computer that initiates the simulation task, and store the simulation result after the target node computer completes the simulation task. When the target node computer acquires the model file according to the task instruction, copying the model file from the file server, and when the node computer which initiates the simulation task node needs to recover the simulation result, acquiring the simulation result from the file server.

It should be understood that by setting the file server, the requirement of current privacy management on the network authority can be met, and the system security is improved.

As shown in fig. 4, a flowchart of a method for simulating a multi-computer cluster in parallel according to an embodiment of the present invention includes:

s110, sending a model file to the file server, wherein the model file comprises a simulation task configuration file and a simulation task file;

in the embodiment of the invention, when one node computer in the multi-computer cluster system of the aircraft engine initiates a simulation task, other node computers in the system can be used as target node computers of the initiating node computer to execute the simulation task.

At this time, the node computer initiating the simulation task needs to compile a simulation task configuration file according to the simulation task; and forming the simulation task into a simulation task file, forming the configuration file and the simulation task file into a model file, and sending the model file to the file server.

It should be understood that the node computer initiating the simulation task can list all the simulation tasks to be executed according to the sequence required to be executed, i.e. form a simulation task file, and also compile a simulation task configuration file according to each simulation task required to be executed, such as configuring the interface sequence between the simulation tools, and the like.

In the embodiment of the present invention, the simulation tool may specifically include Saber, Matlab, and AMESim.

In the embodiment of the invention, the Saber simulation tool mainly realizes the simulation of hardware circuits such as an electronic circuit and the like; the Matlab simulation tool is mainly used for realizing the simulation of a control algorithm; the AMESim simulation tool is mainly used for realizing simulation of an execution mechanism.

S120, determining a target node computer, and sending a task instruction to the target node computer to distribute a simulation task;

in the embodiment of the present invention, the method may specifically include:

checking and displaying the current available node computer;

Further specifically, the method comprises the following steps:

It should be understood that, in the embodiment of the present invention, in order to implement dynamic recursive allocation of a simulation task, a node computer real-time state table and a simulation task running state table are constructed, where the node computer real-time state table records the number of currently available node computers, for example, when one of the node computers determines that an unrecoverable abnormality occurs, the abnormal node computer needs to be deleted from the node computer real-time state table; the simulation task running state table is used for recording the number of the current simulation tasks to be executed, and when the simulation tasks are completed or the abnormality is determined, the simulation tasks need to be deleted from the simulation task running state table.

In this embodiment of the present invention, the determining a target node computer according to a comparison result between the number of simulation task files to be currently executed and the number of available node computers includes:

More specifically, the sending a task instruction to each determined target node computer to distribute a simulation task includes:

It should be understood that, the target node computer is determined according to the comparison result between the number of the simulation task files to be executed currently and the number of the available node computers, for example, as shown in fig. 3, a total of 6 node computers is included, such as a node 1 serving as a node computer initiating a simulation task, the number of the initiated simulation tasks to be executed is 4, and the remaining 5 node computers in the system are all available node computers, and then 4 of the node computers may be arbitrarily selected as the target node computers, and if the number of the tasks to be executed is 20, and the remaining 5 node computers in the system are all available node computers, then all the 5 available node computers are taken as the target node computers, and the simulation tasks to be executed need to be allocated according to a dynamic recursive allocation algorithm.

It should also be understood that if the number of the simulation task files to be executed currently is less than the number of the available node computers, the task instruction can be directly sent to the target node computer without being implemented by a dynamic recursive allocation algorithm, and if the number of the simulation task files to be executed currently is not less than the number of the available node computers, the dynamic recursive allocation algorithm is entered.

In this embodiment of the present invention, when the number of the current simulation task files to be executed is not less than the number of the available node computers, sending a task instruction to the target node computer according to a dynamic recursive allocation algorithm to allocate a simulation task, includes:

acquiring simulation result files of all target node computers;

As shown in fig. 5, firstly, the number of the simulation task files to be executed currently is to be obtained, and here, the obtaining may be directly performed according to the simulation task running state table. Secondly, task instructions are dynamically sent, as described above, if the number of simulation task files to be executed is 20 and the number of currently available target node computers is 5, task instructions are sequentially sent to the target node computers according to the sequence numbers of the simulation task files, for example, the task instructions corresponding to the simulation tasks with the sequence numbers of 1 to 5 may be sent to the corresponding 5 target node computers first, after the 5 target node computers all complete the simulation tasks to obtain simulation results, the following 6 to 10 are continuously sent, and the tasks are sequentially distributed in this way until all the simulation tasks to be executed are distributed.

It should be understood that, each time of distribution, corresponding simulation results are obtained, and whether the number of simulation results is consistent with the number of distributed simulation tasks is judged. For example, when the first allocation is performed, 5 simulation tasks are allocated, and if 5 simulation results are received, it is determined that no exception occurs in the simulation process, and if 5 simulation tasks are allocated and 4 simulation results are received, one simulation task is not completed, and at this time, a reason why the simulation task is not completed needs to be found.

Further specifically, the determining, according to whether the target node computer is abnormal and whether the simulation task is abnormal and the positioning is inconsistent, includes:

judging whether the target node computer is abnormal or not;

It should be understood that when the number of simulation tasks is inconsistent with the number of simulation results, a problem occurs in the simulation, and problem location is required. The logic of positioning is to firstly position whether the target node computer executing the simulation task is abnormal or not, and secondly position whether the simulation task is abnormal or not. And after positioning, adding 1 to the abnormal state value of the abnormal target node computer or the simulation task. If the abnormal times exceed the preset times threshold value, for example, exceed two times, the target node computer with the abnormal times is deleted from the node computer real-time state table, or the simulation task with the abnormal times is deleted from the simulation task running state table.

It should be noted here that, if the number of the simulation tasks is consistent with the number of the simulation results, the node computer real-time state table and the simulation task running state table are updated, and the step of dynamically sending task instructions to all current target node computers according to the number of the current simulation task files to be executed is repeatedly executed until all simulation of the current simulation task files to be executed are finished.

And if the number of the simulation tasks is consistent with that of the simulation results, deleting the completed simulation tasks from the simulation task running state table, and initializing the node computer real-time state table.

And repeating the recursive judgment logic until all the simulation tasks to be executed are executed, and quitting the recursive allocation algorithm.

S130, after the target node computer acquires the model file in the file server according to the task instruction, sending a starting instruction to the target node computer to start a simulation task execution tool in the target node computer, wherein the simulation task execution tool can execute a corresponding simulation task according to the simulation task configuration file;

it should be appreciated that in the dynamic recursive assignment of tasks described above, the target node computer, upon receiving a task instruction, copies the corresponding model file from the file server. At this time, the node computer initiating the simulation task further needs to send a start instruction to the target node computer, where the start instruction may specifically be an m-file (specifically, a set including multiple instructions), the m-file specifies the simulation task configuration file of each target node computer, and sequentially starts a simulation tool in each target node computer. For example, the m file is used for scheduling and running application software Saber, and then the Saber sequentially realizes the scheduling of Matlab and AMESim.

S140, after the target node computer completes the simulation task, acquiring a simulation result of the target node computer, and analyzing and processing the simulation result.

It should be understood that each time the target node computer completes a simulation task, the target node computer sends the simulation result to the file server, and the node computer initiating the simulation task obtains the simulation result from the file server and performs subsequent analysis and processing on the simulation result.

In summary, in the multi-computer cluster parallel simulation method provided by the embodiment of the invention, one node computer initiates simulation tasks to other multiple node computers in the system, so that the simulation tasks can be executed in parallel in the multiple node computers, and the speed of joint simulation can be greatly increased.

For example, the relationship between the parallel simulation time, the serial simulation time and the number of node computers is as follows: t is_{In parallel}=T_{In series}N, where N represents the number of node computers executing the simulation task, it can be seen that the speed after parallel simulation is improved by a factor of N compared to serial simulation.

As another embodiment of the present invention, there is provided an aircraft engine multi-computer cluster system, comprising: the multi-computer cluster parallel simulation method comprises a file server and a plurality of node computers which carry out instruction interaction through MDCS, wherein each node computer is in communication connection with the file server, any node computer can initiate simulation tasks to other node computers in a system, each node computer comprises a storage and a processor, the storage is used for storing computer instructions, and when any node computer initiates the simulation tasks, the processor is used for loading and executing the computer instructions to realize the multi-computer cluster parallel simulation method.

It should be understood that the multi-computer cluster system of the aircraft engine provided by the embodiment of the invention constructs a scheme of multi-computer cluster based on MDCS, can be compatible with the original combined simulation platform to the maximum extent, and can effectively improve the simulation speed.

Regarding the working principle of the aircraft engine multi-computer cluster system provided by the embodiment of the present invention, reference may be made to the description of the multi-computer cluster parallel simulation method in the foregoing, and details are not described herein again.

It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims

1. A multi-computer cluster parallel simulation method is applied to an aircraft engine multi-computer cluster system, the aircraft engine multi-computer cluster system comprises a file server and a plurality of node computers which carry out instruction interaction through MDCS, each node computer is in communication connection with the file server, any node computer can initiate a simulation task to other node computers in the system, and the multi-computer cluster parallel simulation method comprises the following steps:

2. The method of multi-computer cluster concurrent simulation according to claim 1, wherein said sending a model file to said file server comprises:

3. The multi-computer cluster concurrent simulation method of claim 1, wherein said determining a target node computer and sending task instructions to said target node computer to distribute simulation tasks comprises:

checking and displaying the current available node computer;

4. The method of multi-computer cluster concurrent simulation according to claim 3, wherein said determining target node computers from the available node computers based on the number of received model files and sending task instructions to each of said target node computers to distribute simulation tasks comprises:

5. The method for multi-computer cluster concurrent simulation according to claim 4, wherein said determining a target node computer according to the comparison result of the number of simulation task files to be executed currently and the number of available node computers comprises:

6. The method of multi-computer cluster concurrent simulation according to claim 5, wherein said sending task instructions to each of said determined target node computers to distribute simulation tasks comprises:

7. The multi-computer cluster parallel simulation method of claim 6, wherein when the number of the simulation task files to be executed currently is not less than the number of the available node computers, sending task instructions to the target node computer to allocate simulation tasks according to a dynamic recursive allocation algorithm, comprises:

acquiring simulation result files of all target node computers;

8. The multi-computer cluster parallel simulation method according to claim 7, wherein the determining the reason for the positioning inconsistency according to whether the target node computer is abnormal or not and whether the simulation task is abnormal or not comprises:

judging whether the target node computer is abnormal or not;

9. The multi-computer cluster parallel simulation method of claim 7,

and if the simulation task files are consistent, updating the real-time state table of the node computer and the simulation task running state table, and repeatedly executing the step of dynamically sending task instructions to all current target node computers according to the number of the current simulation task files to be executed until the current simulation task files to be executed are completely simulated.

10. An aircraft engine multiple computer cluster system, comprising: the multi-computer cluster parallel simulation method comprises a file server and a plurality of node computers which carry out instruction interaction through MDCS, wherein each node computer is in communication connection with the file server, any node computer can initiate simulation tasks to other node computers in a system, each node computer comprises a storage and a processor, the storage is used for storing computer instructions, and when any node computer initiates a simulation task, the processor is used for loading and executing the computer instructions to realize the multi-computer cluster parallel simulation method as claimed in any one of claims 1 to 9.