CN116578415B

CN116578415B - Parallel method for large sample load simulation

Info

Publication number: CN116578415B
Application number: CN202310465935.6A
Authority: CN
Inventors: 宋海凌; 马晓斌; 强艳辉; 邢向向; 谭丹
Original assignee: Chinese People's Liberation Army 92942 Army
Current assignee: Chinese People's Liberation Army 92942 Army
Priority date: 2023-04-26
Filing date: 2023-04-26
Publication date: 2024-06-04
Anticipated expiration: 2043-04-26
Also published as: CN116578415A

Abstract

The specification discloses a parallel method for large sample load simulation, which relates to the technical field of distributed simulation system design and comprises the steps of judging the request type of a simulation task based on a simulation task request, traversing a sample and pre-starting a simulation service model in the sample if the request type is a scheduling sample, and managing resources corresponding to the sample if the request type is a closing sample; judging whether the sample can be started successfully or not based on the resource demand of the sample and the resource usable quantity of the corresponding working node, if so, applying for resources required by the sample according to an application strategy, and if not, adding the sample into the tail of the global queue; the simulation service model in the sample is started to carry out a simulation test, and the problem that the execution time cost of a simulation test task is too high because parallel scheduling of the simulation sample cannot be realized under the condition of limited resources in the conventional radar system simulation test method at present is solved.

Description

Parallel method for large sample load simulation

Technical Field

The invention belongs to the technical field of distributed simulation system design, and particularly relates to a parallel method for large sample load simulation.

Background

The simulation test of the large-scale distributed radar system relates to the problems of large data exchange among signal-level simulation models, rapid sharing of massive test data, high real-time requirements on the simulation test and the like. The traditional simulation test in the past has smaller scale, the test tasks are usually carried out in a sequential manner, and the testers do not need to pay attention to the time cost of executing the test tasks. However, in order to minimize the time required for task execution and improve the task parallelism when processing large-scale test task sets, it is necessary to improve the parallelism for large-sample load simulation.

The current traditional simulation system lacks a task parallel scheduling optimization method oriented to optimization of a test running process, and parallelism is an important means for improving the execution efficiency of simulation tasks. Considering that a large-scale distributed radar system is generally provided with a plurality of computing nodes, each computing node is provided with a plurality of CPU and GPU hardware resources, if the dynamic parallel scheduling of tasks can be realized according to the use condition of the resources in the computing nodes, the use efficiency of a test platform can be greatly improved, and the total execution time of all simulation tasks is further reduced. In simulation task scheduling, parallelism needs to take into account both inter-model parallelism and inter-sample parallelism. 1. Parallel operation is carried out on the related models in the same simulation test sample to the corresponding GPU of the same computing node mainly by considering how to schedule the models, and data sharing is carried out through a special high-speed channel, so that the step length of executing a single task is reduced; 2. the sample-to-sample parallelism mainly considers how to dynamically schedule different task samples to different computing nodes for execution under the constraint of resources, and improves the overall execution efficiency of test samples while considering load balancing.

Therefore, the conventional radar system simulation test method at present has the problem that the parallel scheduling of simulation samples cannot be realized under the condition of limited resources, so that the execution time cost of simulation test tasks is too high.

Disclosure of Invention

The invention aims to provide a parallel method oriented to large sample load simulation, which aims to solve the problem that the execution time cost of a simulation test task is too high because parallel scheduling of simulation samples cannot be realized under the condition of limited resources in the conventional radar system simulation test method.

In order to achieve the above purpose, the invention adopts the following technical scheme:

In one aspect, the present disclosure provides a parallel method for large sample load simulation, including:

step 102, judging the request type of the simulation task based on the simulation task request, if the request type is a scheduling sample, traversing the sample and pre-starting a simulation service model in the sample, and if the request type is a closing sample, managing resources corresponding to the sample;

Step 104, judging whether the sample can be started successfully or not based on the resource demand of the sample and the resource usable quantity of the corresponding working node, if so, applying for the resource required by the sample according to the application strategy, if not, adding the sample into the tail of a global queue, and controlling the starting sequence of the sample and storing the sample needing to start simulation by the global queue;

and 106, starting a simulation service model in the sample to carry out a simulation test based on the resources required by the sample and the application strategy.

In another aspect, the present disclosure provides a parallel apparatus for large sample load simulation, including:

The scheduling module is used for judging the request type of the simulation task based on the simulation task request, traversing the sample and pre-starting a simulation service model in the sample if the request type is a scheduling sample, and managing resources corresponding to the sample if the request type is a closing sample;

The distribution module is used for judging whether the sample can be started successfully or not based on the resource demand of the sample and the resource available quantity of the corresponding working node, if so, applying for the resource required by the sample according to the application strategy, if not, adding the sample into the tail of the global queue, and controlling the starting sequence of the sample and storing the sample needing to start simulation by the global queue;

and the starting module is used for starting the simulation service model in the sample to carry out a simulation test based on the resources required by the sample and the application strategy.

Based on the technical scheme, the following technical effects can be obtained in the specification:

According to the method, whether the task is to schedule the sample is judged according to the type of the task request, if yes, whether the sample can be started successfully or not is judged according to the demand of different samples on the resource and the available resource of the corresponding working node, if the sample can be started successfully is judged, the resource of the sample is applied, if the sample can not be started successfully is judged, the sample is added into a global queue, the starting sequence of the sample is controlled through the global queue, and the simulation test is continuously carried out on the sample applied to the resource according to the application strategy, so that the scheduling of a plurality of samples or the timely scheduling of a plurality of samples is carried out under the condition of limited resources, the efficient utilization of the resource is guaranteed, and the problem that the execution time cost of the simulation test task is too high due to the fact that the parallel scheduling of the simulation sample can not be realized under the condition of limited resources in the conventional radar system simulation test method at present is solved.

Drawings

FIG. 1 is a flow chart of a parallel method for large sample load simulation in an embodiment of the invention.

FIG. 2 is a flow chart of a parallel method for large sample load simulation in an embodiment of the invention.

FIG. 3 is a flow chart of a parallel method for large sample load simulation in another embodiment of the invention.

FIG. 4 is a schematic diagram of a scheduling architecture of a parallel system for large sample load simulation according to an embodiment of the present invention.

FIG. 5 is a schematic diagram of a parallel device for large sample load simulation according to an embodiment of the present invention.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the invention.

Detailed Description

The advantages and features of the present invention will become more fully apparent from the following description and appended claims, taken in conjunction with the accompanying drawings and detailed description. It should be noted that the drawings are in a very simplified form and are adapted to non-precise proportions, merely for the purpose of facilitating and clearly aiding in the description of embodiments of the invention.

It should be noted that, in order to clearly illustrate the present invention, various embodiments of the present invention are specifically illustrated by the present embodiments to further illustrate different implementations of the present invention, where the various embodiments are listed and not exhaustive. Furthermore, for simplicity of explanation, what has been mentioned in the previous embodiment is often omitted in the latter embodiment, and therefore, what has not been mentioned in the latter embodiment can be referred to the previous embodiment accordingly.

Example 1

Referring to fig. 1, fig. 1 shows a parallel method for large sample load simulation according to this embodiment. In this embodiment, the method includes:

in this embodiment, before traversing the sample and pre-starting the simulation service model in the sample, checking parameters of the simulation task is further included, and if the parameters pass the checking, traversing the sample and starting the simulation service model.

In this embodiment, the step of managing the resources corresponding to the samples includes:

step 202, obtaining scheduling information of the samples based on the sample identifiers of the samples;

step 204, releasing the resources called by the sample based on the scheduling information.

When the request type is a closed sample, inquiring scheduling information of all simulation service models in the current sample in the database according to sample identifiers in the method parameters, wherein the scheduling information comprises model identifiers, identifier names of all current nodes of the models, identifier names of the models in a container cloud platform, resource information used by the models and the like. And then, initiating a resource release request based on the scheduling information like a cloud platform, and after the cloud platform receives the resource release request, performing resource release of the simulation service model under the sample, and updating sample scheduling information in a database after the resource release is successful.

In this embodiment, the method for judging whether the sample can be started successfully based on the resource demand of the sample and the resource availability of the corresponding working node is to judge whether the sample starting resource is sufficient based on the resource demand of the sample and the resource availability of the current corresponding working node, if so, the starting is successful, and if not, the starting is failed.

In this embodiment, before step 104, the method further includes:

And allocating operation resources, including a CPU, a memory, a GPU and the like, to each working node of the simulation platform, returning a resource allocation result, and simultaneously acquiring the resource use condition of the working node at fixed time.

In this embodiment, the application policies include a full-volume application policy and an on-demand application policy, where the full-volume application policy is a resource required by all the simulation service models in the application sample, and the on-demand application policy is a resource required by the simulation service models in the application sample in the same period.

In this embodiment, one implementation manner of step 106 is:

if the application strategy is a full-quantity application strategy, the engine is only required to send out a control instruction to control the starting of the simulation service model;

if the application policy is an on-demand application policy, the engine needs to instruct the simulation service models to start according to the starting policy of each simulation service model in the sample.

In this embodiment, the starting policy is formulated for each simulation service model by the engine based on the resource information required by each simulation service model in the sample and the applied resource information.

In this embodiment, after step 106, the method further includes:

and storing the scheduling information of all models in the current sample.

Specifically, if the sample is started successfully, the scheduling information of all models under the current sample needs to be additionally stored, including model identification, the node where the model is currently located, the identification name of the model in the container cloud platform, the resource information used by the model and the like.

In this embodiment, after step 106, the method further includes:

And monitoring the running state of the sample, and if the running of the sample is finished, releasing the resources of the sample and starting the next sample based on the global queue.

In summary, the method judges whether the task is to schedule the sample according to the type of the task request, if yes, judges whether the sample can be started successfully according to the demand of different samples on the resource and the resource usable quantity of the corresponding working node, judges whether the sample can be started successfully, applies for the resource of the sample, if the sample can not be started successfully, adds the sample into the global queue, controls the starting sequence of the sample through the global queue, and carries out simulation test on the sample applied to the resource according to the application strategy, thereby realizing the scheduling of a plurality of samples or the timely scheduling of a plurality of samples under the condition of limited resource, ensuring that a plurality of simulation tasks can be simultaneously performed, and realizing the efficient utilization of the resource.

Example 2

Referring to fig. 3, the parallel method for large sample load simulation provided in this embodiment includes:

First, receive simulation tasks

Specifically, a dispatch control module deployed on the container cloud platform is responsible for receiving dispatch samples or shutdown sample requests from the simulation program. When the request arrives at the dispatching control module, the dispatching controller judges the type of the current request according to the request method and the request parameters, if the current request is a closed sample request, the dispatching control module performs simple data processing and submits the task to the platform resource management module, and all subsequent works are processed by the platform resource management module. And if the current request is a sample scheduling request, performing simple data processing, and submitting the task to a back-end scheduling module for processing.

Second, the scheduling module starts the simulation sample

Specifically, when the scheduling module receives a scheduling task from the scheduling controller, parameters of the scheduling task are checked first. When the parameter verification passes, the sample is traversed, and the sample simulation service model starts to be started. And calling a container cloud platform scheduling module, and transmitting the resource parameters of the simulation model.

Third, the allocation module allocates resources

Specifically, the scheduling module starts a simulation model container in the sample, starts a resource allocation module, allocates running resources including a CPU, a memory, a GPU and the like for the task application, and returns a resource allocation result. The distribution module obtains the resource use condition of the working node from the monitoring module at regular time, and then calculates whether the sample starting resources are sufficient according to the available total amount of the working node and the resource request of the simulated sample. And if the resources are insufficient, returning a result of sample start failure.

Fourth, apply for sample resources;

Specifically, if the resources are sufficient, the scheduling module applies for the resources required by the sample first, and the sample resource application policy support is configured as a full-volume application or an on-demand application. The multiple models in the sample run in different time periods, and the multi-task module selects resources required by applying all models according to a set application strategy, or directly applies the most resources which can be used in the same time period from the sample level. When the application policy is full, all models are started in advance for the engine, and the engine only needs to send out a control instruction; when the on-demand application strategy is adopted, the engine needs to control the independent starting strategy of each model and perform resource allocation.

Fifth, model start

Specifically, after resource allocation is successful, a simulation service model in a sample is started, the sample selects an on-demand application strategy to apply for resources required by a model in the sample, the on-demand application strategy is executed, then the on-demand application strategy is fed back to an engine layer to assign a starting strategy to each model in the sample, the engine executes strategy assignment to each model based on resource information applied by each model in the sample and the applied resource allocation information, and the engine directs model starting based on the model starting strategy assigned in each model.

Sixth, persisting task information

Specifically, if the sample is started successfully, the scheduling information of all models under the current sample needs to be additionally stored, including model identification, the node where the model is currently located, the identification name of the model in the container cloud platform, the resource information used by the model and the like, and the sample with failed starting is added to the tail of the global queue. The user of the global queue controls the starting sequence of the simulation sample tasks and stores the simulation samples needing to start simulation, and the global queue stores the unique identification of the simulation tasks to wait for the use of the scheduling module.

Seventh, simulation monitoring

Specifically, after the simulation model sample is started successfully, the monitoring module monitors the running state of the sample, if the sample is ended, the current sample resource state is released, then the scheduling module obtains the global queue, if the global queue is not empty, the head sample of the queue is taken to repeat the second step, the simulation sample is started successfully, the next sample is taken continuously for starting, and the tail is added if the starting fails.

Eighth, resource release

Specifically, when the platform resource management module receives a sample closing task from the scheduling controller, the scheduling information of all models under the current sample is queried in the database according to the sample identification in the method parameter, the query result comprises the model identification, all the current nodes of the model, the identification name of the model in the container cloud platform, the resource information used by the model and the like, the platform resource manager takes the query result to process and package the query result into a data format required by the container cloud platform, and initiates a resource release request to the container cloud platform, after the container cloud receives the request, the resource release is carried out, after the resource release is successful, the resource release is returned to the platform resource manager, and the platform resource manager updates the sample scheduling information of the database according to the returned result.

Based on the method, whether the task is to schedule the sample is judged according to the type of the task request, if yes, whether the sample can be started successfully or not is judged according to the demand of different samples on the resource and the available resource of the corresponding working node, the sample is applied for the resource if the sample can be started successfully is judged, if the sample can not be started successfully is added into a global queue, the starting sequence of the sample is controlled through the global queue, and the simulation test is continuously carried out on the sample applied to the resource according to the application strategy, so that the scheduling of a plurality of samples or the timely scheduling of a plurality of samples is carried out under the condition of limited resources, the simultaneous performance of a plurality of simulation tasks is ensured, the efficient utilization of the resource is realized, and the problem that the parallel scheduling of the simulation sample can not be realized under the condition of limited resources in the conventional radar system simulation test method at present, and the time cost of executing the simulation test task is too high is solved.

Example 3

Referring to fig. 4, fig. 4 shows that the present embodiment provides a parallel system for large sample load simulation, which includes:

the system comprises a global queue module, an allocation controller, a scheduling module, a sample resource application strategy module and a model starting strategy module.

A global queue module: the method mainly comprises the steps of controlling a task application starting sequence in a task scheduling queue to realize scheduling and caching of application tasks, stopping all simulation model services under the sample when a scheduling module fails to start the simulation model services, adding the simulation sample into a global queue, waiting for a cloud platform to release resources, and restarting the simulation sample by the scheduling module.

The distributor control module: the allocation controller allocates running resources for the task application including (CPU, memory, GPU, etc.);

and a scheduling module: the automatic scheduling technology of the task application is realized, and the stable operation of the simulation task application on the most suitable task node is ensured;

Sample resource application policy module: configured as a full-volume application or an on-demand application;

model initiation policy module: the starting parameter settings (starting sequence, starting signal) are injected into the model and are divided into a full-quantity mode and an on-demand application.

Based on the method, the system realizes that a plurality of simulation tasks and samples run in a parallel state in a cluster, optimizes and provides the cluster resource utilization rate, can improve the resource utilization rate and reliability of the full-digital signal level simulation platform of the radar system, enhances the control granularity of simulation resources and flexibly manages the system, and improves the running efficiency of the full-digital signal level simulation test of the radar system on the whole.

Example 4

Referring to fig. 5, fig. 5 shows that the present embodiment provides a parallel device for large sample load simulation, which includes:

Optionally, the method further comprises:

and the parameter verification module is used for verifying parameters of the simulation task, traversing the sample and starting the simulation service model when the parameters pass the verification.

Optionally, the starting module includes:

the full application starting unit is used for controlling the simulation service model to start only by sending a control instruction by the engine if the application strategy is a full application strategy;

And the on-demand application starting unit is used for indicating the starting of the simulation service models according to the starting strategy of each simulation service model in the sample by the engine if the application strategy is the on-demand application strategy.

Optionally, the method further comprises:

And the scheduling information storage unit is used for storing the scheduling information of all models in the current sample.

Optionally, the method further comprises:

And the running state monitoring module is used for monitoring the running state of the sample, and if the running of the sample is finished, releasing the resources of the sample and starting the next sample based on the global queue.

Based on the method, the device judges whether the task is to schedule the sample according to the type of the task request, if yes, judges whether the sample can be started successfully according to the demand of different samples on the resource and the resource usable quantity of the corresponding working node, judges whether the sample can be started successfully, applies for the resource of the sample, if the sample can not be started successfully, adds the sample into a global queue, controls the starting sequence of the sample through the global queue, and continuously performs simulation test on the sample applied to the resource according to the application strategy, thereby realizing the scheduling of a plurality of samples or the timely scheduling of a plurality of samples under the condition of limited resources so as to ensure that a plurality of simulation tasks can be performed simultaneously and realizing the efficient utilization of the resource, and solving the problem that the current traditional radar system simulation test method can not realize the parallel scheduling of the simulation sample under the condition of limited resources, so that the time cost for executing the simulation test task is too high.

Example 5

Referring to fig. 6, the present embodiment provides an electronic device, which includes a processor, an internal bus, a network interface, a memory, and a nonvolatile memory, and may include hardware required by other services. The processor reads the corresponding computer program from the nonvolatile memory to the memory and then runs the computer program to form a parallel method oriented to large sample load simulation on a logic level. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.

The network interface, processor and memory may be interconnected by a bus system. The buses may be classified into address buses, data buses, control buses, and the like.

The memory is used for storing programs. In particular, the program may include program code including computer-operating instructions. The memory may include read only memory and random access memory and provide instructions and data to the processor.

The processor is used for executing the program stored in the memory and specifically executing:

The processor may be an integrated circuit chip having signal processing capabilities. In implementation, each step of the above method may be implemented by an integrated logic circuit of hardware of a processor or an instruction in a software form.

Based on the same invention, the embodiments of the present specification also provide a computer-readable storage medium storing one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to perform a parallel method for large sample load simulation provided by the corresponding embodiments of fig. 1 to 2.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-readable storage media having computer-usable program code embodied therein.

In addition, for the device embodiments described above, since they are substantially similar to the method embodiments, the description is relatively simple, and references to the parts of the description of the method embodiments are only required. Also, it should be noted that in the respective modules of the system of the present application, components thereof are logically divided according to functions to be implemented, but the present application is not limited thereto, and the respective components may be re-divided or combined as needed.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the particular order shown, or the sequential order shown, is not necessarily required to achieve desirable results in the course of drawing figures, and in some embodiments, multitasking and parallel processing may be possible or advantageous.

The foregoing description is only exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims

1. A parallel method for large sample load simulation, comprising:

Judging the request type of a simulation task based on a simulation task request, if the request type is a scheduling sample, traversing the sample and pre-starting a simulation service model in the sample, and if the request type is a closing sample, managing resources corresponding to the sample;

Judging whether the sample can be started successfully or not based on the resource demand of the sample and the resource usable quantity of the corresponding working node, if so, applying for the resource required by the sample according to an application strategy, and if not, adding the sample into the tail of a global queue, wherein the global queue controls the starting sequence of the sample and stores the sample needing to start simulation;

starting a simulation service model in the sample to carry out a simulation test based on the resources required by the sample and the application strategy;

Before the sample is traversed and the simulation service model in the sample is pre-started, checking parameters of the simulation task is further included, and if the parameters pass the checking, the sample is traversed and the simulation service model is started;

The method for judging whether the sample can be started successfully based on the resource demand of the sample and the resource available amount of the corresponding working node is to judge whether the sample starting resource is sufficient based on the resource demand of the sample and the resource available amount of the current corresponding working node, if so, the starting is successful, and if not, the starting is failed;

The application strategies comprise a full-volume application strategy and an on-demand application strategy, wherein the full-volume application strategy is a resource required by all simulation service models in an application sample, and the on-demand application strategy is a resource required by the simulation service models in the application sample in the same period;

Based on the resources required by the sample and the application strategy, the step of starting the simulation service model in the sample to carry out the simulation test comprises the following steps:

If the application strategy is a full-quantity application strategy, only an engine is required to send out a control instruction to control the start of the simulation service model;

if the application strategy is an on-demand application strategy, the engine is required to guide the simulation service models to start according to the starting strategy of each simulation service model in the sample;

The starting strategy is formulated for each simulation service model by the engine based on the resource information required by each simulation service model in the sample and the applied resource information.

2. The method of claim 1, further comprising storing scheduling information for all models in a current sample after said launching the simulation service model in the sample for simulation testing.

3. The method of claim 1, further comprising monitoring an operational status of the sample after the launching of the simulation service model in the sample for simulation testing, and if the sample is running, releasing resources of the sample and launching a next sample based on a global queue.

4. The method of claim 1, wherein the step of managing resources corresponding to the samples comprises:

Acquiring scheduling information of the sample based on the sample identification of the sample;

And releasing the resources called by the sample based on the scheduling information.

5. A parallel device for large sample load simulation, comprising:

The distribution module is used for judging whether the sample can be started successfully or not based on the resource demand of the sample and the resource available quantity of the corresponding working node, if so, applying for the resource required by the sample according to the application strategy, if not, adding the sample into the tail of a global queue, and controlling the starting sequence of the sample and storing the sample needing to start simulation by the global queue;

The starting module is used for starting a simulation service model in the sample to carry out a simulation test based on the resources required by the sample and the application strategy;