CN116578415B - Parallel method for large sample load simulation - Google Patents

Parallel method for large sample load simulation Download PDF

Info

Publication number
CN116578415B
CN116578415B CN202310465935.6A CN202310465935A CN116578415B CN 116578415 B CN116578415 B CN 116578415B CN 202310465935 A CN202310465935 A CN 202310465935A CN 116578415 B CN116578415 B CN 116578415B
Authority
CN
China
Prior art keywords
sample
simulation
resource
starting
service model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310465935.6A
Other languages
Chinese (zh)
Other versions
CN116578415A (en
Inventor
宋海凌
马晓斌
强艳辉
邢向向
谭丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese People's Liberation Army 92942 Army
Original Assignee
Chinese People's Liberation Army 92942 Army
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinese People's Liberation Army 92942 Army filed Critical Chinese People's Liberation Army 92942 Army
Priority to CN202310465935.6A priority Critical patent/CN116578415B/en
Publication of CN116578415A publication Critical patent/CN116578415A/en
Application granted granted Critical
Publication of CN116578415B publication Critical patent/CN116578415B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5022Mechanisms to release resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The specification discloses a parallel method for large sample load simulation, which relates to the technical field of distributed simulation system design and comprises the steps of judging the request type of a simulation task based on a simulation task request, traversing a sample and pre-starting a simulation service model in the sample if the request type is a scheduling sample, and managing resources corresponding to the sample if the request type is a closing sample; judging whether the sample can be started successfully or not based on the resource demand of the sample and the resource usable quantity of the corresponding working node, if so, applying for resources required by the sample according to an application strategy, and if not, adding the sample into the tail of the global queue; the simulation service model in the sample is started to carry out a simulation test, and the problem that the execution time cost of a simulation test task is too high because parallel scheduling of the simulation sample cannot be realized under the condition of limited resources in the conventional radar system simulation test method at present is solved.

Description

Parallel method for large sample load simulation
Technical Field
The invention belongs to the technical field of distributed simulation system design, and particularly relates to a parallel method for large sample load simulation.
Background
The simulation test of the large-scale distributed radar system relates to the problems of large data exchange among signal-level simulation models, rapid sharing of massive test data, high real-time requirements on the simulation test and the like. The traditional simulation test in the past has smaller scale, the test tasks are usually carried out in a sequential manner, and the testers do not need to pay attention to the time cost of executing the test tasks. However, in order to minimize the time required for task execution and improve the task parallelism when processing large-scale test task sets, it is necessary to improve the parallelism for large-sample load simulation.
The current traditional simulation system lacks a task parallel scheduling optimization method oriented to optimization of a test running process, and parallelism is an important means for improving the execution efficiency of simulation tasks. Considering that a large-scale distributed radar system is generally provided with a plurality of computing nodes, each computing node is provided with a plurality of CPU and GPU hardware resources, if the dynamic parallel scheduling of tasks can be realized according to the use condition of the resources in the computing nodes, the use efficiency of a test platform can be greatly improved, and the total execution time of all simulation tasks is further reduced. In simulation task scheduling, parallelism needs to take into account both inter-model parallelism and inter-sample parallelism. 1. Parallel operation is carried out on the related models in the same simulation test sample to the corresponding GPU of the same computing node mainly by considering how to schedule the models, and data sharing is carried out through a special high-speed channel, so that the step length of executing a single task is reduced; 2. the sample-to-sample parallelism mainly considers how to dynamically schedule different task samples to different computing nodes for execution under the constraint of resources, and improves the overall execution efficiency of test samples while considering load balancing.
Therefore, the conventional radar system simulation test method at present has the problem that the parallel scheduling of simulation samples cannot be realized under the condition of limited resources, so that the execution time cost of simulation test tasks is too high.
Disclosure of Invention
The invention aims to provide a parallel method oriented to large sample load simulation, which aims to solve the problem that the execution time cost of a simulation test task is too high because parallel scheduling of simulation samples cannot be realized under the condition of limited resources in the conventional radar system simulation test method.
In order to achieve the above purpose, the invention adopts the following technical scheme:
In one aspect, the present disclosure provides a parallel method for large sample load simulation, including:
step 102, judging the request type of the simulation task based on the simulation task request, if the request type is a scheduling sample, traversing the sample and pre-starting a simulation service model in the sample, and if the request type is a closing sample, managing resources corresponding to the sample;
Step 104, judging whether the sample can be started successfully or not based on the resource demand of the sample and the resource usable quantity of the corresponding working node, if so, applying for the resource required by the sample according to the application strategy, if not, adding the sample into the tail of a global queue, and controlling the starting sequence of the sample and storing the sample needing to start simulation by the global queue;
and 106, starting a simulation service model in the sample to carry out a simulation test based on the resources required by the sample and the application strategy.
In another aspect, the present disclosure provides a parallel apparatus for large sample load simulation, including:
The scheduling module is used for judging the request type of the simulation task based on the simulation task request, traversing the sample and pre-starting a simulation service model in the sample if the request type is a scheduling sample, and managing resources corresponding to the sample if the request type is a closing sample;
The distribution module is used for judging whether the sample can be started successfully or not based on the resource demand of the sample and the resource available quantity of the corresponding working node, if so, applying for the resource required by the sample according to the application strategy, if not, adding the sample into the tail of the global queue, and controlling the starting sequence of the sample and storing the sample needing to start simulation by the global queue;
and the starting module is used for starting the simulation service model in the sample to carry out a simulation test based on the resources required by the sample and the application strategy.
Based on the technical scheme, the following technical effects can be obtained in the specification:
According to the method, whether the task is to schedule the sample is judged according to the type of the task request, if yes, whether the sample can be started successfully or not is judged according to the demand of different samples on the resource and the available resource of the corresponding working node, if the sample can be started successfully is judged, the resource of the sample is applied, if the sample can not be started successfully is judged, the sample is added into a global queue, the starting sequence of the sample is controlled through the global queue, and the simulation test is continuously carried out on the sample applied to the resource according to the application strategy, so that the scheduling of a plurality of samples or the timely scheduling of a plurality of samples is carried out under the condition of limited resources, the efficient utilization of the resource is guaranteed, and the problem that the execution time cost of the simulation test task is too high due to the fact that the parallel scheduling of the simulation sample can not be realized under the condition of limited resources in the conventional radar system simulation test method at present is solved.
Drawings
FIG. 1 is a flow chart of a parallel method for large sample load simulation in an embodiment of the invention.
FIG. 2 is a flow chart of a parallel method for large sample load simulation in an embodiment of the invention.
FIG. 3 is a flow chart of a parallel method for large sample load simulation in another embodiment of the invention.
FIG. 4 is a schematic diagram of a scheduling architecture of a parallel system for large sample load simulation according to an embodiment of the present invention.
FIG. 5 is a schematic diagram of a parallel device for large sample load simulation according to an embodiment of the present invention.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the invention.
Detailed Description
The advantages and features of the present invention will become more fully apparent from the following description and appended claims, taken in conjunction with the accompanying drawings and detailed description. It should be noted that the drawings are in a very simplified form and are adapted to non-precise proportions, merely for the purpose of facilitating and clearly aiding in the description of embodiments of the invention.
It should be noted that, in order to clearly illustrate the present invention, various embodiments of the present invention are specifically illustrated by the present embodiments to further illustrate different implementations of the present invention, where the various embodiments are listed and not exhaustive. Furthermore, for simplicity of explanation, what has been mentioned in the previous embodiment is often omitted in the latter embodiment, and therefore, what has not been mentioned in the latter embodiment can be referred to the previous embodiment accordingly.
Example 1
Referring to fig. 1, fig. 1 shows a parallel method for large sample load simulation according to this embodiment. In this embodiment, the method includes:
step 102, judging the request type of the simulation task based on the simulation task request, if the request type is a scheduling sample, traversing the sample and pre-starting a simulation service model in the sample, and if the request type is a closing sample, managing resources corresponding to the sample;
in this embodiment, before traversing the sample and pre-starting the simulation service model in the sample, checking parameters of the simulation task is further included, and if the parameters pass the checking, traversing the sample and starting the simulation service model.
In this embodiment, the step of managing the resources corresponding to the samples includes:
step 202, obtaining scheduling information of the samples based on the sample identifiers of the samples;
step 204, releasing the resources called by the sample based on the scheduling information.
When the request type is a closed sample, inquiring scheduling information of all simulation service models in the current sample in the database according to sample identifiers in the method parameters, wherein the scheduling information comprises model identifiers, identifier names of all current nodes of the models, identifier names of the models in a container cloud platform, resource information used by the models and the like. And then, initiating a resource release request based on the scheduling information like a cloud platform, and after the cloud platform receives the resource release request, performing resource release of the simulation service model under the sample, and updating sample scheduling information in a database after the resource release is successful.
Step 104, judging whether the sample can be started successfully or not based on the resource demand of the sample and the resource usable quantity of the corresponding working node, if so, applying for the resource required by the sample according to the application strategy, if not, adding the sample into the tail of a global queue, and controlling the starting sequence of the sample and storing the sample needing to start simulation by the global queue;
In this embodiment, the method for judging whether the sample can be started successfully based on the resource demand of the sample and the resource availability of the corresponding working node is to judge whether the sample starting resource is sufficient based on the resource demand of the sample and the resource availability of the current corresponding working node, if so, the starting is successful, and if not, the starting is failed.
In this embodiment, before step 104, the method further includes:
And allocating operation resources, including a CPU, a memory, a GPU and the like, to each working node of the simulation platform, returning a resource allocation result, and simultaneously acquiring the resource use condition of the working node at fixed time.
And 106, starting a simulation service model in the sample to carry out a simulation test based on the resources required by the sample and the application strategy.
In this embodiment, the application policies include a full-volume application policy and an on-demand application policy, where the full-volume application policy is a resource required by all the simulation service models in the application sample, and the on-demand application policy is a resource required by the simulation service models in the application sample in the same period.
In this embodiment, one implementation manner of step 106 is:
if the application strategy is a full-quantity application strategy, the engine is only required to send out a control instruction to control the starting of the simulation service model;
if the application policy is an on-demand application policy, the engine needs to instruct the simulation service models to start according to the starting policy of each simulation service model in the sample.
In this embodiment, the starting policy is formulated for each simulation service model by the engine based on the resource information required by each simulation service model in the sample and the applied resource information.
In this embodiment, after step 106, the method further includes:
and storing the scheduling information of all models in the current sample.
Specifically, if the sample is started successfully, the scheduling information of all models under the current sample needs to be additionally stored, including model identification, the node where the model is currently located, the identification name of the model in the container cloud platform, the resource information used by the model and the like.
In this embodiment, after step 106, the method further includes:
And monitoring the running state of the sample, and if the running of the sample is finished, releasing the resources of the sample and starting the next sample based on the global queue.
In summary, the method judges whether the task is to schedule the sample according to the type of the task request, if yes, judges whether the sample can be started successfully according to the demand of different samples on the resource and the resource usable quantity of the corresponding working node, judges whether the sample can be started successfully, applies for the resource of the sample, if the sample can not be started successfully, adds the sample into the global queue, controls the starting sequence of the sample through the global queue, and carries out simulation test on the sample applied to the resource according to the application strategy, thereby realizing the scheduling of a plurality of samples or the timely scheduling of a plurality of samples under the condition of limited resource, ensuring that a plurality of simulation tasks can be simultaneously performed, and realizing the efficient utilization of the resource.
Example 2
Referring to fig. 3, the parallel method for large sample load simulation provided in this embodiment includes:
First, receive simulation tasks
Specifically, a dispatch control module deployed on the container cloud platform is responsible for receiving dispatch samples or shutdown sample requests from the simulation program. When the request arrives at the dispatching control module, the dispatching controller judges the type of the current request according to the request method and the request parameters, if the current request is a closed sample request, the dispatching control module performs simple data processing and submits the task to the platform resource management module, and all subsequent works are processed by the platform resource management module. And if the current request is a sample scheduling request, performing simple data processing, and submitting the task to a back-end scheduling module for processing.
Second, the scheduling module starts the simulation sample
Specifically, when the scheduling module receives a scheduling task from the scheduling controller, parameters of the scheduling task are checked first. When the parameter verification passes, the sample is traversed, and the sample simulation service model starts to be started. And calling a container cloud platform scheduling module, and transmitting the resource parameters of the simulation model.
Third, the allocation module allocates resources
Specifically, the scheduling module starts a simulation model container in the sample, starts a resource allocation module, allocates running resources including a CPU, a memory, a GPU and the like for the task application, and returns a resource allocation result. The distribution module obtains the resource use condition of the working node from the monitoring module at regular time, and then calculates whether the sample starting resources are sufficient according to the available total amount of the working node and the resource request of the simulated sample. And if the resources are insufficient, returning a result of sample start failure.
Fourth, apply for sample resources;
Specifically, if the resources are sufficient, the scheduling module applies for the resources required by the sample first, and the sample resource application policy support is configured as a full-volume application or an on-demand application. The multiple models in the sample run in different time periods, and the multi-task module selects resources required by applying all models according to a set application strategy, or directly applies the most resources which can be used in the same time period from the sample level. When the application policy is full, all models are started in advance for the engine, and the engine only needs to send out a control instruction; when the on-demand application strategy is adopted, the engine needs to control the independent starting strategy of each model and perform resource allocation.
Fifth, model start
Specifically, after resource allocation is successful, a simulation service model in a sample is started, the sample selects an on-demand application strategy to apply for resources required by a model in the sample, the on-demand application strategy is executed, then the on-demand application strategy is fed back to an engine layer to assign a starting strategy to each model in the sample, the engine executes strategy assignment to each model based on resource information applied by each model in the sample and the applied resource allocation information, and the engine directs model starting based on the model starting strategy assigned in each model.
Sixth, persisting task information
Specifically, if the sample is started successfully, the scheduling information of all models under the current sample needs to be additionally stored, including model identification, the node where the model is currently located, the identification name of the model in the container cloud platform, the resource information used by the model and the like, and the sample with failed starting is added to the tail of the global queue. The user of the global queue controls the starting sequence of the simulation sample tasks and stores the simulation samples needing to start simulation, and the global queue stores the unique identification of the simulation tasks to wait for the use of the scheduling module.
Seventh, simulation monitoring
Specifically, after the simulation model sample is started successfully, the monitoring module monitors the running state of the sample, if the sample is ended, the current sample resource state is released, then the scheduling module obtains the global queue, if the global queue is not empty, the head sample of the queue is taken to repeat the second step, the simulation sample is started successfully, the next sample is taken continuously for starting, and the tail is added if the starting fails.
Eighth, resource release
Specifically, when the platform resource management module receives a sample closing task from the scheduling controller, the scheduling information of all models under the current sample is queried in the database according to the sample identification in the method parameter, the query result comprises the model identification, all the current nodes of the model, the identification name of the model in the container cloud platform, the resource information used by the model and the like, the platform resource manager takes the query result to process and package the query result into a data format required by the container cloud platform, and initiates a resource release request to the container cloud platform, after the container cloud receives the request, the resource release is carried out, after the resource release is successful, the resource release is returned to the platform resource manager, and the platform resource manager updates the sample scheduling information of the database according to the returned result.
Based on the method, whether the task is to schedule the sample is judged according to the type of the task request, if yes, whether the sample can be started successfully or not is judged according to the demand of different samples on the resource and the available resource of the corresponding working node, the sample is applied for the resource if the sample can be started successfully is judged, if the sample can not be started successfully is added into a global queue, the starting sequence of the sample is controlled through the global queue, and the simulation test is continuously carried out on the sample applied to the resource according to the application strategy, so that the scheduling of a plurality of samples or the timely scheduling of a plurality of samples is carried out under the condition of limited resources, the simultaneous performance of a plurality of simulation tasks is ensured, the efficient utilization of the resource is realized, and the problem that the parallel scheduling of the simulation sample can not be realized under the condition of limited resources in the conventional radar system simulation test method at present, and the time cost of executing the simulation test task is too high is solved.
Example 3
Referring to fig. 4, fig. 4 shows that the present embodiment provides a parallel system for large sample load simulation, which includes:
the system comprises a global queue module, an allocation controller, a scheduling module, a sample resource application strategy module and a model starting strategy module.
A global queue module: the method mainly comprises the steps of controlling a task application starting sequence in a task scheduling queue to realize scheduling and caching of application tasks, stopping all simulation model services under the sample when a scheduling module fails to start the simulation model services, adding the simulation sample into a global queue, waiting for a cloud platform to release resources, and restarting the simulation sample by the scheduling module.
The distributor control module: the allocation controller allocates running resources for the task application including (CPU, memory, GPU, etc.);
and a scheduling module: the automatic scheduling technology of the task application is realized, and the stable operation of the simulation task application on the most suitable task node is ensured;
Sample resource application policy module: configured as a full-volume application or an on-demand application;
model initiation policy module: the starting parameter settings (starting sequence, starting signal) are injected into the model and are divided into a full-quantity mode and an on-demand application.
Based on the method, the system realizes that a plurality of simulation tasks and samples run in a parallel state in a cluster, optimizes and provides the cluster resource utilization rate, can improve the resource utilization rate and reliability of the full-digital signal level simulation platform of the radar system, enhances the control granularity of simulation resources and flexibly manages the system, and improves the running efficiency of the full-digital signal level simulation test of the radar system on the whole.
Example 4
Referring to fig. 5, fig. 5 shows that the present embodiment provides a parallel device for large sample load simulation, which includes:
The scheduling module is used for judging the request type of the simulation task based on the simulation task request, traversing the sample and pre-starting a simulation service model in the sample if the request type is a scheduling sample, and managing resources corresponding to the sample if the request type is a closing sample;
The distribution module is used for judging whether the sample can be started successfully or not based on the resource demand of the sample and the resource available quantity of the corresponding working node, if so, applying for the resource required by the sample according to the application strategy, if not, adding the sample into the tail of the global queue, and controlling the starting sequence of the sample and storing the sample needing to start simulation by the global queue;
and the starting module is used for starting the simulation service model in the sample to carry out a simulation test based on the resources required by the sample and the application strategy.
Optionally, the method further comprises:
and the parameter verification module is used for verifying parameters of the simulation task, traversing the sample and starting the simulation service model when the parameters pass the verification.
Optionally, the starting module includes:
the full application starting unit is used for controlling the simulation service model to start only by sending a control instruction by the engine if the application strategy is a full application strategy;
And the on-demand application starting unit is used for indicating the starting of the simulation service models according to the starting strategy of each simulation service model in the sample by the engine if the application strategy is the on-demand application strategy.
Optionally, the method further comprises:
And the scheduling information storage unit is used for storing the scheduling information of all models in the current sample.
Optionally, the method further comprises:
And the running state monitoring module is used for monitoring the running state of the sample, and if the running of the sample is finished, releasing the resources of the sample and starting the next sample based on the global queue.
Based on the method, the device judges whether the task is to schedule the sample according to the type of the task request, if yes, judges whether the sample can be started successfully according to the demand of different samples on the resource and the resource usable quantity of the corresponding working node, judges whether the sample can be started successfully, applies for the resource of the sample, if the sample can not be started successfully, adds the sample into a global queue, controls the starting sequence of the sample through the global queue, and continuously performs simulation test on the sample applied to the resource according to the application strategy, thereby realizing the scheduling of a plurality of samples or the timely scheduling of a plurality of samples under the condition of limited resources so as to ensure that a plurality of simulation tasks can be performed simultaneously and realizing the efficient utilization of the resource, and solving the problem that the current traditional radar system simulation test method can not realize the parallel scheduling of the simulation sample under the condition of limited resources, so that the time cost for executing the simulation test task is too high.
Example 5
Referring to fig. 6, the present embodiment provides an electronic device, which includes a processor, an internal bus, a network interface, a memory, and a nonvolatile memory, and may include hardware required by other services. The processor reads the corresponding computer program from the nonvolatile memory to the memory and then runs the computer program to form a parallel method oriented to large sample load simulation on a logic level. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.
The network interface, processor and memory may be interconnected by a bus system. The buses may be classified into address buses, data buses, control buses, and the like.
The memory is used for storing programs. In particular, the program may include program code including computer-operating instructions. The memory may include read only memory and random access memory and provide instructions and data to the processor.
The processor is used for executing the program stored in the memory and specifically executing:
step 102, judging the request type of the simulation task based on the simulation task request, if the request type is a scheduling sample, traversing the sample and pre-starting a simulation service model in the sample, and if the request type is a closing sample, managing resources corresponding to the sample;
Step 104, judging whether the sample can be started successfully or not based on the resource demand of the sample and the resource usable quantity of the corresponding working node, if so, applying for the resource required by the sample according to the application strategy, if not, adding the sample into the tail of a global queue, and controlling the starting sequence of the sample and storing the sample needing to start simulation by the global queue;
and 106, starting a simulation service model in the sample to carry out a simulation test based on the resources required by the sample and the application strategy.
The processor may be an integrated circuit chip having signal processing capabilities. In implementation, each step of the above method may be implemented by an integrated logic circuit of hardware of a processor or an instruction in a software form.
Based on the same invention, the embodiments of the present specification also provide a computer-readable storage medium storing one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to perform a parallel method for large sample load simulation provided by the corresponding embodiments of fig. 1 to 2.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-readable storage media having computer-usable program code embodied therein.
In addition, for the device embodiments described above, since they are substantially similar to the method embodiments, the description is relatively simple, and references to the parts of the description of the method embodiments are only required. Also, it should be noted that in the respective modules of the system of the present application, components thereof are logically divided according to functions to be implemented, but the present application is not limited thereto, and the respective components may be re-divided or combined as needed.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the particular order shown, or the sequential order shown, is not necessarily required to achieve desirable results in the course of drawing figures, and in some embodiments, multitasking and parallel processing may be possible or advantageous.
The foregoing description is only exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims (5)

1. A parallel method for large sample load simulation, comprising:
Judging the request type of a simulation task based on a simulation task request, if the request type is a scheduling sample, traversing the sample and pre-starting a simulation service model in the sample, and if the request type is a closing sample, managing resources corresponding to the sample;
Judging whether the sample can be started successfully or not based on the resource demand of the sample and the resource usable quantity of the corresponding working node, if so, applying for the resource required by the sample according to an application strategy, and if not, adding the sample into the tail of a global queue, wherein the global queue controls the starting sequence of the sample and stores the sample needing to start simulation;
starting a simulation service model in the sample to carry out a simulation test based on the resources required by the sample and the application strategy;
Before the sample is traversed and the simulation service model in the sample is pre-started, checking parameters of the simulation task is further included, and if the parameters pass the checking, the sample is traversed and the simulation service model is started;
The method for judging whether the sample can be started successfully based on the resource demand of the sample and the resource available amount of the corresponding working node is to judge whether the sample starting resource is sufficient based on the resource demand of the sample and the resource available amount of the current corresponding working node, if so, the starting is successful, and if not, the starting is failed;
The application strategies comprise a full-volume application strategy and an on-demand application strategy, wherein the full-volume application strategy is a resource required by all simulation service models in an application sample, and the on-demand application strategy is a resource required by the simulation service models in the application sample in the same period;
Based on the resources required by the sample and the application strategy, the step of starting the simulation service model in the sample to carry out the simulation test comprises the following steps:
If the application strategy is a full-quantity application strategy, only an engine is required to send out a control instruction to control the start of the simulation service model;
if the application strategy is an on-demand application strategy, the engine is required to guide the simulation service models to start according to the starting strategy of each simulation service model in the sample;
The starting strategy is formulated for each simulation service model by the engine based on the resource information required by each simulation service model in the sample and the applied resource information.
2. The method of claim 1, further comprising storing scheduling information for all models in a current sample after said launching the simulation service model in the sample for simulation testing.
3. The method of claim 1, further comprising monitoring an operational status of the sample after the launching of the simulation service model in the sample for simulation testing, and if the sample is running, releasing resources of the sample and launching a next sample based on a global queue.
4. The method of claim 1, wherein the step of managing resources corresponding to the samples comprises:
Acquiring scheduling information of the sample based on the sample identification of the sample;
And releasing the resources called by the sample based on the scheduling information.
5. A parallel device for large sample load simulation, comprising:
the scheduling module is used for judging the request type of the simulation task based on the simulation task request, traversing the sample and pre-starting a simulation service model in the sample if the request type is a scheduling sample, and managing resources corresponding to the sample if the request type is a closing sample;
The distribution module is used for judging whether the sample can be started successfully or not based on the resource demand of the sample and the resource available quantity of the corresponding working node, if so, applying for the resource required by the sample according to the application strategy, if not, adding the sample into the tail of a global queue, and controlling the starting sequence of the sample and storing the sample needing to start simulation by the global queue;
The starting module is used for starting a simulation service model in the sample to carry out a simulation test based on the resources required by the sample and the application strategy;
Before the sample is traversed and the simulation service model in the sample is pre-started, checking parameters of the simulation task is further included, and if the parameters pass the checking, the sample is traversed and the simulation service model is started;
The method for judging whether the sample can be started successfully based on the resource demand of the sample and the resource available amount of the corresponding working node is to judge whether the sample starting resource is sufficient based on the resource demand of the sample and the resource available amount of the current corresponding working node, if so, the starting is successful, and if not, the starting is failed;
The application strategies comprise a full-volume application strategy and an on-demand application strategy, wherein the full-volume application strategy is a resource required by all simulation service models in an application sample, and the on-demand application strategy is a resource required by the simulation service models in the application sample in the same period;
Based on the resources required by the sample and the application strategy, the step of starting the simulation service model in the sample to carry out the simulation test comprises the following steps:
If the application strategy is a full-quantity application strategy, only an engine is required to send out a control instruction to control the start of the simulation service model;
if the application strategy is an on-demand application strategy, the engine is required to guide the simulation service models to start according to the starting strategy of each simulation service model in the sample;
The starting strategy is formulated for each simulation service model by the engine based on the resource information required by each simulation service model in the sample and the applied resource information.
CN202310465935.6A 2023-04-26 2023-04-26 Parallel method for large sample load simulation Active CN116578415B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310465935.6A CN116578415B (en) 2023-04-26 2023-04-26 Parallel method for large sample load simulation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310465935.6A CN116578415B (en) 2023-04-26 2023-04-26 Parallel method for large sample load simulation

Publications (2)

Publication Number Publication Date
CN116578415A CN116578415A (en) 2023-08-11
CN116578415B true CN116578415B (en) 2024-06-04

Family

ID=87542394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310465935.6A Active CN116578415B (en) 2023-04-26 2023-04-26 Parallel method for large sample load simulation

Country Status (1)

Country Link
CN (1) CN116578415B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101436959A (en) * 2008-12-18 2009-05-20 中国人民解放军国防科学技术大学 Method for distributing and scheduling parallel artificial tasks based on background management and control architecture
CN104463492A (en) * 2014-12-23 2015-03-25 国家电网公司 Operation management method of electric power system cloud simulation platform
CN111866187A (en) * 2020-06-30 2020-10-30 中科院计算所西部高等技术研究院 Task scheduling method of distributed deep learning reasoning cloud platform
CN113486036A (en) * 2021-07-07 2021-10-08 广州博冠信息科技有限公司 Virtual resource management method and device, electronic equipment and storage medium
KR20220006490A (en) * 2021-12-29 2022-01-17 케이웨어 (주) Hybrid cloud resource allocation method for workload dynamic resource placement and optimization performance management
CN114500530A (en) * 2021-12-31 2022-05-13 北方信息控制研究院集团有限公司 Automatic adjustment method for civil edge information system
CN115391035A (en) * 2022-08-22 2022-11-25 北京计算机技术及应用研究所 Method for collaborative management and scheduling of heterogeneous computing resources
CN115712501A (en) * 2022-11-11 2023-02-24 江苏徐工工程机械研究院有限公司 Cloud simulation method and system suitable for engineering machinery
CN115828646A (en) * 2023-02-21 2023-03-21 中国人民解放军国防科技大学 Method and device for simulating short-term scheme adjustment strategy in operation of space station under emergency

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101436959A (en) * 2008-12-18 2009-05-20 中国人民解放军国防科学技术大学 Method for distributing and scheduling parallel artificial tasks based on background management and control architecture
CN104463492A (en) * 2014-12-23 2015-03-25 国家电网公司 Operation management method of electric power system cloud simulation platform
CN111866187A (en) * 2020-06-30 2020-10-30 中科院计算所西部高等技术研究院 Task scheduling method of distributed deep learning reasoning cloud platform
CN113486036A (en) * 2021-07-07 2021-10-08 广州博冠信息科技有限公司 Virtual resource management method and device, electronic equipment and storage medium
KR20220006490A (en) * 2021-12-29 2022-01-17 케이웨어 (주) Hybrid cloud resource allocation method for workload dynamic resource placement and optimization performance management
CN114500530A (en) * 2021-12-31 2022-05-13 北方信息控制研究院集团有限公司 Automatic adjustment method for civil edge information system
CN115391035A (en) * 2022-08-22 2022-11-25 北京计算机技术及应用研究所 Method for collaborative management and scheduling of heterogeneous computing resources
CN115712501A (en) * 2022-11-11 2023-02-24 江苏徐工工程机械研究院有限公司 Cloud simulation method and system suitable for engineering machinery
CN115828646A (en) * 2023-02-21 2023-03-21 中国人民解放军国防科技大学 Method and device for simulating short-term scheme adjustment strategy in operation of space station under emergency

Also Published As

Publication number Publication date
CN116578415A (en) 2023-08-11

Similar Documents

Publication Publication Date Title
CN108845884B (en) Physical resource allocation method, device, computer equipment and storage medium
CN109144710B (en) Resource scheduling method, device and computer readable storage medium
CN100495346C (en) Performing thread distribution method for multi-nucleus multi-central processing unit
US9875139B2 (en) Graphics processing unit controller, host system, and methods
CN103593242A (en) Resource sharing control system based on Yarn frame
CN102868573B (en) Method and device for Web service load cloud test
CN111104208B (en) Process scheduling management method, device, computer equipment and storage medium
CN112862098B (en) Cluster training task processing method and system
CN113434284B (en) Privacy computation server side equipment, system and task scheduling method
CN113986534A (en) Task scheduling method and device, computer equipment and computer readable storage medium
Jiang et al. Symbiosis: Network-aware task scheduling in data-parallel frameworks
CN114389955B (en) Method for managing heterogeneous resource pool of embedded platform
CN113157411B (en) Celery-based reliable configurable task system and device
CN111190691A (en) Automatic migration method, system, device and storage medium suitable for virtual machine
CN112948113A (en) Cluster resource management scheduling method, device, equipment and readable storage medium
CN117311990B (en) Resource adjustment method and device, electronic equipment, storage medium and training platform
CN111506407B (en) Resource management and job scheduling method and system combining Pull mode and Push mode
CN116578415B (en) Parallel method for large sample load simulation
US20150212859A1 (en) Graphics processing unit controller, host system, and methods
CN116578416B (en) Signal-level simulation acceleration method based on GPU virtualization
CN116974994A (en) High-efficiency file collaboration system based on clusters
CN112925640B (en) Cluster training node distribution method and electronic equipment
CN112214286B (en) Container starting method and device and electronic equipment
CN113900811A (en) Event-driven task scheduling method and device
CN113407305A (en) Task deployment method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant