CN108710536A - A kind of multi-level fine-grained virtualization GPU method for optimizing scheduling - Google Patents

A kind of multi-level fine-grained virtualization GPU method for optimizing scheduling Download PDF

Info

Publication number
CN108710536A
CN108710536A CN201810285080.8A CN201810285080A CN108710536A CN 108710536 A CN108710536 A CN 108710536A CN 201810285080 A CN201810285080 A CN 201810285080A CN 108710536 A CN108710536 A CN 108710536A
Authority
CN
China
Prior art keywords
scheduling
gpu
ring
event
virtual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810285080.8A
Other languages
Chinese (zh)
Other versions
CN108710536B (en
Inventor
姚建国
赵晓辉
高平
管海兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201810285080.8A priority Critical patent/CN108710536B/en
Publication of CN108710536A publication Critical patent/CN108710536A/en
Application granted granted Critical
Publication of CN108710536B publication Critical patent/CN108710536B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration

Abstract

The invention discloses a kind of multi-level fine-grained virtualization GPU method for optimizing scheduling, respectively with 3 kinds of modes come Optimized Operation strategy:Scheduling based on Time And Event, the seamless scheduling based on assembly line, and mix the scheduling based on ring and based on virtual machine.This 3 kinds of scheduling strategies have been utilized respectively expense caused by two virtual machines switch, virtual machine operation is divided into multiple stages while operation and multiple virtual machines work at the same time this 3 points methods as an optimization using different rings.The present invention is by changing scheduler and scheduling strategy, greatly reduce the expense of handoff procedure, and the parallel execution between multiple virtual GPU is supported, therefore the performance of the shared multiple virtual GPU of a physics GPU can be obviously improved, to promote overall performance.Invention makes the utilization rate of physics GPU be promoted, to further make the performance boost of virtual GPU.In addition, this method ensure that virtual GPU still conforms to quality of service requirement simultaneously.

Description

A kind of multi-level fine-grained virtualization GPU method for optimizing scheduling
Technical field
The present invention relates to GPU vitualization and its task scheduling fields, and in particular to arrives a kind of multi-level fine-grained virtual Change GPU method for optimizing scheduling.Specifically, the performance of GPU vitualization is mainly promoted using the scheduling strategy of optimization.Pass through Optimize GPU scheduling strategies, be fine-grained scheduling by the optimizing scheduling of original coarseness, takes full advantage of original unserviceable Time and resource, in the case where physical condition is constant, promote the performance of virtual GPU to promote the overall utilization rate of GPU.
Background technology
Nowadays, GPU technologies are more and more important in high-performance computing sector, such as AI, deep learning, data analysis, And the fields such as cloud game are required for the participation of GPU.GPU cloud services are also come into being, and Tencent, Ali are proposed respective GPU Cloud Servers are supplied to user as a kind of new calculating pattern.
Therefore GPU vitualization technology has higher requirement for high-performance calculation.Current solution is using complete GPU vitualization (full GPU virtualization) technology.Program advantage is preferably isolation and safety, and The special support of hardware is not needed.But in current full GPU vitualization technology, scheduling strategy granularity is big, still has in performance Prodigious room for promotion.In the present invention, our experimental section is based on GVT-g (Intel Graphics Virtualization Technology for shared vGPU technology, Intel GPU vitualizations technology).The technology allows the GPU of Intel Multiple virtual GPU can be virtualized into, to be used to multiple virtual machines.Although testing the GPU based on Intel, we Method be still a kind of general method.
The every 1 millisecond of scheduling of the existing schedulers of GVT-g is primary, passes through polling dispatching (Round-Robin Scheduling) Mode, the virtual machine that is able to carry out of scheduling selection subsequent time period every time, within this period, the task of the virtual machine (workload) allow to execute on physics GPU.Each virtual machine has certain quality of service requirement, there is three classes, limit (cap), proportion (weight) and priority (priority), original scheduling strategy ensure that service quality reaches given and wants It asks.It is arrived whenever the time, scheduler just determines whether to have completed task.If completing task, scheduler is just New task can be dispatched;Otherwise next task is determined according to the requirement of service quality.The execution of each task It only is likely to really be executed by scheduler, and is serializing on the whole, serial, two tasks will not be by It is performed simultaneously.
This scheduling mode is fairly simple blunt, and convenient for the calculating of service quality.But meanwhile this scheduling strategy grain Degree is thicker, the time for having many free times and being wasted.
Invention content
For the defects in the prior art, the object of the present invention is to provide a kind of multi-level fine-grained virtualization GPU tune Optimization method is spent, 3 kinds of compatible scheduling strategies are largely divided into, is utilized respectively and original do not used from 3 angles Part, to improving performance.This 3 kinds of strategies are respectively:Scheduling based on Time And Event, the seamless scheduling based on assembly line, And mixing is based on ring (ring) and the scheduling based on virtual machine.
The present invention is realized according to following technical scheme:
A kind of multi-level fine-grained virtualization GPU method for optimizing scheduling, which is characterized in that include the following steps:
Step S1:The scheduling based on Time And Event is added, reduces the expense of two virtual GPU switchings;
Step S2:The seamless scheduling based on assembly line is added, allows a part of virtual GPU can be parallel, promotes virtual GPU Efficiency when co-operation;
Step S3:Scheduling of the mixing based on ring and based on virtual machine is added, different virtual machine is allowed concurrently to utilize physics completely GPU promotes overall utilization rate.
In above-mentioned technical proposal, step S1 includes the following steps:
Step S101:Decouple scheduling strategy frame and task dispatcher:Scheduling strategy frame is used for realizing amended Scheduling strategy, and task dispatcher is dispatched to realize;
Step S102:Increase context and complete event, event can be triggered after the completion of context, pass to scheduling strategy Frame, to further trigger corresponding task scheduling;
Step S103:Increase context and submit event, submits event when job scheduler receives context, and at this time When the virtual GPU free time, the event can be handled at once, execute task;
Step S104:Change scheduling strategy frame, to support increased event, scheduling strategy frame in the time of receipt (T of R) or After event, processing is responded, and submits to job scheduler execution;
Step S105:The service quality changed in scheduling strategy frame calculates.
In above-mentioned technical proposal, step S2 includes the following steps:
Step S201:Workflow is divided into audit and shadow stage and scheduling and held by disintegration scheduler flow Row order section, wherein the former is the preparation stage, and the latter is the execution stage;
Step S202:Divide work and submit path, allow multiple work while submitting, makes full use of pipelining scheduling excellent Gesture;
Step S203:Original shadow correlative code is removed task to assign, the code of different phase is separated;This When, whole workflow code is divided into relatively independent two parts, audit and shadow and scheduling and execution, not It can run and be independent of each other simultaneously with the stage, improve efficiency.
Step S204:When each virtual GPU only there are one shadow context when, if virtual GPU is not current at this time GPU, it is just that first job task is shadowed.
In above-mentioned technical proposal, step S3 includes the following steps:
Step S301:Ring or engine scheduling are pressed in introducing, and the support of ring is added, and minimum thread is each from being integrally changed to A ring is worked at the same time and is independent of each other so if work operates on different rings;
Step S302:Modification to task dispatcher will be originally with virtual by all correlative codes from individually array is changed to Machine is that unit is changed to as unit of ring, is related to rescheduling, changes current GPU and ring, next GPU and ring;
Step S303:Modification to task dispatcher reconstructs correlative code logic, will only be supported with virtual in original logic Machine is that the logic of unit is changed to the new logic as unit of ring;
Step S304:To dispatching the modification of policy framework, scheduling data structure is changed to array from single, while running more Scheduling data structure is changed to array and supports to run while multiple rings by a ring;
Step S305:To dispatching the modification of policy framework, time or event triggering are changed to respectively to the branch of each ring It holds;
Step S306:It needs global state consistent by ring scheduling, judges current virtual machine and next void with pointer CRC32 Whether the global state of quasi- machine is consistent;
Step S307:When storing every time or restoring MMIO, the value of CRC32 is calculated, if unanimously, using step S301- Ring scheduling is pressed in 305;If inconsistent, scheduling virtual machine is pressed using original;
Step S308:Maintenance for service quality calculates separately each ring, redefines the ginseng of service quality Number ensures also maintain correct service quality by ring scheduling;
Step S309:The switching for changing by virtual machine and pressing ring scheduling, ensure that program is correctly run.
Compared with prior art, the present invention has following advantageous effect:
The time that the present invention takes full advantage of free time and is wasted, to not change hardware and maintain Service Quality Under the premise of amount, whole performance is improved.Such as a virtual GPU network provider, after the present invention, Ke Yi Identical fund, the lower more overall performances of acquisition of same hardware configuration, are sold to more users, obtain more incomes.And The user of specific objective is completed for desired, such as wants that target program is made to reach 60 frames, then can buy less equipment just Target can be completed, to reduce expense.
Description of the drawings
Upon reading the detailed description of non-limiting embodiments with reference to the following drawings, other feature of the invention, Objects and advantages will become more apparent upon:
Fig. 1 is the scheduling general frame based on Time And Event;
Fig. 2 is the comparison schematic diagram of time-based scheduling and the scheduling based on Time And Event;
Fig. 3 is the seamless scheduling schematic diagram based on assembly line;
Fig. 4 is scheduling schematic diagram of the mixing based on ring and based on virtual machine;
Fig. 5-1 is that the scheduling 3dmark06 based on Time And Event tests score schematic diagram;
Fig. 5-2 is that the scheduling heaven based on Time And Event tests score schematic diagram;
Fig. 6-1 is that the seamless scheduling 3dmark06 based on assembly line tests score schematic diagram;
Fig. 6-2 is that the seamless scheduling heaven based on assembly line tests score schematic diagram;
Fig. 7-1 is that score schematic diagram is tested in the scheduling based on ring;
Fig. 7-2 is scheduling experiment score of the mixing based on ring and based on virtual machine.
Specific implementation mode
With reference to specific embodiment, the present invention is described in detail.Following embodiment will be helpful to the technology of this field Personnel further understand the present invention, but the invention is not limited in any way.It should be pointed out that the ordinary skill of this field For personnel, without departing from the inventive concept of the premise, several changes and improvements can also be made.These belong to the present invention Protection domain.
First in existing dispatching method, triggering scheduling is determined by the time completely, but very may be used after the completion per task There can be the vacant time, when especially small task is in the majority, these vacant times are wasted, therefore present invention adds tasks Completion event triggers scheduler to make full use of the event of this part in advance.Fig. 5-1 is the scheduling based on Time And Event 3dmark06 tests score schematic diagram, and Fig. 5-2 is that the scheduling heaven based on Time And Event tests score schematic diagram, from figure It can be seen that having larger promotion using performance after the present invention.Experiment is shown through this side dispatched based on Time And Event Method, GPU overall performances can promote 3.2%-21.5%.
Then in existing dispatching method, each task be it is ordered executed, the former all it is complete At later, the latter can just be performed.Actually task is segmented into two stages, and the first stage is the preparation stage, the Two-stage is only real execution.The two stages have invoked physics GPU different pieces, therefore can be according to a 2 stage flowing water The mode of line is dispatched.Present invention utilizes the method for this similar pipeline schedule, allow two stages of different task can be by It dispatches simultaneously.Fig. 6-1 is that the seamless scheduling 3dmark06 based on assembly line tests score, and Fig. 6-2 is based on the seamless of assembly line It dispatches heaven and tests score schematic diagram, as can be seen from the figure have larger promotion using part benchmark after this method.Experiment Show that the seamless scheduling based on assembly line, GPU overall performances can promote 0%-19.7% by this.
Finally in existing dispatching method, task is all based on what entire GPU was dispatched as unit, even if multiple tasks tune With different ring/engines, such as image rendering and streaming coding decoding, it is required for that the former is waited for complete.Therefore the present invention draws Enter the method dispatched based on ring, if task needs to call different rings, allows these tasks while called execution.By Inadequate in the support of GPU hardware, the operating system of this method of calling requirement virtual machine is consistent and global state is consistent, if It is inconsistent to use original method call.Fig. 7-1 is that score schematic diagram is tested in the scheduling based on ring, and Fig. 7-2 is that mixing is based on Score is tested in ring and scheduling based on virtual machine.Experiment shows the scheduling based on ring and based on virtual machine by this mixing Method, for two tasks when calling different rings, performance can promote 34.0% and 70.6% respectively, and have little influence on it His part.
A kind of multi-level fine-grained virtualization GPU method for optimizing scheduling of the present invention, which is characterized in that including walking as follows Suddenly:
Step S1:The scheduling based on Time And Event is added, reduces the expense of two virtual GPU switchings;
Step S2:The seamless scheduling based on assembly line is added, allows a part of virtual GPU can be parallel, promotes virtual GPU Efficiency when co-operation;
Step S3:Scheduling of the mixing based on ring and based on virtual machine is added, different virtual machine is allowed concurrently to utilize physics completely GPU promotes overall utilization rate.
Wherein, Fig. 1 is the scheduling general frame based on Time And Event, essentially consists in and adds Time And Event and change Frame.Fig. 2 is the comparison of time-based scheduling and the scheduling based on Time And Event, and top is time-based scheduling, Produce the free time;Lower section is the scheduling based on Time And Event, takes full advantage of the free time.The scheduling based on Time And Event is added Include the following steps:
Step S101:Decouple scheduling strategy frame (scheduling policy framework) and task dispatcher (workload scheduler):Scheduling strategy frame is used for realizing amended scheduling strategy, and task dispatcher comes in fact Now dispatch;Mixing both in original framework can not be revised as customized tactful dispatching method, it is therefore desirable to by its point It opens.Scheduling strategy frame after separating only is responsible for realizing scheduling strategy, and task dispatcher is only responsible for specific implementation scheduling.It is this Modification after the segmentation of Focus is allowed to is possibly realized.
Step S102:Increase context (context) and complete event, event can be triggered after the completion of context, transmit Scheduling strategy frame is given, to further trigger corresponding task scheduling;Original mode only exists the event of time triggered, i.e., Per 1ms, triggering is primary.In the present invention, it on the basis of original scheduler, increases context and completes event, which can be It is triggered after the completion of context, passes to scheduling strategy frame, to further trigger corresponding task scheduling.
Step S103:Increase context and submit event, submits event when job scheduler receives context, and at this time When the virtual GPU free time, the event can be handled at once, execute task;On the basis of original scheduler, increases context and submit thing Part, the event can be periodically executed task in the virtual GPU free time of work at present.Once job scheduler receives context Submission event, and at this time the virtual GPU free time when, can handle the event at once, execute task.In original method, because Lack the event, job scheduler meeting idle waiting wastes the working time.
Step S104:Change scheduling strategy frame, to support increased event, scheduling strategy frame in the time of receipt (T of R) or After event, processing is responded, and submits to job scheduler execution;Time triggered is only supported in original design, is increased now Event triggering, needs the support of scheduling strategy frame.Scheduling strategy frame responds place after time of receipt (T of R) or event Reason, and submit to job scheduler execution.
Step S105:The service quality changed in scheduling strategy frame calculates.In original scheme, in order to keep needs Service quality needs to count corresponding scheduling time, and by being then based on time scheduling, comparison for calculation methods is simple.In the present invention In, due to adding the scheduling based on event, the calculating needs of service quality recalculate, not only need to consider the time, also need Consider the influence that each event is brought.In the present invention, content is calculated to the part and has done algorithm again, to ensure totality Service quality can still reach given requirement.
Fig. 3 is the seamless scheduling schematic diagram based on assembly line, and top is former design, and lower section is modified design, is decomposed Task simultaneously allows streamlined to complete scheduling.Seamless scheduling based on assembly line, includes the following steps:
Step S201:Workflow is divided into audit and shadow (audit&amp by disintegration scheduler flow;Shadow) rank Section and scheduling and execution (scheduling&Execution in the) stage, wherein the former is the preparation stage, and the latter is the execution stage; In original design, workflow ordered can execute, from the beginning each task is held every time there is no by careful decomposition Row to the end, just starts next task.
Step S202:Divide work and submit path, allow multiple work while submitting, makes full use of pipelining scheduling excellent Gesture;It can only support the same time that can be submitted there are one task in original design, the present invention allows multiple.Allow multiple It works while submitting the advantage that could be enjoyed and pipeline scheduling in the present invention.
Step S203:Original shadow correlative code is removed task to assign, the code of different phase is separated;This When, whole workflow code is divided into relatively independent two parts, audit and shadow and scheduling and execution, not It can run and be independent of each other simultaneously with the stage, improve efficiency.At this point, original whole workflow code is divided into relatively Independent two parts, audit and shadow and scheduling and execution.The purpose of segmentation is that different phase can be run and mutual simultaneously It does not influence, therefore when encountering multiple events, the execution stage of previous event can be same with the preparation stage of the latter event Shi Yunhang, to improve efficiency.
Step S204:When each virtual GPU only there are one shadow context when, if virtual GPU is not current at this time GPU, it is just that first job task is shadowed.This step ensure that the dispatching method can correctly be held in all cases It goes, is consistent with original design in correctness.
Fig. 4 is scheduling schematic diagram of the mixing based on ring and based on virtual machine, be respectively compared original design, by ring dispatch with And the scheduling mode of mixed scheduling is different, it is also seen that the advantage of mixed scheduling from figure.Mixing is based on ring (ring) and base In the scheduling of virtual machine, include the following steps:
Step S301:Ring (ring) or engine (engine) scheduling are pressed in introducing, the support of ring are added, by minimum thread It is changed to each ring from whole, so if work operates on different rings, works at the same time and is independent of each other;Original design In, for virtual GPU based on being dispatched as unit of entire virtual machine, virtual machine has respective live load, operates in difference On ring.Under normal circumstances, a GPU has 3 or more rings, each handles different types of task (such as image rendering, stream Media coding decoding etc.).
Step S302:Modification to task dispatcher will be originally with virtual by all correlative codes from individually array is changed to Machine is that unit is changed to as unit of ring, is related to rescheduling, changes current GPU and ring, next GPU and ring;
Step S303:Modification to task dispatcher reconstructs correlative code logic, will only be supported with virtual in original logic Machine is that the logic of unit is changed to the new logic as unit of ring;
Step S304:To dispatching the modification of policy framework, scheduling data structure is changed to array from single, while running more Scheduling data structure is changed to array and supports to run while multiple rings by a ring;Scheduling data structure, scheduling plan are related generally to Slightly equal relevant portions.In original design, it is only necessary to which a structure can complete scheduling.
Step S305:To dispatching the modification of policy framework, time or event triggering are changed to respectively to the branch of each ring It holds;In original design, only time triggered, and only triggering entirety GPU in of the invention, not only needs to support time and thing Part triggers, and is responded with greater need for each ring correspondence.
Step S306:It needs global state consistent by ring scheduling, judges current virtual machine and next void with pointer CRC32 Whether the global state of quasi- machine is consistent;Pointer CRC32 has been directed toward the content of the global state information of virtual GPU, due in the part Piece of content can only be retained in system design by holding, and need to use the pointer as judgement.
Step S307:When storing every time or restoring MMIO, the value of CRC32 is calculated, if unanimously, using step S301- Ring scheduling is pressed in 305;If inconsistent, scheduling virtual machine is pressed using original;The value of CRC32 can find the corresponding overall situation State, when the multiple operating systems executed at present are consistent, which can be multiplexed;If inconsistent, can not just use Press ring scheduling, it is necessary to be switched to original by scheduling virtual machine.
Step S308:Maintenance for service quality calculates separately each ring, redefines the ginseng of service quality Number ensures also maintain correct service quality by ring scheduling;
Step S309:The switching for changing by virtual machine and pressing ring scheduling, ensure that program is correctly run.The present invention is perfect The switching of two kinds of scheduling strategies.
The present invention combines these three dispatching methods, and three kinds of dispatching methods do not conflict, can be by same luck It uses in GPU, therefore the overall performance of GPU can be therefore greatly increased.In specific experiment, present invention uses Intel GPU is as test object, but the method for the present invention is a kind of method of versatility, can be used for other GPU manufacturers systems The GPU made.
Specific embodiments of the present invention are described above.It is to be appreciated that the invention is not limited in above-mentioned Particular implementation, those skilled in the art can make a variety of changes or change within the scope of the claims, this not shadow Ring the substantive content of the present invention.In the absence of conflict, the feature in embodiments herein and embodiment can arbitrary phase Mutually combination.

Claims (4)

1. a kind of multi-level fine-grained virtualization GPU method for optimizing scheduling, which is characterized in that include the following steps:
Step S1:The scheduling based on Time And Event is added, reduces the expense of two virtual GPU switchings;
Step S2:The seamless scheduling based on assembly line is added, allows a part of virtual GPU can be parallel, it is common to promote virtual GPU Efficiency when work;
Step S3:Scheduling of the mixing based on ring and based on virtual machine is added, different virtual machine is allowed concurrently to utilize physics GPU completely, Promote overall utilization rate.
2. a kind of multi-level fine-grained virtualization GPU method for optimizing scheduling according to claim 1, which is characterized in that Step S1 includes the following steps:
Step S101:Decouple scheduling strategy frame and task dispatcher:Scheduling strategy frame is used for realizing amended scheduling Strategy, and task dispatcher is dispatched to realize;
Step S102:Increase context and complete event, event can be triggered after the completion of context, pass to scheduling strategy frame Frame, to further trigger corresponding task scheduling;
Step S103:Increase context and submit event, submits event when job scheduler receives context, and virtual at this time When the GPU free time, the event can be handled at once, execute task;
Step S104:Scheduling strategy frame is changed, to support increased event, scheduling strategy frame is in time of receipt (T of R) or event Afterwards, processing is responded, and submits to job scheduler execution;
Step S105:The service quality changed in scheduling strategy frame calculates.
3. a kind of multi-level fine-grained virtualization GPU method for optimizing scheduling according to claim 1, which is characterized in that Step S2 includes the following steps:
Step S201:Workflow is divided into audit and shadow stage and scheduling and executes rank by disintegration scheduler flow Section, wherein the former is the preparation stage, and the latter is the execution stage;
Step S202:Divide work and submit path, allow multiple work while submitting, makes full use of pipelining scheduling advantage;
Step S203:Original shadow correlative code is removed task to assign, the code of different phase is separated;At this point, will Whole workflow code is divided into relatively independent two parts, audit and shadow and scheduling and execution, in not same order Section can run and be independent of each other simultaneously, improve efficiency.
Step S204:When each virtual GPU only there are one shadow context when, if virtual GPU is not current GPU at this time, It is just that first job task is shadowed.
4. a kind of multi-level fine-grained virtualization GPU method for optimizing scheduling according to claim 1, which is characterized in that Step S3 includes the following steps:
Step S301:Ring or engine scheduling are pressed in introducing, and the support of ring is added, and minimum thread is changed to each from whole Ring is worked at the same time and is independent of each other so if work operates on different rings;
Step S302:Modification to task dispatcher will be originally with virtual machine by all correlative codes from individually array is changed to Unit is changed to as unit of ring, is related to rescheduling, is changed current GPU and ring, next GPU and ring;
Step S303:Modification to task dispatcher reconstructs correlative code logic, will a support be with virtual machine in original logic The logic of unit is changed to the new logic as unit of ring;
Step S304:To dispatching the modification of policy framework, by scheduling data structure from being individually changed to array, while running multiple Scheduling data structure is changed to array and supports to run while multiple rings by ring;
Step S305:To dispatching the modification of policy framework, time or event triggering are changed to the support to each ring respectively;
Step S306:It needs global state consistent by ring scheduling, judges current virtual machine and next virtual machine with pointer CRC32 Global state it is whether consistent;
Step S307:When storing every time or restoring MMIO, the value of CRC32 is calculated, if unanimously, using in step S301-305 Press ring scheduling;If inconsistent, scheduling virtual machine is pressed using original;
Step S308:Maintenance for service quality calculates separately each ring, redefines the parameter of service quality, protects Card can also maintain correct service quality by ring scheduling;
Step S309:The switching for changing by virtual machine and pressing ring scheduling, ensure that program is correctly run.
CN201810285080.8A 2018-04-02 2018-04-02 Multilevel fine-grained virtualized GPU (graphics processing Unit) scheduling optimization method Active CN108710536B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810285080.8A CN108710536B (en) 2018-04-02 2018-04-02 Multilevel fine-grained virtualized GPU (graphics processing Unit) scheduling optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810285080.8A CN108710536B (en) 2018-04-02 2018-04-02 Multilevel fine-grained virtualized GPU (graphics processing Unit) scheduling optimization method

Publications (2)

Publication Number Publication Date
CN108710536A true CN108710536A (en) 2018-10-26
CN108710536B CN108710536B (en) 2021-08-06

Family

ID=63867079

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810285080.8A Active CN108710536B (en) 2018-04-02 2018-04-02 Multilevel fine-grained virtualized GPU (graphics processing Unit) scheduling optimization method

Country Status (1)

Country Link
CN (1) CN108710536B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109656714A (en) * 2018-12-04 2019-04-19 成都雨云科技有限公司 A kind of GPU resource dispatching method virtualizing video card
CN109753134A (en) * 2018-12-24 2019-05-14 四川大学 A kind of GPU inside energy consumption control system and method based on overall situation decoupling
CN109766189A (en) * 2019-01-15 2019-05-17 北京地平线机器人技术研发有限公司 Colony dispatching method and apparatus
CN110442389A (en) * 2019-08-07 2019-11-12 北京技德系统技术有限公司 A kind of shared method using GPU of more desktop environments
CN113274736A (en) * 2021-07-22 2021-08-20 北京蔚领时代科技有限公司 Cloud game resource scheduling method, device, equipment and storage medium
CN113742085A (en) * 2021-09-16 2021-12-03 中国科学院上海高等研究院 Execution port time channel safety protection system and method based on branch filtering
US11321126B2 (en) 2020-08-27 2022-05-03 Ricardo Luis Cayssials Multiprocessor system for facilitating real-time multitasking processing

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102662725A (en) * 2012-03-15 2012-09-12 中国科学院软件研究所 Event-driven high concurrent process virtual machine realization method
CN103336718A (en) * 2013-07-04 2013-10-02 北京航空航天大学 GPU thread scheduling optimization method
CN104714850A (en) * 2015-03-02 2015-06-17 心医国际数字医疗系统(大连)有限公司 Heterogeneous joint account balance method based on OPENCL
CN106663021A (en) * 2014-06-26 2017-05-10 英特尔公司 Intelligent gpu scheduling in a virtualization environment
US20170221173A1 (en) * 2016-01-28 2017-08-03 Qualcomm Incorporated Adaptive context switching
CN107357661A (en) * 2017-07-12 2017-11-17 北京航空航天大学 A kind of fine granularity GPU resource management method for mixed load

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102662725A (en) * 2012-03-15 2012-09-12 中国科学院软件研究所 Event-driven high concurrent process virtual machine realization method
CN103336718A (en) * 2013-07-04 2013-10-02 北京航空航天大学 GPU thread scheduling optimization method
CN106663021A (en) * 2014-06-26 2017-05-10 英特尔公司 Intelligent gpu scheduling in a virtualization environment
CN104714850A (en) * 2015-03-02 2015-06-17 心医国际数字医疗系统(大连)有限公司 Heterogeneous joint account balance method based on OPENCL
US20170221173A1 (en) * 2016-01-28 2017-08-03 Qualcomm Incorporated Adaptive context switching
CN107357661A (en) * 2017-07-12 2017-11-17 北京航空航天大学 A kind of fine granularity GPU resource management method for mixed load

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109656714A (en) * 2018-12-04 2019-04-19 成都雨云科技有限公司 A kind of GPU resource dispatching method virtualizing video card
CN109753134A (en) * 2018-12-24 2019-05-14 四川大学 A kind of GPU inside energy consumption control system and method based on overall situation decoupling
CN109753134B (en) * 2018-12-24 2022-04-15 四川大学 Global decoupling-based GPU internal energy consumption control system and method
CN109766189A (en) * 2019-01-15 2019-05-17 北京地平线机器人技术研发有限公司 Colony dispatching method and apparatus
CN110442389A (en) * 2019-08-07 2019-11-12 北京技德系统技术有限公司 A kind of shared method using GPU of more desktop environments
CN110442389B (en) * 2019-08-07 2024-01-09 北京技德系统技术有限公司 Method for sharing GPU (graphics processing Unit) in multi-desktop environment
US11321126B2 (en) 2020-08-27 2022-05-03 Ricardo Luis Cayssials Multiprocessor system for facilitating real-time multitasking processing
CN113274736A (en) * 2021-07-22 2021-08-20 北京蔚领时代科技有限公司 Cloud game resource scheduling method, device, equipment and storage medium
CN113742085A (en) * 2021-09-16 2021-12-03 中国科学院上海高等研究院 Execution port time channel safety protection system and method based on branch filtering
CN113742085B (en) * 2021-09-16 2023-09-08 中国科学院上海高等研究院 Execution port time channel safety protection system and method based on branch filtering

Also Published As

Publication number Publication date
CN108710536B (en) 2021-08-06

Similar Documents

Publication Publication Date Title
CN108710536A (en) A kind of multi-level fine-grained virtualization GPU method for optimizing scheduling
EP3425502B1 (en) Task scheduling method and device
CN106802826A (en) A kind of method for processing business and device based on thread pool
CN103069389B (en) High-throughput computing method and system in a hybrid computing environment
CN101894047B (en) Kernel virtual machine scheduling policy-based implementation method
CN101751289B (en) Mixed scheduling method of embedded real-time operating system
CN103069390B (en) Method and system for re-scheduling workload in a hybrid computing environment
US20040199927A1 (en) Enhanced runtime hosting
US20200012507A1 (en) Control system for microkernel architecture of industrial server and industrial server comprising the same
CN105183698B (en) A kind of control processing system and method based on multi-core DSP
CN104866374A (en) Multi-task-based discrete event parallel simulation and time synchronization method
CN112463709A (en) Configurable heterogeneous artificial intelligence processor
CN103365718A (en) Thread scheduling method, thread scheduling device and multi-core processor system
CN101694633A (en) Equipment, method and system for dispatching of computer operation
Hirales-Carbajal et al. A grid simulation framework to study advance scheduling strategies for complex workflow applications
CN110795254A (en) Method for processing high-concurrency IO based on PHP
CN105550040A (en) KVM platform based virtual machine CPU resource reservation algorithm
CN114637536A (en) Task processing method, computing coprocessor, chip and computer equipment
CN100440153C (en) Processor
CN111258655A (en) Fusion calculation method and readable storage medium
CN103810041A (en) Parallel computing method capable of supporting dynamic compand
CN115794355B (en) Task processing method, device, terminal equipment and storage medium
CN110928659A (en) Numerical value pool system remote multi-platform access method with self-adaptive function
Aggarwal et al. On the optimality of scheduling dependent mapreduce tasks on heterogeneous machines
Lam et al. Performance guarantee for online deadline scheduling in the presence of overload

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant