CN101901042A - Method for reducing power consumption based on dynamic task migrating technology in multi-GPU (Graphic Processing Unit) system - Google Patents

Method for reducing power consumption based on dynamic task migrating technology in multi-GPU (Graphic Processing Unit) system Download PDF

Info

Publication number
CN101901042A
CN101901042A CN2010102641204A CN201010264120A CN101901042A CN 101901042 A CN101901042 A CN 101901042A CN 2010102641204 A CN2010102641204 A CN 2010102641204A CN 201010264120 A CN201010264120 A CN 201010264120A CN 101901042 A CN101901042 A CN 101901042A
Authority
CN
China
Prior art keywords
gpu
power consumption
task
utilization factor
reducing power
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2010102641204A
Other languages
Chinese (zh)
Other versions
CN101901042B (en
Inventor
过敏意
马曦
朱寅
郑龙
沈耀
周憬宇
曹朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN2010102641204A priority Critical patent/CN101901042B/en
Publication of CN101901042A publication Critical patent/CN101901042A/en
Application granted granted Critical
Publication of CN101901042B publication Critical patent/CN101901042B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Power Sources (AREA)
  • Stored Programmes (AREA)

Abstract

The invention relates to a method for reducing power consumption based on a dynamic task migrating technology in a multi-GPU (Graphic Processing Unit) system in the technical field of computers, which comprises the following steps of: respectively mounting a GPU utilization ratio monitor on each GPU to obtain an average utilization ratio of each GPU in T time; when the utilization ratio of the i GPU is R1, migrating all tasks on the i GPU on the GPU the utilization ratio of which is R2 and shutting down the GPU; when the utilization ratio of the j GPU is 100 percent, migrating part of tasks on the j GPU on the GPU the utilization ratio of which is R3; when the utilization ratio of all running GPUs exceeds a threshold value R4, and a system is provided with the shutdown GPU, automatically starting a shutdown GPU by the system and allocating a new calculating task to the just started GPU; and continuously repeating the steps till all the GPUs are used for running a program. The invention has the function of monitoring a real-time source utilization ratio and can effectively reduce the power consumption of the GPUs and optimize the communication among the GPUs.

Description

In many GPU system based on the reducing power consumption method of dynamic task migrating technology
Technical field
What the present invention relates to is a kind of method of field of computer technology, specifically is based on the reducing power consumption method of dynamic task migrating technology in a kind of many GPU (Graphic Processing Unit, graphics processing unit) system.
Background technology
In recent years, GPU has obtained high speed development, and it is very suitable for high-efficiency and low-cost ground and carries out extensive high performance parallel numerical evaluation.GPU is a notion of deriving out from CPU (central processing unit), and it is the key component of video card, and the video card plate carries internal memory or shared CPU internal memory constitutes a subsystem by monopolizing, and becomes the key of PC system figure handling property.Increasing graphical application makes the status of GPU seem more and more important in the modern computer, this chip after CPU dominates the PC performance to move towards decades in recent years in emerge rapidly, in some applications even reached the status on an equal footing with CPU.NVIDIA company has taken the lead in proposing the notion of GPU when having issued the GeForce256 video card in 1999, this video card has reduced the dependence to CPU, and shares the part work of CPU, particularly when 3-D view is handled.The core technology that GPU adopted has hardware T﹠amp; L (Transform and Lighting, polygon conversion is handled with light source), cube environment texturing and fixed point is mixed, unity and coherence in writing compresses and concavo-convex mapping pinup picture, 256 render engines of dual texture pixel etc., their appearance has greatly improved the performance of machine aspect graphics process.GPGPU (General-purpose computing on graphics processing units, general GPU) is a kind of professional graphic process unit that can be engaged in script by the general-purpose computations task of central processing unit processing.Define from complete GPGPU, it not only can carry out graphics process, and can finish the computing work of CPU, thereby is fit to high-performance calculation more, and supports the programming language of higher level, and is more powerful on performance and versatility.Use from the GPGPU of narrow sense, GPGPU is exactly function intensified GPU, and the advantage of GPU also is the advantage of GPGPU naturally, and it has remedied the wretched insufficiency of CPU floating-point operation ability.Compare with GPU, the weakness of CPU maximum is exactly the floating-point operation scarce capacity.No matter be Intel's or AMD's CPU product, its floating-point operation ability is mostly below tens Gflops (per second 1,000,000,000 times) at present, and the floating-point operation ability of GPU just has been several times as much as the main flow processor in the time of 2006.
The Tesla C2050 of NVIDIA company issue in 2010 has 448 to handle core, bandwidth of memory reaches 144GB/S, power consumption is 247W, double-precision floating point computing and single-precision floating point computing have reached 515Gflops and 1Tflops (per second TFlops floating-point operation) especially, aspect Floating-point Computation, GPU has the irreplaceable high-performance of CPU as can be seen.It can effectively utilize powerful processing power of GPU and huge bandwidth of memory to carry out graph rendering calculating in addition, be widely used in fields such as Flame Image Process, video transmission, signal Processing, artificial intelligence, pattern-recognition, financial analysis, numerical evaluation, petroleum prospecting, astronomical calculating, fluid mechanics, biological computation, Molecular Dynamics Calculation, data base administration, coding encrypting, and CPU has obtained the speed-up ratio of one to two order of magnitude in these fields, has obtained the achievement that attracts people's attention.
Even if yet the GPGPU handling property surpasses one to two order of magnitude of common CP U, this still is difficult to satisfy the requirement of people to high-performance calculation concerning large-scale application system.Parallel computation can solve those problems that can reduce program runtime by a large amount of parallel computations to a certain extent.Present parallel system mainly realizes by distributed system, computer cluster, polycaryon processor and GPU.Development sequence is come in the main MPI storehouse of using in distributed system and the cluster; Mainly in the polycaryon processor use that POSIX develops multithread programs in OpenMP and the linux system; Mainly contain the GLSL of HLSL, the OpenGL of Microsoft, the RTSL of Stanford University at the exploitation of GPU, and on the innovation framework of up-to-date " Fermi " GPU of NVIDIA, the developer uses the CUDA programmed environment, in this environment, no matter select C language, C++, OpenCL, DirectCompute still to select the Fortran language, can both realize the parallel mechanism of application program, the developer can use NVIDIA Parallel Nsight instrument simultaneously.Aspect unit, GPU makes the ordinary desktop computing machine become individual supercomputer.NVIDIA Tesla people's supercomputer for example, it has nearly 960 parallel processing cores, 1T Flops floating-point operation ability based on revolutionary NVIDIA CUDA parallel computation framework, be equivalent to the arithmetic capability that a data center group system is had, thus faster more energy-conservation.
Different qualities according to CPU and GPU, CPU with operation system and Database Systems is a core, the GPU that calculates with the processing large-scale parallel is that coprocessor is current main flow framework, many GPU system satisfies the inexorable trend of following user's high performance demands, and many GPU system refers on the interior mainboard of a cabinet here a plurality of GPU.Yet have a technical problem here, though promptly the performance of GPU is fine, this has higher requirement to user's written program, should meet the programming model of different GPU frameworks, also will write out the program of different degree of concurrence at different GPU.
Champagne branch school, University of California Shane Ryoo, people such as Christopher I.Rodrigues 2008 are at parallel field top-level meeting PPoPP (Principles and Practice of Parallel Programming, the multiple programming principle and put into practice) on point out in the article of by name " Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA " (the using optimization principles and application performance evaluation of the multithreading GPU of CUDA) of delivering, on GeForce 8800GTX video card, different application will reach has more than 3000 thread parallel to carry out simultaneously, could hide the time bottleneck that bandwidth bottleneck between internal memory and the video memory and GPU read overall video memory.This GeForce8800GTX video card has 16 stream handles, 8 nuclears are arranged on each stream handle, have 108 physics kernels, handle core and have 448 on the Tesla C2050 video card, power consumption reaches 247W, if make full use of that these kernels reach or, have 3000 above thread parallels to carry out just passable simultaneously near its theoretical peak double-precision quantity 515Gflops.Therefore for the GPU cluster of forming by many GPU main frame that occurs on the high-end market, the Tesla S2050 video card 1U computing system of NVIDIA company for example, four GPU are arranged on the mainboard, double precision Theoretical Calculation peak value reaches 2.0TFLOPS, power consumption reaches 900W, efficent use of resources and reach the purpose of saving energy consumption how is one of problem of being concerned about most of GPU fabricator and application developer.
Mainly contain in the existing task allocation technique that three companies: NVIDIA is tall and handsome to be reached, the lucid company of AMD and Israel based on many GPU system.Wherein, tall and handsome what reach that company adopts is that SLI is Scalable Link Interface (expansion url interface) technology, main reference source http://www.slizone.com/page/slizone_learn.html: this technology can only be used for interconnected between the video card of same model; What AMD adopted is CrossFire (cross fire) technology, main reference source: http://game.amd.com/us-en/crossfirex_about.aspx, this technology can be applicable to interconnected between the video card of ATI different series, the user can abandon original video card in the upgrading video card like this, reaches the purpose of not wasting resource; Lucid company mainly adopts HYDRA Engine technology, HYDRA Engine is exactly the moderator of GPU, be responsible for the Task Distribution work of all arithmetic elements, its principal character not only can so that the video card of same brand different model work simultaneously, and can move the video card of different brands simultaneously, has very strong compatibility, main reference source: http://www.lucid-tech.com/.More than three kinds of technology mainly be from the Task Distribution angle, according to the load of application program, resource is calculated fast, and accurate Resources allocation, avoid causing the wasting of resources.If but above three kinds of technology and reckon without the situation that computational resource substantially exceeds computation requirement, for example but the thread of concurrent operation is limited, only one of needs or several GPU just can satisfy calculation requirement, the situation of a lot of GPU computing unit free time can occur like this, and cause unnecessary energy charge.
Summary of the invention
The objective of the invention is to overcome the above-mentioned deficiency of prior art, the reducing power consumption method based on dynamic task migrating technology is provided in a kind of many GPU system.The present invention to other GPU, and makes original GPU close by the task of migration on the GPU, thereby can obviously improve the utilization factor of GPU, reaches the beneficial effect of saving power consumption.
The present invention is achieved by the following technical solutions, the present invention includes following steps:
The first step, a GPU utilization factor monitor is set on each GPU respectively monitors that SP all on the GPU (stream processing unit) carries out the times N 1 of computing in time T, obtain each GPU at the average utilization μ=N1/N2 of T in the time, wherein: N2 is the theoretical peak of this GPU in T calculated amount in the time.
Second step when the utilization factor of i GPU is R1, was on the GPU of R2 to utilization factor with the whole task immigrations on i the GPU then, carried out for the 3rd step; When the utilization factor of j GPU is 100%, be on the GPU of R3 to utilization factor with the part task immigration on j the GPU, carried out for the 4th step.
The span of described R1 is: 0%≤R1<20%.
The span of described R2 is: 25%≤R2<90%.
The span of described R3 is: 25%≤R2<90%.
Described migration, be: the content update among register on the GPUA and the cache at different levels is to the sheet of SP in the internal memory, memory content is updated among the GPU B by the SLI connector on the sheet among the GPUA among the SP, GPU B directly visits the global memory of GPU A, and the video memory of each GPU is with annular spread in the system, thus with the task immigration on the GPU A to GPU B.
In the 3rd step, after i GPU was transmitted to the GPU that utilization factor is R2 with new task, the utilization factor that i GPU no longer receives new task and i GPU was 0, closed i GPU automatically at this moment, carried out for the 4th step.
In the 4th step, when the utilization factor of all GPU that moving all surpassed threshold value R4 and system and has buttoned-up GPU, system started a buttoned-up GPU automatically, and give the GPU of firm startup with new distribution of computation tasks this moment.
The value of described threshold value R4 is: 80%≤R4≤90%.
The 5th step constantly repeated above-mentioned steps, until all GPU working procedure all.
Compared with prior art, the invention has the beneficial effects as follows:
1. the function for monitoring that has real-time resource utilization.The present invention does not need software intervention, and as long as come directly the utilization factor of GPU is monitored to have good actual effect by hardware, can in time finish the effectively start of GPU and closes.
2. can effectively reduce the power consumption of GPU, because the degree of parallelism of present a lot of application programs reaches the ability of supporting up to ten thousand the concurrent execution of thread that many GPU system can provide far away, therefore in a lot of application programs, can be by reducing the vacant time of GPU, it is closed reach the purpose of saving energy consumption, the performance that reduces power consumption is according to the different degree of parallelism of application program and difference, in the 4GPU system, and the highest power consumption that reduces this many GPU system 75%.
3. optimize the communication between the GPU, in commercial at present many GPU system, communication between the GPU mainly is to realize by PCI-Express or SLI connector, the former can only satisfy in the common application communicating requirement between the GPU, and the latter is a NVIDIA company at the GPU of GeForce 6600GT and later upgraded version, and it makes the theoretical peak of the communication speed between the GPU reach 1GB/S.
Description of drawings
Fig. 1 is the system construction drawing that monitors the GPU utilization factor among the embodiment;
Fig. 2 is a task immigration synoptic diagram among the embodiment.
Embodiment
Below in conjunction with accompanying drawing method of the present invention is further described: present embodiment is being to implement under the prerequisite with the technical solution of the present invention, provided detailed embodiment and concrete operating process, but protection scope of the present invention is not limited to following embodiment.
Embodiment
System is with the tall and handsome Tesla S2050 1U computing system that reaches company number for " Fermi " framework in the present embodiment, there are 4 GPU in this system, the highest 515 Gigaflop (giga flops) the double precision peak performance that realizes of each GPU, thereby in the 1U space, can realize the double precision performance of 2 Teraflops, the 900W TDP.Concrete reducing power consumption may further comprise the steps:
The first step is provided with a GPU utilization factor monitor respectively and monitors that SP all on the GPU carries out the times N 1 of double-precision floating point computing in time T on each GPU, obtain each GPU at the average utilization μ=N1/N2 of T in the time.
N1 is a number of times of carrying out the double-precision floating point computing in time T in the present embodiment, and N2 is the theoretical peak 5.15 hundred million double-precision floating point number of calculations of each GPU in T calculated amount in the time, and T is 1 microsecond.
Each GPU has m SP in the present embodiment, each SP has n processing unit, can be that unit carries out SIMD (single instruction multiple data) operation with SP during program run, therefore every microsecond of using special register to write down SP in each SP is carried out operation times and is Si, be specially: whenever the control module of PM can preserved the calculated amount that this instruction will be issued ALU specially by a register when ALU (computational logic unit) sends data, according to the frequency of video card can accumulation calculating go out 1 delicate in the operand Si of each SP, the calculated amount addition of m different SP then, can draw the calculated amount of this GPU, compare with the delicate calculated amount of the theoretical peak 1 of GPU then, can draw the utilization factor Ri of this delicate interior GPU, as shown in Figure 1.
Second step when the utilization factor of i GPU is R1, was on the GPU of R2 to utilization factor with the whole task immigrations on i the GPU then, carried out for the 3rd step; When the utilization factor of j GPU is 100%, be on the GPU of R3 to utilization factor with the part task immigration on j the GPU, carried out for the 4th step.
What use in the present embodiment is to share overall video memory technology, and when not having task immigration between GPU, each GPU uses specified memory; When between GPU task immigration being arranged, as when the task on the video card A need be moved to video card B, at first by in the internal memory on the sheet of the content update among register on the video card A and the cache at different levels in the SP, memory content is updated among the video card B by the SLI connector on the sheet among the video card A among the SP, and video card B can directly visit the global memory of video card A, reduced volume of transmitted data so effectively, time-delay visit during simultaneously for the data in other video memory of less visit, the video memory of several GPU is with annular spread, as shown in Figure 2.
R1 is less than 20% in the present embodiment, and R2 is between 25% to 75%, and R3 is between 25% to 75%.
In the 3rd step, after i GPU was transmitted to the GPU that utilization factor is R2 with new task, the utilization factor that i GPU no longer receives new task and i GPU was 0, closed i GPU automatically at this moment, carried out for the 4th step.
The 4th step when the utilization factor of all GPU that moving all surpasses threshold value R4 and system and has buttoned-up GPU, then started a buttoned-up GPU automatically, and will be newly to Task Distribution reach more than the R5 for the video card of startup just up to the utilization factor of this video card.
R4 is 90% in the present embodiment, and R5 is 30%.
The 5th step constantly repeated above-mentioned steps, until all GPU working procedure all.
Present embodiment utilizes the video card of this 4GPU to calculate protein molecule field based on quantum chemistry, and the SPATIAL CALCULATION resolution that is adopted is 256X 256X 256, calculates 3 proteinoid molecular field visual Calculation results: 1A30,1GCV and 1DPS.Wherein, 1A30 is a HIV-1 proteinase, contains 201 amino acid residues, and conformation is the equal dimer that is respectively contained the C2 symmetry that 99 amino acid whose polypeptied chains form by 1 little impedance agent and 2, include 2 die bodys in each monomer, all form by antiparallel βZhe Die; 1GCV is a kind of haemoglobin, contains 552 amino acid residues; 1DPS is a DPS protein, contains 1855 amino acid residues, and its each monomer is roughly similar with the folded situation of ferritin, has aperture on its three-fold symmetry axle, and center tool cavity, this cavity are the important activity zones of ferric ion combination and release.
Because the complexity difference of three kinds of protein, the operand when calculating its molecular field is also inequality.Wherein the relative simple computation amount of 1A30 protein molecular structure is minimum, and its degree of parallelism also is minimum; The 1DPS protein molecular structure is the most complicated, and the parallel thread when carrying out this molecular field is also maximum, can relatively make full use of the GPU computational resource.In the present embodiment, have only when the parallel thread that is carved with during in difference among each GPU more than 10000, the utilization factor of 4 GPU just can reach more than 90%, and when simulated albumin matter molecular field, parallel thread quantity does not reach 10000 under many circumstances, reaches the purpose of saving energy consumption by closing GPU like this.When calculating 1A30 protein molecule field, if the power consumption of each GPU is unit 1, be 4 operation time altogether, the power consumption that needs when classic method is utilized GPU so is 1*4*4=16, and during the task migrating technology that proposes with present embodiment, the time of utilizing 4 GPU simultaneously is 0.5, utilizing 3 GPU times is 1.5, utilizing 2 GPU times is 1.5, utilizing 1 GPU time is 1, and be 4.5 total computing time, and power consumption is 4*0.5+3*1.5+2*1.5+1=10.5, budget speed is slow relatively 0.5/4=12.5%, the saving energy consumption is 5.5/16=34.4%.When calculating 1DPS protein molecule field, if the power consumption of each GPU is unit 1, be 15 operation time altogether, the power consumption that needs when classic method is utilized GPU so is 1*4*15=60, and during the task migrating technology that proposes with present embodiment, the time of utilizing 4 GPU simultaneously is 5, utilizing 3 GPU times is 6, utilizing 2 GPU times is 4, utilizing 1 GPU time is 2, and be 17 total computing time, and power consumption is 4*6+3*6+2*4+1*2=50, budget speed is slow relatively 2/17=11.8%, the saving energy consumption is 10/60=16.7%.
This shows that this task dynamic migration technology based on many GPU system that present embodiment proposes does not produce under the prerequisite of considerable influence performance in assurance, reaches the purpose of saving power consumption.

Claims (6)

  1. More than one kind in the GPU system based on the reducing power consumption method of dynamic task migrating technology, it is characterized in that, may further comprise the steps:
    The first step, a GPU utilization factor monitor is set on each GPU respectively monitors that SP all on the GPU carries out the times N 1 of computing in time T, obtain each GPU at the average utilization μ=N1/N2 of T in the time, wherein: N2 is the theoretical peak of this GPU in T calculated amount in the time;
    Second step when the utilization factor of i GPU is R1, was on the GPU of R2 to utilization factor with the whole task immigrations on i the GPU then, carried out for the 3rd step; When the utilization factor of j GPU is 100%, be on the GPU of R3 to utilization factor with the part task immigration on j the GPU, carried out for the 4th step;
    In the 3rd step, after i GPU was transmitted to the GPU that utilization factor is R2 with new task, the utilization factor that i GPU no longer receives new task and i GPU was 0, closed i GPU automatically at this moment, carried out for the 4th step;
    In the 4th step, when the utilization factor of all GPU that moving all surpassed threshold value R4 and system and has buttoned-up GPU, system started a buttoned-up GPU automatically, and give the GPU of firm startup with new distribution of computation tasks this moment;
    The 5th step constantly repeated above-mentioned steps, until all GPU working procedure all.
  2. 2. based on the reducing power consumption method of dynamic task migrating technology, it is characterized in that the span of described R1 is: 0%≤R1<20% in many GPU according to claim 1 system.
  3. 3. based on the reducing power consumption method of dynamic task migrating technology, it is characterized in that the span of described R2 is: 25%≤R2<90% in many GPU according to claim 1 system.
  4. 4. based on the reducing power consumption method of dynamic task migrating technology, it is characterized in that the span of described R3 is: 25%≤R2<90% in many GPU according to claim 1 system.
  5. 5. in many GPU according to claim 1 system based on the reducing power consumption method of dynamic task migrating technology, it is characterized in that, described migration, be: the content update among register on the GPU A and the cache at different levels is to the sheet of SP in the internal memory, memory content is updated among the GPU B by the SLI connector on the sheet among the GPU A among the SP, GPU B directly visits the global memory of GPU A, and in the system video memory of each GPU with annular spread, thereby with the task immigration on the GPU A to GPU B.
  6. 6. based on the reducing power consumption method of dynamic task migrating technology, it is characterized in that the value of described threshold value R4 is: 80%≤R4≤90% in many GPU according to claim 1 system.
CN2010102641204A 2010-08-27 2010-08-27 Method for reducing power consumption based on dynamic task migrating technology in multi-GPU (Graphic Processing Unit) system Expired - Fee Related CN101901042B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010102641204A CN101901042B (en) 2010-08-27 2010-08-27 Method for reducing power consumption based on dynamic task migrating technology in multi-GPU (Graphic Processing Unit) system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010102641204A CN101901042B (en) 2010-08-27 2010-08-27 Method for reducing power consumption based on dynamic task migrating technology in multi-GPU (Graphic Processing Unit) system

Publications (2)

Publication Number Publication Date
CN101901042A true CN101901042A (en) 2010-12-01
CN101901042B CN101901042B (en) 2011-07-27

Family

ID=43226638

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010102641204A Expired - Fee Related CN101901042B (en) 2010-08-27 2010-08-27 Method for reducing power consumption based on dynamic task migrating technology in multi-GPU (Graphic Processing Unit) system

Country Status (1)

Country Link
CN (1) CN101901042B (en)

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102385668A (en) * 2011-09-19 2012-03-21 浙江大学 Method for predicting interaction loci based on protein molecular field
CN102857533A (en) * 2011-07-01 2013-01-02 云联(北京)信息技术有限公司 Remote interaction type system on basis of cloud computing
CN102880785A (en) * 2012-08-01 2013-01-16 北京大学 Method for estimating transmission energy consumption of source code grade data directed towards GPU program
CN103019367A (en) * 2012-12-03 2013-04-03 福州瑞芯微电子有限公司 Embedded type GPU (Graphic Processing Unit) dynamic frequency modulating method and device based on Android system
CN103037109A (en) * 2012-12-12 2013-04-10 中国联合网络通信集团有限公司 Multicore equipment energy consumption management method and device
CN103077082A (en) * 2013-01-08 2013-05-01 中国科学院深圳先进技术研究院 Method and system for distributing data center load and saving energy during virtual machine migration
CN103105895A (en) * 2011-11-15 2013-05-15 辉达公司 Computer system and display cards thereof and method for processing graphs of computer system
CN103150212A (en) * 2011-12-06 2013-06-12 曙光信息产业股份有限公司 Method and device for realizing quantum mechanics calculation
CN103428228A (en) * 2012-05-14 2013-12-04 辉达公司 Graphic display card for conducting cooperative calculation through wireless technology
CN103577269A (en) * 2012-08-02 2014-02-12 英特尔公司 Media workload scheduler
WO2014105303A1 (en) * 2012-12-27 2014-07-03 Intel Corporation Methods, systems and apparatus to manage power consumption of a graphics engine
CN104407920A (en) * 2014-12-23 2015-03-11 浪潮(北京)电子信息产业有限公司 Data processing method and system based on inter-process communication
CN105046638A (en) * 2015-08-06 2015-11-11 骆凌 Processor system and image data processing method thereof
CN106211511A (en) * 2016-07-25 2016-12-07 青岛海信电器股份有限公司 The method of adjustment of horse race lamp rolling speed and display device
CN107122245A (en) * 2017-04-25 2017-09-01 上海交通大学 GPU task dispatching method and system
CN108022269A (en) * 2017-11-24 2018-05-11 中国航空工业集团公司西安航空计算技术研究所 A kind of modeling structure of GPU compressed textures storage Cache
CN108170525A (en) * 2016-12-07 2018-06-15 晨星半导体股份有限公司 The device and method of the task load configuration of dynamic adjustment multi-core processor
CN108694151A (en) * 2017-04-09 2018-10-23 英特尔公司 Computing cluster in universal graphics processing unit is seized
CN109753134A (en) * 2018-12-24 2019-05-14 四川大学 A kind of GPU inside energy consumption control system and method based on overall situation decoupling
CN109992385A (en) * 2019-03-19 2019-07-09 四川大学 A kind of inside GPU energy consumption optimization method of task based access control balance dispatching
CN110399252A (en) * 2019-07-19 2019-11-01 广东浪潮大数据研究有限公司 A kind of data back up method, device, equipment and computer readable storage medium
CN110457135A (en) * 2019-08-09 2019-11-15 重庆紫光华山智安科技有限公司 A kind of method of resource regulating method, device and shared GPU video memory
CN111651131A (en) * 2020-05-18 2020-09-11 武汉联影医疗科技有限公司 Image display method and device and computer equipment
CN111930593A (en) * 2020-07-27 2020-11-13 长沙景嘉微电子股份有限公司 GPU occupancy rate determination method, device, processing system and storage medium
CN112000468A (en) * 2020-08-03 2020-11-27 苏州浪潮智能科技有限公司 GPU management device and method based on detection and adjustment module and GPU server
CN112181124A (en) * 2020-09-11 2021-01-05 华为技术有限公司 Method for power consumption management and related device
CN113157407A (en) * 2021-03-18 2021-07-23 浙大宁波理工学院 Dynamic task migration scheduling method for parallel processing of video compression in GPU
US11262831B2 (en) 2018-08-17 2022-03-01 Hewlett-Packard Development Company, L.P. Modifications of power allocations for graphical processing units based on usage
CN116954929A (en) * 2023-09-20 2023-10-27 四川并济科技有限公司 Dynamic GPU scheduling method and system for live migration

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7256788B1 (en) * 2002-06-11 2007-08-14 Nvidia Corporation Graphics power savings system and method
CN101231552A (en) * 2007-01-24 2008-07-30 惠普开发有限公司 Regulating power consumption
US20090135180A1 (en) * 2007-11-28 2009-05-28 Siemens Corporate Research, Inc. APPARATUS AND METHOD FOR VOLUME RENDERING ON MULTIPLE GRAPHICS PROCESSING UNITS (GPUs)

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7256788B1 (en) * 2002-06-11 2007-08-14 Nvidia Corporation Graphics power savings system and method
CN101231552A (en) * 2007-01-24 2008-07-30 惠普开发有限公司 Regulating power consumption
US20090135180A1 (en) * 2007-11-28 2009-05-28 Siemens Corporate Research, Inc. APPARATUS AND METHOD FOR VOLUME RENDERING ON MULTIPLE GRAPHICS PROCESSING UNITS (GPUs)

Cited By (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102857533A (en) * 2011-07-01 2013-01-02 云联(北京)信息技术有限公司 Remote interaction type system on basis of cloud computing
CN102857533B (en) * 2011-07-01 2015-11-18 云联(北京)信息技术有限公司 A kind of long-distance interactive system based on cloud computing
CN102385668A (en) * 2011-09-19 2012-03-21 浙江大学 Method for predicting interaction loci based on protein molecular field
CN103105895A (en) * 2011-11-15 2013-05-15 辉达公司 Computer system and display cards thereof and method for processing graphs of computer system
CN103150212B (en) * 2011-12-06 2016-04-06 曙光信息产业股份有限公司 The implementation method of Quantum mechanical calculation and device
CN103150212A (en) * 2011-12-06 2013-06-12 曙光信息产业股份有限公司 Method and device for realizing quantum mechanics calculation
US9256914B2 (en) 2012-05-14 2016-02-09 Nvidia Corporation Graphic card for collaborative computing through wireless technologies
CN103428228A (en) * 2012-05-14 2013-12-04 辉达公司 Graphic display card for conducting cooperative calculation through wireless technology
CN102880785A (en) * 2012-08-01 2013-01-16 北京大学 Method for estimating transmission energy consumption of source code grade data directed towards GPU program
CN103577269A (en) * 2012-08-02 2014-02-12 英特尔公司 Media workload scheduler
CN103019367B (en) * 2012-12-03 2015-07-08 福州瑞芯微电子有限公司 Embedded type GPU (Graphic Processing Unit) dynamic frequency modulating method and device based on Android system
CN103019367A (en) * 2012-12-03 2013-04-03 福州瑞芯微电子有限公司 Embedded type GPU (Graphic Processing Unit) dynamic frequency modulating method and device based on Android system
CN103037109B (en) * 2012-12-12 2015-02-25 中国联合网络通信集团有限公司 Multicore equipment energy consumption management method and device
CN103037109A (en) * 2012-12-12 2013-04-10 中国联合网络通信集团有限公司 Multicore equipment energy consumption management method and device
WO2014105303A1 (en) * 2012-12-27 2014-07-03 Intel Corporation Methods, systems and apparatus to manage power consumption of a graphics engine
US9098282B2 (en) 2012-12-27 2015-08-04 Intel Corporation Methods, systems and apparatus to manage power consumption of a graphics engine
US9460483B2 (en) 2012-12-27 2016-10-04 Intel Corporation Methods, systems and apparatus to manage power consumption of a graphics engine
CN103077082A (en) * 2013-01-08 2013-05-01 中国科学院深圳先进技术研究院 Method and system for distributing data center load and saving energy during virtual machine migration
CN103077082B (en) * 2013-01-08 2016-12-28 中国科学院深圳先进技术研究院 A kind of data center loads distribution and virtual machine (vm) migration power-economizing method and system
CN104407920A (en) * 2014-12-23 2015-03-11 浪潮(北京)电子信息产业有限公司 Data processing method and system based on inter-process communication
CN104407920B (en) * 2014-12-23 2018-02-09 浪潮(北京)电子信息产业有限公司 A kind of data processing method and system based on interprocess communication
CN105046638A (en) * 2015-08-06 2015-11-11 骆凌 Processor system and image data processing method thereof
CN105046638B (en) * 2015-08-06 2019-05-21 骆凌 Processor system and its image processing method
CN106211511A (en) * 2016-07-25 2016-12-07 青岛海信电器股份有限公司 The method of adjustment of horse race lamp rolling speed and display device
CN108170525A (en) * 2016-12-07 2018-06-15 晨星半导体股份有限公司 The device and method of the task load configuration of dynamic adjustment multi-core processor
CN108694151A (en) * 2017-04-09 2018-10-23 英特尔公司 Computing cluster in universal graphics processing unit is seized
CN107122245A (en) * 2017-04-25 2017-09-01 上海交通大学 GPU task dispatching method and system
CN107122245B (en) * 2017-04-25 2019-06-04 上海交通大学 GPU task dispatching method and system
CN108022269A (en) * 2017-11-24 2018-05-11 中国航空工业集团公司西安航空计算技术研究所 A kind of modeling structure of GPU compressed textures storage Cache
US11262831B2 (en) 2018-08-17 2022-03-01 Hewlett-Packard Development Company, L.P. Modifications of power allocations for graphical processing units based on usage
CN109753134A (en) * 2018-12-24 2019-05-14 四川大学 A kind of GPU inside energy consumption control system and method based on overall situation decoupling
CN109753134B (en) * 2018-12-24 2022-04-15 四川大学 Global decoupling-based GPU internal energy consumption control system and method
CN109992385B (en) * 2019-03-19 2021-05-14 四川大学 GPU internal energy consumption optimization method based on task balance scheduling
CN109992385A (en) * 2019-03-19 2019-07-09 四川大学 A kind of inside GPU energy consumption optimization method of task based access control balance dispatching
CN110399252A (en) * 2019-07-19 2019-11-01 广东浪潮大数据研究有限公司 A kind of data back up method, device, equipment and computer readable storage medium
CN110457135A (en) * 2019-08-09 2019-11-15 重庆紫光华山智安科技有限公司 A kind of method of resource regulating method, device and shared GPU video memory
CN111651131A (en) * 2020-05-18 2020-09-11 武汉联影医疗科技有限公司 Image display method and device and computer equipment
CN111651131B (en) * 2020-05-18 2024-02-27 武汉联影医疗科技有限公司 Image display method and device and computer equipment
CN111930593A (en) * 2020-07-27 2020-11-13 长沙景嘉微电子股份有限公司 GPU occupancy rate determination method, device, processing system and storage medium
CN112000468A (en) * 2020-08-03 2020-11-27 苏州浪潮智能科技有限公司 GPU management device and method based on detection and adjustment module and GPU server
CN112000468B (en) * 2020-08-03 2023-02-24 苏州浪潮智能科技有限公司 GPU management device and method based on detection and adjustment module and GPU server
CN112181124A (en) * 2020-09-11 2021-01-05 华为技术有限公司 Method for power consumption management and related device
CN112181124B (en) * 2020-09-11 2023-09-01 华为技术有限公司 Method for managing power consumption and related equipment
CN113157407A (en) * 2021-03-18 2021-07-23 浙大宁波理工学院 Dynamic task migration scheduling method for parallel processing of video compression in GPU
CN113157407B (en) * 2021-03-18 2024-03-01 浙大宁波理工学院 Dynamic task migration scheduling method for parallel processing video compression in GPU
CN116954929A (en) * 2023-09-20 2023-10-27 四川并济科技有限公司 Dynamic GPU scheduling method and system for live migration
CN116954929B (en) * 2023-09-20 2023-12-01 四川并济科技有限公司 Dynamic GPU scheduling method and system for live migration

Also Published As

Publication number Publication date
CN101901042B (en) 2011-07-27

Similar Documents

Publication Publication Date Title
CN101901042B (en) Method for reducing power consumption based on dynamic task migrating technology in multi-GPU (Graphic Processing Unit) system
Brodtkorb et al. Graphics processing unit (GPU) programming strategies and trends in GPU computing
Hong-Tao et al. K-means on commodity GPUs with CUDA
CN113383310A (en) Pulse decomposition within matrix accelerator architecture
CN113424162A (en) Dynamic memory reconfiguration
Prakash et al. Energy-efficient execution of data-parallel applications on heterogeneous mobile platforms
Rajovic et al. Experiences with mobile processors for energy efficient HPC
Wong et al. Pangaea: a tightly-coupled IA32 heterogeneous chip multiprocessor
US11042370B2 (en) Instruction and logic for systolic dot product with accumulate
US10560892B2 (en) Advanced graphics power state management
Wynters Parallel processing on NVIDIA graphics processing units using CUDA
Pospichal et al. Parallel genetic algorithm solving 0/1 knapsack problem running on the gpu
US20210103433A1 (en) Kernel fusion for machine learning
US20210267095A1 (en) Intelligent and integrated liquid-cooled rack for datacenters
US20200372337A1 (en) Parallelization strategies for training a neural network
Farber Topical perspective on massive threading and parallelism
CN112233010A (en) Partial write management in a multi-block graphics engine
Fang et al. Parallel Computation of Non-Bonded Interactions in Drug Discovery: Nvidia GPUs vs. Intel Xeon Phi.
Haidar et al. Optimization for performance and energy for batched matrix computations on GPUs
CN103049329A (en) High-efficiency system based on central processing unit (CPU)/many integrated core (MIC) heterogeneous system structure
Gupta et al. Performance Analysis of GPU compared to Single-core and Multi-core CPU for Natural Language Applications
Wang Power analysis and optimizations for GPU architecture using a power simulator
US11822926B2 (en) Device link management
Singh et al. Accelerating smith-waterman on heterogeneous cpu-gpu systems
US20220309017A1 (en) Multi-format graphics processing unit docking board

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110727

Termination date: 20140827

EXPY Termination of patent right or utility model