CN108257076B - Low-power-consumption gated clock for unified dyeing array of graphics processor - Google Patents

Low-power-consumption gated clock for unified dyeing array of graphics processor Download PDF

Info

Publication number
CN108257076B
CN108257076B CN201711283981.5A CN201711283981A CN108257076B CN 108257076 B CN108257076 B CN 108257076B CN 201711283981 A CN201711283981 A CN 201711283981A CN 108257076 B CN108257076 B CN 108257076B
Authority
CN
China
Prior art keywords
clock
stainer
array
dyeing
unified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711283981.5A
Other languages
Chinese (zh)
Other versions
CN108257076A (en
Inventor
张骏
郑新建
任向隆
韩立敏
刘航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Aeronautics Computing Technique Research Institute of AVIC
Original Assignee
Xian Aeronautics Computing Technique Research Institute of AVIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Aeronautics Computing Technique Research Institute of AVIC filed Critical Xian Aeronautics Computing Technique Research Institute of AVIC
Priority to CN201711283981.5A priority Critical patent/CN108257076B/en
Publication of CN108257076A publication Critical patent/CN108257076A/en
Application granted granted Critical
Publication of CN108257076B publication Critical patent/CN108257076B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining

Abstract

The invention relates to the technical field of computer hardware, and discloses a low-power gated clock of a unified dyeing array of a graphic processor, which comprises the following components: the system comprises a unified dyeing array module (1), a clock and power consumption control module (2) and a dyeing task scheduling unit (3); the uniform dyeing array module (1) is composed of at least two stainer clusters and is responsible for executing vertex or pixel dyeing programs in parallel; the clock and power consumption control module (2) is responsible for generating working clocks of all functional modules in the graphic processor and realizing the clock gating function of all stainer clusters; the dyeing task scheduling unit (3) is responsible for scheduling vertex and pixel dyeing tasks and determining on which stainer cluster a certain vertex or pixel dyeing program is processed.

Description

Low-power-consumption gated clock for unified dyeing array of graphics processor
Technical Field
The invention relates to the technical field of computer hardware, in particular to a low-power-consumption gated clock of a unified dyeing array of a graphic processor.
Background
With the increasing of graphics applications, it is difficult for early solutions of graphics rendering by CPU alone to meet the graphics Processing requirements of performance and technology growth, and Graphics Processing Units (GPUs) have come into play. From 1999, the first GPU product released by Nvidia to date, the development of GPU technology mainly goes through the fixed function pipeline stage, the separation stainer architecture stage, and the unified stainer architecture stage, the graphics processing capability of the GPU technology is continuously improved, and the application field is gradually expanded from the initial graphics drawing to the general computing field. The GPU pipeline has high speed, parallel characteristics and flexible programmability, and provides a good running platform for graphic processing and general parallel computing.
At present, no GPU based on a unified dyeing framework exists in China, and a large number of foreign imported commercial GPU chips are adopted in display control systems in various fields. Particularly, in the military field, the foreign imported commercial GPU chip has the defects of poor temperature and environmental adaptability, incapability of ensuring that the circuit or supporting software has no back door, contains a large number of redundant functional units which are not needed in the military field, incapability of meeting the requirements on power consumption indexes, quick update of the commercial GPU chip, difficulty in meeting the continuous guarantee of weaponry and the like, and has great hidden dangers in the aspects of safety, reliability, guarantee and the like. Moreover, for political, military, economic reasons and the like, technology blocking and product monopoly are carried out in China abroad, and bottom technical data of the GPU chip, such as register data, detailed internal micro-architecture, core software source codes and the like, are difficult to obtain, so that the functions and the performances of the GPU cannot be fully exerted, and the portability is poor; the problems seriously restrict the independent development and autonomous development of the display system in China.
Particularly, the design technology of the low power consumption of the unified dyeing array of the graphics processor is a core technology of a GPU graphics processing system structure, breaks through the key technology of the low power consumption design of the GPU, and is urgent to develop a high-performance graphics processor chip.
Disclosure of Invention
The invention discloses a low-power-consumption gated clock of a unified dyeing array of a graphic processor, which can realize real-time dynamic gating of any number of stainer cluster working clocks which do not exceed the upper limit of the number of stainer clusters in the unified dyeing array according to the behavior characteristics and the data volume of a graphic drawing scene, thereby reducing the power consumption of the unified dyeing array.
The technical solution of the invention is as follows:
a graphics processor unified dye array low power gated clock, comprising:
the system comprises a unified dyeing array module (1), a clock and power consumption control module (2) and a dyeing task scheduling unit (3);
the uniform dyeing array module (1) is composed of at least two stainer clusters and is responsible for executing vertex or pixel dyeing programs in parallel;
the clock and power consumption control module (2) is responsible for generating working clocks of all functional modules in the graphic processor and realizing the clock gating function of all stainer clusters;
the dyeing task scheduling unit (3) is responsible for scheduling vertex and pixel dyeing tasks and determining on which stainer cluster a certain vertex or pixel dyeing program is processed.
The clock and power consumption control module (2) provides independent working clocks for at least two stainer clusters in the unified staining array;
the clock and power consumption control module (2) comprises a unified dyeing array clock gating control register aiming at least two stainer clusters in the unified dyeing array module (1); the clock gating control register of the uniform dyeing array at least comprises N bits, and the size of N is consistent with the number of stainer clusters in the uniform dyeing array; each bit in the clock gating control register of the uniform dyeing array corresponds to one stainer cluster in the uniform dyeing array and is used for realizing gating of a working clock of the uniform dyeing array;
the clock and power consumption control module (2) comprises independent clock gating circuits for at least two stainer clusters in the unified staining array; each clock gating circuit corresponds to one stainer cluster, and under the condition that the corresponding bit in the clock gating control register of the unified staining array is 1, the working clock of the corresponding stainer cluster is closed, so that the gating of the working clock of the corresponding stainer cluster is realized;
the dyeing task scheduling unit (3) monitors the current use condition of the task field resources in the unified dyeing array, and configures the corresponding bit in the clock gating control register of the unified dyeing array as 0 under the condition that a certain stainer cluster is idle for a long time, so that the gating of the working clock of the stainer cluster is realized.
The invention has the technical effects that:
1. generally, the unified dye array is a complete functional module, and existing low power consumption technologies control the complete functional module as a whole, such as a dynamic adjustment technology of an operating voltage, a dynamic scaling technology of a clock frequency, and the like. Considering that the interior of the unified staining array is actually composed of a plurality of independent stainer clusters, the unified staining array can be completely independently controlled from the viewpoint of low power consumption. Therefore, the scheme provided by the invention not only can realize the clock gating of the whole unified dyeing array as a whole, but also can realize the independent clock gating of each stainer cluster at a finer granularity level, thereby realizing the real-time dynamic gating of the working clocks of any number of stainer clusters which do not exceed the upper limit of the number of stainer clusters in the unified dyeing array according to the behavior characteristics and the data volume of the graphic drawing scene. The power consumption of the uniform dyeing array can be dynamically adjusted, and the flexibility of controlling the power consumption of the uniform dyeing array can be obviously enhanced.
Drawings
FIG. 1 is a block diagram of a low power gated clock scheme for a unified shader array of a graphics processor according to the present invention.
Detailed Description
The technical solution of the present invention will be clearly and completely described below with reference to the accompanying drawings and the specific embodiments. It is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than the whole embodiments, and that all other embodiments, which can be derived by a person skilled in the art without inventive step based on the embodiments of the present invention, belong to the scope of protection of the present invention.
As shown in fig. 1, a clock gating with low power consumption for a unified shader array of a graphics processor includes:
the system comprises a unified dyeing array module 1, a clock and power consumption control module 2 and a dyeing task scheduling unit 3;
the uniform dyeing array module 1 is composed of at least two stainer clusters and is responsible for executing vertex or pixel dyeing programs in parallel;
the clock and power consumption control module 2 is responsible for generating working clocks of all functional modules in the graphic processor and realizing the clock gating function of all stainer clusters;
the dyeing task scheduling unit 3 is responsible for scheduling vertex and pixel dyeing tasks and determining on which stainer cluster a certain vertex or pixel dyeing program is processed.
The clock and power consumption control module 2 provides independent working clocks for at least two stainer clusters in the unified staining array;
the clock and power consumption control module 2 comprises a unified dyeing array clock gating control register aiming at least two stainer clusters in the unified dyeing array module 1; the clock gating control register of the uniform dyeing array at least comprises N bits, and the size of N is consistent with the number of stainer clusters in the uniform dyeing array; each bit in the clock gating control register of the uniform dyeing array corresponds to one stainer cluster in the uniform dyeing array and is used for realizing gating of a working clock of the uniform dyeing array;
the clock and power consumption control module 2 comprises independent clock gating circuits for at least two stainer clusters in the unified staining array; each clock gating circuit corresponds to one stainer cluster, and under the condition that the corresponding bit in the clock gating control register of the unified staining array is 1, the working clock of the corresponding stainer cluster is closed, so that the gating of the working clock of the corresponding stainer cluster is realized;
the dyeing task scheduling unit 3 monitors the current use condition of the task field resources in the unified dyeing array, and configures the corresponding bit in the clock gating control register of the unified dyeing array as 0 under the condition that a certain stainer cluster is idle for a long time, thereby realizing the gating of the working clock of the stainer cluster.
Examples
As shown in fig. 1, the dyeing task scheduling unit monitors the current use condition of the task field resources in the unified dyeing array, configures a corresponding bit in the clock gating control register of the unified dyeing array as 0 when one or more stainer clusters are idle for a long time, and configures a bit corresponding to the clock gating control register of the unified dyeing array of one or more stainer clusters in a busy state as 1; the unified dye array clock gates the operating clock of the dye clusters corresponding to the bit of 0 in the control register to be turned off, i.e. gated.
The clock and power consumption control module includes a unified staining array clock gating control register for a plurality of stainer clusters in the unified staining array. The clock gating control register of the uniform dyeing array at least comprises N bits, and the size of N is consistent with the number of stainer clusters in the uniform dyeing array. Each bit in the clock gating control register of the uniform dyeing array corresponds to one stainer cluster in the uniform dyeing array and is used for realizing the gating of the working clock of the uniform dyeing array.
The clock and power consumption control module provides independent working clocks for a plurality of stainer clusters in the unified dyeing array, and even if the PLL generates the working clocks of M stainer clusters, the number of M is consistent with that of stainer clusters in the unified dyeing array; the M stainer cluster working clocks are respectively connected to a plurality of independent clock gating circuits of the clock and power consumption control module. After being processed by the gating circuit, the output clock is connected to M stainer clusters in the uniform staining array.
The clock and power consumption control module contains multiple independent clock gating circuits for multiple stainer clusters in a unified staining array. Each clock gating circuit corresponds to one stainer cluster, and under the condition that the corresponding bit in the clock gating control register of the unified staining array is 0, the working clock of the corresponding stainer cluster is closed, so that the gating of the working clock of the corresponding stainer cluster is realized.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art; the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (4)

1. A graphics processor unified dye array low power gated clock, comprising:
the system comprises a unified dyeing array module (1), a clock and power consumption control module (2) and a dyeing task scheduling unit (3);
the uniform dyeing array module (1) is composed of at least two stainer clusters and is responsible for executing vertex or pixel dyeing programs in parallel;
the clock and power consumption control module (2) is responsible for generating working clocks of all functional modules in the graphic processor, realizing the clock gating function of all stainer clusters, and performing real-time dynamic gating on any number of stainer cluster working clocks which do not exceed the upper limit of the number of stainer clusters in the unified staining array;
the dyeing task scheduling unit (3) is responsible for scheduling vertex and pixel dyeing tasks and determining on which stainer cluster a certain vertex or pixel dyeing program is processed.
2. The graphics processor unified shader array low power gated clock scheme of claim 1, wherein: the clock and power consumption control module (2) provides independent working clocks for at least two stainer clusters in the unified staining array.
3. The graphics processor unified shader array low power gated clock scheme of claim 1, wherein: the clock and power consumption control module (2) comprises a unified dyeing array clock gating control register aiming at least two stainer clusters in the unified dyeing array module (1); the clock gating control register of the uniform dyeing array at least comprises N bits, and the size of N is consistent with the number of stainer clusters in the uniform dyeing array; each bit in the clock gating control register of the uniform dyeing array corresponds to one stainer cluster in the uniform dyeing array and is used for realizing gating of a working clock of the uniform dyeing array;
the clock and power consumption control module (2) comprises independent clock gating circuits for at least two stainer clusters in the unified staining array; each clock gating circuit corresponds to one stainer cluster, and under the condition that the corresponding bit in the clock gating control register of the unified staining array is 1, the working clock of the corresponding stainer cluster is closed, so that the gating of the working clock of the corresponding stainer cluster is realized.
4. The graphics processor unified shader array low power gated clock scheme of claim 1, wherein: the dyeing task scheduling unit (3) monitors the current use condition of the task field resources in the unified dyeing array, and configures the corresponding bit in the clock gating control register of the unified dyeing array as 0 under the condition that a certain stainer cluster is idle for a long time, so that the gating of the working clock of the stainer cluster is realized.
CN201711283981.5A 2017-12-06 2017-12-06 Low-power-consumption gated clock for unified dyeing array of graphics processor Active CN108257076B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711283981.5A CN108257076B (en) 2017-12-06 2017-12-06 Low-power-consumption gated clock for unified dyeing array of graphics processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711283981.5A CN108257076B (en) 2017-12-06 2017-12-06 Low-power-consumption gated clock for unified dyeing array of graphics processor

Publications (2)

Publication Number Publication Date
CN108257076A CN108257076A (en) 2018-07-06
CN108257076B true CN108257076B (en) 2021-10-15

Family

ID=62721003

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711283981.5A Active CN108257076B (en) 2017-12-06 2017-12-06 Low-power-consumption gated clock for unified dyeing array of graphics processor

Country Status (1)

Country Link
CN (1) CN108257076B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109669770B (en) * 2018-12-12 2022-10-11 中国航空工业集团公司西安航空计算技术研究所 Parallel coloring task scheduling unit system of graphics processor

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630441A (en) * 2015-12-11 2016-06-01 中国航空工业集团公司西安航空计算技术研究所 GPU (Graphics Processing Unit) system architecture based on uniform dyeing technology
CN106651744A (en) * 2016-12-12 2017-05-10 中国航空工业集团公司西安航空计算技术研究所 Low power consumption GPU (Graphic Process Unit) staining task and uniform staining array task field mapping structure
CN106709860A (en) * 2016-12-12 2017-05-24 中国航空工业集团公司西安航空计算技术研究所 Debugging structure for GPU unified dyeing processing array
CN106709859A (en) * 2016-12-12 2017-05-24 中国航空工业集团公司西安航空计算技术研究所 Adaptive low-power-consumption clock gating structure of rasterization unit of graphic processing unit

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630441A (en) * 2015-12-11 2016-06-01 中国航空工业集团公司西安航空计算技术研究所 GPU (Graphics Processing Unit) system architecture based on uniform dyeing technology
CN106651744A (en) * 2016-12-12 2017-05-10 中国航空工业集团公司西安航空计算技术研究所 Low power consumption GPU (Graphic Process Unit) staining task and uniform staining array task field mapping structure
CN106709860A (en) * 2016-12-12 2017-05-24 中国航空工业集团公司西安航空计算技术研究所 Debugging structure for GPU unified dyeing processing array
CN106709859A (en) * 2016-12-12 2017-05-24 中国航空工业集团公司西安航空计算技术研究所 Adaptive low-power-consumption clock gating structure of rasterization unit of graphic processing unit

Also Published As

Publication number Publication date
CN108257076A (en) 2018-07-06

Similar Documents

Publication Publication Date Title
US9043629B2 (en) Multi-cluster processing system and method of operating the same
US9311102B2 (en) Dynamic control of SIMDs
US20140136816A1 (en) Scalable computing array
TW201643714A (en) Performing power management in a multicore processor
CN106709859B (en) Self-adaptive low-power-consumption clock gating structure of rasterization unit of graphics processor
US10466769B2 (en) Reducing power consumption during graphics rendering
US20130346781A1 (en) Power Gating Functional Units Of A Processor
CN103080899A (en) Dynamic enabling and disabling of SIMD units in a graphics processor
CN103959197B (en) Reducing power for 3D workloads
WO2016058495A1 (en) Hardware apparatus and method for multiple processors dynamic asymmetric and symmetric mode switching
US20170090551A1 (en) Context aware power management for graphics devices
WO2014074176A1 (en) Dynamically rebalancing graphics processor resources
EP3198364B1 (en) Novel low cost, low power high performance smp/asmp multiple-processor system
CN110226148B (en) Clock divider apparatus and method thereof
KR20210103415A (en) Speech chip and electronic device
CN108257076B (en) Low-power-consumption gated clock for unified dyeing array of graphics processor
CN106780289B (en) Rendering mode self-adaptive based graphics processor uniform dyeing array bypass structure
US9792151B2 (en) Energy efficient burst mode
KR20210045544A (en) Dynamic power monitor monitoring power basted on clock cycle, processor, and system on chip
Aghilinasab et al. Reducing power consumption of GPGPUs through instruction reordering
Hsiao et al. An adaptive thread scheduling mechanism with low-power register file for mobile GPUs
US20120319731A1 (en) Integrated circuit device comprising clock gating circuitry, electronic device and method for dynamically configuring clock gating
US9354694B2 (en) Controlling processor consumption using on-off keying having a maximum off time
Wang et al. A predictive shutdown technique for gpu shader processors
US20180284874A1 (en) Clock gating coupled memory retention circuit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant