JP2008165746A

JP2008165746A - Accelerator, information processor and information processing method

Info

Publication number: JP2008165746A
Application number: JP2007304273A
Authority: JP
Inventors: Hideki Yasukawa; 英樹安川
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2006-12-06
Filing date: 2007-11-26
Publication date: 2008-07-17
Anticipated expiration: 2027-11-26
Also published as: CN101196776B; CN101196776A; JP4945410B2

Abstract

PROBLEM TO BE SOLVED: To provide an accelerator capable of executing a program, including a plurality of operation parts, each operation part capable to execute the program in parallel by determining its own internal sharing between the plurality of operation parts. SOLUTION: The accelerator 3 is an accelerator connected to a PC 2 and capable of executing the program. The accelerator 3 includes the plurality of operation parts 22a capable to execute the program in parallel; an F/V control part 22c for controlling at least either the operation or the throughput of each of the plurality of operation parts 22a; and an operation part 21a for determining at least either the operation or the throughput of each of the plurality of operation parts 22a based on the load information about the program to be executed, and controlling the F/V control part 22c according to the determination. COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、アクセラレータ、情報処理装置及び情報処理方法に関し、特に、情報処理装置に接続可能でプログラムを並列処理により実行可能な複数の演算部を有するアクセラレータ、そのアクセラレータに接続された情報処理装置及び情報処理方法に関する。 The present invention relates to an accelerator, an information processing device, and an information processing method, and in particular, an accelerator having a plurality of arithmetic units that can be connected to the information processing device and can execute a program by parallel processing, an information processing device connected to the accelerator, and The present invention relates to an information processing method.

従来より、演算機能を有する装置を情報処理装置に付加して、実行される処理の一部を、その付加した装置に分担させる技術が知られている。例えば、情報処理装置としてのパーソナルコンピュータ（以下、PCという）に、アクセラレータと呼ばれる演算機能を有する装置を装着して、PC本体の中央処理装置（以下、CPUという）が、アクセラレータにプログラムの処理を分担させ、処理速度の向上を図る技術がある。 2. Description of the Related Art Conventionally, a technique is known in which a device having an arithmetic function is added to an information processing device, and a part of processing to be executed is shared by the added device. For example, a personal computer (hereinafter referred to as a PC) serving as an information processing device is equipped with a device having an arithmetic function called an accelerator, and the central processing unit (hereinafter referred to as a CPU) of the PC main body performs program processing on the accelerator. There is a technology for sharing and improving the processing speed.

最近では、処理分担あるいは処理速度向上を単に図るだけでなく、消費電力を考慮した、本体部にアクセラレータを付加した情報処理装置も、例えば、特開２００３−１５７８５号公報に、提案されている。 Recently, an information processing apparatus in which an accelerator is added to a main body in consideration of power consumption as well as processing sharing or processing speed improvement has been proposed in, for example, Japanese Patent Application Laid-Open No. 2003-15785.

その提案に係る技術によれば、本体部側のCPUが、付加されたアクセラレータの性能情報を読み込み、その性能情報に基づいてアクセラレータの駆動電圧あるいは駆動周波数を決定して設定するようにすることによって、低消費電力モード等に対応したアクセラレータの駆動が可能となる。 According to the technology related to the proposal, the CPU on the main body side reads the performance information of the added accelerator, and determines and sets the driving voltage or driving frequency of the accelerator based on the performance information. In addition, it is possible to drive the accelerator corresponding to the low power consumption mode and the like.

しかし、上記の提案に係る情報処理装置の場合、アクセラレータの駆動電圧等を決定するのは本体部側のCPUであり、そのために、そのCPUがその決定処理を実行しなければならず、CPUにオーバーヘッドが発生する。 However, in the case of the information processing apparatus according to the above proposal, it is the CPU on the main body side that determines the drive voltage of the accelerator, and for that purpose, the CPU must execute the determination process. Overhead occurs.

また、アクセラレータ内部に複数の演算部があるような場合について、上記の提案に係る情報処理装置は、何ら考慮されていない。
特開２００３−１５７８５号公報 In addition, in the case where there are a plurality of arithmetic units inside the accelerator, the information processing apparatus according to the above proposal is not considered at all.
JP 2003-15785 A

本発明は、以上の問題に鑑みてなされたものであり、並列処理によりプログラムを実行可能な複数の演算部を有するアクセラレータが、自らの内部の複数の演算部間の分担を決定して、プログラムを実行可能なアクセラレータ、情報処理装置及び情報処理方法を提供することを目的とする。 The present invention has been made in view of the above problems, and an accelerator having a plurality of arithmetic units capable of executing a program by parallel processing determines sharing among a plurality of arithmetic units within itself, and a program It is an object to provide an accelerator, an information processing apparatus, and an information processing method capable of executing the above.

本発明の一態様によれば、情報処理装置に接続可能で、プログラムを実行可能なアクセラレータであって、前記プログラムを並列処理により実行可能な複数の演算部と、前記複数の演算部のそれぞれの動作及び処理能力の少なくとも一方を制御する動作制御部と、実行する前記プログラムについての負荷情報に基づいて、前記複数の演算部のそれぞれの前記動作及び処理能力の少なくとも一方を決定して、その決定に応じて前記動作制御部を制御する制御部と、を有するアクセラレータを提供することができる。 According to one aspect of the present invention, there is an accelerator that can be connected to an information processing apparatus and that can execute a program, and each of the plurality of arithmetic units that can execute the program by parallel processing, and each of the plurality of arithmetic units Determining at least one of the operation and processing capability of each of the plurality of arithmetic units based on load information about the program to be executed and an operation control unit that controls at least one of operation and processing capability; And an accelerator having a control unit that controls the operation control unit.

本発明によれば、並列処理によりプログラムを実行可能な複数の演算部を有するアクセラレータが、自らの内部の複数の演算部間の分担を決定して、プログラムを実行可能なアクセラレータ、情報処理装置及び情報処理方法を実現することができる。 According to the present invention, an accelerator having a plurality of arithmetic units capable of executing a program by parallel processing determines an assignment among a plurality of internal arithmetic units, an accelerator capable of executing the program, an information processing device, and An information processing method can be realized.

以下、図面を参照して本発明の実施の形態を説明する。
（第１の実施の形態）
まず図１に基づき、本発明の第１の実施の形態に係わる情報処理装置の構成を説明する。図１は、本実施の形態に係わる情報処理装置の構成を示す構成図である。
情報処理装置１は、PCアーキテクチャを有するコンピュータであるPC２を含んで構成されている。PC２には、アクセラレータ３が付加可能に、すなわち接続可能になっている。PC２は、CPU（Central Processing Unit）１１と、MCH（Memory Controller Hub）１２と、ICH（I/O Controller Hub）１３と、GPU（Graphics Processing Unit）１４と、主メモリ１５と、画像メモリとしてのVRAM（Video RAM）１６とを含んで構成される情報処理装置である。よって、このようなPCアーキテクチャを有するPC２にアクセラレータ３が接続されて情報処理装置１が構成されている。なお、本実施の形態では、PCアーキテクチャとして、CPU１１と、MCH１２と、ICH１３と、GPU１４からなるPCアーキテクチャの例を示すが、PCアーキテクチャは、このような構成に限られるものではない。 Embodiments of the present invention will be described below with reference to the drawings.
(First embodiment)
First, the configuration of the information processing apparatus according to the first embodiment of the present invention will be described with reference to FIG. FIG. 1 is a configuration diagram showing the configuration of the information processing apparatus according to the present embodiment.
The information processing apparatus 1 includes a PC 2 that is a computer having a PC architecture. An accelerator 3 can be added to the PC 2, that is, can be connected. The PC 2 includes a CPU (Central Processing Unit) 11, an MCH (Memory Controller Hub) 12, an ICH (I / O Controller Hub) 13, a GPU (Graphics Processing Unit) 14, a main memory 15, and an image memory. An information processing apparatus including a VRAM (Video RAM) 16. Therefore, the information processing apparatus 1 is configured by connecting the accelerator 3 to the PC 2 having such a PC architecture. In the present embodiment, an example of the PC architecture including the CPU 11, the MCH 12, the ICH 13, and the GPU 14 is shown as the PC architecture, but the PC architecture is not limited to such a configuration.

特に、MCH１２は、CPU１１と主メモリ１５との接続等の機能を担う、いわゆるノースブリッジの機能を有する半導体装置のチップである。ICH１３は、PCIバス、USB等を介して、ハードディスク装置（以下、HDDという）１７等の、他の構成要素と結び付ける等、いわゆるサウスブリッジの機能を有する半導体装置のチップであり、ここでは、ICH１３は、USB2、SATA（Serial ATA）、Audio、PCI Express等の規格に応じた各信号の入出力を制御する。また、グラフィック用処理装置であるGPU１４は、いわゆるグラフィックエンジンであり、３次元グラフィックスの表示に必要な計算処理を行う半導体装置のチップである。 In particular, the MCH 12 is a chip of a semiconductor device having a so-called north bridge function that performs functions such as connection between the CPU 11 and the main memory 15. The ICH 13 is a chip of a semiconductor device having a so-called south bridge function, such as a hard disk device (hereinafter referred to as HDD) 17 or the like, which is connected to other components via a PCI bus, USB, or the like. Controls input / output of each signal according to standards such as USB2, SATA (Serial ATA), Audio, and PCI Express. The GPU 14 as a graphic processing device is a so-called graphic engine, and is a chip of a semiconductor device that performs calculation processing necessary for displaying three-dimensional graphics.

演算機能を有する付加装置としてのアクセラレータ（以下、ACと略す）３は、ICH１３に接続されており、さらに、自己のワーキングメモリとしてのRAM（フラッシュメモリ等でもよい）４にも接続されているチップである。周辺デバイスとしてのAC３の構成については後述する。なお、RAM４は、AC３の内部に設けてもよい。
CPU１１は、各種のアプリケーションプログラムを実行可能であり、各種アプリケーションプログラムの中には、負荷量の高いプログラムもあれば、負荷量の低いプログラムもある。従って、CPU１１は、負荷量の高いアプリケーションプログラム、例えば、画像認識のアプリケーションプログラム、動画の再生等のアプリケーションプログラムの実行を、AC３に依頼して、行わせることができる。具体的には、情報処理装置１において、あるアプリケーションプログラムをAC３を用いて実行する場合には、CPU１１がAC３に対して所定のコマンドを出力し、AC３は、そのコマンドを受信してCPU１１により指定されたプログラムの処理を行う。その場合、例えば、AC３は、指定された処理、例えば画像の認識処理、を行う場合、SATA等からのストリーム信号をDMAにより読み込んで、その認識処理を行い、その認識処理した結果データを、DMAにより、GPU１４等に転送して出力する。 An accelerator (hereinafter abbreviated as AC) 3 as an additional device having an arithmetic function is connected to the ICH 13 and further connected to a RAM (may be a flash memory or the like) 4 as its working memory. It is. The configuration of AC3 as a peripheral device will be described later. The RAM 4 may be provided inside the AC 3.
The CPU 11 can execute various application programs. Among the various application programs, there are a program with a high load and a program with a low load. Therefore, the CPU 11 can request the AC 3 to execute an application program having a high load, for example, an application program for image recognition and a reproduction of a moving image. Specifically, in the information processing apparatus 1, when an application program is executed using AC 3, the CPU 11 outputs a predetermined command to the AC 3, and the AC 3 receives the command and designates it by the CPU 11. The processed program is processed. In this case, for example, when AC3 performs a specified process, for example, an image recognition process, the stream signal from SATA or the like is read by DMA, the recognition process is performed, and the result data obtained by the recognition process is transferred to DMA. Then, the data is transferred to the GPU 14 and output.

PCI Expressは、1以上のレーン数を有する。ICH１３とAC３とは、所定のレーン数、例えば、１、２、４、８等のレーン数のPCI Expressにより接続される。レーン数は、BIOS等により設定される。例えば、ICH１３とAC３とは、4レーンのPCI Expressにより接続される。 PCI Express has one or more lanes. The ICH 13 and the AC 3 are connected by PCI Express having a predetermined number of lanes, for example, 1, 2, 4, 8, or the like. The number of lanes is set by the BIOS. For example, the ICH 13 and the AC 3 are connected by 4-lane PCI Express.

なお、図１において点線で示すように、複数のAC３のそれぞれを、PCI Expressの各レーンに接続するようにして、複数のAC３をICH１３に接続するようにしてもよい。その結果、演算処理負荷の高いアプリケーションプログラムに対して、後述するプロセッシングユニットの数を増やして対応することができる。 Note that, as indicated by dotted lines in FIG. 1, each of the plurality of ACs 3 may be connected to each lane of PCI Express, and the plurality of ACs 3 may be connected to the ICH 13. As a result, it is possible to cope with an application program having a high calculation processing load by increasing the number of processing units described later.

さらになお、複数のAC３をICH１３に接続するときに、各AC３とICH１３とは、複数のレーンにより接続するようにしてもよい。 Furthermore, when a plurality of ACs 3 are connected to the ICH 13, each AC 3 and the ICH 13 may be connected by a plurality of lanes.

AC3は、並列処理可能なマルチコア・マルチプロセッサ・アーキテクチャを有する半導体装置のプロセッサであり、各演算部の動作及び処理能力が制御される。 AC3 is a processor of a semiconductor device having a multicore multiprocessor architecture capable of parallel processing, and the operation and processing capability of each arithmetic unit are controlled.

本実施の形態では、AC3は、プログラムを並列処理可能な複数の演算部を含み、AC3は、その指定された処理を実行するときに、自らが複数の演算部間での分担を決定して、各演算部に処理を実行させる。分担の決定では、AC3は、自ら、複数の演算部のどの演算部にその処理を実行させるかを決定し、その処理を実行する演算部に対しては、電力を供給し、かつその実行に際しての動作周波数を決定して設定する。 In the present embodiment, AC3 includes a plurality of arithmetic units that can process a program in parallel, and AC3 determines the sharing among the plurality of arithmetic units when executing the specified processing. Then, each processing unit is caused to execute processing. In the determination of sharing, AC3 decides which computing unit of the plurality of computing units is to execute the process, supplies power to the computing unit that executes the process, and executes the process. Determine and set the operating frequency.

次に、AC３の構成を説明する。図２は、AC３の構成を説明するためのブロック図である。AC３は、制御用プロセッシングユニット（以下、CPEと略す）２１と、複数の、ここでは４つの、プロセッシングユニット（以下、PEと略す）と、インターフェース部（以下、I/F部と略す）２３とを含む。４つのPEは、それぞれをPE２２A,PE２２B,PE２２C,PE２２Dとする。以下、纏めてあるいは１つのPEを指すときはPE２２という。さらに、AC３は、I/F部２４を含み、AC３に接続されたRAM４内のプログラム及びデータを読み出すことができる。CPE２１と各PE２２とI/F部２３とI/F部２４は、互いに内部バス２５を介して接続されている。I/F部２３は、内部バス２５と、PCアーキテクチャのバスとのインターフェースのための回路である。CPE２１は、電源が投入されると、CPU１１から、プログラム及びデータがロードされてRAM４にストアされる。なお、そのプログラム及びデータは、AC３内にROMを設けて、そのROMに記憶しておき、CPE２１は、そのROMから読み出すようにしてもよい。さらに、その他の入出力端子２６、PLL回路２７、およびデジタル温度センサ（以下、DTSと略す）２８も、AC３のチップ内に設けられている。 Next, the configuration of AC3 will be described. FIG. 2 is a block diagram for explaining the configuration of AC3. The AC 3 includes a control processing unit (hereinafter abbreviated as CPE) 21, a plurality of, here four, processing units (hereinafter abbreviated as PE), and an interface unit (hereinafter abbreviated as I / F unit) 23. including. The four PEs are respectively PE22A, PE22B, PE22C, and PE22D. Hereinafter, when collectively or referring to one PE, it is referred to as PE22. Further, the AC 3 includes an I / F unit 24 and can read programs and data in the RAM 4 connected to the AC 3. The CPE 21, each PE 22, the I / F unit 23, and the I / F unit 24 are connected to each other via an internal bus 25. The I / F unit 23 is a circuit for an interface between the internal bus 25 and the PC architecture bus. When the power is turned on, the CPE 21 is loaded with a program and data from the CPU 11 and stored in the RAM 4. The program and data may be stored in the ROM provided in the AC 3, and the CPE 21 may be read from the ROM. Further, other input / output terminals 26, a PLL circuit 27, and a digital temperature sensor (hereinafter abbreviated as DTS) 28 are also provided in the AC3 chip.

CPE２１は、内部に、制御部である演算部２１aと、キャッシュメモリ２１ｂを含む。各PEは、演算部とローカルメモリとを含む。また、各PEには、周波数／電圧制御（以下、F/V制御と略す）部が設けられている。具体的には、PE22A,22B,22C,22D（以下、纏めてあるいは１つのPEを指すときはPE２２という）は、それぞれ、演算部22Aa,22Ba,22Ca,22Da（以下、纏めてあるいは１つの演算部を指すときは演算部２２ａという）と、ローカルメモリ22Ab,22Bb,22Cb,22Db（以下、纏めてあるいは１つのローカルメモリを指すときはローカルメモリ２２ｂという）とを有する。そして、各PE２２には、F/V制御部22Ac,22Bc,22Cc,22Dc（以下、纏めてあるいは１つのF/V制御部を指すときはF/V制御部２２ｃという）が設けられている。 The CPE 21 includes a calculation unit 21a, which is a control unit, and a cache memory 21b. Each PE includes a calculation unit and a local memory. Each PE is provided with a frequency / voltage control (hereinafter abbreviated as F / V control) section. Specifically, PE22A, 22B, 22C, and 22D (hereinafter collectively referred to as PE22 when referring to one PE) are respectively calculated by calculation units 22Aa, 22Ba, 22Ca, and 22Da (hereinafter collectively or one calculation). And a local memory 22Ab, 22Bb, 22Cb, 22Db (hereinafter collectively referred to as a local memory 22b when referring to one local memory). Each PE 22 is provided with F / V control units 22Ac, 22Bc, 22Cc, 22Dc (hereinafter collectively or referred to as F / V control unit 22c when referring to one F / V control unit).

演算部２２ａは、CPE２１からの依頼に基づいて、処理プログラムを並列処理する回路である。演算部２２ａは、特定用途向けのハードウエアエンジンでもよいが、本実施の形態では、プログラム可能な汎用な処理部である。各演算部２２ａは、AC３における内部演算のためのリソースである。後述するように、演算部２２ａは、１以上の演算部を用いて処理プログラムを並列処理する。
演算部２２ａは、ここでは、データ幅が１２８ビットのデータに対してSIMD演算が可能な演算部である。さらに、演算部２２ａは、３２ビットの単精度及び６４ビットの倍精度のフローティング演算が可能である。 The computing unit 22a is a circuit that processes processing programs in parallel based on a request from the CPE 21. The calculation unit 22a may be a hardware engine for a specific application, but in the present embodiment, it is a general-purpose programmable processing unit. Each calculation unit 22a is a resource for internal calculation in AC3. As will be described later, the calculation unit 22a performs parallel processing of processing programs using one or more calculation units.
Here, the calculation unit 22a is a calculation unit capable of performing SIMD calculation on data having a data width of 128 bits. Further, the calculation unit 22a can perform a 32-bit single-precision and 64-bit double-precision floating calculation.

各ローカルメモリ２２ｂは、処理プログラム及び処理対象のデータである対象データをストアする記憶部である。 Each local memory 22b is a storage unit that stores a processing program and target data that is processing target data.

例えば、各PE２２では、画像データに対する画像認識処理、あるいは画像データのエンコード及びデコード処理等のコーデック処理を行う場合、HDD１７あるいは図示しないカメラから読み出された処理対象のデータが、各ローカルメモリ２２ｂの容量に合わせて分割された状態で各ローカルメモリ２２ｂに記憶される。そして、各演算部２２ａは、SIMD演算により、その記憶されたデータに対して所定の処理を実行し、実行結果を各ローカルメモリ２２ｂに記憶する。各PE２２では、所定の処理が終了すると、ローカルメモリ２２ｂから処理されたデータは、HDD１７に転送され、次に処理すべきデータがHDD１７から各ローカルメモリ２２ｂに転送され、上述したように所定の処理が行われる。以上の処理を繰り返すことによって、情報処理装置１では、AC３を利用して、画像認識処理等がスムーズに行われる。 For example, in each PE 22, when performing image recognition processing on image data or codec processing such as image data encoding and decoding processing, processing target data read from the HDD 17 or a camera (not shown) is stored in each local memory 22b. It is stored in each local memory 22b in a state of being divided according to the capacity. And each calculating part 22a performs a predetermined | prescribed process with respect to the memorize | stored data by SIMD calculation, and memorize | stores an execution result in each local memory 22b. In each PE 22, when the predetermined processing is completed, data processed from the local memory 22b is transferred to the HDD 17, and data to be processed next is transferred from the HDD 17 to each local memory 22b. As described above, the predetermined processing is performed. Is done. By repeating the above processing, the information processing apparatus 1 uses the AC 3 to perform image recognition processing and the like smoothly.

各F/V制御部２２ｃは、対応する演算部２２ａの動作及び処理能力の両方を制御する動作制御部であり、具体的には、対応する演算部２２ａへ供給するクロック信号の周波数の変更の機能、演算部２２ａ内の各回路へ供給されるクロック信号の供給と停止の機能、及び演算部２２ａ内の各回路へ供給される電力の供給と停止の機能を有する回路である。なお、各回路へ供給されるクロックCLKは、PLL回路２７から供給される。 Each F / V control unit 22c is an operation control unit that controls both the operation and processing capability of the corresponding calculation unit 22a. Specifically, the F / V control unit 22c changes the frequency of the clock signal supplied to the corresponding calculation unit 22a. This is a circuit having a function, a function of supplying and stopping a clock signal supplied to each circuit in the calculation unit 22a, and a function of supplying and stopping power supplied to each circuit in the calculation unit 22a. The clock CLK supplied to each circuit is supplied from the PLL circuit 27.

なお、ここでは、各PE２２にF/V制御部２２ｃが設けられているが、４つのPE２２の全体に対して一つのF/V制御部２２ｃを設けて、４つのPE２２の全体に対してクロック信号の周波数の変更、クロック信号の供給と停止、及び電力の供給と停止を行うようにしてもよい。その場合、PLL回路２７の出力は、図２において点線で示すスイッチ回路２９を介して出力するようにし、そのスイッチ回路２９に対してクロックの供給を停止するための制御信号が、CPE２１から供給される。 Here, each PE 22 is provided with an F / V control unit 22 c, but one F / V control unit 22 c is provided for the entire four PEs 22, and a clock is provided for the entire four PEs 22. You may make it perform the change of the frequency of a signal, supply and stop of a clock signal, and supply and stop of electric power. In that case, the output of the PLL circuit 27 is output via a switch circuit 29 indicated by a dotted line in FIG. 2, and a control signal for stopping the supply of the clock to the switch circuit 29 is supplied from the CPE 21. The

動作周波数の変更の機能は、後述するように、処理プログラムの負荷に比べて、各PE２２内の各演算部２２ａが提供できる演算性能が高い場合に、各PE２２内の各演算部２２ａ等の動作周波数を低減して、クロック信号による消費電力を最適にするための機能である。 As will be described later, the function of changing the operating frequency is the operation of each arithmetic unit 22a in each PE 22 when the arithmetic performance that can be provided by each arithmetic unit 22a in each PE 22 is higher than the load of the processing program. This is a function for reducing the frequency and optimizing the power consumption by the clock signal.

クロック信号の供給と停止の機能、すなわち、クロックゲーティング機能は、各PE２２内の各演算部２２ａ等に対するクロック信号の供給と停止を行うための機能である。クロック信号の供給が停止されると、クロック信号による消費電力を０（ゼロ）に抑えることができる。 The function of supplying and stopping the clock signal, that is, the clock gating function is a function for supplying and stopping the clock signal to each arithmetic unit 22a and the like in each PE 22. When the supply of the clock signal is stopped, the power consumption by the clock signal can be suppressed to 0 (zero).

電力の供給と停止の機能は、各PE２２内の各演算部２２ａ等に対する電力の供給と停止を行う機能である。電力の供給が停止されると、内部回路のリーク電流による消費電力を０（ゼロ）に抑えることができる。 The function of supplying and stopping power is a function of supplying and stopping power to each arithmetic unit 22a and the like in each PE 22. When the supply of power is stopped, the power consumption due to the leakage current of the internal circuit can be suppressed to 0 (zero).

各演算部２２ａへ供給されるクロック周波数は、各演算部２２ａの処理能力を示す。各演算部２２ａについて予め決められた最大動作周波数のとき、その演算部２２ａの処理能力は最大となり、各F/V制御部２２ｃは、その最大動作周波数以下に変更することによって、演算部２２ａの処理能力を、最大処理能力以下に、制御することができる。 The clock frequency supplied to each computing unit 22a indicates the processing capability of each computing unit 22a. When each operation unit 22a has a predetermined maximum operating frequency, the processing capability of the operation unit 22a is maximized, and each F / V control unit 22c changes the operation frequency of the operation unit 22a below the maximum operation frequency. The processing capacity can be controlled below the maximum processing capacity.

また、各演算部２２ａへ供給されるべきクロック信号の供給を停止することによって、各F/V制御部２２ｃは、各演算部２２ａの動作を停止させることができる。同様に、各演算部２２ａへ供給されるべき電力、例えば供給電圧、の供給を停止することによって、各F/V制御部２２ｃは、演算部２２ａの動作を停止させることができる。従って、各F/V制御部２２ｃは、演算部２２ａへのクロック信号の周波数の変更をしたり、クロック信号の供給を制御、すなわちクロックゲーティングを行ったり、各演算部２２ａへの電力の供給を制御することによって、各演算部２２ａの動作を制御することができる。 Further, by stopping the supply of the clock signal to be supplied to each calculation unit 22a, each F / V control unit 22c can stop the operation of each calculation unit 22a. Similarly, each F / V control part 22c can stop the operation | movement of the calculating part 22a by stopping supply of the electric power which should be supplied to each calculating part 22a, for example, supply voltage. Accordingly, each F / V control unit 22c changes the frequency of the clock signal to the calculation unit 22a, controls the supply of the clock signal, that is, performs clock gating, and supplies power to each calculation unit 22a. By controlling the operation, the operation of each computing unit 22a can be controlled.

なお、本実施の形態では、各F/V制御部２２ｃは、対応する演算部２２ａの動作及び処理能力の両方を制御するが、動作及び処理能力の少なくとも一方でもよい。 In the present embodiment, each F / V control unit 22c controls both the operation and the processing capability of the corresponding calculation unit 22a, but may be at least one of the operation and the processing capability.

そして、CPE２１の演算部２１ａが、後述するように、各PE２２と各F/V制御部２２ｃを制御する。よって、各F/V制御部２２ｃによる演算部２２ａの動作及び処理能力の制御は、CPE２１の演算部２１ａの指示に応じて行われる。 And the calculating part 21a of CPE21 controls each PE22 and each F / V control part 22c so that it may mention later. Therefore, the operation and processing capacity of the calculation unit 22a by each F / V control unit 22c are controlled according to instructions from the calculation unit 21a of the CPE 21.

上述したように、制御部である演算部２１ａは、CPU１１から所定の処理を実行する旨のコマンドを受信すると、４つのPE２２に対して所定の指示を出力する。その所定の指示には、どのPE２２がその処理を実行するのかの指示、そのときの動作周波数をどのくらいにするのかの指示、等を含む。 As described above, when the arithmetic unit 21a, which is a control unit, receives a command to execute predetermined processing from the CPU 11, the arithmetic unit 21a outputs predetermined instructions to the four PEs 22. The predetermined instruction includes an instruction as to which PE 22 executes the process, an instruction as to what operating frequency is to be used at that time, and the like.

また、AC３のCPE２１は、可変の電源である、外部の電源回路モジュールであるVRM(Voltage Regulator Module)３０に対して、所定のコード信号VID、例えば６ビット信号、を出力し、VRM３０は、その所定のコード信号VIDに応じた電源電圧VをAC3に供給する。 The AC3 CPE 21 outputs a predetermined code signal VID, for example, a 6-bit signal, to a VRM (Voltage Regulator Module) 30 which is an external power supply circuit module which is a variable power supply, and the VRM 30 A power supply voltage V corresponding to a predetermined code signal VID is supplied to AC3.

さらにまた、AC３上の各回路は、複数の、ここでは１３個の、ブロックに分けられ、その分けられたブロック毎に別々に電源が供給されるように、AC３は構成されている。すなわち、各電源に対して、その電源を供給する回路部分のブロックが予め決められており、各電源は、その予め決められたその対応するブロックのみに電源を供給する。具体的には、CPE２１を含むブロックB1には、内部ロジック用電源PS1から電源が供給される。PLL回路２７を含むブロックB2には、PLL部用アナログ電源PS2からの電源が供給される。DTS２８を含むブロックB3には、デジタル温度センサ部用アナログ電源PS3からの電源が供給される。PCI Express用のI/F２３の一部を含むブロックB4には、第１のPCI Expressロジック用電源PS4からの電源が供給される。PCI Express用のI/F２３の他の一部を含むブロックB5には、第２のPCI Expressロジック用電源PS5からの電源と、PCI Express用アナログ電源PS6からの電源が供給される。I/F２４の一部を含むブロックB7には、I/F２４用アナログ電源PS7からの電源が供給される。I/F２４の他の一部を含むブロックB8には、I/F２４ロジック用電源PS8からの電源が供給される。その他の入出力端子２６を含むブロックB9には、その他の入出力端子２６用電源PS9からの電源が供給される。４つの各PE２２には、それぞれ、PE用電源PS10,PS11,PS12,PS13からの電源が供給される。 Furthermore, each circuit on the AC 3 is divided into a plurality of blocks, here 13 blocks, and the AC 3 is configured so that power is supplied to each divided block. That is, for each power supply, a block of a circuit portion that supplies the power is determined in advance, and each power supply supplies power only to the predetermined corresponding block. Specifically, the block B1 including the CPE 21 is supplied with power from the internal logic power source PS1. The block B2 including the PLL circuit 27 is supplied with power from the PLL unit analog power supply PS2. The block B3 including the DTS 28 is supplied with power from the analog temperature PS3 digital temperature sensor unit. The block B4 including a part of the PCI Express I / F 23 is supplied with power from the first PCI Express logic power supply PS4. The block B5 including the other part of the PCI Express I / F 23 is supplied with power from the second PCI Express logic power supply PS5 and power from the PCI Express analog power supply PS6. The block B7 including a part of the I / F 24 is supplied with power from the analog power supply PS7 for I / F 24. The block B8 including the other part of the I / F 24 is supplied with power from the I / F 24 logic power supply PS8. The block B9 including the other input / output terminals 26 is supplied with power from the other input / output terminal power supply PS9. Each of the four PEs 22 is supplied with power from PE power supplies PS10, PS11, PS12, and PS13.

例えばアプリケーションプログラムが実行されていてAC3を使用する状態では、電源PS1からPS13のすべてから各回路部に電源が供給されるように、CPU１１は、各電源からの電源供給を制御する。また、例えばAC３を使用しない状態では、不要な電源の供給がされないように、CPU１１は、電源供給を制御する。より具体的には、CPU１１が、AC3に対してデバイスステートを指示すると、CPE２１は、そのデバイスステートの情報を受信し、その情報に応じて、外部の電源コントローラ３１に対して、各電源PS1からPS13の電源供給状態を指示する。外部電源コントローラ３１は、その電源供給状態の指示に従って、各電源PS1からPS13の電源供給状態を変更する。デバイスステートには、上述したようなすべての電源PS1からPS13からの電源を供給するフル状態D0、電源PS1からPS13中の一部の電源からのみ電源供給を行う状態D1、及び、いわゆるスリープ状態D2のような状態がある。 For example, when the application program is executed and AC3 is used, the CPU 11 controls power supply from each power supply so that power is supplied to all circuit units from all the power supplies PS1 to PS13. For example, when AC3 is not used, the CPU 11 controls power supply so that unnecessary power is not supplied. More specifically, when the CPU 11 instructs the device state to AC3, the CPE 21 receives the information on the device state, and in response to the information, the CPE 21 from the power supply PS1 to the external power supply controller 31. Instructs the power supply status of PS13. The external power controller 31 changes the power supply state of each of the power supplies PS1 to PS13 according to the instruction of the power supply state. The device state includes a full state D0 that supplies power from all the power sources PS1 to PS13 as described above, a state D1 that supplies power only from some power sources in the power sources PS1 to PS13, and a so-called sleep state D2. There is a state like this.

以上のように、CPU１１により、情報処理装置１の状態に応じて、ここでは、AC３の使用状態に応じて、AC３内の各ブロックに対する電源供給の制御が行われる。 As described above, the CPU 11 controls the power supply to each block in the AC 3 according to the state of the information processing apparatus 1, and here according to the usage state of the AC 3.

図３は、CPU１１の処理の流れの例を示すフローチャートである。CPU１１における処理プログラムは、主メモリ１５内にストアされて、CPU１１によって実行される。
CPU１１が各種処理を実行している途中で、ある処理、ここでは画像の認識処理、をAC３に分担させる場合の例で説明する。CPU１１は、AC３との間で、その処理を依頼する前に、所定の前処理を実行した後に、CPU１１は、その画像認識プログラムをAC３に送信する（ステップS1）。CPE２１の演算部２１ａは、CPU１１からの画像認識プログラムをRAM４にストアする。 FIG. 3 is a flowchart showing an example of the processing flow of the CPU 11. A processing program in the CPU 11 is stored in the main memory 15 and executed by the CPU 11.
In the middle of executing various processes by the CPU 11, an example will be described in which a certain process, here, an image recognition process is shared by the AC3. The CPU 11 transmits the image recognition program to the AC 3 after executing predetermined pre-processing before requesting the processing with the AC 3 (step S1). The calculation unit 21 a of the CPE 21 stores the image recognition program from the CPU 11 in the RAM 4.

次に、CPU１１は、画像認識処理の対象である対象データのアドレスと、認識処理の結果データのアドレスと、画像認識プログラムの負荷情報と、画像認識プログラムの並列度情報とを、AC３へ送信する（ステップS2）。AC３は、受信した負荷情報と並列度情報をRAM４に蓄積する。 Next, the CPU 11 transmits the address of the target data that is the target of the image recognition process, the address of the result data of the recognition process, the load information of the image recognition program, and the parallel degree information of the image recognition program to the AC 3. (Step S2). The AC 3 stores the received load information and parallelism information in the RAM 4.

負荷情報は、処理の重さを示す情報であり、並列度情報は、その処理プログラムを並列処理できる度合いを示す情報である。本実施の形態では、負荷情報と並列度情報は、０（ゼロ）を含む整数０，１，２，・・で示す例で説明する。負荷情報は、その数が大きい程、その処理の負荷が大きいことを示す。並列度情報は、その数に応じたPEの数で実行可能な処理であることを示す。 The load information is information indicating the weight of processing, and the parallel degree information is information indicating the degree to which the processing program can be processed in parallel. In the present embodiment, load information and parallelism information will be described using an example indicated by integers 0, 1, 2,... Including 0 (zero). The load information indicates that the larger the number, the greater the processing load. The degree of parallelism information indicates that the process can be executed with the number of PEs corresponding to the number.

負荷情報と並列度情報は、処理プログラム毎に、予め決められて、主メモリ１５にストアされている。図４は、その負荷情報と並列度情報を示すテーブルデータの例を示す図である。 The load information and the parallelism information are determined in advance for each processing program and stored in the main memory 15. FIG. 4 is a diagram illustrating an example of table data indicating the load information and parallelism information.

図４に示すように、処理プログラム毎に、負荷情報と、並列度情報とが予め設定されている。処理プログラムAは、負荷が２であり、並列度が４であることが示されている。処理プログラムBは、負荷が１であり、並列度が１であることが示されている。処理プログラムCは、負荷が１であり、並列度が４であることが示されている。 As shown in FIG. 4, load information and parallelism information are set in advance for each processing program. The processing program A is shown to have a load of 2 and a degree of parallelism of 4. It is shown that the processing program B has a load of 1 and a parallelism of 1. It is shown that the processing program C has a load of 1 and a parallelism of 4.

図４の表データは、主メモリ１５に予め記憶されているので、CPU１１は、AC３に依頼する処理プログラムの負荷情報と並列度情報を、主メモリ１５から読み出して取得してAC３に送信することができる。 Since the table data of FIG. 4 is stored in the main memory 15 in advance, the CPU 11 reads out the load information and parallelism information of the processing program requested from the AC 3 from the main memory 15 and transmits them to the AC 3. Can do.

次に、AC３におけるCPE２１の演算部２１ａの処理について説明する。図５は、CPE２１の処理の例を示すフローチャートである。 Next, processing of the calculation unit 21a of the CPE 21 in AC3 will be described. FIG. 5 is a flowchart illustrating an example of processing of the CPE 21.

CPE２１は、CPU１１から上述した処理を依頼されると、受信した負荷情報と並列度情報を参照し、その負荷情報と並列度情報をRAM４にストアする（ステップS11）。 When the CPE 21 is requested by the CPU 11, the CPE 21 refers to the received load information and parallelism information, and stores the load information and parallelism information in the RAM 4 (step S11).

CPE２１は、その負荷情報と並列度情報とに基づいて、動作すべきPEを決定する（ステップS12）。すなわち、CPE２１は、負荷情報に、並列度情報を加味して、動作すべき１以上のPE２２を決定し、動作するPE２２の数が決定される。本実施の形態では、並列度は、並列処理可能な演算部の最大数を示し、負荷は、１つのPE２２で実行できる処理量を１として、その処理量に対する比率を示している。よって、CPE２１は、受信した負荷情報と並列度情報に基づいて、処理プログラムを、いくつのPE２２で、かつどれくらいの動作周波数で実行できるかを決定することができる。 The CPE 21 determines a PE to be operated based on the load information and the parallelism information (step S12). That is, the CPE 21 considers the parallelism information to the load information, determines one or more PEs 22 to operate, and determines the number of operating PEs 22. In the present embodiment, the degree of parallelism indicates the maximum number of arithmetic units that can be processed in parallel, and the load indicates the ratio of the processing amount that can be executed by one PE 22 as 1. Therefore, the CPE 21 can determine the number of PEs 22 and how many operating frequencies the processing program can be executed based on the received load information and parallelism information.

その決定方法においては、AC３の消費電力が最小になるようにするという基準に従って、最適な、動作すべきPE２２と動作周波数が決定される。また、処理に使用されないPE２２は、消費電力が最小になるように、例えば、電力の供給を停止するように制御される。 In the determination method, the optimum PE 22 to be operated and the operating frequency are determined in accordance with a criterion for minimizing the power consumption of AC3. Further, the PE 22 that is not used for processing is controlled so as to stop the supply of power, for example, so that the power consumption is minimized.

CPE２１は、決定した１以上の動作すべきＰＥ２２のそれぞれの動作周波数と供給電圧を決定する（ステップS13）。すなわち、CPE２１は、動作する各PE２２の動作周波数と供給電圧を決定し、動作する各PE２２にその決定した動作周波数に対応するクロック信号と決定した電圧の電力を供給するように、F/V制御部２２ｃを制御する。なお、動作しないPEに対しては、クロック信号は、供給されず、かつ演算処理に必要な電力も供給されない。 The CPE 21 determines the operating frequency and supply voltage of each of the determined one or more PEs 22 to be operated (step S13). That is, the CPE 21 determines the operating frequency and supply voltage of each operating PE 22 and supplies the clock signal corresponding to the determined operating frequency and the determined voltage power to each operating PE 22. The unit 22c is controlled. Note that the clock signal is not supplied to the non-operating PE, and the power necessary for the arithmetic processing is not supplied.

ステップS13における動作周波数の決定は、例えば次のように行われる。図６は、動作周波数の決定処理の流れの例を示すフローチャートである。 The operation frequency is determined in step S13 as follows, for example. FIG. 6 is a flowchart illustrating an example of the flow of the operating frequency determination process.

まず、CPE２１は、現在の使用可能なPE２２を判定する（ステップS21）。すなわち、その処理の指示を受けたときに、既にAC３のPE２２の中には、既に別の処理を実行しているPE２２がある場合がある。CPE２１は、各PE２２の動作を監視しており、各PE２２が何の処理を実行しているかを把握することができる。よって、まず、CPE２１は、処理を依頼する前に、実行可能なPE２２がどれかを判定して、使用可能な、すなわち実行可能なPE２２を決定する（ステップS21）。 First, the CPE 21 determines the currently usable PE 22 (step S21). That is, when receiving the processing instruction, there may be a PE 22 that is already executing another processing among the PEs 22 of AC3. The CPE 21 monitors the operation of each PE 22 and can grasp what processing each PE 22 is executing. Therefore, first, before requesting the processing, the CPE 21 determines which PE 22 can be executed, and determines a usable PE 22, that is, an executable PE 22 (step S21).

次に、CPE２１は、負荷に応じた動作周波数と供給電圧を決定し、各PE２２の各F/V制御部２２ｃへ通知する（ステップS22）。例えば、図４の表にあるプログラムAのように、負荷が２で並列度が４の処理プログラムの場合であって、ステップS21のときに実行可能なPEが３つあった場合、各演算部２２ａの動作可能な最大周波数fとすると、CPE２１は、プログラムの負荷を示す２を、実行可能なPE２２の数を示す３で除算する処理を行う。すると、除算した結果の値(2/3)が得られる。その結果、PE２２の演算部２２ａの動作周波数は、(2/3)fとなる。 Next, the CPE 21 determines the operating frequency and supply voltage according to the load, and notifies each F / V control unit 22c of each PE 22 (step S22). For example, in the case of a processing program with a load of 2 and a parallelism of 4 as in the program A in the table of FIG. If the maximum operable frequency f of 22a is assumed, the CPE 21 performs a process of dividing 2 indicating the load of the program by 3 indicating the number of executable PEs 22. Then, the value (2/3) as a result of the division is obtained. As a result, the operating frequency of the computing unit 22a of the PE 22 is (2/3) f.

なお、PE２２の動作周波数が、除算した結果の値を取ることができない場合がある。例えば、PE２２の動作周波数として、f,(1/2)f,(1/3)f,(1/4)f,(1/8)f等、予め固定した値の周波数のみで、動作可能な場合である。このような場合には、CPE２１は、(2/3)fに近く、かつ(2/3)fよりも大きい値を、動作周波数として選択して決定する。 In some cases, the operating frequency of the PE 22 cannot take the value resulting from the division. For example, the operation frequency of PE22 can be operated only with a fixed frequency such as f, (1/2) f, (1/3) f, (1/4) f, (1/8) f, etc. This is the case. In such a case, the CPE 21 selects and determines a value close to (2/3) f and larger than (2/3) f as the operating frequency.

このようにして、CPE２１は、動作すべきPE２２の動作周波数を決定し、さらに、動作するPE２２の供給電圧も決定する。供給電圧は、動作すべきPE２２に対しては動作に必要な電圧である。動作しないPE２２に対しては、動作に必要な電圧は供給されず、供給電圧は、０、あるいはスタンバイ状態のような最小消費電力に対応する電圧となる。 In this way, the CPE 21 determines the operating frequency of the PE 22 to be operated, and further determines the supply voltage of the operating PE 22. The supply voltage is a voltage necessary for operation with respect to the PE 22 to be operated. A voltage necessary for operation is not supplied to the PE 22 that does not operate, and the supply voltage is 0 or a voltage corresponding to the minimum power consumption as in the standby state.

図５に戻り、CPE２１は、動作するPE２２に対して、処理プログラム（上述した例であれば画像認識プログラム）をロードするように指示する（ステップS14）。具体的には、CPE２１は、PE２２に処理プログラムのアドレスを通知して、PE２２にその処理プログラムをロードするように指示する、すなわち処理プログラムのロード命令を出力する。その結果、動作するPE２２は、処理プログラムをロードして、ローカルメモリ２２ｂにストアする。 Returning to FIG. 5, the CPE 21 instructs the operating PE 22 to load the processing program (in the above example, the image recognition program) (step S14). Specifically, the CPE 21 notifies the PE 22 of the address of the processing program and instructs the PE 22 to load the processing program, that is, outputs a load instruction of the processing program. As a result, the operating PE 22 loads the processing program and stores it in the local memory 22b.

そして、CPE２１は、動作するPE２２に対して、起動命令を出力する（ステップS15）。起動命令を受信したPE２２は、ローカルメモリ２２ｂに蓄積された処理プログラムを実行する。このとき、F/V制御部２２ｃに通知されて設定された動作周波数と電圧によって、各PE２２の演算部２２ａは動作している。 Then, the CPE 21 outputs an activation command to the operating PE 22 (step S15). The PE 22 that has received the start command executes the processing program stored in the local memory 22b. At this time, the computing unit 22a of each PE 22 is operating with the operating frequency and voltage notified and set to the F / V control unit 22c.

PE２２は、処理した結果データを、ステップS2で指示されたアドレスに出力する。 The PE 22 outputs the processed result data to the address indicated in step S2.

CPE２１は、各PEの動作を監視し、全ての処理が終了すると所定の処理を実行する。 The CPE 21 monitors the operation of each PE and executes a predetermined process when all the processes are completed.

図７は、CPE２１の演算部２１ａにおける、処理プログラムの終了時の処理の流れの例を示すフローチャートである。 FIG. 7 is a flowchart illustrating an example of a processing flow at the end of the processing program in the calculation unit 21a of the CPE 21.

CPE２１は、各ＰＥ２２における処理プログラムの実行状態を監視し、まず、その処理プログラムを実行する旨の動作指示を出した全てのPE２２がその処理を終了したか否かを判断する（ステップS31）。 The CPE 21 monitors the execution state of the processing program in each PE 22, and first determines whether or not all the PEs 22 that have issued an operation instruction to execute the processing program have finished the processing (step S31).

全てのPE２２の処理が終了すると、CPE２１は、依頼された処理プログラムの実行が終了した旨を示す終了通知を、CPU１１へ出力する（ステップS32）。 When the processing of all the PEs 22 is completed, the CPE 21 outputs an end notification indicating that the requested processing program has been executed to the CPU 11 (step S32).

そして、CPE２１は、処理の終了したPE２２への、ステップS13において決定した動作周波数のクロック信号と電圧の供給を停止する（ステップS33）。この停止は、いわゆるスタンバイ状態における動作周波数のクロック信号と電圧の供給状態にすることを意味する。 Then, the CPE 21 stops supplying the clock signal and voltage at the operating frequency determined in step S13 to the PE 22 that has been processed (step S33). This stop means a state in which a clock signal having an operating frequency and a voltage are supplied in a so-called standby state.

以上のようにして、処理プログラムがCPU１１からAC３へ依頼され、AC３において実行される。 As described above, the processing program is requested from the CPU 11 to the AC 3 and executed in the AC 3.

次に、以上の処理の流れについて、具体的な例を用いて説明する。図８は、CPE２１における処理を説明するための図である。図８は、AC３の状態の変化の例を示し、４つのPE２２を含むように示している。なお、図８において、ノードStartは、CPE２１が動作する前の状態を示し、ノードEndは、CPE２１が動作を終了した状態を示す。CPE２１が動作を開始すると、スタンバイ状態１０１の状態になる。 Next, the flow of the above processing will be described using a specific example. FIG. 8 is a diagram for explaining processing in the CPE 21. FIG. 8 shows an example of a change in the state of AC3, and shows four PEs 22 included. In FIG. 8, the node Start indicates a state before the CPE 21 operates, and the node End indicates a state where the CPE 21 ends the operation. When the CPE 21 starts operation, the standby state 101 is entered.

図８において、AC3がスタンバイ状態１０１にあり、そのスタンバイ状態１０１において、負荷が１でかつ並列度が１の処理Wを、CPU１１から依頼されたときは、状態１０２になる。 In FIG. 8, AC3 is in the standby state 101. In the standby state 101, when the CPU 11 requests processing W with a load of 1 and a parallelism of 1, the state becomes the state 102.

スタンバイ状態１０１では、AC３内部ではゲーティング可能な回路部分に対してはクロックゲーティングを行いクロック信号の供給が停止され、クロック信号の周波数を下げられる回路部分に対しては下げられるレベルまで下げた周波数のクロック信号が供給される。よって、スタンバイ状態１０１は、AC３の消費電力が最低の状態である。 In the standby state 101, clock gating is performed on the circuit portion that can be gated in the AC 3, and the supply of the clock signal is stopped, and the circuit portion that can reduce the frequency of the clock signal is lowered to a level that can be lowered. A frequency clock signal is provided. Therefore, the standby state 101 is a state where the power consumption of AC3 is the lowest.

そのスタンバイ状態１０１において、上述したような処理Wを依頼されると、CPE２１は、その処理Wが１つのPE２２で処理できる負荷１であり、並列度が１である処理であることが判明するので、その場合は、１つのPE２２Aを動作すべきPEとし、かつそのPE２２Aの動作周波数を最大動作周波数fに設定し、他のPE２２B,２２C,２２Dに対しては、クロックゲーティングを行い、かつ電力の供給を停止する。なお、図８において、４つのPE２２中、斜線を付したPE２２Aが動作するPEである。 When the processing W as described above is requested in the standby state 101, the CPE 21 is a load 1 that can be processed by one PE 22, and the parallel degree is 1. In this case, one PE 22A is set as a PE to be operated, the operating frequency of the PE 22A is set to the maximum operating frequency f, clock gating is performed on the other PEs 22B, 22C, and 22D, and power is supplied. Stop supplying. In FIG. 8, among the four PEs 22, the shaded PE 22 A is the operating PE.

その処理Wが終了すると、状態１０２からスタンバイ状態１０１へ戻る。さらに、AC3がスタンバイ状態１０１にあり、そのスタンバイ状態１０１において、負荷が１でかつ並列度が４の処理Xを、CPU１１から依頼されたときは、状態１０３になる。 When the process W ends, the state 102 returns to the standby state 101. Furthermore, when AC3 is in the standby state 101 and the CPU 11 requests processing X with a load of 1 and a parallelism of 4 in the standby state 101, the state becomes the state 103.

具体的には、上述したような処理Xを依頼されると、CPE２１は、その処理Xが１つのPE２２で処理できる負荷１であり、並列度が４である処理であることが判明する。そして、もっとも消費電力が少ない動作方法が、動作可能な複数のPE２２に均等に負荷を分担させる方法であるときは、４つのPE２２全てを動作すべきPEとし、かつ各PE２２の動作周波数を(1/4)f（fは最大動作周波数）に設定して動作させる。 Specifically, when the processing X as described above is requested, the CPE 21 is determined to be the processing with the load 1 that can be processed by one PE 22 and the parallelism of 4. When the operation method with the least power consumption is a method in which loads are evenly shared among a plurality of operable PEs 22, all the four PEs 22 are set as PEs to be operated, and the operation frequency of each PE 22 is set to (1 / 4) Set to f (f is the maximum operating frequency) and operate.

なお、負荷が１でかつ並列度が４の処理Xの場合、選択肢としては、他にも、(1/1)fの動作周波数で１つのPEで実行する方法と、(1/2)fの動作周波数で２つのPEで実行する方法とあるが、AC３における各回路の実装方法、運用方法等によって、決定される最適なすなわち低消費電力になる方法は、異なる。 In the case of processing X with a load of 1 and a degree of parallelism of 4, other options include a method of executing with one PE at an operating frequency of (1/1) f, and (1/2) f However, the optimum method of achieving low power consumption is different depending on the mounting method and operating method of each circuit in AC3.

その処理Xが終了すると、状態１０３からスタンバイ状態１０１へ戻る。さらに、AC3がスタンバイ状態１０１にあり、そのスタンバイ状態１０１において、負荷が１／４でかつ並列度が２の処理Yと、負荷が２でかつ並列度が２の処理Zの２つの処理を、CPU１１から依頼されたときは、状態１０４になる。 When the process X ends, the state 103 returns to the standby state 101. Furthermore, AC3 is in the standby state 101, and in the standby state 101, two processes, a process Y with a load of 1/4 and a parallel degree of 2, and a process Z with a load of 2 and a parallel degree of 2, When requested by the CPU 11, the state 104 is entered.

具体的には、上述したような処理YとZを依頼されると、CPE２１は、処理Yについては、１つのPE２２で処理できる負荷の(1/4)であり、並列度が２であることが判明する。そして、CPE２１は、処理Zについては、２つのPE２２で処理できる負荷２であり、並列度が２であることが判明する。従って、もっとも消費電力が少ない動作方法が、動作可能な複数のPE２２に均等に負荷を分担させる方法であるときは、処理Yについては、２つのPE２２A, PE２２Bを動作すべきPEとし、かつ動作周波数(1/8)fに設定して処理Yを行うように動作させ、処理Zについては、２つのPE２２C,２２Dを動作すべきPEとし、かつ動作周波数を(1/1)fに設定して処理Zを行うように動作させる。この場合、処理Yのプログラムは、PE２２A, PE２２Bにロードされ、処理Zのプログラムは、PE２２C,２２Dにロードされる。
その処理Y,Zが終了すると、状態１０４からスタンバイ状態１０１へ戻る。 Specifically, when processing Y and Z as described above are requested, CPE 21 is (1/4) of the load that can be processed by one PE 22 for processing Y, and the degree of parallelism is 2. Becomes clear. Then, it is found that the CPE 21 is the load 2 that can be processed by the two PEs 22 for the processing Z, and the parallelism is 2. Therefore, when the operation method with the least power consumption is a method of evenly sharing the load among the plurality of operable PEs 22, for the processing Y, the two PEs 22A and PE22B are set to operate PEs, and the operation frequency (1/8) Set to f and operate to perform process Y. For process Z, set two PEs 22C and 22D to operate and set the operating frequency to (1/1) f. Operate to perform process Z. In this case, the process Y program is loaded into the PEs 22A and 22B, and the process Z program is loaded into the PEs 22C and 22D.
When the processes Y and Z are completed, the state 104 returns to the standby state 101.

以上のように、AC３においては、処理プログラムに応じて、最適な消費電力となるように、ここでは低消費電力となるように、各PE２２の動作は制御され、その結果、AC３における消費電力は動的に変化するように制御される。すなわち、AC３内では、処理プログラムの負荷に応じて、内部の演算リソースである演算部２２ａの提供及びその動作状態が動的に変更される。そのとき、AC３において最適な消費電力になるように、動作する各PE２２の演算部２２ａに対しては動作周波数と供給電圧を決定され、動作しない各PE２２に対しては、クロックゲーティング、電圧供給の停止等が行われる。その結果、使用しないPE２２においては、クロック信号による電力の消費や、内部のリーク電流の発生を低く抑え、無駄な電力消費を抑えることができる。 As described above, in AC3, the operation of each PE 22 is controlled so as to achieve low power consumption in accordance with the processing program, so that the power consumption in AC3 is as a result. It is controlled to change dynamically. That is, in AC3, according to the load of the processing program, the provision of the arithmetic unit 22a that is an internal arithmetic resource and its operation state are dynamically changed. At that time, the operating frequency and the supply voltage are determined for the computing unit 22a of each operating PE 22 so as to achieve optimum power consumption in AC3, and clock gating and voltage supply are provided for each PE 22 that does not operate. Is stopped. As a result, in the PE 22 that is not used, power consumption due to the clock signal and generation of internal leakage current can be suppressed to a low level, and wasteful power consumption can be suppressed.

よって、本実施の形態によれば、AC３は、自律的に、内部の複数のPE２２における処理分担を決定し、かつ消費電力を考慮して動作及び処理能力を決定して、CPU１１から依頼された処理を実行するようにしたので、AC３は、最適な消費電力で依頼された処理を行うことができる。 Therefore, according to the present embodiment, the AC 3 autonomously determines processing sharing among the plurality of internal PEs 22 and determines the operation and processing capacity in consideration of power consumption, and is requested from the CPU 11. Since the process is executed, the AC 3 can perform the requested process with the optimum power consumption.

（第２の実施の形態）
次に、本発明の第２の実施の形態を説明する。第２の実施の形態に係る情報処理装置用のACは、複数の汎用の処理部（PE）を有するだけでなく、さらに、複数のハード・マクロを有し、その複数のハード・マクロの動作についても、処理分担を決定して、かつ最適な消費電力で処理を実行するように制御する。 (Second Embodiment)
Next, a second embodiment of the present invention will be described. The AC for the information processing apparatus according to the second embodiment has not only a plurality of general-purpose processing units (PE) but also a plurality of hard macros, and the operations of the plurality of hard macros Also, the processing sharing is determined and control is performed so that the processing is executed with the optimum power consumption.

図９は、第２の実施の形態に係わるAC3Aの構成を示すブロック図である。第１の実施の形態のAC3と同じ構成要素については、同じ符号を付して説明は省略する。
図９に示すように、AC3Aは、ハード・マクロとして、複数（ここでは２つ）のエンコーダ２６A、２６Bと、複数（ここでは２つ）のデコーダ２６C、２６Dとを有し、それぞれが、内部バス２５を介して、CPE２１と接続されている。以下、エンコーダ２６A、２６Bと、デコーダ２６C、２６Dとを、纏めて指すときは、あるいはその中の１つを指すときは、ハード・マクロ２６という。 FIG. 9 is a block diagram showing a configuration of AC3A according to the second embodiment. The same components as those of AC3 in the first embodiment are denoted by the same reference numerals and description thereof is omitted.
As shown in FIG. 9, AC3A has, as a hard macro, a plurality (here, two) encoders 26A, 26B and a plurality (here, two) decoders 26C, 26D, each of which is an internal component. It is connected to the CPE 21 via the bus 25. Hereinafter, the encoders 26A and 26B and the decoders 26C and 26D will be referred to as a hard macro 26 when collectively referring to one of them.

ハード・マクロ２６は、ハードウエアエンジン部であり、PE２２のような受信したプログラムを実行可能な汎用な処理部ではない。PE２２は、プログラムに応じた処理を実行可能な汎用な処理部であるが、ハード・マクロ２６の処理内容は、ASIC等のハードウエアにより実現されており、動作用の制御データと対象データが与えられるとその処理が実行されるものである。 The hardware macro 26 is a hardware engine unit, and is not a general-purpose processing unit that can execute a received program such as the PE 22. The PE 22 is a general-purpose processing unit that can execute processing according to a program, but the processing content of the hardware macro 26 is realized by hardware such as an ASIC, and is given control data and target data for operation. If it is, the process is executed.

本実施の形態では、AC3Aは、ハード・マクロ２６によって、MPEG4、H264、VC1等の画像処理における、画像データのエンコード処理とデコード処理の２つの処理が実行できるように構成されているものとする。２つのエンコーダ２６A、２６Bは、CPE２１からの依頼に基づいて、エンコード処理を並列処理可能なハードウエア回路である。２つのデコーダ２６C、２６Dも、CPE２１からの依頼に基づいて、デコード処理を並列処理可能なハードウエア回路である。 In the present embodiment, AC3A is configured to be able to execute two processes of image data encoding processing and decoding processing in image processing such as MPEG4, H264, and VC1 by hardware macro 26. . The two encoders 26 A and 26 B are hardware circuits capable of performing parallel encoding processing based on a request from the CPE 21. The two decoders 26 C and 26 D are also hardware circuits that can perform decoding processing in parallel based on a request from the CPE 21.

従って、AC3Aは、それぞれが並列処理可能なハード・マクロ２６を用いて、エンコード又はデコードの処理を、あるいはエンコードとデコードの両処理を、PE２２の処理とは別に実行可能となっている。 Therefore, the AC 3A can execute the encoding or decoding process, or both the encoding and decoding processes separately from the process of the PE 22 by using the hardware macro 26 that can process each in parallel.

また、エンコーダ２６A、２６Bとデコーダ２６C、２６Dには、それぞれ、F/V制御部２６Ac、２６Bc、２６Cc、２６Dc（以下、纏めてあるいは１つのF/V制御部を指すときはF/V制御部２６ｃという）が設けられている。各F/V制御部２６ｃは、対応するハード・マクロ２６の動作及び処理能力の両方を制御する動作制御部であり、具体的には、対応するハード・マクロ２６へ供給するクロック信号の周波数の変更の機能、ハード・マクロ２６内の各回路へ供給されるクロック信号の供給と停止の機能、ハード・マクロ２６内の各回路へ供給される電力の供給と停止の機能を有する回路である。 Also, the encoders 26A and 26B and the decoders 26C and 26D are respectively provided with F / V control units 26Ac, 26Bc, 26Cc, and 26Dc (hereinafter referred to collectively or F / V control units when referring to one F / V control unit). 26c). Each F / V control unit 26c is an operation control unit that controls both the operation and the processing capability of the corresponding hard macro 26. Specifically, the F / V control unit 26c has the frequency of the clock signal supplied to the corresponding hard macro 26. This is a circuit having a function of changing, a function of supplying and stopping a clock signal supplied to each circuit in the hard macro 26, and a function of supplying and stopping power supplied to each circuit in the hard macro 26.

よって、情報処理装置１においてアプリケーションプログラムが実行されるときに、エンコーダ２６Ａ、２６Bとデコーダ２６Ｃ，２６Dの使用状態に応じて、あるいは、使用／不使用に応じて、クロック信号の周波数の変更、クロック信号の供給と停止、および電力の供給と停止が、CPE２１の制御の下で行われる。 Therefore, when the application program is executed in the information processing apparatus 1, the frequency of the clock signal is changed according to the use state of the encoders 26A and 26B and the decoders 26C and 26D, or according to use / non-use. Signal supply and stop, and power supply and stop are performed under the control of the CPE 21.

なお、本実施の形態においても、エンコーダ２６A、２６Bとデコーダ２６C、２６DのそれぞれにF/V制御部２６ｃが設けられているが、エンコーダ２６A、２６Bとデコーダ２６C、２６Dの全体に対して一つのF/V制御部２６ｃを設けて、その全体に対してクロック信号の周波数の変更、クロック信号の供給と停止、電力の供給と停止を行うようにしてもよい。その場合も、第１の実施の形態と同様に、PLL回路２７の出力は、スイッチ回路２９を介して出力するようにし、そのスイッチ回路２６に対してクロックの供給を停止するための制御信号が、CPE２１から供給される。 Also in the present embodiment, the F / V control unit 26c is provided in each of the encoders 26A and 26B and the decoders 26C and 26D, but one encoder is provided for the entire encoders 26A and 26B and the decoders 26C and 26D. The F / V control unit 26c may be provided to change the frequency of the clock signal, supply and stop the clock signal, and supply and stop the power to the whole. In this case as well, as in the first embodiment, the output of the PLL circuit 27 is output via the switch circuit 29, and a control signal for stopping the clock supply to the switch circuit 26 is provided. , Supplied from CPE21.

各機能は、第１の実施の形態で説明したPE２２に対する機能と同等である。
なお、本実施の形態においても、各F/V制御部２６ｃは、対応するハード・マクロ２６の動作及び処理能力の両方を制御するが、動作及び処理能力の少なくとも一方でもよい。 Each function is equivalent to the function for the PE 22 described in the first embodiment.
Also in the present embodiment, each F / V control unit 26c controls both the operation and the processing capability of the corresponding hardware macro 26, but may be at least one of the operation and the processing capability.

そして、CPE２１の演算部２１ａが、後述するように、各PE２２と、各ハード・マクロ２６と、各F/V制御部２２ｃ、２６ｃを制御する。よって、各F/V制御部２２ｃによる演算部２２ａの動作及び処理能力の制御、及び各F/V制御部２６ｃによるハード・マクロ２６の動作及び処理能力の制御は、CPE２１の演算部２１ａの指示に応じて行われる。 Then, the calculation unit 21a of the CPE 21 controls each PE 22, each hard macro 26, and each F / V control unit 22c, 26c, as will be described later. Therefore, the operation and processing capability of the calculation unit 22a by each F / V control unit 22c and the operation and processing capability of the hardware macro 26 by each F / V control unit 26c are controlled by the calculation unit 21a of the CPE 21. Is done according to.

制御部である演算部２１ａは、CPU１１から所定の処理を実行する旨のコマンドを受信すると、そのコマンドに応じて、４つのPE２２及び４つのハード・マクロ２６に対して所定の指示を出力する。その所定の指示には、どのPE２２、あるいはどのハード・マクロ２６がその処理を実行するのかの指示、そのときの動作周波数をどのくらいにするのかの指示、等が含まれる。 When the arithmetic unit 21a, which is a control unit, receives a command to execute a predetermined process from the CPU 11, the arithmetic unit 21a outputs a predetermined instruction to the four PEs 22 and the four hard macros 26 according to the command. The predetermined instruction includes an instruction as to which PE 22 or which hard macro 26 executes the process, an instruction as to what operating frequency is to be used at that time, and the like.

以下、AC3Aの動作を、例えば、カメラ等によって撮像されて得られた画像データについて、画像データのデコードの処理と画像認識の処理を、AC3Aが行う場合で説明する。なお、画像認識の処理とデコードの処理は、同時に行われても、同時に行われなくても良いし、さらに、互いに同期して行われても良いし、非同期で行われても良い。 Hereinafter, the operation of AC3A will be described in the case where AC3A performs image data decoding processing and image recognition processing, for example, on image data obtained by imaging with a camera or the like. Note that the image recognition process and the decoding process may be performed simultaneously, or may not be performed at the same time, may be performed in synchronization with each other, or may be performed asynchronously.

CPU１１が、第１の実施の形態と同様に、画像認識のアプリケーションプログラムを、AC3Aに依頼して行わせる場合、CPU１１がAC3Aに対して所定のコマンドを出力する。AC3Aは、そのコマンドを受信してCPU１１により指定されたそのアプリケーションプログラムの処理を行う。この場合、画像認識のアプリケーションプログラムは、PE２２において実行されるが、その場合の負荷情報及び並列度情報に基づく、PE２２の動作は、第１の実施の形態における動作と同様である。すなわち、その画像処理プログラムの負荷情報と、並列度情報に基づいて、CPE２１が、複数のPE２２の動作を決定する。
その場合におけるCPU１１の処理の流れは、図３及び図４と同様である。すなわち、CPU１１は、画像認識プログラムをAC3Aに送信し、CPE２１の演算部２１ａは、CPU１１からの画像認識プログラムをRAM４にストアする。そして、CPU１１は、画像認識処理の対象である対象データのアドレスと、認識処理の結果データのアドレスと、画像認識プログラムについての負荷情報と、画像認識プログラムについての並列度情報とを、AC３Aへ送信する。AC３Aは、受信した負荷情報と並列度情報をRAM４に蓄積する。
一方、CPU１１が、画像データのデコード処理を、AC3Aに依頼して行わせる場合、CPU１１がAC3Aに対して、上述した画像認識処理のためのコマンドとは異なる、所定のコマンドを出力する。なお、CPU１１は、画像データのデコード処理の依頼を、上述した画像認識の処理の依頼と同時に行ってもよいし、別々に行ってもよい。AC3Aは、そのコマンドを受信してCPU１１により指定されたデコード処理を、ハード・マクロ２６を用いて行う。 Similarly to the first embodiment, when the CPU 11 requests the AC 3A to execute an image recognition application program, the CPU 11 outputs a predetermined command to the AC 3A. The AC 3A receives the command and processes the application program designated by the CPU 11. In this case, the image recognition application program is executed in the PE 22, but the operation of the PE 22 based on the load information and the parallelism information in this case is the same as the operation in the first embodiment. That is, the CPE 21 determines the operations of the plurality of PEs 22 based on the load information of the image processing program and the parallelism information.
In this case, the processing flow of the CPU 11 is the same as that shown in FIGS. That is, the CPU 11 transmits the image recognition program to AC3A, and the calculation unit 21a of the CPE 21 stores the image recognition program from the CPU 11 in the RAM 4. Then, the CPU 11 transmits the address of the target data that is the target of the image recognition process, the address of the result data of the recognition process, the load information about the image recognition program, and the parallel degree information about the image recognition program to the AC 3A. To do. The AC 3A stores the received load information and parallelism information in the RAM 4.
On the other hand, when the CPU 11 requests the AC 3A to perform image data decoding processing, the CPU 11 outputs a predetermined command different from the above-described image recognition processing command to the AC 3A. Note that the CPU 11 may request the image data decoding process simultaneously with the above-described image recognition processing request, or may perform the request separately. The AC 3A receives the command and performs the decoding process designated by the CPU 11 using the hardware macro 26.

図１０は、その場合におけるCPU１１の処理の流れの例を示すフローチャートである。 FIG. 10 is a flowchart showing an example of the processing flow of the CPU 11 in that case.

CPU１１が、画像データのデコード処理をAC３Aに分担させる場合、CPU１１は、デコーダ２６Cと２６Dの使用の有無の通知を、AC3Aへ行う（ステップS11）。CPU１１は、デコード処理を依頼するので、デコーダ２６Cと２６Dを使用することを通知し、その結果、エンコーダ２６Aと２６Bは使用しないことが通知されたことになる。 When the CPU 11 shares the decoding process of the image data with the AC 3A, the CPU 11 notifies the AC 3A of the presence / absence of the use of the decoders 26C and 26D (step S11). Since the CPU 11 requests the decoding process, the CPU 11 notifies that the decoders 26C and 26D are used, and as a result, notifies that the encoders 26A and 26B are not used.

次に、図３の場合と同様に、CPU１１は、対象データのアドレスと、結果データのアドレスと、負荷情報と、並列度情報とを、AC３Aへ送信する（ステップS2）。ここでは、対象データは、デコード処理の対象データであり、結果データは、デコード処理の結果データであり、負荷情報は、デコード処理の対象データについての負荷情報であり、並列度情報は、デコード処理の並列度情報である。負荷情報は、ここでは、対象データである画像データの解像度、プロファイル等に応じて、決定される。例えば、解像度が高ければ、処理の負荷が大きくなり、解像度が低ければ、その負荷は小さくなるからである。AC３Aは、受信した負荷情報と並列度情報をRAM４に蓄積する。 Next, as in the case of FIG. 3, the CPU 11 transmits the address of the target data, the address of the result data, the load information, and the parallelism information to the AC 3A (step S2). Here, the target data is the target data for the decoding process, the result data is the result data for the decoding process, the load information is the load information for the target data for the decoding process, and the parallel degree information is the decoding process. Is the degree of parallelism information. Here, the load information is determined according to the resolution, profile, and the like of the image data that is the target data. For example, if the resolution is high, the processing load increases, and if the resolution is low, the load decreases. The AC 3A stores the received load information and parallelism information in the RAM 4.

図１１は、デコード処理についての負荷情報と並列度情報を示すテーブルデータの例を示す図である。図１１に示すように、画像データの解像度のレベルに応じて、負荷情報と、並列度情報とが予め設定されている。図示はしないが、エンコード処理についても、図１１と同様のテーブルデータが用意されている。 FIG. 11 is a diagram illustrating an example of table data indicating load information and parallelism information regarding decoding processing. As shown in FIG. 11, load information and parallelism information are set in advance according to the resolution level of image data. Although not shown, the same table data as in FIG. 11 is prepared for the encoding process.

CPE２１における画像認識プログラムの処理については、第１の実施の形態における図５から図７と同様であるので、説明は省略する。
デコード処理について、図１２を用いて説明する。図１２は、CPE２１におけるデコード処理の例を示すフローチャートである。
CPE２１は、CPU１１から上述したデコード処理を依頼されると、受信した負荷情報と並列度情報を参照し、その負荷情報と並列度情報をRAM４にストアする（ステップS11）。 Since the processing of the image recognition program in the CPE 21 is the same as that in FIGS. 5 to 7 in the first embodiment, the description thereof is omitted.
The decoding process will be described with reference to FIG. FIG. 12 is a flowchart illustrating an example of decoding processing in the CPE 21.
When requested by the CPU 11 to perform the decoding process, the CPE 21 refers to the received load information and parallelism information, and stores the load information and parallelism information in the RAM 4 (step S11).

CPE２１は、その負荷情報と並列度情報とに基づいて、動作すべきハード・マクロ（HM）を決定する（ステップS22）。すなわち、CPE２１は、負荷情報に、並列度情報を加味して、動作すべき１以上のハード・マクロ（HM）を決定し、動作するハード・マクロ２６の数が決定される。 The CPE 21 determines a hardware macro (HM) to be operated based on the load information and the parallelism information (step S22). In other words, the CPE 21 considers the parallelism information to the load information, determines one or more hard macros (HM) to be operated, and determines the number of hard macros 26 to be operated.

ここでは、依頼された処理がデコード処理なので、デコーダ２６C、２６Dの２つが使用可能であり、並列度情報が「２」であれば、２つのハード・マクロ２６Cと２６Dを、動作するハード・マクロとして決定される。 Here, since the requested process is a decoding process, two decoders 26C and 26D can be used, and if the degree of parallelism information is “2”, the two hard macros 26C and 26D are operated. As determined.

そして、第１の実施の形態と同様に、CPE２１は、受信した負荷情報と並列度情報に基づいて、各ハード・マクロ２６を、どれくらいの動作周波数で実行できるかを決定することができる。さらに、デコード処理を行わないハード・マクロがあれば、消費電力が最小になるように、そのようなハード・マクロ２６は、例えば、電力の供給を停止するように制御される。 As in the first embodiment, the CPE 21 can determine how many operating frequencies each hard macro 26 can execute based on the received load information and parallelism information. Further, if there is a hard macro that does not perform decoding processing, such a hard macro 26 is controlled to stop the supply of power, for example, so that power consumption is minimized.

従って、CPE２１は、決定した１以上の動作すべきハード・マクロ２６のそれぞれの動作周波数と供給電圧を決定する（ステップS13）。よって、動作しないハード・マクロ２６に対しては、クロック信号は、供給されず、かつ演算処理に必要な電力も供給されない。ステップS13におけるハード・マクロ２６に対する、負荷に応じた動作周波数と供給電圧の決定の方法は、第１の実施の形態の図６で説明したPE２２に対する、負荷電力に応じた動作周波数と供給電圧の決定の方法と同じであるので、説明は省略する。 Therefore, the CPE 21 determines the operating frequency and the supply voltage of each of the determined one or more hardware macros 26 to be operated (step S13). Therefore, the clock signal is not supplied to the hardware macro 26 that does not operate, and the power necessary for the arithmetic processing is not supplied. The method of determining the operating frequency and supply voltage corresponding to the load for the hardware macro 26 in step S13 is the same as the operation frequency and supply voltage corresponding to the load power for the PE 22 described in FIG. 6 of the first embodiment. Since it is the same as the determination method, the description is omitted.

次に、CPE２１は、動作するハード・マクロ（HM）２６に対して、起動命令を出力する（ステップS25）。起動命令を受信したハード・マクロ（HM）２６は、デコード処理の対象データを指定されたアドレスから読み出して取得し、デコード処理を施し、そのデコード処理した結果データを、指定されたアドレスに出力する。このとき、各ハード・マクロ２６は、F/V制御部２６ｃに通知されて設定された動作周波数と電圧に従って動作している。 Next, the CPE 21 outputs an activation command to the operating hard macro (HM) 26 (step S25). The hardware macro (HM) 26 that has received the start command reads out and acquires the data to be decoded from the designated address, performs the decoding process, and outputs the decoded result data to the designated address. . At this time, each hard macro 26 is operating according to the operating frequency and voltage set by being notified to the F / V control unit 26c.

以上のように、AC3Aは、複数の汎用処理部に加えて、複数のハード・マクロを有し、CPE２１が、処理対象のデータ負荷情報と並列度情報とに基づいて、その複数のハード・マクロの動作を決定する。
よって、本実施の形態によれば、AC3Aは、自律的に、内部の複数のPE２２と複数のハード・マクロ２６における処理分担を決定し、かつ消費電力を考慮して動作及び処理能力を決定して、CPU１１から依頼された処理を実行するようにしたので、AC３Aは、最適な消費電力で依頼された処理を行うことができる。 As described above, AC3A has a plurality of hard macros in addition to a plurality of general-purpose processing units, and the CPE 21 uses the plurality of hard macros based on the data load information to be processed and the parallelism information. Determine the behavior.
Therefore, according to the present embodiment, AC3A autonomously determines processing sharing among a plurality of internal PEs 22 and a plurality of hardware macros 26, and determines operation and processing capacity in consideration of power consumption. Since the processing requested from the CPU 11 is executed, the AC 3A can perform the requested processing with the optimum power consumption.

なお、上述した例では、ハード・マクロの行う処理は、画像データのエンコードとデコードの例として説明したが、他にも例えば、物理シミュレーション処理（仮想空間内での物理現象をシミュレーションする処理）、WIFI通信処理、暗号演算（符号化／復号化）処理、等であってもよい。 In the example described above, the processing performed by the hard macro has been described as an example of encoding and decoding of image data. However, for example, physical simulation processing (processing for simulating a physical phenomenon in a virtual space), WIFI communication processing, cryptographic operation (encoding / decoding) processing, and the like may be used.

以上のように、上述した実施の形態によれば、並列処理によりプログラムを実行可能な複数の演算部を有するアクセラレータが、自らの内部の複数の演算部間の分担を決定して、プログラムを実行可能なアクセラレータ及び情報処理装置を実現することができる。 As described above, according to the above-described embodiment, an accelerator having a plurality of arithmetic units capable of executing a program by parallel processing determines the sharing among a plurality of internal arithmetic units and executes the program. A possible accelerator and information processing apparatus can be realized.

本発明は、上述した実施の形態に限定されるものではなく、本発明の要旨を変えない範囲において、種々の変更、改変等が可能である。 The present invention is not limited to the above-described embodiments, and various changes and modifications can be made without departing from the scope of the present invention.

本発明の第１の実施の形態に係わる情報処理装置の構成を示す構成図である。It is a block diagram which shows the structure of the information processing apparatus concerning the 1st Embodiment of this invention. 本発明の第１の実施の形態に係わるアクセラレータの構成を説明するためのブロック図である。It is a block diagram for demonstrating the structure of the accelerator concerning the 1st Embodiment of this invention. 本発明の第１の実施の形態に係わる、CPUの処理の流れの例を示すフローチャートである。It is a flowchart which shows the example of the flow of a process of CPU concerning the 1st Embodiment of this invention. 本発明の第１の実施の形態に係わる、負荷情報と並列度情報を示すテーブルデータの例を示す図である。It is a figure which shows the example of the table data which shows the load information and parallelism information concerning the 1st Embodiment of this invention. 本発明の第１の実施の形態に係わる、CPEの処理の例を示すフローチャートである。It is a flowchart which shows the example of a process of CPE concerning the 1st Embodiment of this invention. 本発明の第１の実施の形態に係わる、動作周波数の決定処理の流れの例を示すフローチャートである。It is a flowchart which shows the example of the flow of the determination process of the operating frequency concerning the 1st Embodiment of this invention. 本発明の第１の実施の形態に係わる、CPEの演算部における、処理プログラムの終了時の処理の流れの例を示すフローチャートである。It is a flowchart which shows the example of the flow of a process at the time of completion | finish of a processing program in the calculating part of CPE concerning the 1st Embodiment of this invention. 本発明の第１の実施の形態に係わる、CPEにおける処理を説明するための図である。It is a figure for demonstrating the process in CPE concerning the 1st Embodiment of this invention. 本発明の第２の実施の形態に係わるアクセラレータの構成を示すブロック図である。It is a block diagram which shows the structure of the accelerator concerning the 2nd Embodiment of this invention. 本発明の第２の実施の形態に係わる、CPUの処理の流れの例を示すフローチャートである。It is a flowchart which shows the example of the flow of a process of CPU concerning the 2nd Embodiment of this invention. 本発明の第２の実施の形態に係わる、デコード処理についての負荷情報と並列度情報を示すテーブルデータの例を示す図である。It is a figure which shows the example of the table data which shows the load information and parallelism information about a decoding process concerning the 2nd Embodiment of this invention. 本発明の第２の実施の形態に係わる、CPEにおけるデコード処理の例を示すフローチャートである。It is a flowchart which shows the example of the decoding process in CPE concerning the 2nd Embodiment of this invention.

Explanation of symbols

１情報処理装置、２ PC、３アクセラレータ、２１制御用プロセッシングユニット（CPE）、２２プロセッシングユニット（PE） 1 Information processing equipment, 2 PC, 3 Accelerator, 21 Processing unit for control (CPE), 22 Processing unit (PE)

Claims

An accelerator that can be connected to an information processing device and can execute a program,
A plurality of arithmetic units capable of executing the program by parallel processing;
An operation control unit that controls at least one of the operation and processing capability of each of the plurality of arithmetic units;
Based on load information about the program to be executed, determine at least one of the operation and processing capacity of each of the plurality of arithmetic units, and a control unit that controls the operation control unit according to the determination;
An accelerator comprising:

An information processing apparatus having an accelerator and a computer connected to the accelerator,
The accelerator is an accelerator capable of executing a program,
A plurality of arithmetic units capable of executing the program by parallel processing;
An operation control unit that controls at least one of the operation and processing capability of each of the plurality of arithmetic units;
Based on load information about the program to be executed, determine at least one of the operation and processing capacity of each of the plurality of arithmetic units, and a control unit that controls the operation control unit according to the determination;
An information processing apparatus comprising:

An accelerator that can be connected to an information processing device,
A plurality of arithmetic units capable of executing a program by parallel processing;
A plurality of hardware engine units capable of executing predetermined processing on target data in parallel;
An operation control unit that controls at least one of the operation and processing capability of each of the plurality of arithmetic units and the plurality of hardware engine units;
Based on load information about the program to be executed, determine at least one of the operation and processing capability of each of the plurality of arithmetic units, and based on the load information about the target data, the plurality of hardware A control unit that determines at least one of the operation and processing capacity of each engine unit and controls the operation control unit in accordance with the determination;
An accelerator comprising:

An information processing apparatus having an accelerator and a computer connected to the accelerator,
The accelerator is
A plurality of arithmetic units capable of executing a program by parallel processing;
A plurality of hardware engine units capable of executing predetermined processing on target data in parallel;
An operation control unit that controls at least one of the operation and processing capability of each of the plurality of arithmetic units and the plurality of hardware engine units;
Based on load information about the program to be executed, determine at least one of the operation and processing capability of each of the plurality of arithmetic units, and based on the load information about the target data, the plurality of hardware A control unit that determines at least one of the operation and processing capacity of each engine unit and controls the operation control unit in accordance with the determination;
An information processing apparatus comprising:

An information processing method using an accelerator having a plurality of arithmetic units capable of executing a program by parallel processing and an operation control unit that controls at least one of the operation and processing capability of each of the plurality of arithmetic units,
Based on the load information about the program to be executed, determine at least one of the operation and processing capacity of each of the plurality of arithmetic units,
An information processing method comprising controlling the operation control unit in response to the determination.