JP7474061B2

JP7474061B2 - Interface device, data processing device, cache control method, and program

Info

Publication number: JP7474061B2
Application number: JP2020022672A
Authority: JP
Inventors: 忠幸伊藤
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-03-01
Filing date: 2020-02-13
Publication date: 2024-04-24
Anticipated expiration: 2040-02-13
Also published as: JP2020144856A

Description

本発明はインタフェース装置、データ処理装置、キャッシュ制御方法、及びプログラムに関し、特に共有キャッシュメモリに関する。 The present invention relates to an interface device, a data processing device, a cache control method, and a program, and in particular to a shared cache memory.

近年、１つの製品が様々な機能を実現することが求められている。例えば、複数のデータ処理部を有する製品において、アプリケーションに応じて使用するデータ処理部を組み合わせて用いることにより、様々な機能を実現する手法が知られている。 In recent years, there has been a demand for a single product to realize a variety of functions. For example, in a product that has multiple data processing units, a method is known in which a variety of functions are realized by combining the data processing units to be used depending on the application.

このような構成においては、データ処理部の間でのデータ転送を効率化することにより、処理速度を向上できる。例えば特許文献１は、共有キャッシュメモリ装置を介して２つのプロセッサを接続する方法を開示している。特許文献１の方法によれば、共有キャッシュメモリ装置は、第１プロセッサによるデータの書き込みを監視し、第２プロセッサにより要求されたデータが書き込まれると、このデータを第２プロセッサに転送する。 In such a configuration, the processing speed can be improved by making data transfer between data processing units more efficient. For example, Patent Document 1 discloses a method of connecting two processors via a shared cache memory device. According to the method of Patent Document 1, the shared cache memory device monitors data writing by the first processor, and when data requested by the second processor is written, it transfers this data to the second processor.

特開２０１２－４３０３１号公報JP 2012-43031 A

データ処理部は様々な考え方に従って製造され、また、様々なメーカーによって製造される。このため、それぞれのデータ処理部が有する、データ処理の処理単位などの、データ処理の仕様又は制約は、互いに異なっていることが多い。このため、前段のデータ処理部と後段のデータ処理部とを接続する装置は、前段のデータ処理部から受け取ったデータをただちに後段のデータ処理部に転送できないことが多い。したがって、前段のデータ処理部から受け取ったデータを少なくとも一時的に何らかの記憶部に格納する必要がある。このような処理は、追加の回路を必要とし、又は追加の処理負荷を必要とする。 Data processing units are manufactured according to various concepts and by various manufacturers. For this reason, the data processing specifications or constraints, such as the processing unit of data processing, of each data processing unit are often different from each other. For this reason, a device connecting a previous data processing unit to a subsequent data processing unit is often unable to immediately transfer data received from the previous data processing unit to the subsequent data processing unit. Therefore, it is necessary to at least temporarily store the data received from the previous data processing unit in some kind of memory unit. Such processing requires additional circuitry or an additional processing load.

本発明は、処理部を互いに接続するインタフェース装置における、一方の処理部から他方の処理部へのデータ転送処理を効率化することを目的とする。 The present invention aims to improve the efficiency of data transfer processing from one processing unit to another in an interface device that connects processing units to each other.

本発明の目的を達成するために、例えば、本発明のインタフェース装置は以下の構成を備える。すなわち、
複数の処理部の共有キャッシュとして働くインタフェース装置であって、
前記複数の処理部に含まれる第１の処理部からデータを取得する第１のポートと、
前記第１の処理部から取得したデータを前記複数の処理部に含まれる第２の処理部に出力する第２のポートと、
前記第１の処理部から取得したデータをキャッシュするキャッシュ手段と、
前記キャッシュ手段に書き込まれたデータを前記キャッシュ手段とは異なる記憶手段にライトバックするか否かを、前記第２の処理部から取得した情報に基づいて制御する制御手段と、を備え、
前記第２の処理部から取得した情報は、前記第２の処理部が要求したデータを前記キャッシュ手段から前記記憶手段にライトバックする必要がないことを示すことを特徴とするインタフェース装置。 In order to achieve the object of the present invention, for example, an interface device of the present invention has the following arrangement:
1. An interface device that serves as a shared cache for a plurality of processing units, comprising:
a first port that acquires data from a first processing unit included in the plurality of processing units;
a second port that outputs data acquired from the first processing unit to a second processing unit included in the plurality of processing units;
a cache unit that caches data acquired from the first processing unit;
a control unit that controls whether or not to write back the data written to the cache unit to a storage unit different from the cache unit based on information acquired from the second processing unit ,
2. An interface device according to claim 1, wherein the information obtained from said second processing unit indicates that it is not necessary to write back data requested by said second processing unit from said cache means to said storage means .

処理部を互いに接続するインタフェース装置において、一方の処理部から他方の処理部へのデータ転送処理を効率化することができる。 In an interface device that connects processing units to each other, data transfer processing from one processing unit to another can be made more efficient.

一実施形態に係るデータ処理装置の構成例を示すブロック図。FIG. 1 is a block diagram showing an example of the configuration of a data processing device according to an embodiment. 一実施形態に係る前段処理と後段処理の接続例を示すブロック図。FIG. 1 is a block diagram showing an example of a connection between pre-stage processing and post-stage processing according to an embodiment. データ転送動作を説明する図。FIG. 4 is a diagram for explaining a data transfer operation. タイル走査を説明する図。FIG. 一実施形態に係るインタフェース装置の構成例を示すブロック図。FIG. 1 is a block diagram showing an example of the configuration of an interface device according to an embodiment. キャッシュ判定部４１２の構成例を示すブロック図。FIG. 4 is a block diagram showing an example of the configuration of a cache determinator 412.

以下、添付図面を参照して実施形態を詳しく説明する。なお、以下の実施形態は特許請求の範囲に係る発明を限定するものではない。実施形態には複数の特徴が記載されているが、これらの複数の特徴の全てが発明に必須のものとは限らず、また、複数の特徴は任意に組み合わせられてもよい。さらに、添付図面においては、同一若しくは同様の構成に同一の参照番号を付し、重複した説明は省略する。 The following embodiments are described in detail with reference to the attached drawings. Note that the following embodiments do not limit the invention according to the claims. Although the embodiments describe multiple features, not all of these multiple features are necessarily essential to the invention, and multiple features may be combined in any manner. Furthermore, in the attached drawings, the same reference numbers are used for the same or similar configurations, and duplicate explanations are omitted.

［実施形態１］
（データ処理装置の構成例）
図１は、実施形態１に係るインタフェース装置を適用可能なデータ処理装置の構成例を示すブロック図である。データ処理装置の処理対象は特に限定されないが、図１には画像データに対する画像処理を行う画像処理装置が示されている。図１に示すデータ処理装置は、ＣＰＵ回路部１００、画像読取部１２０、画像入力部１３０、画像処理部１５０、及び画像表示部１６０を備える。 [Embodiment 1]
(Example of configuration of data processing device)
Fig. 1 is a block diagram showing an example of the configuration of a data processing device to which the interface device according to the first embodiment can be applied. The processing target of the data processing device is not particularly limited, but Fig. 1 shows an image processing device that performs image processing on image data. The data processing device shown in Fig. 1 includes a CPU circuit unit 100, an image reading unit 120, an image input unit 130, an image processing unit 150, and an image display unit 160.

画像読取部１２０は、レンズ１２４、ＣＣＤセンサ１２６、及び信号処理部１２７を有する。原稿１１０の画像は、レンズ１２４を介してＣＣＤセンサ１２６に結像する。そして、ＣＣＤセンサ１２６は画像を示すアナログ電気信号を生成する。信号処理部１２７は、Ｒ、Ｇ、Ｂの色ごとに補正処理を行い、さらにアナログ／デジタル変換を行うことでフルカラーのデジタル画像信号（画素値）を生成する。こうして生成されたデジタル画像信号は、画像入力部１３０に入力される。以下では、１枚の画像に含まれる複数の画素についてのデジタル画像信号（画素値）の集合を、画像データと呼ぶ。 The image reading unit 120 has a lens 124, a CCD sensor 126, and a signal processing unit 127. The image of the original 110 is focused on the CCD sensor 126 via the lens 124. The CCD sensor 126 then generates an analog electrical signal representing the image. The signal processing unit 127 performs correction processing for each of the colors R, G, and B, and further performs analog/digital conversion to generate a full-color digital image signal (pixel value). The digital image signal thus generated is input to the image input unit 130. Hereinafter, a collection of digital image signals (pixel values) for multiple pixels contained in one image will be referred to as image data.

画像処理部１５０は、画像入力部１３０に入力された画像データに対する画像処理を行う。画像処理としては、センサ素子の個体差を補償する処理、入力ガンマ補正などの色補正処理、空間フィルタ処理、色空間変換処理、濃度補正処理、及び中間調処理などが挙げられるが、これらには限定されない。画像処理部１５０は、例えば画像を印刷するための画像処理を行うことにより、印刷用の画像データを作成することができる。なお、画像処理部１５０は、複数フレームの画像データを含む映像データに対する画像処理を行ってもよい。 The image processing unit 150 performs image processing on the image data input to the image input unit 130. Examples of image processing include, but are not limited to, processing to compensate for individual differences in sensor elements, color correction processing such as input gamma correction, spatial filtering processing, color space conversion processing, density correction processing, and halftone processing. The image processing unit 150 can create image data for printing, for example, by performing image processing for printing an image. Note that the image processing unit 150 may also perform image processing on video data that includes image data for multiple frames.

画像表示部１６０は、画像処理部１５０による画像処理後の画像データの表示を行う。画像表示部１６０は、画像処理後の映像をディスプレイなどの画像表示装置に表示してもよい。また、画像表示部１６０の代わりに、又はこれに加えて、データ処理装置が画像印刷部１７０を有していてもよい。画像印刷部１７０は、画像処理部１５０による画像処理後の画像データに従う印刷を行う。画像印刷部１７０は、インクジェットヘッド又はサーマルヘッドなどを備え、画像データのデジタル画像信号に基づいて記録紙上に画像を記録するプリンタであってもよい。 The image display unit 160 displays the image data after image processing by the image processing unit 150. The image display unit 160 may display the image after image processing on an image display device such as a display. The data processing device may also have an image printing unit 170 instead of or in addition to the image display unit 160. The image printing unit 170 performs printing according to the image data after image processing by the image processing unit 150. The image printing unit 170 may be a printer equipped with an inkjet head or a thermal head, etc., that records an image on recording paper based on the digital image signal of the image data.

ＣＰＵ回路部１００は、演算制御用のプロセッサであるＣＰＵ１０２、固定データ又はプログラムを格納するメモリであるＲＯＭ１０４、データ又はプログラムが一時的にロードされるメモリであるＲＡＭ１０６、及び外部記憶装置１０８などを備える。ＣＰＵ回路部１００は、画像読取部１２０、画像処理部１５０、画像表示部１６０、及び画像印刷部１７０等を制御することにより、データ処理装置が行う処理のシーケンスを統括的に制御することができる。外部記憶装置１０８は、データ処理装置が使用するパラメータ、プログラム、及び補正データを記憶することができる、ディスクなどの記憶媒体である。データやプログラムなどが外部記憶装置１０８からＲＡＭ１０６へとロードされてもよい。 The CPU circuit section 100 includes a CPU 102, which is a processor for arithmetic control, a ROM 104, which is a memory for storing fixed data or programs, a RAM 106, which is a memory into which data or programs are temporarily loaded, and an external storage device 108. The CPU circuit section 100 can comprehensively control the sequence of processing performed by the data processing device by controlling the image reading section 120, the image processing section 150, the image display section 160, and the image printing section 170. The external storage device 108 is a storage medium such as a disk that can store parameters, programs, and correction data used by the data processing device. Data, programs, etc. may be loaded from the external storage device 108 to the RAM 106.

上記のように、画像入力部１３０、画像処理部１５０、画像表示部１６０、及び画像印刷部１７０の間では、データの転送が行われる。このデータの転送は、ＲＡＭ１０６又は外部記憶装置１０８を介して行われることがある。例えば、ＷＤＭＡＣ１９２(Write Direct Memory Access Controller)は、画像入力部１３０に入力されたデジタル画像信号を出力する。ＷＤＭＡＣ１９２は、画像データを、共有バス１９０を介してＲＡＭ１０６又は外部記憶装置１０８などに格納することができる。同様に、ＷＤＭＡＣ１９６も、画像処理部１５０からの画像データをＲＡＭ１０６又は外部記憶装置１０８などに格納することができる。 As described above, data is transferred between the image input unit 130, image processing unit 150, image display unit 160, and image printing unit 170. This data transfer may be performed via the RAM 106 or the external storage device 108. For example, a WDMAC 192 (Write Direct Memory Access Controller) outputs a digital image signal input to the image input unit 130. The WDMAC 192 can store image data in the RAM 106 or the external storage device 108 via the shared bus 190. Similarly, the WDMAC 196 can store image data from the image processing unit 150 in the RAM 106 or the external storage device 108.

また、ＲＤＭＡＣ１９４(Read Direct Memory Access Controller)は、共有バス１９０を介してＲＡＭ１０６又は外部記憶装置１０８などに記憶された画像データを読み出し、処理対象となる画素のデジタル画像信号を画像処理部１５０に入力することができる。同様に、ＲＤＭＡＣ１９８も、ＲＡＭ１０６又は外部記憶装置１０８から読み出した画像データを画像表示部１６０又は画像印刷部１７０に入力することができる。 In addition, the RDMAC 194 (Read Direct Memory Access Controller) can read image data stored in the RAM 106 or the external storage device 108 via the shared bus 190, and input a digital image signal of the pixel to be processed to the image processing unit 150. Similarly, the RDMAC 198 can also input image data read from the RAM 106 or the external storage device 108 to the image display unit 160 or the image printing unit 170.

ＣＰＵ１０２、画像入力部１３０、画像処理部１５０、画像表示部１６０、及び画像印刷部１７０は、このようなＷＤＭＡＣ１９２，１９６、及びＲＤＭＡＣ１９４，１９８の動作を設定したり、これらを起動したりすることができる。 The CPU 102, image input unit 130, image processing unit 150, image display unit 160, and image printing unit 170 can set and activate the operation of such WDMACs 192, 196 and RDMACs 194, 198.

（前段処理と後段処理の接続）
以上のように、画像入力部１３０、画像処理部１５０、画像表示部１６０、及び画像印刷部１７０の間では、データの転送が行われる。しかしながら、これらの処理部の間では、データ処理の仕様又は制約が異なっているかもしれない。実施形態１に係るインタフェース装置は、処理部間でのデータ処理の仕様又は制約の違いを緩衝（吸収）しながら、処理部間を接続することができる。 (Connection between pre-processing and post-processing)
As described above, data is transferred between the image input unit 130, image processing unit 150, image display unit 160, and image printing unit 170. However, the specifications or constraints of data processing may differ between these processing units. The interface device according to the first embodiment can connect the processing units while absorbing the differences in the specifications or constraints of data processing between the processing units.

以下の説明において、実施形態１に係るインタフェース装置は、複数の処理部の共有キャッシュとして働くインタフェース装置である。このインタフェース装置は、複数の処理部に含まれる前段処理部（第１の処理部）からデータを取得して複数の処理部に含まれる後段処理部（第２の処理部）に出力する。図２（Ａ）は、実施形態１に係るデータ処理装置の一部を示す。図２（Ａ）に示すように、本実施形態に係るインタフェース装置である共有キャッシュＩ／Ｆ２５０（以下、単にＩ／Ｆ２５０と呼ぶ）は、前段処理ブロック２２０と後段処理ブロック２３０とを接続している。以下、前段処理ブロック２２０のことを単に前段処理２２０と、後段処理ブロック２３０のことを単に後段処理２３０と、それぞれ呼ぶ。 In the following description, the interface device according to the first embodiment is an interface device that functions as a shared cache for multiple processing units. This interface device acquires data from a front-stage processing unit (first processing unit) included in the multiple processing units and outputs the data to a rear-stage processing unit (second processing unit) included in the multiple processing units. FIG. 2(A) shows a part of a data processing device according to the first embodiment. As shown in FIG. 2(A), a shared cache I/F 250 (hereinafter simply referred to as I/F 250), which is an interface device according to this embodiment, connects a front-stage processing block 220 and a rear-stage processing block 230. Hereinafter, the front-stage processing block 220 will be simply referred to as front-stage processing 220, and the rear-stage processing block 230 will be simply referred to as rear-stage processing 230.

前段処理２２０に含まれる処理部２２４は、例えば画像処理部１５０であってもよい。この場合、ＲＤＭＡＣ２２２及びＷＤＭＡＣ２２６は、それぞれＲＤＭＡＣ１９４及びＷＤＭＡＣ１９６である。また、後段処理２３０に含まれる処理部２３４は、例えば画像表示部１６０であってもよい。この場合、ＲＤＭＡＣ２３２はＲＤＭＡＣ１９８である。ＲＤＭＡＣ２２２とＷＤＭＡＣ２３６の少なくとも一方は省略されてもよい。 The processing unit 224 included in the front-stage processing 220 may be, for example, the image processing unit 150. In this case, the RDMAC 222 and the WDMAC 226 are the RDMAC 194 and the WDMAC 196, respectively. The processing unit 234 included in the back-stage processing 230 may be, for example, the image display unit 160. In this case, the RDMAC 232 is the RDMAC 198. At least one of the RDMAC 222 and the WDMAC 236 may be omitted.

この例において、前段処理２２０は、入力データに対する第１のデータ処理によりデータ群を生成する。そして、後段処理２３０は、このデータ群に対する第２のデータ処理を行うことにより、入力データに対して第１のデータ処理及び第２のデータ処理を行って得られる処理結果を生成する。 In this example, the pre-processing 220 generates a data group by performing a first data processing on the input data. The post-processing 230 then performs a second data processing on this data group, thereby generating a processing result obtained by performing the first data processing and the second data processing on the input data.

実施形態１に係るインタフェース装置は、図１に示される他の処理部の間を接続してもよい。なお、処理部がデータ処理によりデータを生成又は修正することは必須ではない。例えば、前段処理２２０は画像入力部１３０であってもよく、この画像入力部１３０は受け取ったデータをそのまま出力してもよい。また、後段処理２３０は画像表示部１６０であってもよく、この画像表示部１６０は受け取ったデータをそのまま出力してもよい。また、実施形態１に係るインタフェース装置が接続する処理部の種類は、図１に示したものには限定されない。これらの処理部は、パイプライン回路などのハードウェアで実現されてもよいし、プロセッサ及びプログラム（ソフトウェア）の組み合わせにより実現されてもよい。 The interface device according to the first embodiment may connect between other processing units shown in FIG. 1. It is not essential that the processing units generate or modify data by data processing. For example, the pre-processing 220 may be the image input unit 130, which may output the received data as is. The post-processing 230 may be the image display unit 160, which may output the received data as is. The types of processing units to which the interface device according to the first embodiment connects are not limited to those shown in FIG. 1. These processing units may be realized by hardware such as a pipeline circuit, or may be realized by a combination of a processor and a program (software).

それぞれの処理部が処理対象データに対するデータ処理を行う場合、処理対象データに含まれる部分データに対して順次処理を行うことができる。例えば、画像データに対する画像処理を行う場合には、ラスタ走査順に各画素について処理を行うことができる。一方で、画像データを領域分割し、それぞれの領域について順に処理を行うこともできる。 When each processing unit performs data processing on the data to be processed, it can perform sequential processing on partial data included in the data to be processed. For example, when performing image processing on image data, it can process each pixel in raster scan order. On the other hand, it is also possible to divide the image data into regions and perform processing on each region in sequence.

例えば、画像データの領域分割手法として、画像データの２次元分割を用いることができる。この場合、画像データは複数のタイル領域（以下、単にタイル又はブロックと呼ぶことがある）へと分割される。ここで、１つのタイルにおける画像のことを部分画像と呼ぶ。以下、タイルごとに行うデータ処理について説明する。以下の例において、処理単位（又は処理粒度）は部分画像である。以下のようなタイルごとのデータ処理のことは、タイル処理又はブロック処理と呼ぶことができる。なお、１つのタイルは１画素に対応していてもよい。 For example, two-dimensional division of image data can be used as a region division method for image data. In this case, the image data is divided into multiple tile regions (hereinafter, sometimes simply referred to as tiles or blocks). Here, the image in one tile is called a partial image. Below, data processing performed for each tile is explained. In the following example, the processing unit (or processing granularity) is a partial image. The following data processing for each tile can be called tile processing or block processing. Note that one tile may correspond to one pixel.

画像データに対する画像処理を行う場合、処理前の画像データが読み込まれ、そして処理後の画像データが生成される。図３（Ａ）には、前段処理２２０において生成された画像データ３００の例が示されている。画像データ３００は複数のタイルに分割されており、図３（Ａ）には複数のタイルのうちのタイル３０２～３０８が示されている。図４（Ａ）はこのようなタイルの例を示す。それぞれのタイルの大きさは特に限定されず、長さＴＬ及び高さＴＨは任意の画素数であってもよい。図３（Ａ）において、１つのタイルは５画素×５画素の矩形領域である。 When performing image processing on image data, the image data before processing is read, and then the image data after processing is generated. FIG. 3(A) shows an example of image data 300 generated in the pre-stage processing 220. The image data 300 is divided into a number of tiles, and FIG. 3(A) shows tiles 302 to 308 out of the multiple tiles. FIG. 4(A) shows an example of such a tile. The size of each tile is not particularly limited, and the length TL and height TH may be any number of pixels. In FIG. 3(A), one tile is a rectangular area of 5 pixels by 5 pixels.

前段処理２２０においては、タイル単位で部分画像が生成される（タイル走査又はブロック走査とも呼ばれる）。ここで、部分画像の各画素のデータは、タイル３０２に示す矢印の順番で順次生成される。すなわち、前段処理２２０においてはタイル３０２、タイル３０４、タイル３０６、及びタイル３０８が順に生成され、こうして処理後の画像データが得られる。また、前段処理２２０からはタイル単位で部分画像が出力される。すなわち、前段処理２２０からは、タイル３０２、タイル３０４、タイル３０６、及びタイル３０８が順に出力される。部分画像の各画素のデータは、タイル３０２に示す矢印の順番で順次出力される。このとき、画像全体における走査されている画素の座標は、画像全体中でのタイルの位置と、タイル中の走査位置から、算出することができる。 In the pre-stage processing 220, a partial image is generated in tile units (also called tile scanning or block scanning). Here, data for each pixel of the partial image is generated sequentially in the order of the arrows shown in tile 302. That is, tiles 302, 304, 306, and 308 are generated in sequence in the pre-stage processing 220, and thus processed image data is obtained. In addition, the pre-stage processing 220 outputs a partial image in tile units. That is, tiles 302, 304, 306, and 308 are output sequentially from the pre-stage processing 220. Data for each pixel of the partial image is output sequentially in the order of the arrows shown in tile 302. At this time, the coordinates of the pixel being scanned in the entire image can be calculated from the position of the tile in the entire image and the scanning position in the tile.

図３（Ａ）にはまた、後段処理２３０において参照される画像データ３００の例が示されている。前段処理２２０において生成された画像データと後段処理２３０において参照される画像データは同一であるが、後段処理２３０においてはラスタ走査順に画像データ３００が参照される。すなわち、後段処理２３０では、ライン３１２、ライン３１４、ライン３１６の順に各画素のデータを参照することにより、画像全体に対する処理が行われる。この例において、後段処理２３０では、画像データ３００のライン３１２を参照して、画像データ３００に対する画像処理により得られる画像データにおける対応するラインのデータが生成されてもよい。 Figure 3 (A) also shows an example of image data 300 referenced in subsequent processing 230. The image data generated in the pre-stage processing 220 and the image data referenced in subsequent processing 230 are the same, but the image data 300 is referenced in raster scan order in subsequent processing 230. That is, in subsequent processing 230, processing is performed on the entire image by referencing the data of each pixel in the order of line 312, line 314, and line 316. In this example, subsequent processing 230 may reference line 312 of image data 300 to generate data for the corresponding line in the image data obtained by image processing of image data 300.

このように、前段処理２２０から出力された画像データを用いて後段処理２３０は画像処理を行う。しかしながら、前段処理２２０からの各画素のデータの出力順序と、各画素のデータの後段処理２３０による参照順序とは異なる。このような、接続される２つの処理部間の仕様又は制約の違いのために、前段処理２２０からデータが出力されてから、後段処理２３０にこのデータが入力されるまでの間に、データは一時的に何らかのバッファに保持される。Ｉ／Ｆ２５０はこのようなバッファを提供することができる。例えば、前段処理２２０は、画像に設定された、第１のサイズを有する複数のタイルのそれぞれに含まれるデータを、タイルごとにＩ／Ｆ２５０に送信することができる。また、後段処理２３０は、画像に設定された、第１のサイズとは異なる第２のサイズを有する複数のタイルのそれぞれに含まれるデータを、タイルごとにＩ／Ｆ２５０から受信することができる。 In this way, the latter-stage processing 230 performs image processing using the image data output from the former-stage processing 220. However, the output order of the data of each pixel from the former-stage processing 220 is different from the reference order of the data of each pixel by the latter-stage processing 230. Due to such differences in specifications or constraints between the two connected processing units, data is temporarily held in some kind of buffer between the time when the data is output from the former-stage processing 220 and the time when the data is input to the latter-stage processing 230. The I/F 250 can provide such a buffer. For example, the former-stage processing 220 can transmit data included in each of a plurality of tiles having a first size set in the image to the I/F 250 for each tile. Also, the latter-stage processing 230 can receive data included in each of a plurality of tiles having a second size different from the first size set in the image from the I/F 250 for each tile.

（インタフェース装置の構成例）
図５に示すように、前段処理２２０と後段処理２３０とを接続する本実施形態に係るＩ／Ｆ２５０は、前段処理２２０から取得したデータをキャッシュするキャッシュメモリ４３４を有している。また、Ｉ／Ｆ２５０はさらにキャッシュ判定部４１２を有している。キャッシュ判定部４１２は、キャッシュメモリ４３４に書き込まれたデータをキャッシュメモリ４３４とは異なる記憶部にライトバックするか否かを、後段処理２３０から取得した情報に基づいて制御することができる。このように、キャッシュ判定部４１２はＩ／Ｆ２５０におけるキャッシュ制御を実現できる。 (Example of the configuration of an interface device)
5, the I/F 250 according to this embodiment, which connects the pre-stage processing 220 and the post-stage processing 230, has a cache memory 434 that caches data acquired from the pre-stage processing 220. The I/F 250 also has a cache determination unit 412. The cache determination unit 412 can control whether or not data written to the cache memory 434 is written back to a storage unit other than the cache memory 434, based on information acquired from the post-stage processing 230. In this way, the cache determination unit 412 can realize cache control in the I/F 250.

以下では、ＲＡＭ１０６又は外部記憶装置１０８のような、キャッシュメモリ４３４とは異なる記憶部に確保した転送データの格納先を、以下ではグローバルバッファと総称する。グローバルバッファとしては例えばＤＲＡＭを用いることができる。また、キャッシュメモリ４３４は、例えばＳＲＡＭなどのオンチップメモリであり、グローバルバッファと比較して高速な読み書きが可能なメモリである。 In the following, the storage destination for transfer data secured in a storage unit other than the cache memory 434, such as the RAM 106 or the external storage device 108, will be collectively referred to as a global buffer. For example, a DRAM can be used as the global buffer. In addition, the cache memory 434 is an on-chip memory, such as an SRAM, and is a memory that can be read and written at higher speeds than the global buffer.

通常のキャッシュメモリにおいては、データの不整合を防ぐために、キャッシュメモリにデータを書き込む際に同じデータが主記憶にも書き込まれる（ライトスルー）。あるいは、キャッシュメモリに書き込まれたデータは、破棄される前に主記憶へと書き込まれる（ライトバック）。しかしながら、Ｉ／Ｆ２５０は、キャッシュメモリ４３４に書き込まれたデータをグローバルバッファに書き込んでからこのデータを破棄するか、このデータをグローバルバッファに書き込まずにこのデータを破棄するか、を制御できる。 In a typical cache memory, when data is written to the cache memory, the same data is also written to the main memory (write through) to prevent data inconsistencies. Alternatively, data written to the cache memory is written back to the main memory before being discarded (write back). However, I/F 250 can control whether data written to cache memory 434 is written to the global buffer and then discarded, or whether the data is discarded without writing it to the global buffer.

このような構成を有するＩ／Ｆ２５０を用いることにより、前段処理２２０から後段処理２３０へのデータ転送処理を効率化することができる。より具体的には、前段処理２２０から出力された画像データの全体をグローバルバッファ２４０に書き出す場合と比較して、処理速度を向上させ、及び消費電力を減少させることができる。すなわち、前段処理２２０により得られた画像データ全体をグローバルバッファ２４０に書き込み、及びグローバルバッファ２４０から読み出した場合、２つの画像のデータ量に対応するメモリアクセスが発生する。本実施形態では前段処理２２０から取得したデータの一部はグローバルバッファ２４０に書き込まれないため、メモリアクセスの増加に従うアクセス速度の低下及び消費電力の増加を抑制できる。 By using an I/F 250 having such a configuration, it is possible to improve the efficiency of data transfer processing from the pre-stage processing 220 to the post-stage processing 230. More specifically, it is possible to improve the processing speed and reduce power consumption compared to the case where the entire image data output from the pre-stage processing 220 is written to the global buffer 240. That is, when the entire image data obtained by the pre-stage processing 220 is written to the global buffer 240 and read from the global buffer 240, memory access corresponding to the data amount of two images occurs. In this embodiment, a portion of the data obtained from the pre-stage processing 220 is not written to the global buffer 240, so that it is possible to suppress the decrease in access speed and the increase in power consumption that comes with an increase in memory access.

また、このような構成を有するＩ／Ｆ２５０を用いる場合、キャッシュメモリ４３４の容量を小さくすることができる。すなわち、前段処理２２０から出力された画像データの全体を格納可能なキャッシュメモリを設けることが不要になる。このような、回路規模が大きいことが多いキャッシュメモリの容量を小さくすることにより、製品の製造コストを減らすことができる。 In addition, when an I/F 250 having such a configuration is used, the capacity of the cache memory 434 can be reduced. In other words, it is no longer necessary to provide a cache memory capable of storing the entire image data output from the pre-stage processing 220. By reducing the capacity of such cache memories, which often have large circuit scales, the manufacturing costs of the product can be reduced.

図４は、Ｉ／Ｆ２５０の構成の一例を示すブロック図である。Ｉ／Ｆ２５０は、前段処理２２０からデータを取得する第１のポートであるライトポート４０２と、前段処理２２０から取得したデータを後段処理２３０に出力する第２のポートであるリードポート４０４とを備える。また、Ｉ／Ｆ２５０はＮｅｔｗｏｒｋｏｎＣｈｉｐ２１０（以下、ＮｏＣ２１０と呼ぶ）に接続されている。ＮｏＣ２１０にはグローバルバッファ２４０も接続されており、Ｉ／Ｆ２５０はＮｏＣ２１０を経由してグローバルバッファ２４０との間でデータを入出力するアクセスポート４０６を備えている。図２（Ａ）に示されるように、Ｉ／Ｆ２５０は、アクセスポート４０６と、ＮｏＣ２１０と、例えばＤＲＡＭコントローラのようなコントローラ２４５とを介して、グローバルバッファ２４０にアクセスできる。図５に示すように、Ｉ／Ｆ２５０は、ＮｏＣ２１０を介さずに、前段処理２２０及び後段処理２３０に接続されている。 Figure 4 is a block diagram showing an example of the configuration of the I/F 250. The I/F 250 includes a write port 402, which is a first port that acquires data from the pre-stage processing 220, and a read port 404, which is a second port that outputs the data acquired from the pre-stage processing 220 to the post-stage processing 230. The I/F 250 is also connected to a Network on Chip 210 (hereinafter referred to as NoC 210). The global buffer 240 is also connected to the NoC 210, and the I/F 250 includes an access port 406 that inputs and outputs data to and from the global buffer 240 via the NoC 210. As shown in Figure 2 (A), the I/F 250 can access the global buffer 240 via the access port 406, the NoC 210, and a controller 245 such as a DRAM controller. As shown in FIG. 5, I/F 250 is connected to pre-stage processing 220 and post-stage processing 230 without going through NoC 210.

さらに、Ｉ／Ｆ２５０は、前段処理２２０から取得したデータをキャッシュするキャッシュメモリ４３４を有する。また、Ｉ／Ｆ２５０は、キャッシュメモリ４３４に書き込まれたデータをグローバルバッファ２４０にライトバックするか否かを、後段処理２３０から取得した情報に基づいて制御するキャッシュ判定部４１２と、を有している。 Furthermore, the I/F 250 has a cache memory 434 that caches data obtained from the front-stage processing 220. The I/F 250 also has a cache determination unit 412 that controls whether or not data written to the cache memory 434 is written back to the global buffer 240 based on information obtained from the back-stage processing 230.

以下、図５を参照して、Ｉ／Ｆ２５０の具体的な構成例を説明する。この例において、Ｉ／Ｆ２５０は、ライトポート４０２及びリードポート４０４への要求を同時に受け付けることが可能な、マルチポートの共有キャッシュである。 Below, a specific example of the configuration of the I/F 250 will be described with reference to FIG. 5. In this example, the I/F 250 is a multi-port shared cache that can simultaneously accept requests to the write port 402 and the read port 404.

ライトポート４０２を介して、Ｉ／Ｆ２５０にはライト要求、同期情報、及びライトデータが入力される。ライトデータは前段処理２２０から入力される画素のデータである。また、ライト要求は、前段処理２２０から取得した、ライトデータの受け取りを要求することを示す情報である。ライト要求は、ライトデータを特定する情報を含むことができる。以下の例では、ライト要求は、ライトデータが格納されるグローバルバッファ２４０のメモリアドレスを示す（ただし、後述するように、ライトデータはグローバルバッファ２４０に格納されないかもしれない）。一方で、ライト要求が、ライトデータに対応する画素の画素位置を示していてもよい。同期情報は、前段処理２２０から取得した情報（第１の情報）である。このデータは、ライトデータが後段処理２３０に転送されるデータであることを示すことができる。詳細については後述する。 A write request, synchronization information, and write data are input to the I/F 250 via the write port 402. The write data is pixel data input from the front-stage processing 220. The write request is information obtained from the front-stage processing 220 indicating a request to receive the write data. The write request can include information specifying the write data. In the following example, the write request indicates a memory address in the global buffer 240 where the write data is stored (however, as described below, the write data may not be stored in the global buffer 240). On the other hand, the write request may indicate the pixel position of the pixel corresponding to the write data. The synchronization information is information (first information) obtained from the front-stage processing 220. This data can indicate that the write data is data to be transferred to the back-stage processing 230. Details will be described later.

また、リードポート４０４を介して、Ｉ／Ｆ２５０にはリード要求及び同期情報が入力され、Ｉ／Ｆ２５０からリードデータが出力される。リードデータは後段処理２３０に入力される画素のデータである。リードデータは、前段処理２２０から入力されたライトデータであり、キャッシュメモリ４３４又はグローバルバッファ２４０に格納されている。リード要求は、後段処理２３０から取得した、リードデータの受け取りを要求することを示す情報である。リード要求は、リードデータを特定する情報を含むことができる。以下の例では、リード要求は、リードデータが格納されているグローバルバッファ２４０のメモリアドレスを示す（ただし、後述するように、リードデータはグローバルバッファ２４０に格納されていないかもしれない）。一方で、リード要求が、リードデータに対応する画素の画素位置を示していてもよい。同期情報は、後段処理２３０から取得した情報（第２の情報）である。この情報は、例えば、リードデータをキャッシュメモリ４３４からグローバルバッファ２４０にライトバックする必要がないことを示すことができる。詳細については後述する。 Also, a read request and synchronization information are input to the I/F 250 via the read port 404, and read data is output from the I/F 250. The read data is pixel data input to the post-stage processing 230. The read data is write data input from the pre-stage processing 220, and is stored in the cache memory 434 or the global buffer 240. The read request is information obtained from the post-stage processing 230 indicating a request to receive the read data. The read request may include information specifying the read data. In the following example, the read request indicates the memory address of the global buffer 240 where the read data is stored (however, as described later, the read data may not be stored in the global buffer 240). On the other hand, the read request may indicate the pixel position of the pixel corresponding to the read data. The synchronization information is information (second information) obtained from the post-stage processing 230. This information can indicate, for example, that it is not necessary to write back the read data from the cache memory 434 to the global buffer 240. Details will be described later.

なお、本実施形態において、ライトデータとリードデータのデータ量は同じであり、ライト要求とリード要求のアドレス指定方法も同じである。また、ライトデータとリードデータのデータ量は特に制限されない。例えば、ライトデータ及びリードデータが、１画素のデータであってもよいし、所定サイズ（例えば縦１画素×横８画素）の画素ブロックに含まれる画素のデータであってもよい。また、上述のとおり、前段処理２２０からの各画素のデータの出力順序と、各画素のデータの後段処理２３０による参照順序とは異なるかもしれない。すなわち、ライトポート４０２は、前段処理２２０から、画像データのようなデータ群に含まれるデータを第１の順序で取得することができる。一方でリードポート４０４は、後段処理２３０へと、データ群に含まれるデータを第１の順序とは異なる第２の順序で出力することができる。 In this embodiment, the data amount of the write data and the read data is the same, and the address specification method of the write request and the read request is also the same. In addition, the data amount of the write data and the read data is not particularly limited. For example, the write data and the read data may be data of one pixel, or may be data of pixels included in a pixel block of a predetermined size (for example, 1 pixel vertically by 8 pixels horizontally). In addition, as described above, the output order of the data of each pixel from the pre-stage processing 220 may be different from the reference order of the data of each pixel by the post-stage processing 230. That is, the write port 402 can obtain data included in a data group such as image data from the pre-stage processing 220 in a first order. On the other hand, the read port 404 can output data included in a data group to the post-stage processing 230 in a second order different from the first order.

Ｉ／Ｆ２５０は、プリフェッチ部４１０、中間ＦＩＦＯ４２０、及びフェッチ部４３０を有している。プリフェッチ部４１０は、キャッシュ判定及びプリフェッチ動作を行うことができる。本実施形態においてプリフェッチ部４１０は、ライトポート４０２へのライト要求及びリードポート４０４へのリード要求を受け付ける。そして、プリフェッチ部４１０は、プリフェッチ部４１０が有するキャッシュ判定部４１２を用いて、それぞれの要求に対するキャッシュ判定を行う。すなわちキャッシュ判定部４１２は、キャッシュヒット又はキャッシュミスの判定を行うことができる。具体的には、キャッシュ判定部４１２は、ライト要求において指定されたグローバルバッファ２４０のメモリアドレスに対応するデータがキャッシュメモリ４３４に格納されていると判断した場合、キャッシュヒットの判定を行う。一方でキャッシュ判定部４１２は、このデータが格納されていないと判断した場合、キャッシュミスの判定を行う。また、キャッシュ判定部４１２は、リード要求において指定されたリードデータがキャッシュメモリ４３４に格納されていると判断した場合、キャッシュヒットの判定を行い、格納されていないと判断した場合、キャッシュミスの判定を行う。 The I/F 250 has a prefetch unit 410, an intermediate FIFO 420, and a fetch unit 430. The prefetch unit 410 can perform cache judgment and prefetch operations. In this embodiment, the prefetch unit 410 accepts a write request to the write port 402 and a read request to the read port 404. The prefetch unit 410 then uses the cache judgment unit 412 of the prefetch unit 410 to perform cache judgment for each request. That is, the cache judgment unit 412 can perform a cache hit or cache miss judgment. Specifically, when the cache judgment unit 412 judges that data corresponding to the memory address of the global buffer 240 specified in the write request is stored in the cache memory 434, it judges that there is a cache hit. On the other hand, when the cache judgment unit 412 judges that the data is not stored, it judges that there is a cache miss. In addition, if the cache determination unit 412 determines that the read data specified in the read request is stored in the cache memory 434, it performs a cache hit determination, and if it determines that the data is not stored, it performs a cache miss determination.

ライト要求に対するキャッシュ判定結果、ライト要求、及びライトデータは、プリフェッチ部４１０から、中間ＦＩＦＯ４２０を介して、フェッチ部４３０が有するデータ取得部４３２へと送られる。データ取得部４３２は、ライトデータをキャッシュメモリ４３４に格納する。 The cache determination result for the write request, the write request, and the write data are sent from the prefetch unit 410 via the intermediate FIFO 420 to the data acquisition unit 432 of the fetch unit 430. The data acquisition unit 432 stores the write data in the cache memory 434.

通常のライト要求に対して、データ取得部４３２は、通常のキャッシュメモリへの書き込み時に行われる動作を行うことができる。例えば、プリフェッチ部４１０がライト要求に対してキャッシュヒットの判定を行った場合、キャッシュメモリ４３４には、ライト要求において指定されたアドレスのデータが格納されている。このため、フェッチ部４３０は、プリフェッチ部４１０からデータ取得部４３２へ送られたライトデータを、キャッシュメモリ４３４に上書きする。また、プリフェッチ部４１０がライト要求に対してキャッシュミスの判定を行った場合、キャッシュメモリ４３４には、ライト要求において指定されたアドレスのデータが格納されていない。この場合、プリフェッチ部４１０は、アクセスポート４０６を介して、グローバルバッファ２４０に対するリード要求を発行する。そしてフェッチ部４３０は、グローバルバッファ２４０から受け取ったデータにライトデータを上書きし、得られたデータをキャッシュメモリ４３４に格納する。 For a normal write request, the data acquisition unit 432 can perform the operation that is performed when writing to a normal cache memory. For example, when the prefetch unit 410 judges a write request as a cache hit, the cache memory 434 stores data at the address specified in the write request. Therefore, the fetch unit 430 overwrites the cache memory 434 with the write data sent from the prefetch unit 410 to the data acquisition unit 432. Also, when the prefetch unit 410 judges a write request as a cache miss, the cache memory 434 does not store data at the address specified in the write request. In this case, the prefetch unit 410 issues a read request to the global buffer 240 via the access port 406. The fetch unit 430 then overwrites the data received from the global buffer 240 with the write data, and stores the obtained data in the cache memory 434.

一方で本実施形態において、前段処理２２０は、後段処理２３０に転送するデータをＩ／Ｆ２５０に転送する場合、プリロード命令を指定してＩ／Ｆ２５０に対するライト要求を行う。この場合、プリフェッチ部４１０は、ライト要求に対してキャッシュミスの判定を行った場合であっても、グローバルバッファ２４０へのリード要求を発行しない。この場合、データ取得部４３２は、ライト要求と同期して入力されたライトデータを、キャッシュメモリ４３４に格納する。 On the other hand, in this embodiment, when the front-stage processing 220 transfers data to be transferred to the back-stage processing 230 to the I/F 250, it issues a write request to the I/F 250 specifying a preload command. In this case, the prefetch unit 410 does not issue a read request to the global buffer 240 even if it determines that the write request is a cache miss. In this case, the data acquisition unit 432 stores the write data input in synchronization with the write request in the cache memory 434.

また、リード要求に対するキャッシュ判定結果及びリード要求も、プリフェッチ部４１０から中間ＦＩＦＯ４２０を介してデータ取得部４３２へと送られる。リード要求に対して、データ取得部４３２は、通常のキャッシュメモリへの書き込み時に行われる動作を行うことができる。 The cache determination result for the read request and the read request are also sent from the prefetch unit 410 to the data acquisition unit 432 via the intermediate FIFO 420. In response to the read request, the data acquisition unit 432 can perform the same operations that are normally performed when writing to a cache memory.

例えば、プリフェッチ部４１０がリード要求に対してキャッシュヒットの判定を行った場合、キャッシュメモリ４３４には、リード要求において指定されたアドレスのデータが格納されている。このため、プリフェッチ部４１０がグローバルバッファ２４０に対するリード要求を発行する必要はない。データ取得部４３２は、リード要求がフェッチ部４３０に届いたときに、キャッシュメモリ４３４からリード要求に示されるデータを取り出し、リードデータとしてリードポート４０４に転送する。 For example, when the prefetch unit 410 judges whether there is a cache hit in response to a read request, the data at the address specified in the read request is stored in the cache memory 434. Therefore, the prefetch unit 410 does not need to issue a read request to the global buffer 240. When the read request reaches the fetch unit 430, the data acquisition unit 432 retrieves the data indicated in the read request from the cache memory 434 and transfers it to the read port 404 as read data.

一方、プリフェッチ部４１０がリード要求に対してキャッシュミスの判定を行った場合、キャッシュメモリ４３４には、リード要求において指定されたアドレスのデータが格納されていない。このためプリフェッチ部４１０は、アクセスポート４０６を介して、グローバルバッファ２４０に対するリード要求を発行する。するとフェッチ部４３０には、リード要求に対してキャッシュミスの判定が行われた場合と同様、リード要求において指定されたメモリアドレスのデータを含むデータが入力される。データ取得部４３２は、リード要求がフェッチ部４３０に届いたときに、グローバルバッファ２４０からのデータを受信してキャッシュメモリ４３４に格納する。そして、データ取得部４３２は、リード要求に示されるデータを、リードデータとしてリードポート４０４に転送する。 On the other hand, if the prefetch unit 410 judges the read request to be a cache miss, the data at the address specified in the read request is not stored in the cache memory 434. Therefore, the prefetch unit 410 issues a read request to the global buffer 240 via the access port 406. Then, data including the data at the memory address specified in the read request is input to the fetch unit 430, just as when a cache miss is judged for the read request. When the read request reaches the fetch unit 430, the data acquisition unit 432 receives the data from the global buffer 240 and stores it in the cache memory 434. The data acquisition unit 432 then transfers the data indicated in the read request to the read port 404 as read data.

以上のようにＩ／Ｆ２５０は、ライト要求及びリード要求に対して適切な処理を行うことができる。 As described above, I/F 250 can perform appropriate processing for write and read requests.

次に図６を参照してキャッシュ判定部４１２の構成について説明する。以下の例において、連想（ライン選択）方式としてはフルアソシアティブ方式が用いられ、Ｉ／Ｆ２５０はフルアソシアティブ方式に従うキャッシュ動作を行う。Ｉ／Ｆ２５０はマルチポートの共有キャッシュであるため、キャッシュ判定部４１２には複数のポートからの要求が入力される。図６には、複数のポートとして、ポート［０］５１２、ポート［１］５１４、・・・、及びポート［Ｎ－１］５１６が示されている。上述のライトポート４０２及びリードポート４０４は、これらのポートのいずれかである。 Next, the configuration of the cache determination unit 412 will be described with reference to FIG. 6. In the following example, the full associative method is used as the associative (line selection) method, and the I/F 250 performs cache operations according to the full associative method. Since the I/F 250 is a multi-port shared cache, requests are input from multiple ports to the cache determination unit 412. FIG. 6 shows the multiple ports as port [0] 512, port [1] 514, ..., and port [N-1] 516. The above-mentioned write port 402 and read port 404 are any of these ports.

選択回路５１８は、各ポート５１２～５１６から入力された要求を選択する。選択されたリード要求又はライト要求に示されるアドレスは、アドレスレジスタ５２１に記憶される。また、ライトポート４０２又はリードポート４０４に入力された同期情報は、同期情報レジスタ５３０に記憶される。 The selection circuit 518 selects a request input from each of the ports 512 to 516. The address indicated in the selected read request or write request is stored in the address register 521. In addition, the synchronization information input to the write port 402 or the read port 404 is stored in the synchronization information register 530.

キャッシュ判定部４１２は、８個のキャッシュタグ４１４を記憶することができる。この例において、Ｉ／Ｆ２５０は８ノードのフルアソシアティブ方式のキャッシュ装置となる。また８個のキャッシュタグ４１４のそれぞれには、予め定められた番号（［０］～［７］）が付されており、これらの番号は、対応するキャッシュメモリの「相対」キャッシュライン番号を示す。図６の例において、キャッシュメモリ４３４は８個のキャッシュラインを有しており、８個のキャッシュラインにはＦＩＦＯ方式に従ってデータが格納される。なお、キャッシュラインの数、及びそれぞれのキャッシュラインの容量は特に限定されず、適宜設定することができる。 The cache determination unit 412 can store eight cache tags 414. In this example, the I/F 250 is an eight-node fully associative cache device. Each of the eight cache tags 414 is assigned a predetermined number ([0] to [7]), and these numbers indicate the "relative" cache line number of the corresponding cache memory. In the example of FIG. 6, the cache memory 434 has eight cache lines, and data is stored in the eight cache lines according to the FIFO method. The number of cache lines and the capacity of each cache line are not particularly limited and can be set as appropriate.

またキャッシュ判定部４１２は、８個の同期情報５３２を記憶することができる。それぞれの同期情報５３２は８個のキャッシュタグ４１４のうちの１つに対応し、同じ番号（［０］～［７］）が付されている。同期情報５３２は、ライトポート４０２に入力された同期情報、リードポート４０４に入力された同期情報、又はこれらの演算結果を示すことができる。以下の例において、同期情報５３２は、ライトポート４０２に入力された同期情報、又はこれとリードポート４０４に入力された同期情報との演算結果である。 The cache determination unit 412 can also store eight pieces of synchronization information 532. Each piece of synchronization information 532 corresponds to one of the eight cache tags 414 and is assigned the same number ([0] to [7]). The synchronization information 532 can indicate the synchronization information input to the write port 402, the synchronization information input to the read port 404, or the result of an operation between the synchronization information input to the write port 402 and the synchronization information input to the read port 404. In the following example, the synchronization information 532 is the synchronization information input to the write port 402, or the result of an operation between the synchronization information input to the write port 402 and the synchronization information input to the read port 404.

以下では最も古いデータが格納されているキャッシュラインの「相対」キャッシュライン番号は［０］であり、最も新しいデータが格納されているキャッシュラインの「相対」キャッシュライン番号は［７］である。また、キャッシュミスと判定されると、これから新しいデータが格納される（破棄されるデータが格納されている）キャッシュラインの「相対」キャッシュライン番号が［７］となる。 In the following example, the "relative" cache line number of the cache line in which the oldest data is stored is [0], and the "relative" cache line number of the cache line in which the newest data is stored is [7]. Also, when a cache miss is determined, the "relative" cache line number of the cache line in which new data will be stored (the cache line in which the data to be discarded is stored) becomes [7].

キャッシュ判定部４１２は８個の比較器５２３を有し、それぞれの比較器５２３は８個のキャッシュタグ４１４のうちの１つに対応する。比較器５２３は、対応するキャッシュタグ４１４に格納されたアドレスと、アドレスレジスタ５２１に格納されたアドレスと、の比較を行い、アドレス同士が「一致」するか否かを示す比較結果５２４を判定器５２５へと出力する。 The cache determination unit 412 has eight comparators 523, each of which corresponds to one of the eight cache tags 414. The comparators 523 compare the address stored in the corresponding cache tag 414 with the address stored in the address register 521, and output a comparison result 524 indicating whether the addresses "match" to the determiner 525.

ここで、８個の比較器５２３から出力された８個の比較結果５２４のうち、１つでも「一致」を示す場合、判定器５２５はキャッシュヒットと判定する。一方で、８個の比較結果５２４のうちいずれも「一致」を示していない場合、判定器５２５はキャッシュミスと判定する。 Here, if any one of the eight comparison results 524 output from the eight comparators 523 indicates a "match," the determiner 525 determines that there is a cache hit. On the other hand, if none of the eight comparison results 524 indicates a "match," the determiner 525 determines that there is a cache miss.

キャッシュミスと判定された場合（分岐５２６でＹＥＳ）、アドレスレジスタ５２１に保持されているアドレスを値として有するように、キャッシュタグ４１４が更新される。図６においてキャッシュタグ４１４は、シフトレジスタを有する記憶領域に格納される。判定結果がキャッシュミスである場合、シフト動作が行われ、キャッシュタグの値は下流のキャッシュタグに移動する。すなわち、キャッシュタグ［０］の値はキャッシュタグ［１］の値に変化し、キャッシュタグ［１］の値はキャッシュタグ［２］の値に変化する。同様に移動が繰り返され、キャッシュタグ［６］の値はキャッシュタグ［７］の値に変化する。そして、キャッシュタグ［７］の値は、アドレスレジスタ５２１に格納されているアドレスの値に変化する。 If it is determined to be a cache miss (YES at branch 526), the cache tag 414 is updated to have the address held in the address register 521 as its value. In FIG. 6, the cache tag 414 is stored in a memory area having a shift register. If the determination result is a cache miss, a shift operation is performed and the value of the cache tag is moved to the downstream cache tag. That is, the value of cache tag [0] changes to the value of cache tag [1], and the value of cache tag [1] changes to the value of cache tag [2]. The movement is repeated in a similar manner, and the value of cache tag [6] changes to the value of cache tag [7]. The value of cache tag [7] then changes to the value of the address stored in the address register 521.

このように、図６の例では、古いキャッシュタグ［０］の値が破棄される、「ＦＩＦＯ方式（ラウンドロビン方式）」のキャッシュタグのリプレイス手法が用いられている。このような方式を、フルアソシアティブ方式のキャッシュ装置において採用することにより、装置を簡略化することができる。 In this way, in the example of Figure 6, a "FIFO (round robin)" cache tag replacement method is used in which the value of the old cache tag [0] is discarded. By adopting such a method in a fully associative cache device, the device can be simplified.

また、キャッシュミスと判定された場合、同期情報レジスタ５３０に格納されている値を保持するように、同期情報５３２が更新される。図６の例において同期情報５３２は、キャッシュタグ４１４と同様に、シフトレジスタを有する記憶領域に格納される。キャッシュミスと判定された場合、キャッシュタグ４１４と同様に、同期情報５３２のシフト動作が行われ、同期情報の値は下流の同期情報に移動する。すなわち、同期情報レジスタ５３０に格納されている値は同期情報［７］に書き込まれ、古い同期情報［０］の値は破棄される。 If a cache miss is determined, the synchronization information 532 is updated to retain the value stored in the synchronization information register 530. In the example of FIG. 6, the synchronization information 532 is stored in a memory area having a shift register, similar to the cache tag 414. If a cache miss is determined, a shift operation of the synchronization information 532 is performed, similar to the cache tag 414, and the value of the synchronization information is moved to the downstream synchronization information. That is, the value stored in the synchronization information register 530 is written to the synchronization information [7], and the value of the old synchronization information [0] is discarded.

一方で、キャッシュヒットと判定された場合、このようなキャッシュタグ４１４及び同期情報５３２の更新は行われない。その一方で、キャッシュヒットと判定された場合、修正器５３５は、キャッシュヒットと判定されたキャッシュタグ４１４に対応する同期情報５３２の修正を行う。すなわち、修正器５３５は、アドレスレジスタ５２１に格納されたアドレスと一致する値を有しているキャッシュタグ４１４の番号（［０］～［７］）と、同じ番号を有する同期情報５３２の値を修正する。 On the other hand, if a cache hit is determined, such updates to the cache tag 414 and synchronization information 532 are not performed. On the other hand, if a cache hit is determined, the modifier 535 modifies the synchronization information 532 corresponding to the cache tag 414 determined to be a cache hit. In other words, the modifier 535 modifies the number ([0] to [7]) of the cache tag 414 that has a value matching the address stored in the address register 521 and the value of the synchronization information 532 that has the same number.

判定器５２５は、以上のようなキャッシュヒット又はキャッシュミスを示すキャッシュ判定結果を、キャッシュミスフラグ５２８として出力する。また、判定結果がキャッシュヒットである場合、判定器５２５は、アドレスレジスタ５２１に格納されたアドレスと一致する値を有しているキャッシュタグ４１４の番号（［０］～［７］）を、ライン番号５２７として出力する。一方で、判定結果がキャッシュミスである場合、判定器５２５は、７（すなわちキャッシュタグ［７］の番号）をライン番号５２７として出力する。さらにキャッシュ判定部４１２は、キャッシュミスの判定を行った場合、シフト動作により破棄されるキャッシュタグ［０］の値５４０、及び破棄される同期情報［０］の値５４２も、キャッシュ判定結果として出力する。これらの情報に従って、プリフェッチ部４１０及びフェッチ部４３０は上述の動作を行うことができる。 The determiner 525 outputs the cache determination result indicating a cache hit or a cache miss as a cache miss flag 528. If the determination result is a cache hit, the determiner 525 outputs the number ([0] to [7]) of the cache tag 414 having a value matching the address stored in the address register 521 as the line number 527. On the other hand, if the determination result is a cache miss, the determiner 525 outputs 7 (i.e., the number of the cache tag [7]) as the line number 527. Furthermore, when the cache determination unit 412 determines a cache miss, it also outputs the value 540 of the cache tag [0] to be discarded by the shift operation and the value 542 of the synchronization information [0] to be discarded as the cache determination result. According to this information, the prefetch unit 410 and the fetch unit 430 can perform the above-mentioned operations.

なお、キャッシュヒットと判定された場合、ライト要求を受信したフェッチ部４３０は、ライン番号５２７により示されるキャッシュラインにライトデータを格納する。また、リード要求を受信したフェッチ部４３０は、ライン番号５２７により示されるキャッシュラインからリードデータを読み出す。 When a cache hit is determined, the fetch unit 430 that received the write request stores the write data in the cache line indicated by line number 527. When a read request is received, the fetch unit 430 reads the read data from the cache line indicated by line number 527.

一方、キャッシュミスと判定された場合、フェッチ部４３０は、ライン番号５２７により示されるキャッシュライン［７］に格納されていたデータを、同期情報［０］の値５４２に従って破棄し、又はグローバルメモリにライトバックする。ライトバックを行う場合、フェッチ部４３０は、グローバルメモリのキャッシュタグの値５４０により示されるアドレスに対してライトバックを行う。また、ライト要求を受信したフェッチ部４３０は、ライン番号５２７により示されるキャッシュライン［７］にライトデータを格納する。さらに、リード要求を受信したフェッチ部４３０は、ライン番号５２７により示されるキャッシュライン［７］に、グローバルバッファ２４０から受信したデータを書き込む。 On the other hand, if it is determined to be a cache miss, the fetch unit 430 discards the data stored in the cache line [7] indicated by the line number 527 according to the value 542 of the synchronization information [0], or writes it back to the global memory. When performing a write back, the fetch unit 430 performs a write back to the address indicated by the cache tag value 540 of the global memory. Furthermore, when the fetch unit 430 receives a write request, it stores the write data in the cache line [7] indicated by the line number 527. Furthermore, when the fetch unit 430 receives a read request, it writes the data received from the global buffer 240 to the cache line [7] indicated by the line number 527.

（動作例）
本実施形態において、Ｉ／Ｆ２５０は前段処理２２０の処理結果を後段処理２３０に転送し、また転送できない処理結果をグローバルバッファ２４０に退避する。このような処理制御は、例えば、以下のように同期情報を使用することにより実現できる。 (Example of operation)
In this embodiment, the I/F 250 transfers the processing results of the pre-stage processing 220 to the post-stage processing 230, and also saves the processing results that cannot be transferred in the global buffer 240. Such processing control can be realized, for example, by using synchronization information as follows.

本実施形態において、前段処理２２０が処理結果を後段処理２３０に転送する場合、前段処理２２０はプリロード命令を用いてＩ／Ｆ２５０に対するライト要求を行う。図６の例において前段処理２２０は、プリロード命令を行う際に、値が「１」である同期情報をＩ／Ｆ２５０に入力する。 In this embodiment, when the front-stage processing 220 transfers the processing result to the rear-stage processing 230, the front-stage processing 220 issues a write request to the I/F 250 using a preload command. In the example of FIG. 6, when the front-stage processing 220 issues a preload command, it inputs synchronization information with a value of "1" to the I/F 250.

上述のとおり、キャッシュ判定部４１２はプリロード命令が入力された場合、キャッシュミスの判定を行う。すなわち、上述の通り、キャッシュタグにアドレスを書き込み、前段処理から入力された同期情報の値である「１」を保持するように、同期情報５３２を更新する。また、上述の通り、この場合プリフェッチ部４１０はグローバルバッファ２４０へのリード要求を発行せず、前段処理２２０の処理結果であるライトデータがキャッシュメモリ４３４に格納される。 As described above, when a preload command is input, the cache determination unit 412 performs a cache miss determination. That is, as described above, it writes the address to the cache tag, and updates the synchronization information 532 so that it retains the value "1" of the synchronization information input from the previous stage processing. Also, as described above, in this case, the prefetch unit 410 does not issue a read request to the global buffer 240, and the write data that is the processing result of the previous stage processing 220 is stored in the cache memory 434.

一方で後段処理２３０は、前段処理２２０の処理結果を取得するために、Ｉ／Ｆに対するリード要求を行う。図６の例において後段処理２３０は、リード要求を行う際に、値が「１」である同期情報をＩ／Ｆ２５０に入力する。上述の通り、キャッシュ判定部４１２はリード要求に示されるアドレスに従ってキャッシュ判定を行い、キャッシュヒットと判定された場合、修正器５３５は同期情報５３２を修正する。本実施形態において修正器５３５は、キャッシュヒットと判定されたキャッシュタグ４１４に対応する同期情報５３２と、同期情報レジスタ５３０の値と、のＸＯＲ(Exclusive-OR）演算を行う。そして、ＸＯＲ演算により得られた値で、キャッシュヒットと判定されたキャッシュタグ４１４に対応する同期情報５３２を更新する。本実施形態の場合、プリロード命令により前段処理２２０からのデータがキャッシュメモリ４３４に格納されると、対応する同期情報５３２の値は上記の通り「１」となる。一方、リード要求を受けた際の同期情報レジスタ５３０の値は上記の通り「０」である。したがって、キャッシュメモリ４３４に格納されたデータに対するリード要求が行われると、対応する同期情報５３２の値は「１」から「０」になる。 On the other hand, the post-stage processing 230 makes a read request to the I/F to obtain the processing result of the pre-stage processing 220. In the example of FIG. 6, when making a read request, the post-stage processing 230 inputs synchronization information with a value of "1" to the I/F 250. As described above, the cache determination unit 412 makes a cache determination according to the address indicated in the read request, and if it is determined to be a cache hit, the modifier 535 modifies the synchronization information 532. In this embodiment, the modifier 535 performs an XOR (Exclusive-OR) operation on the synchronization information 532 corresponding to the cache tag 414 determined to be a cache hit and the value of the synchronization information register 530. Then, the synchronization information 532 corresponding to the cache tag 414 determined to be a cache hit is updated with the value obtained by the XOR operation. In this embodiment, when data from the pre-stage processing 220 is stored in the cache memory 434 by a preload command, the value of the corresponding synchronization information 532 becomes "1" as described above. On the other hand, the value of the synchronization information register 530 when a read request is received is "0" as described above. Therefore, when a read request is made for data stored in cache memory 434, the value of the corresponding synchronization information 532 changes from "1" to "0."

前段処理２２０及び後段処理２３０の処理が進むにつれて、キャッシュタグが更新されていき、上記のように一部のキャッシュタグはキャッシュ判定部４１２から破棄される。このとき、破棄されるキャッシュタグの値５４０、破棄される同期情報の値５４２、及びライン番号５２７が、フェッチ部４３０に入力される。 As the processing of the front-stage process 220 and the back-stage process 230 progresses, the cache tags are updated, and some cache tags are discarded by the cache determination unit 412 as described above. At this time, the value 540 of the cache tag to be discarded, the value 542 of the synchronization information to be discarded, and the line number 527 are input to the fetch unit 430.

入力された同期情報の値５４２が「０」である場合、キャッシュタグの値５４０が示すアドレスに対応する前段処理２２０からのデータは、後段処理２３０からのリード要求に従って後段処理２３０に転送されている。このため、このデータをグローバルバッファ２４０に待避させる必要はない。このデータは、プリロード命令により、キャッシュメモリ４３４のライン番号５２７に対応するキャッシュラインに格納されている。このため、入力された同期情報の値５４２が「０」である場合、フェッチ部４３０は、キャッシュメモリ４３４が有する、ライン番号５２７に対応するキャッシュラインのデータを破棄する。 When the input synchronization information value 542 is "0", the data from the pre-stage processing 220 corresponding to the address indicated by the cache tag value 540 has been transferred to the post-stage processing 230 in accordance with a read request from the post-stage processing 230. Therefore, there is no need to save this data in the global buffer 240. This data has been stored in the cache line corresponding to line number 527 of the cache memory 434 by a preload command. Therefore, when the input synchronization information value 542 is "0", the fetch unit 430 discards the data of the cache line corresponding to line number 527 held by the cache memory 434.

一方、入力された同期情報の値５４２が「１」である場合、キャッシュタグの値５４０が示すアドレスに対応する前段処理２２０からのデータは、後段処理２３０からのリード要求がないため後段処理２３０に転送されていない。このデータは、プリロード命令により、キャッシュメモリ４３４のライン番号５２７に対応するキャッシュラインに格納されている。このため、入力された同期情報の値５４２が「１」である場合、フェッチ部４３０は、キャッシュメモリ４３４が有する、ライン番号５２７に対応するキャッシュラインに格納されているデータを、グローバルバッファ２４０に待避させる。具体的にはフェッチ部４３０は、このデータをキャッシュタグの値５４０が示すグローバルバッファ２４０のアドレスに格納（ライトバック）する。 On the other hand, if the input synchronization information value 542 is "1", the data from the pre-stage processing 220 corresponding to the address indicated by the cache tag value 540 has not been transferred to the post-stage processing 230 because there is no read request from the post-stage processing 230. This data is stored in the cache line corresponding to line number 527 of the cache memory 434 by a preload command. Therefore, if the input synchronization information value 542 is "1", the fetch unit 430 evacuates the data stored in the cache line corresponding to line number 527 of the cache memory 434 to the global buffer 240. Specifically, the fetch unit 430 stores (writes back) this data at the address of the global buffer 240 indicated by the cache tag value 540.

以上のように、前段処理２２０からのライト要求により、ライトデータはキャッシュメモリに一時記憶される。そして、このライトデータをグローバルバッファ２４０にライトバックするかどうかは、後段処理２３０からのリード要求によって制御される。このように、前段処理２２０が送信したライトデータに対するライトバック動作を実行するか否かは、データを受信する後段処理２３０が決定する。より具体的には、後段処理２３０から得られた、リードデータをキャッシュメモリ４３４からグローバルバッファ２４０にライトバックする必要がないことを示す、リードポート４０４に入力された同期情報が参照される。そして、このような同期情報に少なくとも従って、破棄するデータをライトバックするか否かが切り替えられている。以上の例では、キャッシュメモリ４３４に書き込まれているデータを破棄する際に、このような同期情報に従って、ライトバックの有無が切り替えられている。 As described above, the write data is temporarily stored in the cache memory in response to a write request from the front-stage processing 220. Then, whether or not to write back this write data to the global buffer 240 is controlled by a read request from the back-stage processing 230. In this way, whether or not to execute a write-back operation for the write data sent by the front-stage processing 220 is determined by the back-stage processing 230 that receives the data. More specifically, the synchronization information input to the read port 404, which indicates that the read data obtained from the back-stage processing 230 does not need to be written back from the cache memory 434 to the global buffer 240, is referenced. Then, whether or not to write back the data to be discarded is switched at least in accordance with this synchronization information. In the above example, when discarding data written to the cache memory 434, whether or not to perform a write-back is switched in accordance with this synchronization information.

上記の具体例においては、プリロード命令を用いることによりライトデータが後段処理２３０に転送されるデータであることが示されている場合、キャッシュメモリ４３４に格納されたデータに関連付けて、同期情報の値５４２として「１」が格納される。この同期情報は、前段処理２２０から得られた、ライトデータが後段処理２３０に転送されるデータであることを示している。また、こうしてキャッシュメモリ４３４に格納されたデータは、グローバルバッファ２４０から取得されたものではなく、前段処理２２０から直接取得したものである。一方で、こうしてキャッシュメモリ４３４に格納されたデータを要求する場合、後段処理２３０は、リードポート４０４に同期情報として「０」を入力することができる。この同期情報は、後段処理２３０から得られた、リードデータをキャッシュメモリ４３４からグローバルバッファ２４０にライトバックする必要がないことを示している。これらの情報に従って、フェッチ部４３０は、キャッシュメモリ４３４に書き込まれているデータをグローバルバッファ２４０にライトバックせずに破棄した。 In the above specific example, when the preload command is used to indicate that the write data is data to be transferred to the subsequent processing 230, "1" is stored as the synchronization information value 542 in association with the data stored in the cache memory 434. This synchronization information indicates that the write data obtained from the previous processing 220 is data to be transferred to the subsequent processing 230. In addition, the data thus stored in the cache memory 434 is not obtained from the global buffer 240, but is obtained directly from the previous processing 220. On the other hand, when requesting the data thus stored in the cache memory 434, the subsequent processing 230 can input "0" as the synchronization information to the read port 404. This synchronization information indicates that the read data obtained from the subsequent processing 230 does not need to be written back from the cache memory 434 to the global buffer 240. In accordance with this information, the fetch unit 430 discards the data written in the cache memory 434 without writing it back to the global buffer 240.

このようにフェッチ部４３０は、前段処理２２０から得られた同期情報と、後段処理２３０から得られた同期情報と、の双方に基づいて、キャッシュメモリ４３４に書き込まれているデータをグローバルバッファ２４０にライトバックするか否かを制御している。とりわけ、上記の例においては、前段処理２２０から得られた同期情報と、後段処理２３０から得られた同期情報の演算結果である「０」が、同期情報の値５４２として格納されている。そして、この同期情報の値５４２に従って、ライトバックの制御が行われた。一方で、このような構成は一例にすぎない。例えば、同期情報の値５４２として、前段処理２２０から得られた同期情報と、後段処理２３０から得られた同期情報と、のそれぞれが格納されてもよい。 In this way, the fetch unit 430 controls whether or not to write back data written to the cache memory 434 to the global buffer 240 based on both the synchronization information obtained from the previous stage processing 220 and the synchronization information obtained from the subsequent stage processing 230. In particular, in the above example, "0", which is the calculation result of the synchronization information obtained from the previous stage processing 220 and the synchronization information obtained from the subsequent stage processing 230, is stored as the synchronization information value 542. Then, write-back control was performed according to this synchronization information value 542. However, this configuration is merely one example. For example, the synchronization information obtained from the previous stage processing 220 and the synchronization information obtained from the subsequent stage processing 230 may each be stored as the synchronization information value 542.

（後段処理でタイル走査が行われる場合の動作例）
実施形態１のようなＩ／Ｆ２５０を用いることにより、前段処理２２０及び後段処理２３０で用いられる走査順序にかかわらず、このような動作を実現することができる。実施形態１では、例えば図３（Ａ）に示すように、前段処理２２０でタイル走査が行われ、後段処理２３０でラスタ走査が行われていたが、前段処理２２０及び後段処理２３０はこれに限定されない。例えば、前段処理２２０で所定サイズのタイルに従うタイル走査が行われ、後段処理２３０で異なる大きさのタイルに従うタイル走査が行われる場合にも、実施形態１の方法は有効である。このような場合、後段処理２３０は、例えば、１つのタイル内の各画素の画素データをＩ／Ｆ２５０から取得し、取得した画素データを用いた処理を行い、このタイル内の各画素の処理後の画素データを生成することができる。後段処理２３０は、このようなタイルごとの処理をそれぞれのタイルについて繰り返すことにより、処理後の画像データを生成することができる。この場合もＩ／Ｆ２５０は、上記のように、後段処理２３０に要求されたデータをキャッシュメモリ４３４から出力し、又はグローバルバッファ２４０から取得して出力することができる。 (Example of operation when tile scanning is performed in later processing)
By using the I/F 250 as in the first embodiment, such an operation can be realized regardless of the scanning order used in the front-stage processing 220 and the rear-stage processing 230. In the first embodiment, for example, as shown in FIG. 3A, the front-stage processing 220 performs tile scanning and the rear-stage processing 230 performs raster scanning, but the front-stage processing 220 and the rear-stage processing 230 are not limited to this. For example, the method of the first embodiment is effective even when the front-stage processing 220 performs tile scanning according to tiles of a predetermined size and the rear-stage processing 230 performs tile scanning according to tiles of a different size. In such a case, the rear-stage processing 230 can, for example, obtain pixel data of each pixel in one tile from the I/F 250, perform processing using the obtained pixel data, and generate processed pixel data of each pixel in this tile. The rear-stage processing 230 can generate processed image data by repeating such processing for each tile for each tile. In this case as well, the I/F 250 can output data requested by the post-stage processing 230 from the cache memory 434, or obtain it from the global buffer 240 and output it, as described above.

一方、このような１つのタイルについての処理において、タイル外の画素の画素データが参照されることがある。例えば、後段処理２３０が画像データに対してＦＩＲフィルタのようなフィルタ処理を行う場合、ある画素の画素データを算出するために、周辺画素の画素データが参照されることがある。このような場合、後段処理２３０は、１つのタイル内の各画素の画素データに加えて、このタイルの周辺画素を含む、より大きなタイルの画素データをＩ／Ｆ２５０から取得する。 Meanwhile, in such processing of a single tile, pixel data of pixels outside the tile may be referenced. For example, when the post-processing 230 performs filtering processing such as an FIR filter on image data, pixel data of surrounding pixels may be referenced to calculate pixel data for a pixel. In such cases, in addition to the pixel data for each pixel in a single tile, the post-processing 230 obtains pixel data for a larger tile that includes the surrounding pixels of this tile from the I/F 250.

図４（Ｂ）～（Ｅ）の例では、後段処理２３０は、１つ目のタイルを処理する際に、より大きなタイルである領域３９１内のデータを取得し、同様に２～４つ目のタイルを処理する際に、より大きなタイルである領域３９２～３９４内のデータを取得する。図４（Ｂ）～（Ｅ）において、後段処理２３０によって画素のデータが２回以上取得される領域は、ハッチングで示されている。以下、このような領域のことをオーバーラップ領域と呼ぶ。例えば、後段処理２３０において、処理対象画素を中心とする縦５画素×横５画素の計２５画素を参照するフィルタ処理を行う場合、オーバーラップ領域の幅は２画素となる。 In the example of Figures 4 (B) to (E), when processing the first tile, the post-processing 230 obtains data in area 391, which is a larger tile, and similarly when processing the second to fourth tiles, it obtains data in areas 392 to 394, which are larger tiles. In Figures 4 (B) to (E), areas from which pixel data is obtained two or more times by the post-processing 230 are shown hatched. Hereinafter, such areas are referred to as overlap areas. For example, when the post-processing 230 performs filter processing that references a total of 25 pixels, 5 pixels vertically and 5 pixels horizontally, centered on the pixel to be processed, the width of the overlap area is 2 pixels.

以下、このようなオーバーラップ領域が存在する場合の、本実施形態に係るインタフェース装置の動作例について、図３（Ｂ）を参照して説明する。図３（Ｂ）で後段処理２３０は、領域３５０の画像データをＩ／Ｆ２５０から取得しようとしている。後段処理２３０は、領域３５０のうち、領域３５１のデータに対するリード要求を行う際には、図３（Ａ）の場合と同様に、リード要求の際に同期情報として「１」を設定する。ここで、領域３５１はオーバーラップ領域ではない領域であり、すなわち後続するタイルの処理において参照されない領域である。この場合、上述の通り、キャッシュヒットした場合は、キャッシュメモリ４３４に格納されているデータがライトバックされずに破棄される。 Below, an example of the operation of the interface device according to this embodiment when such an overlapping area exists will be described with reference to FIG. 3B. In FIG. 3B, the post-processing 230 is attempting to obtain image data of area 350 from the I/F 250. When making a read request for data in area 351 of area 350, the post-processing 230 sets "1" as synchronization information at the time of the read request, as in the case of FIG. 3A. Here, area 351 is an area that is not an overlapping area, that is, an area that is not referenced in the processing of subsequent tiles. In this case, as described above, when a cache hit occurs, the data stored in the cache memory 434 is discarded without being written back.

このように、後段処理２３０は、Ｉ／Ｆ２５０に対しデータを要求する際に、データを後の処理で再度要求するか否かを判定することができる。また、再度要求しないとの判定に応じて、要求するデータをキャッシュメモリ４３４からグローバルバッファ２４０にライトバックする必要がないことを示す同期情報（「１」）を、Ｉ／Ｆ２５０に対して送信することができる。 In this way, when requesting data from I/F 250, post-processing 230 can determine whether or not to request the data again in a later process. In addition, depending on the determination that the data will not be requested again, synchronization information ("1") indicating that the requested data does not need to be written back from cache memory 434 to global buffer 240 can be sent to I/F 250.

一方で後段処理２３０は、領域３５０のうち、領域３５２のデータに対するリード要求する際には、リード要求の際に同期情報として「０」を設定する。ここで、領域３５２はオーバーラップ領域であり、すなわち後続するタイルの処理において参照される領域である。この場合、キャッシュヒットしたとしても、ＸＯＲ演算の結果、キャッシュメモリ４３４に格納されているデータに対応する同期情報５３２の値は「１」のままとなる。このため、キャッシュヒットしたとしても、キャッシュメモリ４３４に格納されているデータはグローバルバッファ２４０にライトバックされる。この結果として、後続するタイルの処理時に、参照する領域のデータをグローバルバッファ２４０から取得することが可能となる。 On the other hand, when the post-processing 230 makes a read request for data in area 352 of area 350, it sets the synchronization information to "0" when making the read request. Here, area 352 is an overlap area, that is, an area that is referenced in the processing of the subsequent tile. In this case, even if there is a cache hit, the value of the synchronization information 532 corresponding to the data stored in the cache memory 434 remains "1" as a result of the XOR operation. Therefore, even if there is a cache hit, the data stored in the cache memory 434 is written back to the global buffer 240. As a result, it becomes possible to obtain the data of the referenced area from the global buffer 240 when processing the subsequent tile.

このように、Ｉ／Ｆ２５０に送信される、後段処理２３０が要求したデータをキャッシュメモリ４３４からグローバルバッファ２４０にライトバックする必要性を示す同期情報を制御することができる。後段処理２３０は、このような制御を、Ｉ／Ｆ２５０に対してタイル領域に含まれるデータを要求する際に、データが他のタイル領域に含まれるか否か（すなわちオーバーラップ領域に含まれるか否か）に応じて行うことができる。 In this way, it is possible to control the synchronization information transmitted to the I/F 250, which indicates the need to write back data requested by the post-stage processing 230 from the cache memory 434 to the global buffer 240. When requesting data contained in a tile region from the I/F 250, the post-stage processing 230 can perform such control depending on whether the data is contained in another tile region (i.e., whether it is contained in an overlap region).

なお、本実施形態に係るインタフェース装置の動作は、上記のものに限定されない。例えば、図３（Ｃ）の例で、後段処理２３０は、１つのタイルを処理する際に領域３６０のデータを取得する。ここで、図３（Ｃ）の例では行３８１の読み込みが終わった後に行３８２の読み込みが行われる。このため、別のタイルを処理するために領域３７５のデータを読み込む際には、領域３６０の下端にある領域３６４のデータがキャッシュメモリから破棄されている可能性が高い。このため、領域３６４のデータをグローバルバッファ２４０に待避させるために、後段処理２３０は領域３６４のデータに対するリード要求を行う際に同期情報として「０」を設定する。 Note that the operation of the interface device according to this embodiment is not limited to the above. For example, in the example of FIG. 3C, the post-processing 230 acquires data in area 360 when processing one tile. Here, in the example of FIG. 3C, after reading row 381, row 382 is read. Therefore, when reading data in area 375 to process another tile, it is highly likely that data in area 364 at the bottom end of area 360 has been discarded from the cache memory. Therefore, in order to save the data in area 364 to the global buffer 240, the post-processing 230 sets "0" as the synchronization information when making a read request for the data in area 364.

一方で、図３（Ｃ）の例では領域３６０のデータの読み込みが終わった後に領域３７０のデータの読み込みが行われる。したがって、領域３７０は領域３６２を含んでいるが、領域３７０のデータを読み込む際に領域３６２のデータはキャッシュヒットする。すなわち、領域３６２のデータをグローバルバッファ２４０に待避させる必要はないため、後段処理２３０は領域３６４のデータに対するリード要求を行う際に同期情報として「１」を設定してもよい。 On the other hand, in the example of FIG. 3(C), the data in area 370 is read after the data in area 360 has been read. Therefore, although area 370 includes area 362, the data in area 362 is a cache hit when the data in area 370 is read. In other words, since there is no need to save the data in area 362 to the global buffer 240, the post-stage processing 230 may set "1" as the synchronization information when making a read request for the data in area 364.

後段処理２３０の種類は特に限定されず、後段処理２３０が解像度変換（任意変倍処理）のような画像の大きさを変更する処理を行う場合にも、本実施形態を適用できる。タイル処理のような領域分割手法を用いて解像度変換を行う場合、変倍率によっては、処理において参照される領域の大きさ、又は処理により出力される領域の大きさが、画像中のタイルの位置によって変動する場合がある。一方で、後段処理２３０はこのような参照する領域の大きさの変動を検知できるため、領域の大きさの変化に応じてリード要求の数を変えることにより、処理に必要なデータを得ることができる。また、後段処理２３０は、参照する領域の大きさの変化と、出力される領域の大きさの変化とを検知することができるため、上述したオーバーラップ領域の変化も検知できる。このため、後段処理２３０は、上述のように同期情報の値を変更することで、ライトバック動作を行うかどうかを制御することができる。 The type of the post-stage processing 230 is not particularly limited, and this embodiment can also be applied when the post-stage processing 230 performs processing to change the size of an image, such as resolution conversion (arbitrary magnification processing). When performing resolution conversion using a region division method such as tile processing, the size of the region referenced in the processing or the size of the region output by the processing may vary depending on the position of the tile in the image, depending on the magnification ratio. On the other hand, since the post-stage processing 230 can detect such a change in the size of the referenced region, it is possible to obtain the data required for processing by changing the number of read requests according to the change in the size of the region. In addition, since the post-stage processing 230 can detect the change in the size of the referenced region and the change in the size of the output region, it can also detect the change in the overlap region described above. Therefore, the post-stage processing 230 can control whether or not to perform a write-back operation by changing the value of the synchronization information as described above.

以上のように本実施形態によれば、Ｉ／Ｆ２５０は、前段処理２２０による処理後のデータの少なくとも一部を、グローバルバッファ２４０への一時保存を行わずに、後段処理２３０に直接転送することができる。また、Ｉ／Ｆ２５０は、このように直接転送できなかったデータのみをグローバルバッファ２４０に待避させる。このように、Ｉ／Ｆ２５０を用いて前段処理２２０と後段処理２３０とを直結することにより、グローバルバッファ２４０に待避することなく後段処理２３０に直接転送されるデータと、グローバルバッファ２４０に待避するデータと、を選り分けることができる。このため、前段処理２２０から出力された画像データの全体をグローバルバッファ２４０に書き出す場合と比較して、処理速度を向上させ、及び消費電力を減少させることができる。このように、Ｉ／Ｆ２５０を用いることにより、前段処理２２０から後段処理２３０へのデータ転送処理を効率化することができる。 As described above, according to this embodiment, the I/F 250 can directly transfer at least a portion of the data processed by the pre-stage processing 220 to the post-stage processing 230 without temporarily storing it in the global buffer 240. The I/F 250 also saves only the data that could not be directly transferred in this way to the global buffer 240. In this way, by directly connecting the pre-stage processing 220 and the post-stage processing 230 using the I/F 250, it is possible to select data that is directly transferred to the post-stage processing 230 without saving it in the global buffer 240 and data that is saved in the global buffer 240. Therefore, it is possible to improve the processing speed and reduce power consumption compared to the case where the entire image data output from the pre-stage processing 220 is written to the global buffer 240. In this way, by using the I/F 250, it is possible to make the data transfer process from the pre-stage processing 220 to the post-stage processing 230 more efficient.

Ｉ／Ｆ２５０はキャッシュメモリ４３４の大きさに応じてこのような選り分け動作を行うことができる。前段処理２２０から後段処理２３０に直接データを転送するためには、データがグローバルバッファ２４０に待避する前に後段処理２３０がリード要求を行う必要がある。このため、キャッシュメモリ４３４の容量が大きいほど、直接データを転送するためのリード要求のタイミリミットが遅くなる。直接のデータ転送が行われると、その後Ｉ／Ｆ２５０はこのデータをグローバルバッファ２４０にライトバックせずにキャッシュメモリ４３４から破棄するため、グローバルバッファ２４０へのアクセス量が減少する。このため、キャッシュメモリの大きさと、グローバルバッファ２４０へのアクセス量と、のバランスを調整することができる。キャッシュメモリ４３４の容量が大きいほど、前段処理２２０と後段処理２３０とを疎結合化でき、Ｉ／Ｆ２５０のシステム上での動作がより安定になる。 The I/F 250 can perform such a selection operation according to the size of the cache memory 434. In order to directly transfer data from the pre-stage processing 220 to the post-stage processing 230, the post-stage processing 230 needs to make a read request before the data is evacuated to the global buffer 240. Therefore, the larger the capacity of the cache memory 434, the later the time limit of the read request for direct data transfer. After direct data transfer, the I/F 250 discards the data from the cache memory 434 without writing it back to the global buffer 240, so the amount of access to the global buffer 240 decreases. Therefore, it is possible to adjust the balance between the size of the cache memory and the amount of access to the global buffer 240. The larger the capacity of the cache memory 434, the more loosely the pre-stage processing 220 and the post-stage processing 230 can be coupled, and the operation of the I/F 250 on the system becomes more stable.

［実施形態２］
実施形態１では、１つのチップ内にある前段処理２２０と後段処理２３０とが接続された。しかしながら、前段処理２２０と後段処理２３０が別々のチップに搭載されていてもよい。実施形態２においては、図２（Ｂ）に示されるようにチップ２６５（チップＢ）は、Ｉ／Ｆ２５０と、後段処理２３０とを有している。Ｉ／Ｆ２５０は実施形態１と同様の機能を持ち、チップ２６５とは異なるチップ２６０（チップＡ）が有している前段処理２２０と接続されている。前段処理２２０のＷＤＭＡＣ２２６は、チップ２６５のグローバルバッファ２４０のアドレスへの、処理部２２４による処理結果のライト要求を発行する。図２（Ｂ）ではチップ間のインタフェースの一例としてＰＣＩｅが用いられており、チップ２６０のＰＣＩｅ２２８はライト要求をＰＣＩｅの転送プロトコルに変換してチップ２６５に転送する。チップ２６５のＰＣＩｅ２３８は、チップ２６０からの転送データを受信し、Ｉ／Ｆ２５０にライト要求を行う。チップ２６５の後段処理２３０、ＮｏＣ２１０、コントローラ２４５、及びグローバルバッファ２４０の機能は、実施形態１と同様である。 [Embodiment 2]
In the first embodiment, the pre-stage processing 220 and the post-stage processing 230 in one chip are connected. However, the pre-stage processing 220 and the post-stage processing 230 may be mounted on separate chips. In the second embodiment, as shown in FIG. 2B, the chip 265 (chip B) has an I/F 250 and a post-stage processing 230. The I/F 250 has the same function as in the first embodiment, and is connected to the pre-stage processing 220 of a chip 260 (chip A) different from the chip 265. The WDMAC 226 of the pre-stage processing 220 issues a write request of the processing result by the processing unit 224 to an address of the global buffer 240 of the chip 265. In FIG. 2B, PCIe is used as an example of an interface between chips, and the PCIe 228 of the chip 260 converts the write request into a transfer protocol of PCIe and transfers it to the chip 265. The PCIe 238 of the chip 265 receives the transferred data from the chip 260 and issues a write request to the I/F 250. The functions of the post-stage processing 230, the NoC 210, the controller 245, and the global buffer 240 of the chip 265 are similar to those in the first embodiment.

このような構成により、複数のチップ間にまたがって、処理部間でのデータ処理の仕様又は制約の違いを吸収しながら、処理部間でのデータ転送を行うことができる。この例において、前段処理２２０から出力するデータ量以上のデータを転送する必要はない。また、この例において後段処理２３０を有するチップ２６５がＩ／Ｆ２５０及びグローバルバッファ２４０を有している。したがって、この構成によれば、チップ２６５におけるキャッシュメモリの大きさと、グローバルバッファ２４０へのアクセス量と、のバランスを調整することができる。チップ２６０の前段処理２２０は、チップ間インタフェースを介して実施形態１と同様に予め定められた同期情報を転送することができる。また、チップ２６５のＩ／Ｆ２５０は、前段処理２２０から受け取った同期情報を、実施形態１と同様に後段処理２３０から受け取った同期情報で修正することができる。 This configuration allows data transfer between processing units across multiple chips while absorbing differences in data processing specifications or constraints between processing units. In this example, it is not necessary to transfer data greater than the amount of data output from the front-end processing 220. In this example, the chip 265 having the back-end processing 230 has an I/F 250 and a global buffer 240. Therefore, this configuration allows the balance between the size of the cache memory in the chip 265 and the amount of access to the global buffer 240 to be adjusted. The front-end processing 220 of the chip 260 can transfer predetermined synchronization information via the inter-chip interface in the same way as in embodiment 1. In addition, the I/F 250 of the chip 265 can correct the synchronization information received from the front-end processing 220 with the synchronization information received from the back-end processing 230 in the same way as in embodiment 1.

［実施形態１，２の変形例］
以下、実施形態１，２における同期情報の修正についてさらに詳細に説明する。実施形態１と同様の方式を用いる場合、同期情報の修正は以下のように行うことができる。すなわち、ライトポート４０２へライト要求とともに入力される同期情報と、リードポート４０４へリード要求とともに入力される同期情報とを用いて、所望のキャッシュラインについての同期情報を演算することができる。そして、キャッシュミスが生じると、最も古いキャッシュライン［０］についての同期情報［０］はＩ／Ｆ２５０から破棄される。このとき、破棄される同期情報［０］の値が１である場合には、グローバルバッファ２４０（例えばＤＲＡＭ）にキャッシュデータをライトバックすることができる。 [Modifications of the First and Second Embodiments]
The correction of the synchronization information in the first and second embodiments will be described in more detail below. When using the same method as in the first embodiment, the correction of the synchronization information can be performed as follows. That is, the synchronization information for the desired cache line can be calculated using the synchronization information input to the write port 402 together with the write request and the synchronization information input to the read port 404 together with the read request. Then, when a cache miss occurs, the synchronization information [0] for the oldest cache line [0] is discarded from the I/F 250. At this time, if the value of the synchronization information [0] to be discarded is 1, the cache data can be written back to the global buffer 240 (e.g., DRAM).

一方、実施形態２のようにチップ間でデータを送受信するような場合など、優先的にライトポートからのデータをリードポートに伝達したいことがある。このような場合には、上記のようなライトバックを行わない動作を用いることができる。例えば、ライトポート４０２からの入力をストール（一時停止）し、リードポート４０４からのリード要求を優先的に処理することができる。そして、最も古いキャッシュライン［０］に対するリード要求が入力され、同期情報［０］の値が１から０になり、キャッシュライン［０］のキャッシュデータが破棄可能となった時に、ライトポート４０２の入力ストール（一時停止）を解除することができる。 On the other hand, when data is sent and received between chips as in the second embodiment, it may be desirable to give priority to transmitting data from the write port to the read port. In such a case, the above-mentioned operation without write-back can be used. For example, input from the write port 402 can be stalled (temporarily stopped), and read requests from the read port 404 can be processed with priority. Then, when a read request for the oldest cache line [0] is input, the value of the synchronization information [0] changes from 1 to 0, and the cache data of the cache line [0] can be discarded, the input stall (temporary stop) of the write port 402 can be released.

このような実施形態によれば、ライトポート４０２からのデータ受信より、リードポート４０４へのデータ送信を優先的に行うことにより、グローバルバッファ２４０へのデータの書き戻し量を抑制することができる。また、グローバルバッファ２４０からのデータの再読み出し量も抑制されるため、グローバルバッファ２４０（例えばＤＲＡＭ）へのアクセス帯域を削減し、ライトポート４０２からリードポート４０４への伝達レイテンシを短くすることができる。 According to this embodiment, by giving priority to sending data to the read port 404 over receiving data from the write port 402, it is possible to suppress the amount of data written back to the global buffer 240. In addition, since the amount of data re-read from the global buffer 240 is also suppressed, it is possible to reduce the access bandwidth to the global buffer 240 (e.g., DRAM) and shorten the transmission latency from the write port 402 to the read port 404.

さらに、同期情報を用いた制御手法について詳細に説明する。実施形態１の方式では、同期情報は１ビットのフラグであり、ライトポート４０２から受信するデータと、リードポート４０４に送信するデータと、の間のデータ転送比は１対１であった。一方で、同期情報はＮビット（Ｎは１以上）のカウント値であってもよい。例として、ライトポート４０２からの受信データを、リードポート４０４から７回読み出す場合について説明する。この場合、ライトポート４０２へとライト要求とともに入力される同期情報の値を７にすることができる。こうして入力された同期情報（値＝７）は、キャッシュライン［７］についての同期情報［７］として書き込まれる。そして、リードポート４０４へのリード要求がキャッシュヒットするたびに、対応するキャッシュラインについての同期情報から１が減算される。そして、最も古いキャッシュライン［０］からキャッシュデータが破棄されるときに、対応する同期情報［０］の値が０であればライトバックは行われず、１以上であればグローバルバッファ２４０（例えばＤＲＡＭ）への書き戻しが行われる。この場合、ライトポート４０２からリードポート４０４へのデータ転送比を１：７に制御することができる。 Furthermore, a control method using the synchronization information will be described in detail. In the method of the first embodiment, the synchronization information is a 1-bit flag, and the data transfer ratio between the data received from the write port 402 and the data sent to the read port 404 is 1:1. On the other hand, the synchronization information may be a count value of N bits (N is 1 or more). As an example, a case will be described where the received data from the write port 402 is read from the read port 404 7 times. In this case, the value of the synchronization information input to the write port 402 together with the write request can be set to 7. The synchronization information (value = 7) thus input is written as the synchronization information [7] for the cache line [7]. Then, each time a read request to the read port 404 hits the cache, 1 is subtracted from the synchronization information for the corresponding cache line. Then, when the cache data is discarded from the oldest cache line [0], if the value of the corresponding synchronization information [0] is 0, no write back is performed, and if it is 1 or more, writing back to the global buffer 240 (e.g., DRAM) is performed. In this case, the data transfer ratio from the write port 402 to the read port 404 can be controlled to 1:7.

また、同期情報の使い方を工夫することにより、ライトポート４０２から受信するデータと、リードポート４０４に送信するデータと、のデータ転送比が予め確定していない場合であっても、データ転送比を制御できる。例えば、８ビットの同期情報を用い、前段処理２２０は、ライト要求とともに値として０ｘＦＦ（無限倍）を持つ同期情報を、ライトポート４０２からＩ／Ｆ２５０に書き込むことができる。このとき、前段処理２２０は、送信データが後段処理２３０でどのように利用されるかを知る必要は必ずしもない。データ転送比をどのような大きさにするかは、データを利用する後段処理２３０が決めることができる。この場合も、リードポート４０４へリード要求とともに入力される同期情報の値と、キャッシュヒットしたキャッシュラインについての同期情報と、の演算により、キャッシュラインのキャッシュデータをライトバックするか破棄するかを定めることができる。 In addition, by devising a way to use the synchronization information, the data transfer ratio can be controlled even if the data transfer ratio between the data received from the write port 402 and the data sent to the read port 404 is not determined in advance. For example, by using 8-bit synchronization information, the pre-stage processing 220 can write synchronization information having a value of 0xFF (infinite multiple) from the write port 402 to the I/F 250 together with the write request. At this time, the pre-stage processing 220 does not necessarily need to know how the sent data will be used in the post-stage processing 230. The post-stage processing 230 that uses the data can decide what the data transfer ratio should be. In this case, too, it is possible to determine whether to write back or discard the cache data of the cache line by calculating the value of the synchronization information input to the read port 404 together with the read request and the synchronization information for the cache line that has been hit.

例えば、後段処理２３０は、リードポート４０４へのリード要求により、必要な回数だけ所望のキャッシュデータを読み出すことができる。キャッシュメモリに所望のデータがない場合、グローバルバッファ２４０（例えばＤＲＡＭ）から再読み出しされたデータ及び同期情報が、リードポート４０４に送信される。そして、リードポート４０４へのリード要求により、所望のデータを最後に読み出すときに、キャッシュラインについての同期情報を強制的に０の値で上書きすることができる。このようなキャッシュラインのキャッシュデータは、グローバルバッファ２４０に書き戻されることなく、キャッシュメモリから廃棄される。後段処理２３０は、リードポート４０４へリード要求とともに入力する同期情報を用いて、このような同期情報の上書きを行うことができる。 For example, the post-processing 230 can read the desired cache data as many times as necessary by a read request to the read port 404. If the desired data is not in the cache memory, the data and synchronization information reread from the global buffer 240 (e.g., DRAM) are sent to the read port 404. Then, when the desired data is finally read by a read request to the read port 404, the synchronization information for the cache line can be forcibly overwritten with a value of 0. The cache data of such a cache line is discarded from the cache memory without being written back to the global buffer 240. The post-processing 230 can overwrite such synchronization information using the synchronization information input to the read port 404 together with the read request.

このような実施形態によれば、前段処理２２０の送信データと後段処理２３０の受信データとのデータ転送比を容易に制御することができる。とりわけ、上述の実施形態のように、後段処理２３０が同期情報を制御することにより、柔軟なデータ転送比を実現することがきる。この場合、前段処理２２０はデータを単純に送信すればよい。 According to such an embodiment, the data transfer ratio between the data sent by the pre-stage processing 220 and the data received by the post-stage processing 230 can be easily controlled. In particular, as in the above embodiment, the post-stage processing 230 controls the synchronization information, making it possible to realize a flexible data transfer ratio. In this case, the pre-stage processing 220 simply transmits the data.

［実施形態３］
上述の実施形態においては、画像データを異なる走査順序で送受信したり、フィルタ処理のオーバーラップ領域を考慮したりするために、大きなキャッシュメモリを用いることが望ましい。キャッシュメモリが大きいほど、グローバルバッファ（例えばＤＲＡＭ）へのデータ退避及び再読み出しのためのアクセスを抑制することができるため、グローバルバッファへのアクセス帯域を削減できる。 [Embodiment 3]
In the above-described embodiment, it is desirable to use a large cache memory in order to transmit and receive image data in different scan orders, to take into account overlapping areas in filter processing, etc. The larger the cache memory, the more it is possible to suppress accesses for saving and re-reading data to a global buffer (e.g., DRAM), thereby reducing the access bandwidth to the global buffer.

このため、キャッシュメモリとして、従来のＳＲＡＭの代わりに、spin-transfer torque magnetic RAM（ＳＴＴ－ＭＲＡＭ）のような、次世代の不揮発性メモリを用いることができる。また、次世代のメモリと呼ばれる、ＦＲＡＭ（登録商標）、ＲｅＲＡＭ、ＰＣＭなどを用いることもできる。例えば、ＳＴＴ－ＭＲＡＭは、ＳＲＡＭに比べて回路素子が小さいため、４倍以上の容量を有することが容易である。このため、キャッシュメモリの容量を大きくすることができる。また、ＳＴＴ－ＭＲＡＭの消費電力は、ＳＲＡＭと比べて、リードアクセスについては約１／６０の大きさでありうるが、ライトアクセスについては約１．６倍の大きさとなりうる。しかしながら、上記の変形例のように、本発明の一実施形態に係るインタフェース装置は、前段処理２２０によるライト１回に対する後段処理２３０によるリード回数、すなわちデータ転送比を容易に制御できる。このため、ＳＴＴ－ＭＲＡＭを用いることによる消費電力の抑制効果を活用することができる。 Therefore, instead of conventional SRAM, a next-generation non-volatile memory such as spin-transfer torque magnetic RAM (STT-MRAM) can be used as the cache memory. In addition, FRAM (registered trademark), ReRAM, PCM, and the like, which are called next-generation memories, can also be used. For example, STT-MRAM has smaller circuit elements than SRAM, so it is easy to have a capacity four times larger or more. Therefore, the capacity of the cache memory can be increased. Furthermore, the power consumption of STT-MRAM can be about 1/60 of that of SRAM for read access, but about 1.6 times larger for write access. However, as in the above modification, the interface device according to one embodiment of the present invention can easily control the number of reads by the rear-stage processing 230 relative to one write by the front-stage processing 220, that is, the data transfer ratio. Therefore, the effect of suppressing power consumption by using STT-MRAM can be utilized.

以上のように、キャッシュメモリとしてＳＴＴ－ＭＲＡＭなどの次世代メモリ又は不揮発性メモリを用いることにより、キャッシュ容量を大きくし、データ伝送の効率を向上することができる。また、ライトポートに対するリードポートのデータ転送比が大きいとき、ＳＴＴ－ＭＲＡＭを用いることで効果的に消費電力を抑制できる。 As described above, by using next-generation memory such as STT-MRAM or non-volatile memory as cache memory, it is possible to increase the cache capacity and improve the efficiency of data transmission. Furthermore, when the data transfer ratio of the read port to the write port is high, power consumption can be effectively reduced by using STT-MRAM.

［実施形態４］
前段処理２２０は、撮像センサなどのセンシングデバイスであってもよい。例えば、撮像センサは単純なラスタ走査順で撮像データを送信することが多い。また、後段処理２３０は、撮像データに対する高画質化処理であってもよい。上述の実施形態によれば、省メモリ化が可能なタイル領域単位の画像処理を用いるための走査変換、及びフィルタ処理のためのオーバーラップ領域の制御を行うことができる。そして、上述の実施形態によれば、前段処理２２０は単純にデータ送信を行うことができ、後段処理２３０が同期情報を制御することにより多彩な方式のデータ受信を行うことができる。したがって、上述の実施形態は、撮像センサなどのセンシングデバイスが共有キャッシュＩ／Ｆに対する単純なデータ送信を行い、複雑な画像処理を行う後段処理２３０がその機能及び動作に応じたデータ受信を行うように使用可能である。 [Embodiment 4]
The pre-processing 220 may be a sensing device such as an imaging sensor. For example, an imaging sensor often transmits imaging data in a simple raster scan order. The post-processing 230 may perform high-quality image processing on the imaging data. According to the above-mentioned embodiment, it is possible to perform scan conversion to use image processing in tile regions that can reduce memory, and control of overlapping regions for filter processing. According to the above-mentioned embodiment, the pre-processing 220 can simply transmit data, and the post-processing 230 can receive data in a variety of ways by controlling synchronization information. Therefore, the above-mentioned embodiment can be used such that a sensing device such as an imaging sensor transmits simple data to a shared cache I/F, and the post-processing 230, which performs complex image processing, receives data according to its function and operation.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 Other Examples
The present invention can also be realized by a process in which a program for implementing one or more of the functions of the above-described embodiments is supplied to a system or device via a network or a storage medium, and one or more processors in a computer of the system or device read and execute the program. The present invention can also be realized by a circuit (e.g., ASIC) that implements one or more of the functions.

発明は上記実施形態に制限されるものではなく、発明の精神及び範囲から離脱することなく、様々な変更及び変形が可能である。従って、発明の範囲を公にするために請求項を添付する。 The invention is not limited to the above-described embodiment, and various modifications and variations are possible without departing from the spirit and scope of the invention. Therefore, the following claims are appended to disclose the scope of the invention.

２２０：前段処理、２３０：後段処理、２５０：共有キャッシュＩ／Ｆ、４０２：ライトポート、４０４：リードポート、４１０：プリフェッチ部、４３０：フェッチ部 220: Pre-processing, 230: Post-processing, 250: Shared cache I/F, 402: Write port, 404: Read port, 410: Prefetch unit, 430: Fetch unit

Claims

1. An interface device that serves as a shared cache for a plurality of processing units, comprising:
a first port that acquires data from a first processing unit included in the plurality of processing units;
a second port that outputs data acquired from the first processing unit to a second processing unit included in the plurality of processing units;
a cache unit that caches the data acquired from the first processing unit;
a control unit that controls whether or not to write back the data written to the cache unit to a storage unit different from the cache unit based on information acquired from the second processing unit ,
2. An interface device according to claim 1, wherein the information obtained from said second processing unit indicates that it is not necessary to write back data requested by said second processing unit from said cache means to said storage means .

The first port obtains data included in a data group from the first processing unit in a first order;
2. The interface device according to claim 1, wherein the second port outputs the data included in the data group to the second processing unit in a second order different from the first order.

3. The interface device according to claim 1, wherein the control means controls whether or not to write back the data written to the cache means, further based on the information acquired from the first processing unit.

4. The interface device according to claim 3 , wherein the information obtained from the first processing unit indicates that the data is to be transferred to the second processing unit.

5. The interface device according to claim 3, wherein the control means stores, in association with the data written in the cache means, information acquired from the first processing unit, information acquired from the second processing unit, or a result of an operation between the information acquired from the first processing unit and the information acquired from the second processing unit.

The control means
the data written to the cache means is not obtained from the storage means,
When the information acquired from the first processing unit indicates that the data is data to be transferred to the second processing unit, and the information acquired from the second processing unit indicates that the data requested by the second processing unit does not need to be written back from the cache means to the storage means,
6. The interface device according to claim 3 , wherein the data written in the cache means is discarded without being written back to the storage means.

7. The interface device according to claim 1, wherein the control means, when discarding the data written to the cache means, switches whether or not to write back the data to be discarded based on at least information obtained from the second processing unit.

8. The interface device according to claim 1, wherein said cache means performs a cache operation according to a full associative method.

9. The interface device according to claim 1, wherein said storage means is a DRAM.

10. The interface device according to claim 1, wherein the data is image data.

An interface device according to any one of claims 1 to 10 , comprising a first chip having the interface device according to any one of claims 1 to 10 and the second processing unit, and connected to the first processing unit of a second chip different from the first chip.

A data processing device comprising: the first processing unit, the second processing unit, and the interface device according to any one of claims 1 to 11 .

The first processing unit generates a data group by a first data processing on input data,
13. The data processing device according to claim 12, wherein the second processing unit performs a second data processing on the data group to generate a processing result obtained by performing the first data processing and the second data processing on the input data.

the first processing unit transmits data included in each of a plurality of tile regions having a first size set in an image to the interface device for each tile region;
14. The data processing device according to claim 12, wherein the second processing unit receives, from the interface device, data contained in each of a plurality of tile regions set in the image and having a second size different from the first size, for each tile region.

15. The data processing device according to claim 12, wherein the second processing unit, when requesting data from the interface device, determines whether or not to request the data again in a later process, and, in response to a determination that the data will not be requested again, transmits information to the interface device indicating that the requested data does not need to be written back from the cache means to the storage means.

The second processing unit receives data included in each of a plurality of tile regions set in an image from the interface device for each tile region;
15. A data processing device according to claim 12, characterized in that, when requesting data contained in a tile area from the interface device, information indicating the need to write back the data requested by the second processing unit from the cache means to the storage means is controlled depending on whether the data is contained in another tile area, and is sent to the interface device.

The data processing device further comprises a network and the storage means connected to the network;
the data processing device is connected to the network;
The data processing device according to claim 12 , wherein the data processing device is connected to the first processing unit and the second processing unit without passing through the network.

A cache control method performed by an interface device that serves as a shared cache for a plurality of processing units, the interface device comprising: a first port that acquires data from a first processing unit included in the plurality of processing units; a second port that outputs the data acquired from the first processing unit to a second processing unit included in the plurality of processing units; and cache means that caches the data acquired from the first processing unit, the method comprising:
a step of controlling whether or not to write back the data written to the cache means to a storage means other than the cache means based on information acquired from the second processing unit ;
A cache control method, characterized in that the information obtained from the second processing unit indicates that data requested by the second processing unit does not need to be written back from the cache means to the storage means .

A program for causing a computer to function as the control means of the interface device according to any one of claims 1 to 11 .