JP2006065850A

JP2006065850A - Microcomputer

Info

Publication number: JP2006065850A
Application number: JP2005216995A
Authority: JP
Inventors: Yutaka Arita; 有田　　裕; Yasuhiro Nakatsuka; 康弘中塚; Kotaro Shimamura; 光太郎島村; Yasuo Watanabe; 泰夫渡邊
Original assignee: Renesas Technology Corp
Current assignee: Renesas Technology Corp
Priority date: 2004-07-28
Filing date: 2005-07-27
Publication date: 2006-03-09
Anticipated expiration: 2025-07-27
Also published as: JP4796346B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a microcomputer capable of minimizing a bottleneck caused by data linkage at memory access generated when a CPU and an accelerator are operated in cooperation with each other to enhance multimedia processing performance. <P>SOLUTION: This multimedia microcomputer 1 has the CPU 11 and the accelerator 12 and is adapted to execute multimedia processing by cooperating the CPU 11 with the accelerator 12. In order to eliminate a bottleneck by memory access generated for implementing data linkage between the CPU 11 and the accelerator 12 via a memory 2, an I/O-specific cache 14 accessible by the CPU 11 and the accelerator 12 in common is formed in front of the memory 2, and data required for data linkage are kept in the I/O-specific cache 14, whereby the speed of the data linkage between the CPU 11 and the accelerator 12 is increased and the speed-up of multimedia processing is obtained. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、マイクロコンピュータ（以下、単にマイコンと称する）に関し、特に、ＣＰＵによる処理以外にアクセラレータなどの補助回路を有する通信及びマルチメディア処理を行うマイコンに適用して有効な技術に関する。 The present invention relates to a microcomputer (hereinafter, simply referred to as a microcomputer), and more particularly to a technique that is effective when applied to a microcomputer that performs communication and multimedia processing including an auxiliary circuit such as an accelerator in addition to processing by a CPU.

本発明者が検討したところによれば、マルチメディア処理を行うマイコンに関しては、以下のような技術が考えられる。 According to a study by the present inventor, the following techniques can be considered for a microcomputer that performs multimedia processing.

たとえば、マルチメディア処理を行うマイコンでは、マルチメディア処理の性能を上げるため、ＣＰＵのほかにＣＰＵを補助するアクセラレータを内蔵している。このアクセラレータは、特にＣＰＵが不得意な時間のかかる処理をハードウエアで高速処理すると共に、ＣＰＵとアクセラレータとで共同作業（以下、データ連携と称する）を行うことで、マルチメディア処理を効率よく高速化している。 For example, a microcomputer that performs multimedia processing incorporates an accelerator that assists the CPU in addition to the CPU in order to improve the performance of the multimedia processing. This accelerator performs high-speed multimedia processing by performing high-speed processing using hardware that is particularly difficult for the CPU, and performing collaborative work (hereinafter referred to as data linkage) between the CPU and the accelerator. It has become.

また、ＣＰＵやアクセラレータには、メモリアクセス待ちによる処理低下、つまりボトルネックを防ぐため、キャッシュを内蔵している。そのため、メモリの内容が他のアクセラレータにより変更された場合、ＣＰＵのキャッシュ内のデータとメモリ内のデータとの不一致を解消するために、キャッシュ内の当該データを破棄し、再びＣＰＵが当該アドレスをアクセスすると、メモリから当該データをキャッシュに読み込まれるようにすることで、キャッシュとメモリ間のデータの一致、つまりキャッシュコヒーレンシを維持している。 The CPU and accelerator have a built-in cache in order to prevent processing degradation due to waiting for memory access, that is, a bottleneck. Therefore, when the contents of the memory are changed by another accelerator, the data in the cache is discarded and the CPU again sets the address to eliminate the mismatch between the data in the CPU cache and the data in the memory. When accessed, the data is read from the memory into the cache, thereby maintaining data coincidence between the cache and the memory, that is, cache coherency.

したがって、ＣＰＵやアクセラレータにキャッシュが内蔵されていても、ＣＰＵとアクセラレータ間のデータ連携は、キャッシュによる恩恵はなく、メモリを直接アクセスして行われる。 Therefore, even if the cache is built in the CPU or accelerator, data linkage between the CPU and the accelerator is not benefited from the cache, and is performed by directly accessing the memory.

たとえば、ＣＰＵやアクセラレータからメモリにアクセスするための技術として、特許文献１および特許文献２などが挙げられる。特許文献１には、アクセラレータがメモリに速くアクセスすることを可能とした技術が開示されている。また、特許文献２には、ＣＰＵがメモリに対して速くアクセスすることを可能とした技術が開示されている。
特開平１１−１６１５９８号公報特開２００１−２１６１９４号公報 For example, Patent Document 1 and Patent Document 2 are examples of techniques for accessing a memory from a CPU or an accelerator. Patent Document 1 discloses a technique that enables an accelerator to quickly access a memory. Patent Document 2 discloses a technique that enables a CPU to quickly access a memory.
Japanese Patent Laid-Open No. 11-161598 JP 2001-216194 A

ところで、前記のようなマルチメディア処理を行うマイコンに関して、本発明者が検討した結果、以下のようなことが明らかとなった。 By the way, as a result of examination by the present inventor regarding the microcomputer for performing the multimedia processing as described above, the following has been clarified.

たとえば、近年、半導体製造技術の進歩により、マルチメディア処理システムは、システムＬＳＩ化により、１チップ内に複数のアクセラレータが搭載されると共に、アクセラレータ自体もＣＰＵ並みに高速化している。 For example, in recent years, due to advances in semiconductor manufacturing technology, multimedia processing systems have been made into system LSIs, and a plurality of accelerators are mounted on one chip, and the accelerator itself has been accelerated at the same speed as a CPU.

そのため、メモリへの負荷が高まると共に、アクセス速度の高速化が重要になってきた。ここで重要になるのは、メモリに書き込まれたデータを読み出す速度、つまりレイテンシである。しかし、ＳＤＲＡＭやＤＤＲ−ＳＤＲＡＭでは、メモリアクセスのスループット向上は実現したものの、コマンド投入に伴うオーバーヘッドが大きく、レイテンシは低下している。 For this reason, the load on the memory has increased and it has become important to increase the access speed. What is important here is the speed at which the data written in the memory is read, that is, the latency. However, in SDRAM and DDR-SDRAM, although the memory access throughput has been improved, the overhead associated with the command input is large and the latency is lowered.

したがって、ＣＰＵとアクセラレータ間でデータ連携を行う際には、ＣＰＵはアクセラレータの処理待ちもさりながら、アクセラレータが処理したデータがメモリに書き込まれてから、ＣＰＵが当該データをメモリから読み出されるまでＣＰＵが待機してしまうメモリアクセス待ちになってしまう。つまり、マルチメディア処理がＣＰＵやアクセラレータに比べて遅いメモリに律速される現象が起こってきている。さらに、半導体製造技術の進歩による集積度の向上のため、１チップ内に複数のアクセラレータが内蔵されるようになり、ＣＰＵと複数のアクセラレータ間でデータ連携が起こると、ますますＣＰＵはメモリ待ちによる処理速度低下の影響が大きくなってしまう。 Therefore, when data is linked between the CPU and the accelerator, the CPU waits for the accelerator to process and after the data processed by the accelerator is written to the memory, the CPU reads the data from the memory until the CPU reads the data. Wait for memory access. In other words, a phenomenon has occurred in which multimedia processing is rate-limited to a memory that is slower than a CPU or an accelerator. In addition, due to advances in semiconductor manufacturing technology, multiple accelerators are built in one chip, and when data linkage occurs between the CPU and multiple accelerators, the CPU is increasingly waiting for memory. The effect of a decrease in processing speed is increased.

そこで、本発明の目的は、ＣＰＵとアクセラレータが連携して動作する際に発生するメモリアクセスでのデータ連携によるボトルネックを最小限に抑えて、マルチメディア処理性能を高めることができるマイコンを提供することにある。 SUMMARY OF THE INVENTION Accordingly, an object of the present invention is to provide a microcomputer capable of improving multimedia processing performance by minimizing a bottleneck caused by data cooperation in memory access that occurs when a CPU and an accelerator operate in cooperation. There is.

本発明の前記ならびにその他の目的と新規な特徴は、本明細書の記述および添付図面から明らかになるであろう。 The above and other objects and novel features of the present invention will be apparent from the description of this specification and the accompanying drawings.

本願において開示される発明のうち、代表的なものの概要を簡単に説明すれば、次のとおりである。 Of the inventions disclosed in the present application, the outline of typical ones will be briefly described as follows.

本発明は、マスタとして動作するＣＰＵと、スレーブとして動作するアクセラレータとを有し、ＣＰＵおよびアクセラレータからメモリをアクセス可能なマイコンに適用され、以下のような特徴を有するものである。 The present invention is applied to a microcomputer that has a CPU that operates as a master and an accelerator that operates as a slave and can access a memory from the CPU and the accelerator, and has the following characteristics.

すなわち、本発明のマイコンにおいて、ＣＰＵおよびアクセラレータがメモリに対してアクセスするデータは、ＣＰＵおよびアクセラレータが互いにやり取りする連携データと、これを除くデータ本体とから構成され、これらのうち、連携データを保持するＩ／Ｏ専用キャッシュを有するものである。 In other words, in the microcomputer of the present invention, the data that the CPU and the accelerator access to the memory is composed of linked data that the CPU and the accelerator exchange with each other and the data body other than the linked data. Having an I / O dedicated cache.

また、本発明のマイコンにおいて、Ｉ／Ｏ専用キャッシュは、ＣＰＵおよびアクセラレータからメモリへのライトアクセス要求の際に、ライトアクセス要求のデータを保持するかどうかを判定する機能を有する。さらに、アクセラレータは、メモリへライトアクセスする際に、Ｉ／Ｏ専用キャッシュに対して保持要求を出す機能を有する。さらに、Ｉ／Ｏ専用キャッシュは、アクセラレータからのメモリへのライトアクセスの際に出力される保持要求により、アクセラレータから出力されるデータを保持するかどうかを判定する機能を有する。また、Ｉ／Ｏ専用キャッシュは、ＣＰＵおよびアクセラレータからのメモリへのライトアクセスの際に、ＣＰＵおよびアクセラレータから出力されるアドレスによりデータを保持するかどうかを判定する機能を有するものである。 In the microcomputer of the present invention, the I / O dedicated cache has a function of determining whether or not to hold write access request data when a write access request from the CPU and accelerator to the memory is made. Further, the accelerator has a function of issuing a holding request to the I / O dedicated cache when performing write access to the memory. Further, the I / O dedicated cache has a function of determining whether to hold data output from the accelerator in response to a holding request output at the time of write access to the memory from the accelerator. The I / O dedicated cache has a function of determining whether to hold data based on addresses output from the CPU and the accelerator when the CPU and the accelerator perform a write access to the memory.

また、本発明のマイコンにおいて、Ｉ／Ｏ専用キャッシュは、アクセラレータからメモリへのリードアクセス要求の際に、Ｉ／Ｏ専用キャッシュがリードアクセス要求のデータを保持している場合には、Ｉ／Ｏ専用キャッシュがデータをアクセラレータに出力する機能を有するものである。 In the microcomputer of the present invention, the I / O-dedicated cache is an I / O when the I / O-dedicated cache holds read access request data at the time of a read access request from the accelerator to the memory. The dedicated cache has a function of outputting data to the accelerator.

また、本発明のマイコンにおいて、ＣＰＵおよびアクセラレータからのメモリへのアクセスを制御するメモリコントローラを有し、ＣＰＵおよびアクセラレータからのアクセス要求に対して優先順位を持ち、メモリコントローラは、優先順位に従ってＣＰＵおよびアクセラレータからのアクセス要求を処理する機能を有する。さらに、メモリがＳＤＲＡＭまたはＤＤＲ−ＳＤＲＡＭであり、メモリコントローラは、ＣＰＵおよびアクセラレータからのアクセス要求に対して、メモリの同一バンクおよび同一ローアドレスに対するアクセスを連続して行う機能を有する。さらに、メモリコントローラは、ＣＰＵおよびアクセラレータからのアクセス要求のうち、同一アドレスへのアクセスに対し、依存関係を管理してメモリへのアクセスの一貫性を保つ機能を有するものである。 Further, the microcomputer of the present invention includes a memory controller that controls access to the memory from the CPU and the accelerator, and has a priority order for access requests from the CPU and the accelerator. It has a function of processing an access request from an accelerator. Further, the memory is SDRAM or DDR-SDRAM, and the memory controller has a function of continuously accessing the same bank and the same row address of the memory in response to an access request from the CPU and the accelerator. Further, the memory controller has a function of managing the dependency relationship for the access to the same address among the access requests from the CPU and the accelerator to keep the access to the memory consistent.

また、本発明のマイコンにおいて、メモリは、マイコンの外部に有するものである。あるいは、メモリは、マイコンの内部に有するものである。 In the microcomputer of the present invention, the memory is provided outside the microcomputer. Alternatively, the memory is provided inside the microcomputer.

具体的に、本発明のマイコンは、ＣＰＵとアクセラレータを有し、ＣＰＵとアクセラレータが連携してマルチメディア処理を行うマルチメディアマイコンにおいて、ＣＰＵとアクセラレータ間のデータ連携がメモリを介して行うために発生するメモリアクセスによるボトルネックを解消するために、メモリの手前に、ＣＰＵとアクセラレータが共通にアクセス可能なＩ／Ｏ専用キャッシュを設け、データ連携に必要な連携データをＩ／Ｏ専用キャッシュで保持することで、ＣＰＵとアクセラレータ間のデータ連携を高速化し、マルチメディア処理の高速化を実現するものである。 Specifically, the microcomputer of the present invention has a CPU and an accelerator, and occurs in the multimedia microcomputer in which the CPU and the accelerator cooperate to perform multimedia processing, because the data cooperation between the CPU and the accelerator is performed through the memory. In order to eliminate the bottleneck caused by memory access, an I / O dedicated cache that can be accessed in common by the CPU and accelerator is provided in front of the memory, and linked data required for data linkage is held in the I / O dedicated cache. As a result, the data linkage between the CPU and the accelerator is speeded up, and the speed of multimedia processing is realized.

また、本発明のマイコンにおいて、ＣＰＵは、内部にキャッシュを有するものである。 In the microcomputer of the present invention, the CPU has a cache inside.

また、本発明のマイコンにおいて、マイコンは外部のメモリと接続され、この外部のメモリには、プログラムまたはワークエリアの領域が形成されるものである。また、外部のメモリには、アクセラレータのデータ領域が形成されるものである。 In the microcomputer of the present invention, the microcomputer is connected to an external memory, and a program or work area is formed in the external memory. In addition, an accelerator data area is formed in the external memory.

また、本発明のマイコンにおいて、ＣＰＵの内部のキャッシュは、スヌープ機能を持つものである。 In the microcomputer of the present invention, the internal cache of the CPU has a snoop function.

本願において開示される発明のうち、代表的なものによって得られる効果を簡単に説明すれば以下のとおりである。 Among the inventions disclosed in the present application, effects obtained by typical ones will be briefly described as follows.

本発明によれば、ＣＰＵとアクセラレータが連携して動作する際に発生するメモリアクセスでのデータ連携によるボトルネックを最小限に抑えることができるので、マルチメディア処理性能を高めることが可能となる。 According to the present invention, it is possible to minimize a bottleneck caused by data cooperation in memory access that occurs when a CPU and an accelerator operate in cooperation with each other. Therefore, it is possible to improve multimedia processing performance.

以下、本発明の実施の形態を図面に基づいて詳細に説明する。なお、実施の形態を説明するための全図において、同一の機能を有する部材には原則として同一の符号を付し、その繰り返しの説明は省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Note that components having the same function are denoted by the same reference symbols throughout the drawings for describing the embodiment, and the repetitive description thereof will be omitted.

図１〜図３を用いて、本発明の一実施の形態に係るマルチメディアマイコンの構成及び動作の一例を説明する。図１は、マルチメディアマイコンを示す構成図である。図２は、メモリの構成を示す図である。図３は、別のマルチメディアマイコンを示す構成図である。 An example of the configuration and operation of a multimedia microcomputer according to an embodiment of the present invention will be described with reference to FIGS. FIG. 1 is a block diagram showing a multimedia microcomputer. FIG. 2 is a diagram illustrating the configuration of the memory. FIG. 3 is a block diagram showing another multimedia microcomputer.

図１に示すように、本実施の形態のマルチメディアマイコン１は、マスタとして動作するＣＰＵ１１と、スレーブとして動作する複数のアクセラレータ１２（１２−１〜１２−ｎ）と、本発明の特徴であるＩ／Ｏ専用キャッシュ１４と、これらを接続するバス１３と、メモリコントローラ１５から構成されている。このマルチメディアマイコン１の外部に、メモリ２が接続されている。 As shown in FIG. 1, the multimedia microcomputer 1 of the present embodiment is characterized by a CPU 11 that operates as a master, a plurality of accelerators 12 (12-1 to 12-n) that operate as slaves, and the present invention. It is composed of an I / O dedicated cache 14, a bus 13 for connecting them, and a memory controller 15. A memory 2 is connected to the outside of the multimedia microcomputer 1.

アクセラレータ１２は、ＣＰＵ１１を補助する役目を持ち、ＣＰＵ１１が不得意な時間のかかる処理をハードウエアで高速に実行する機能を持つ。また、メモリコントローラ１５は、Ｉ／Ｏ専用キャッシュ１４とメモリ２とに接続され、バス１３及びＩ／Ｏ専用キャッシュ１４を介してから来るメモリアクセス要求に対し、メモリ２に対して、ＳＤＲＡＭまたはＤＤＲ−ＳＤＲＡＭコマンドを出してアクセスを行う機能を持つ。 The accelerator 12 has a function of assisting the CPU 11 and has a function of executing a time-consuming process that the CPU 11 is not good at using hardware. The memory controller 15 is connected to the I / O dedicated cache 14 and the memory 2, and in response to a memory access request coming via the bus 13 and the I / O dedicated cache 14, the memory controller 15 sends SDRAM or DDR to the memory 2. -It has a function of issuing an SDRAM command to access.

図２に示すように、メモリ２には、ＣＰＵ１１で実行されるマルチメディア処理に関する一連の手続き処理を記述したプログラム２１と、ワークエリア２２と、さらには、アクセラレータ１２毎に処理するデータを格納するデータ領域２３（２３−１〜２３−ｎ）がある。また、複数のアクセラレータ１２間で共通のデータ領域２３をアクセスすることもある。 As shown in FIG. 2, the memory 2 stores a program 21 describing a series of procedure processes related to multimedia processing executed by the CPU 11, a work area 22, and data to be processed for each accelerator 12. There is a data area 23 (23-1 to 23-n). In addition, a common data area 23 may be accessed among a plurality of accelerators 12.

図３に示すように、本実施の形態のマルチメディアマイコンにおいては、前記図１のようにメモリ２を外部に接続する構成の他に、メモリ２を内部に設けて、このメモリ２と、ＣＰＵ１１、複数のアクセラレータ１２（１２−１〜１２−ｎ）、Ｉ／Ｏ専用キャッシュ１４、バス１３、及びメモリコントローラ１５が一体型となったマルチメディアマイコン１０とすることも可能である。 As shown in FIG. 3, in the multimedia microcomputer of the present embodiment, in addition to the configuration in which the memory 2 is connected to the outside as shown in FIG. 1, the memory 2 is provided inside, and the memory 2 and the CPU 11 The multimedia microcomputer 10 in which the plurality of accelerators 12 (12-1 to 12-n), the I / O dedicated cache 14, the bus 13, and the memory controller 15 are integrated is also possible.

続いて、前記図１に示したマルチメディアマイコン１において、Ｉ／Ｏ専用キャッシュ１４がＯＦＦの場合の動作について説明する。なお、図３に示したマルチメディアマイコン１０においても同様である。 Next, an operation when the I / O dedicated cache 14 is OFF in the multimedia microcomputer 1 shown in FIG. 1 will be described. The same applies to the multimedia microcomputer 10 shown in FIG.

ＣＰＵ１１は、バス１３、Ｉ／Ｏ専用キャッシュ１４、メモリコントローラ１５を介し、メモリ２からプログラム２１及びワークエリア２２、データ領域２３内のデータをアクセスして処理を行う。このとき、ＣＰＵ１１はプログラム２１に従い、アクセラレータ１２で処理すべきデータをデータ領域２３へセットし、アクセラレータ１２へ処理要求し、アクセラレータ１２での処理結果をデータ領域２３から読み出しを行うことにより、ＭＰＥＧやＭＰ３をはじめとしたマルチメディア処理が実現される。 The CPU 11 performs processing by accessing the program 21, the work area 22, and the data in the data area 23 from the memory 2 via the bus 13, the I / O dedicated cache 14, and the memory controller 15. At this time, the CPU 11 sets data to be processed by the accelerator 12 in the data area 23 according to the program 21, requests processing to the accelerator 12, and reads the processing result in the accelerator 12 from the data area 23, so that MPEG or Multimedia processing including MP3 is realized.

このように、マルチメディアマイコン１では、ＣＰＵ１１とアクセラレータ１２間でメモリ２内のデータ領域２３を介したデータ連携を行い、マルチメディア処理が実行される。そのため、ＣＰＵ１１やアクセラレータ１２の処理速度に比べ、アクセス速度の遅いメモリ２がマルチメディア処理におけるボトルネックとなり、マルチメディア処理性能の向上が難しくなってきた。そこで、本実施の形態では、後述するように、ＣＰＵ１１とアクセラレータ１２間のデータのやり取りをスムーズに行うことで、マルチメディア処理の高速化を実現することが可能となる。 In this way, in the multimedia microcomputer 1, data processing is performed between the CPU 11 and the accelerator 12 via the data area 23 in the memory 2, and multimedia processing is executed. Therefore, the memory 2 having a slower access speed than the processing speed of the CPU 11 and the accelerator 12 has become a bottleneck in multimedia processing, and it has become difficult to improve multimedia processing performance. Therefore, in the present embodiment, as will be described later, it is possible to realize high-speed multimedia processing by smoothly exchanging data between the CPU 11 and the accelerator 12.

すなわち、前記図１に示すように、Ｉ／Ｏ専用キャッシュ１４をメモリコントローラ１５側に置き、ＣＰＵ１１及びアクセラレータ１２の双方からアクセスできるようにし、ＣＰＵ１１とアクセラレータ１２間での連携データを保持する。これにより、ＣＰＵ１１とアクセラレータ１２間のデータ連携を、より高速にアクセス可能なＩ／Ｏ専用キャッシュ１４にて行い、メモリアクセス待ちによるオーバーヘッドを大幅に削減し、スムーズなマルチメディア処理の実行を実現する。 That is, as shown in FIG. 1, the I / O dedicated cache 14 is placed on the memory controller 15 side so that it can be accessed from both the CPU 11 and the accelerator 12, and the cooperation data between the CPU 11 and the accelerator 12 is held. As a result, data linkage between the CPU 11 and the accelerator 12 is performed by the I / O dedicated cache 14 that can be accessed at a higher speed, and the overhead caused by waiting for memory access is greatly reduced, so that smooth multimedia processing can be executed. .

また、ＣＰＵ１１とアクセラレータ１２でのデータ連携に必要なデータは、アクセラレータ１２で処理させるすべてのデータではなく、ヘッダやアクセラレータ１２へのコマンドなどのみの一部であることに注目し、Ｉ／Ｏ専用キャッシュ１４は連携に必要なデータである連携データのみを保持し、ＣＰＵ１１のみ、アクセラレータ１２のみで処理するデータであるデータ本体は、Ｉ／Ｏ専用キャッシュ１４ではなく、メモリ２上に置くことで、Ｉ／Ｏ専用キャッシュ１４に保持するデータ量を抑え、Ｉ／Ｏ専用キャッシュ１４の有効利用及びヒット率の向上を実現する。 Note that the data necessary for data linkage between the CPU 11 and the accelerator 12 is not all data to be processed by the accelerator 12, but only a part of the header, commands to the accelerator 12, and the like. The cache 14 holds only linkage data that is data necessary for linkage, and the data body that is data processed only by the CPU 11 and only the accelerator 12 is placed on the memory 2 instead of the I / O dedicated cache 14. The amount of data held in the I / O dedicated cache 14 is suppressed, and effective use of the I / O dedicated cache 14 and improvement of the hit rate are realized.

ここで注目すべきは、Ｉ／Ｏ専用キャッシュ１４が保持すべき連携データは、必ずＣＰＵ１１かアクセラレータ１２により、メモリ２に書き込まれるデータである。従って、Ｉ／Ｏ専用キャッシュ１４は、メモリ２へのライトアクセスに対してのみキャッシュするか否かの判定を行えば良く、この判定には、当該ライトアクセスのアドレスを用いる方法と、Ｉ／Ｏ専用キャッシュ１４へのキャッシュ要求信号を用いる方法の２通りが存在する。なお、ＣＰＵ１１からのライトアクセスにおけるキャッシュ判定は、アドレスを用いた判定を、また、アクセラレータ１２からのライトアクセスにおけるキャッシュ判定は、アドレスを用いた判定及びキャッシュ要求信号による判定の両方を用いることができる。 It should be noted here that the cooperation data to be held by the I / O dedicated cache 14 is data that is always written into the memory 2 by the CPU 11 or the accelerator 12. Therefore, the I / O dedicated cache 14 only needs to determine whether or not to cache only for write access to the memory 2. For this determination, there is a method of using the address of the write access and the I / O. There are two methods using a cache request signal to the dedicated cache 14. Note that the cache determination in the write access from the CPU 11 can use the determination using the address, and the cache determination in the write access from the accelerator 12 can use both the determination using the address and the determination based on the cache request signal. .

一方、メモリ２へのリードに関しては、ヒットすればＩ／Ｏ専用キャッシュ１４から当該データを出力するが、キャッシュミスの場合、Ｉ／Ｏ専用キャッシュ１４はメモリ２へ当該アクセスを通すのみとし、メモリ２からのリードデータをキャッシュすることはしない。これは、ＣＰＵ１１及びアクセラレータ１２が専用のキャッシュまたはバッファをもっており、この専用のキャッシュまたはバッファで、メモリ２からのリードデータが保持されるためである。さらに、バス１３がスプリットバスである場合に対応するため、Ｉ／Ｏ専用キャッシュ１４は、キャッシュミスを起こし、メモリ２にリードアクセス中であっても、次のアクセス要求に対して、キャッシュヒットの場合には、当該ヒットデータをバス１３に出力する機能が必要となる。この点が、Ｉ／Ｏ専用キャッシュ１４が従来のキャッシュ及びバッファと大きく異なる所となる。 On the other hand, regarding the read to the memory 2, if a hit occurs, the data is output from the I / O dedicated cache 14, but in the case of a cache miss, the I / O dedicated cache 14 only passes the access to the memory 2, and the memory The read data from 2 is not cached. This is because the CPU 11 and the accelerator 12 have a dedicated cache or buffer, and read data from the memory 2 is held in the dedicated cache or buffer. Further, in order to cope with the case where the bus 13 is a split bus, the I / O dedicated cache 14 causes a cache miss, and even if the memory 2 is being read-accessed, a cache hit is detected in response to the next access request. In this case, a function for outputting the hit data to the bus 13 is required. This is where the I / O dedicated cache 14 is significantly different from conventional caches and buffers.

また、もうひとつの特徴は、Ｉ／Ｏ専用キャッシュ１４はキャッシュであり、ＣＰＵ１１で実行されるプログラム２１は、このＩ／Ｏ専用キャッシュ１４の存在を意識することなく、メモリ２へのアクセスとして処理できる点である。 Another feature is that the I / O dedicated cache 14 is a cache, and the program 21 executed by the CPU 11 is processed as an access to the memory 2 without being aware of the existence of the I / O dedicated cache 14. This is a possible point.

さらに、メモリ２へのアクセス効率を向上させるため、Ｉ／Ｏ専用キャッシュ１４では、ＣＰＵ１１やアクセラレータ１２から要求されるアクセスサイズが、メモリ２のアクセスサイズより小さい場合、本Ｉ／Ｏ専用キャッシュ１４を用いてまとめてメモリ２とアクセスすることで、メモリ２へのアクセス回数を削減し、メモリ待ちによるボトルネックを削減することが可能となる。 Further, in order to improve the access efficiency to the memory 2, the I / O dedicated cache 14 uses the I / O dedicated cache 14 when the access size requested from the CPU 11 or the accelerator 12 is smaller than the access size of the memory 2. By using and collectively accessing the memory 2, it is possible to reduce the number of accesses to the memory 2 and to reduce bottlenecks due to memory waiting.

次に、図４を用いて、マルチメディアマイコンで実行されるマルチメディア処理の流れの一例を説明する。図４は、マルチメディア処理の流れを示す図である。 Next, an example of the flow of multimedia processing executed by the multimedia microcomputer will be described with reference to FIG. FIG. 4 is a diagram showing the flow of multimedia processing.

図４に示すように、マルチメディアマイコン１では、ＣＰＵ１１とアクセラレータ１２とが連携してマルチメディア処理を行い、ＣＰＵ１１にて実行される処理（１０００）と、アクセラレータ１２にて実行される処理（１１００）に区別される。ＣＰＵ１１にて実行されるマルチメディア処理は、前処理（１００１）と後処理（１００９）の２つがあり、それぞれ、アクセラレータ１２による処理（１００５）の前後に行う。 As shown in FIG. 4, in the multimedia microcomputer 1, the CPU 11 and the accelerator 12 cooperate to perform multimedia processing, the processing executed by the CPU 11 (1000), and the processing executed by the accelerator 12 (1100). ). The multimedia processing executed by the CPU 11 includes two processes, a pre-process (1001) and a post-process (1009), which are performed before and after the process (1005) by the accelerator 12, respectively.

まず、ＣＰＵ１１が前処理（１００１）を行うと、ＣＰＵ１１はアクセラレータ１２に当該データを渡すために、データ領域２３に書き込み（１００２）、アクセラレータ１２に対して、起動要求を出す（１００３）。それを受け、アクセラレータ１２は、データ領域２３からデータを読み出し（１００４）、アクセラレータ１２で処理を行い（１００５）、処理結果をデータ領域２３に書き戻し（１００６）、その後に、ＣＰＵ１１に対して処理終了報告を上げる（１００７）。そして、ＣＰＵ１１は、アクセラレータ１２からの処理終了報告を受けると、アクセラレータ１２による処理結果をデータ領域２３から読み込み（１００８）、後処理を行う（１００９）。また、処理内容によっては、前処理（１００１）がなく、アクセラレータ１２から処理が開始される場合や、後処理（１００９）がなく、アクセラレータ１２による処理で終了するものもある。 First, when the CPU 11 performs preprocessing (1001), the CPU 11 writes the data in the data area 23 in order to pass the data to the accelerator 12 (1002), and issues an activation request to the accelerator 12 (1003). In response to this, the accelerator 12 reads data from the data area 23 (1004), performs processing in the accelerator 12 (1005), writes the processing result back to the data area 23 (1006), and then processes it to the CPU 11. An end report is raised (1007). When the CPU 11 receives the processing end report from the accelerator 12, the CPU 11 reads the processing result by the accelerator 12 from the data area 23 (1008) and performs post-processing (1009). Further, depending on the processing contents, there are cases where there is no pre-processing (1001) and processing is started from the accelerator 12, or there is no post-processing (1009) and processing ends by the accelerator 12.

このように、ＣＰＵ１１とアクセラレータ１２がデータ領域２３を介してデータ連携を行い、マルチメディア処理を実行している。 As described above, the CPU 11 and the accelerator 12 perform data cooperation via the data area 23 and execute multimedia processing.

次に、図５及び図６を用いて、前記図４を参照しながら、Ｉ／Ｏ専用キャッシュを用いたマルチメディア処理のデータの流れの一例を説明する。図５及び図６は、マルチメディア処理のデータの流れを示す図であり、図５は図４における前処理（１００１）からアクセラレータ処理（１００５）までの処理、図６は図４における処理結果セット（１００６）から後処理（１００９）までの処理をそれぞれ示したものである。 Next, an example of the data flow of multimedia processing using the I / O dedicated cache will be described with reference to FIGS. 5 and 6 with reference to FIG. 5 and 6 are diagrams illustrating the flow of data in multimedia processing. FIG. 5 illustrates processing from pre-processing (1001) to accelerator processing (1005) in FIG. 4, and FIG. 6 illustrates processing result sets in FIG. The processes from (1006) to post-processing (1009) are respectively shown.

まず、図５に示すように、ＣＰＵ１１は前処理（１００１）を行い、その結果データをアクセラレータ１２で処理させるために、データ領域２３に書き込む（１００２，１０１）。このとき、Ｉ／Ｏ専用キャッシュ１４は、ＣＰＵ１１からのデータ領域２３への当該書き込みデータをキャッシュすると共に、メモリ２内のデータ領域２３に当該書き込みデータを書き込む（１０２）。この際に、Ｉ／Ｏ専用キャッシュ１４は、キャッシュすべきデータであるかは、ＣＰＵ１１から書き込みデータと共に出力される書き込み先のアドレスにより、データ領域２３であるかどうかにより判断する。 First, as shown in FIG. 5, the CPU 11 performs preprocessing (1001), and writes the result data into the data area 23 for processing by the accelerator 12 (1002, 101). At this time, the I / O dedicated cache 14 caches the write data from the CPU 11 to the data area 23 and writes the write data to the data area 23 in the memory 2 (102). At this time, the I / O dedicated cache 14 determines whether the data to be cached is based on whether it is the data area 23 based on the address of the write destination output from the CPU 11 together with the write data.

その後、ＣＰＵ１１は、アクセラレータ１２に対して起動要求信号を出力する（１００３）。それを受けて、アクセラレータ１２は起動し、データ領域２３から当該データを読み込む（１００４）。このとき、Ｉ／Ｏ専用キャッシュ１４上に当該書き込みデータがキャッシュされている部分の連携データは、Ｉ／Ｏ専用キャッシュ１４から読み出し（１０３）、Ｉ／Ｏ専用キャッシュ１４にキャッシュされていない部分のデータ本体は、メモリ２のデータ領域２３から直接読み出し（１０４）、アクセラレータ１２は読み出された当該データに対して処理（１００５）を行う。 Thereafter, the CPU 11 outputs an activation request signal to the accelerator 12 (1003). In response to this, the accelerator 12 is activated and reads the data from the data area 23 (1004). At this time, the linkage data of the portion where the write data is cached on the I / O dedicated cache 14 is read from the I / O dedicated cache 14 (103), and the portion of the linkage data not cached in the I / O dedicated cache 14 is read. The data body is directly read from the data area 23 of the memory 2 (104), and the accelerator 12 performs processing (1005) on the read data.

続いて、図６に示すように、アクセラレータ１２による処理（１００５）が終わると、処理結果をデータ領域２３に書き戻す（１００６，１１１）。このとき、Ｉ／Ｏ専用キャッシュ１４は、アクセラレータ１２からデータ領域２３への書き込みデータをキャッシュすると共に、メモリ２内のデータ領域２３に当該処理データを書き込む（１１２）。この際に、Ｉ／Ｏ専用キャッシュ１４は、アクセラレータ１２から当該処理データと共に出力されるキャッシュ要求信号または書き込み先のアドレスにより、キャッシュすべきデータであるかどうかを判定する。 Subsequently, as shown in FIG. 6, when the processing by the accelerator 12 (1005) is completed, the processing result is written back to the data area 23 (1006, 111). At this time, the I / O dedicated cache 14 caches the write data from the accelerator 12 to the data area 23 and writes the processed data to the data area 23 in the memory 2 (112). At this time, the I / O dedicated cache 14 determines whether the data to be cached is based on a cache request signal output from the accelerator 12 together with the processing data or a write destination address.

その後、アクセラレータ１２からの処理終了報告（１００７）を受け、ＣＰＵ１１は、当該処理データをデータ領域２３から読み出す（１００８）。このとき、ＣＰＵ１１が処理するデータは、Ｉ／Ｏ専用キャッシュ１４上に当該処理データがキャッシュされている部分の連携データであるため、Ｉ／Ｏ専用キャッシュ１４から読み出す（１１３）だけで、ＣＰＵ１１は後処理（１００９）ができる。この際に、Ｉ／Ｏ専用キャッシュ１４の容量の関係でキャッシュされていない部分が存在した場合のみ、メモリ２のデータ領域２３から読み出す（１１４）。 Thereafter, upon receiving a processing end report (1007) from the accelerator 12, the CPU 11 reads the processing data from the data area 23 (1008). At this time, since the data processed by the CPU 11 is the cooperative data of the portion where the processing data is cached on the I / O dedicated cache 14, the CPU 11 reads only from the I / O dedicated cache 14 (113). Post-processing (1009) can be performed. At this time, the data is read from the data area 23 of the memory 2 only when there is a portion that is not cached due to the capacity of the I / O dedicated cache 14 (114).

このように、メモリ２よりもアクセスレイテンシが短い高速なＩ／Ｏ専用キャッシュ１４を介して、ＣＰＵ１１とアクセラレータ１２がデータ連携を行うことで、メモリ２内のデータ領域２３を介したデータ連携に比べて、オーバーヘッドとなるアクセス待ち時間を大幅に削減でき、マルチメディア処理の高速化が実現される。 As described above, the CPU 11 and the accelerator 12 perform data cooperation through the high-speed I / O dedicated cache 14 having a shorter access latency than the memory 2, thereby comparing with data cooperation through the data area 23 in the memory 2. Thus, the access waiting time, which is an overhead, can be greatly reduced, and the multimedia processing speed can be increased.

さらに、ＣＰＵ１１が後処理を行う場合、ＣＰＵ１１がアクセラレータ１２による処理データをすべて読み出すことは少ないことに注目し、当該処理データをメモリ２に書き込む際に、ＣＰＵ１１が読み出すデータである連携データの部分をＩ／Ｏ専用キャッシュ１４にキャッシュし、それ以外のデータ本体は、Ｉ／Ｏ専用キャッシュ１４上にキャッシュせず、メモリ２内のデータ領域２３に直接書き込む。 Further, when the CPU 11 performs post-processing, the CPU 11 rarely reads out all the processing data by the accelerator 12, and when the processing data is written in the memory 2, the portion of the cooperation data that is the data read by the CPU 11 is changed. The data is cached in the I / O dedicated cache 14, and the other data body is not cached on the I / O dedicated cache 14, but is directly written in the data area 23 in the memory 2.

また、アクセラレータ１２にて処理を行う場合、アクセラレータ１２は、基本的にデータ領域２３へのアクセスは連続したアドレスに対して行われる。そこで、メモリ２は、ＳＤＲＡＭやＤＤＲ−ＳＤＲＡＭなどのスループットが高速であるメモリであることに注目し、データ領域２３の最初のみをＩ／Ｏ専用キャッシュ１４に保持し、あとは、メモリ２の連続アクセス性能に期待する方法を採る。 When processing is performed by the accelerator 12, the accelerator 12 basically accesses the data area 23 for consecutive addresses. Therefore, paying attention to the memory 2 that is a high-speed memory such as SDRAM or DDR-SDRAM, only the first of the data area 23 is held in the I / O-dedicated cache 14, and then the continuation of the memory 2 is continued. Use the method expected for access performance.

以上の方法を採ることで、Ｉ／Ｏ専用キャッシュ１４上にキャッシュする連携データの部分を減らして、Ｉ／Ｏ専用キャッシュ１４の有効利用を実現できる。 By adopting the above method, it is possible to reduce the portion of the cooperative data cached on the I / O dedicated cache 14 and realize effective use of the I / O dedicated cache 14.

次に、図７〜図１４を用いて、Ｉ／Ｏ専用キャッシュの構造及び動作を詳細に説明する。図７は、バスの構成を示す図である。図８は、Ｉ／Ｏ専用キャッシュの構成を示す図である。図９は、レジスタの構成を示す図である。図１０（ａ），（ｂ）は、Ｉ／Ｏ専用キャッシュ内のレジスタアクセス経路を示す図である。図１１は、判定回路での処理の流れを示す図である。図１２は、アドレス判定回路の構成を示す図である。図１３は、キャッシュの構成を示す図である。図１４は、キャッシュの動作を示す図である。 Next, the structure and operation of the I / O dedicated cache will be described in detail with reference to FIGS. FIG. 7 is a diagram illustrating the configuration of the bus. FIG. 8 is a diagram showing the configuration of the I / O dedicated cache. FIG. 9 is a diagram illustrating a configuration of a register. FIGS. 10A and 10B are diagrams showing register access paths in the I / O dedicated cache. FIG. 11 is a diagram showing a flow of processing in the determination circuit. FIG. 12 is a diagram illustrating a configuration of the address determination circuit. FIG. 13 is a diagram illustrating a configuration of a cache. FIG. 14 is a diagram illustrating a cache operation.

図７に示すように、バス１３は、アドレスバス１３１と、データバス１３２から構成されている。アドレスバス１３１は、アクセス先のアドレス１３１１とアクセス信号１３１２、及び、アクセラレータ１２からのキャッシュ要求信号１３１３から構成されている。また、データバス１３２は、リード用データバス１３２１とライト用データバス１３２２から構成されている。 As shown in FIG. 7, the bus 13 includes an address bus 131 and a data bus 132. The address bus 131 includes an access destination address 1311, an access signal 1312, and a cache request signal 1313 from the accelerator 12. The data bus 132 includes a read data bus 1321 and a write data bus 1322.

図８に示すように、Ｉ／Ｏ専用キャッシュ１４は、バス１３とメモリコントローラ１５に接続されており、レジスタ１４１、判定回路１４２及びキャッシュ１４３から構成される。また、判定回路１４２からキャッシュ１４３に対して、キャッシュ要求１４４が、レジスタ１４１から判定回路１４２へエリアレジスタデータ信号１４５が出力されている。さらに、Ｉ／Ｏ専用キャッシュ１４において、アドレスバス１３１は、判定回路１４２及びキャッシュ１４３に、データバス１３２はキャッシュ１４３に接続されている。 As shown in FIG. 8, the I / O dedicated cache 14 is connected to the bus 13 and the memory controller 15 and includes a register 141, a determination circuit 142, and a cache 143. Further, a cache request 144 is output from the determination circuit 142 to the cache 143, and an area register data signal 145 is output from the register 141 to the determination circuit 142. Further, in the I / O dedicated cache 14, the address bus 131 is connected to the determination circuit 142 and the cache 143, and the data bus 132 is connected to the cache 143.

図９に示すように、レジスタ１４１は、ＣＰＵ１１からアクセス可能であり、Ｉ／Ｏ専用キャッシュ１４の状態及び設定値を保持する複数のレジスタから構成されている。このレジスタ１４１は、Ｉ／Ｏ専用キャッシュ１４の有効・無効をセットさせる動作モードレジスタ１４１１、ライトバックモードやライトスルーモードなどのキャッシュ１４３の動作モードを規定するキャッシュモードレジスタ１４１２、及びＩ／Ｏ専用キャッシュ１４に保持させるデータエリア（アドレス範囲）を指定する連携データエリアレジスタ１４１３から構成されている。 As illustrated in FIG. 9, the register 141 is accessible from the CPU 11, and includes a plurality of registers that hold the state and setting values of the I / O dedicated cache 14. This register 141 is an operation mode register 1411 for setting validity / invalidity of the I / O dedicated cache 14, a cache mode register 1412 for defining the operation mode of the cache 143 such as a write back mode and a write through mode, and an I / O dedicated register. It is composed of a cooperative data area register 1413 for designating a data area (address range) to be held in the cache 14.

この連携データエリアレジスタ１４１３では、１つの連携データエリアは、連携データエリアアドレスレジスタ１４１４（１４１４−１〜１４１４−ｍ）と、連携データエリアマスクレジスタ１４１５（１４１５−１〜１４１５−ｍ）を用いて表し、この２つのレジスタのセットを複数持つことで、複数の連携データエリアをサポート可能とする。また、連携データエリアマスクレジスタ１４１５は、連携データエリアアドレスレジスタ１４１４とアドレス１３１１とで値の比較を行う際に、比較すべきビットを表している。これにより、２つのレジスタ１４１４と１４１５とで、連携データエリアを表現することが可能となる。他には、連携データエリア開始アドレスレジスタと連携データエリア終了アドレスレジスタのセットによる連携データエリアの表現もある。 In this cooperative data area register 1413, one cooperative data area uses a cooperative data area address register 1414 (1414-1 to 1414-m) and a cooperative data area mask register 1415 (1415-1 to 1415-m). By having a plurality of sets of these two registers, a plurality of linked data areas can be supported. The cooperative data area mask register 1415 represents bits to be compared when the cooperative data area address register 1414 and the address 1311 compare values. As a result, the two registers 1414 and 1415 can represent the linked data area. In addition, there is a representation of a linked data area by setting a linked data area start address register and a linked data area end address register.

この連携データエリアレジスタ１４１３内のこれらのレジスタ値はエリアレジスタデータ信号１４５として、判定回路１４２に出力される。 These register values in the cooperative data area register 1413 are output to the determination circuit 142 as an area register data signal 145.

なお、このレジスタ１４１に対するＣＰＵ１１からのアクセス経路に関しては、図１０に示すように、バス１３に接続された構成（ａ）と、バス１３とは異なるレジスタアクセス用バス経由でバス１３に接続される構成（ｂ）がある。すなわち、図１０（ａ）の構成では、レジスタ１４１がバス１３に接続され、このバス１３を通じてＣＰＵ１１からアクセスされる。一方、図１０（ｂ）の構成では、レジスタ１４１がレジスタアクセス用バスを経由してバス１３に接続され、このレジスタアクセス用バスを経由してＣＰＵ１１からアクセスされる。 The access path from the CPU 11 to the register 141 is connected to the bus 13 via a register access bus that is different from the configuration (a) connected to the bus 13 and the bus 13 as shown in FIG. There is a configuration (b). That is, in the configuration of FIG. 10A, the register 141 is connected to the bus 13 and accessed from the CPU 11 through this bus 13. On the other hand, in the configuration of FIG. 10B, the register 141 is connected to the bus 13 via the register access bus, and is accessed from the CPU 11 via this register access bus.

判定回路１４２は、ＣＰＵ１１及びアクセラレータ１２からのメモリ２へのライトアクセスに対し、レジスタ１４１からのエリアレジスタデータ信号１４５、アドレスバス１３１及び、アクセラレータ１２からのキャッシュ要求信号１３１３から、そのライトデータをキャッシュ１４３に保持させるかどうかの判定を行い、キャッシュ１４３に対してキャッシュ要求１４４を出力する。この判定方法は、図１１に示すとおりである。 In response to a write access to the memory 2 from the CPU 11 and the accelerator 12, the determination circuit 142 caches the write data from the area register data signal 145 from the register 141, the address bus 131, and the cache request signal 1313 from the accelerator 12. It is determined whether or not the data is held in 143 and a cache request 144 is output to the cache 143. This determination method is as shown in FIG.

図１１に示すように、まず、判定回路１４２は、バス１３からメモリ２へのアクセス要求に対し、アクセス信号１３１２をチェックし、アクセスの種類を調べ（１４２１）、リードアクセスならば、キャッシュ要求１４４は無効とする（１４２６）。 As shown in FIG. 11, first, the determination circuit 142 checks the access signal 1312 in response to an access request from the bus 13 to the memory 2, checks the type of access (1421), and if it is a read access, the cache request 144 Is invalid (1426).

また、１４２１にて、ライトアクセスの場合、当該ライトアクセスのアドレス１３１１及び、レジスタ１４１からのエリアレジスタデータ信号１４５から、当該アドレスが、連携データエリア内であるかどうかを調べ（１４２２）、連携データエリア内ならば（Ｙｅｓ）、キャッシュ要求１４４は有効となる（１４２５）。 In the case of a write access at 1421, it is checked from the address 1311 of the write access and the area register data signal 145 from the register 141 whether or not the address is in the linked data area (1422). If it is within the area (Yes), the cache request 144 becomes valid (1425).

また、１４２２にて、連携データエリア外の場合（Ｎｏ）、当該ライトアクセスのアクセス要求元を調べ（１４２３）、ＣＰＵ１１からのライトアクセスならば、キャッシュ要求１４４は無効となる（１４２６）。 If it is outside the cooperative data area at 1422 (No), the access request source of the write access is checked (1423), and if it is a write access from the CPU 11, the cache request 144 becomes invalid (1426).

また、１４２３にて、アクセス要求元がアクセラレータ１２ならば、当該アクセラレータ１２からのキャッシュ要求信号１３１３が有効か無効かを調べ（１４２４）、有効ならば、キャッシュ要求１４４は有効となる（１４２５）。 If the access request source is the accelerator 12 at 1423, it is checked whether the cache request signal 1313 from the accelerator 12 is valid or invalid (1424). If valid, the cache request 144 is valid (1425).

また、１４２４にて、当該アクセラレータ１２からのキャッシュ要求信号１３１３が無効ならば、キャッシュ要求１４４は無効となる（１４２６）。 If the cache request signal 1313 from the accelerator 12 is invalid at 1424, the cache request 144 is invalidated (1426).

続いて、前述したライトアクセスのアドレスが連携データエリア内であるかどうかの判定（１４２２）について図１２に示す。 Next, FIG. 12 shows determination (1422) of whether or not the above-described write access address is in the cooperative data area.

図１２に示すように、判定（１４２２）は、レジスタ１４１からのエリアレジスタデータ信号１４５及びアドレス１３１１を入力とし、連携データエリアアドレスレジスタ１４１４−１〜１４１４−ｍとアドレス１３１１との比較を行う。連携データエリアアドレスレジスタ１４１４−１〜１４１４−ｍと連携データエリアマスクレジスタ１４１５とでビット毎の論理積を計算するゲート１４２５−１〜１４２５−ｍと、アドレス１３１１と連携データエリアマスクレジスタ１４１５とでビット毎の論理積を計算するゲート１４２６−１〜１４２６−ｍにより比較するビットのみを比較器１４２７−１〜１４２７−ｍに入力し、各比較器１４２７−１〜１４２７−ｍの比較結果の総論理和をゲート１４２８で計算し、当該アドレス１３１１が、連携データエリアであるかどうかを判定する。 As shown in FIG. 12, in the determination (1422), the area register data signal 145 and the address 1311 from the register 141 are input, and the linked data area address registers 1414-1 to 1414-m and the address 1311 are compared. The gates 1425-1 to 1425 -m that calculate the logical product of each bit by the cooperative data area address registers 1414-1 to 1414 -m and the cooperative data area mask register 1415, and the addresses 1311 and the cooperative data area mask register 1415. Only the bits to be compared are input to the comparators 1427-1 to 1427 -m by the gates 1426-1 to 1426 -m that calculate the logical product for each bit, and the total of the comparison results of the comparators 1427-1 to 1427 -m are input. A logical sum is calculated by the gate 1428, and it is determined whether or not the address 1311 is a cooperative data area.

以上により、判定回路１４２は、メモリ２へのアクセスが連携データエリアへのアクセスかどうかを判定し、キャッシュ１４３にキャッシュ要求１４４を出力する。キャッシュ１４３は、バス１３及びメモリコントローラ１５と接続されており、ライトバックまたはライトスルーキャッシュとして動作し、判定回路１４２からのキャッシュ要求１４４を受け、当該ライトデータをキャッシュする。 As described above, the determination circuit 142 determines whether the access to the memory 2 is an access to the cooperative data area, and outputs the cache request 144 to the cache 143. The cache 143 is connected to the bus 13 and the memory controller 15, operates as a write-back or write-through cache, receives a cache request 144 from the determination circuit 142, and caches the write data.

このキャッシュ１４３の構成を図１３に示す。図１３ではフルアソシアティブ方式で、Ｎ個のエントリを持ち、各エントリに保持しているアドレス情報、データ、制御情報がある。各エントリが保持するデータのサイズは、３２Ｂや６４Ｂぐらいである。また、制御情報は、エントリの入換えを行う際のＬＲＵ情報やエントリにデータが登録されているかどうかのＶａｌｉｄビット及び、データサイズが更新されているかどうかを示すダーティビット（ライトバック時に使用）などがある。また、キャッシュヒットとは、当該アドレスが、本キャッシュ１４３のエントリに登録されている場合、キャッシュミスは、キャッシュ１４３に登録されていない場合のことを示す。 The configuration of the cache 143 is shown in FIG. In FIG. 13, there are N entries in the fully associative method, and there are address information, data, and control information held in each entry. The size of data held by each entry is about 32B or 64B. In addition, the control information includes LRU information when exchanging entries, a Valid bit indicating whether data is registered in the entry, a dirty bit indicating whether the data size has been updated, etc. There is. The cache hit indicates that the address is registered in the entry of the cache 143, and the cache miss is not registered in the cache 143.

このキャッシュ１４３の動作は、下記の５種類（ライトアクセスで３種類（ａ）−（１），（２），（３）、リードアクセスで２種類（ｂ），（ｃ））に分類される。 The operation of the cache 143 is classified into the following five types (three types (a)-(1), (2), (3) for write access and two types (b), (c) for read access). .

（ａ）−（１）ライトアクセスで、キャッシュ要求１４４が有効及び、キャッシュヒットの場合は、キャッシュ１４３に登録されている当該エントリのデータをデータライトバス１３３のライトデータで上書きを行い、ダーティビットをＯＮにする。 (A)-(1) If the cache request 144 is valid and the cache hit is a write access, the data of the entry registered in the cache 143 is overwritten with the write data of the data write bus 133, and the dirty bit Set to ON.

（ａ）−（２）ライトアクセスで、キャッシュ要求１４４が有効及び、キャッシュミスでキャッシュ１４３に空いているエントリがある場合は、キャッシュ１４３の空いているエントリを探し、当該エントリにライトデータを登録する。これは、エントリを有効にし、アドレス情報にアドレス１３１１の値を書き込む。このとき、データライトバス１３２２からのライトデータサイズがエントリのデータサイズより小さい場合には、メモリ２より当該アドレスの内容データを読み出し、当該エントリのデータ情報に登録した後に、当該ライトデータを書き込む。 (A)-(2) If the cache request 144 is valid for write access and there is a free entry in the cache 143 due to a cache miss, the free entry in the cache 143 is searched and write data is registered in the entry. To do. This validates the entry and writes the value of address 1311 to the address information. At this time, if the write data size from the data write bus 1322 is smaller than the data size of the entry, the content data at the address is read from the memory 2 and registered in the data information of the entry, and then the write data is written.

（ａ）−（３）ライトアクセスで、キャッシュ要求１４４が有効及び、キャッシュミスでキャッシュ１４３に空いているエントリがない場合は、キャッシュ１４３の各エントリの制御情報にあるＬＲＵ情報を調べ、一番古いエントリを破棄し、このエントリに当該ライトデータを登録する。登録手順は、（ａ）−（２）と同じである。 (A)-(3) If the cache request 144 is valid for write access and there is no free entry in the cache 143 due to a cache miss, the LRU information in the control information of each entry in the cache 143 is examined, Discard the old entry and register the write data in this entry. The registration procedure is the same as (a)-(2).

（ｂ）リードアクセスで、キャッシュ１４３にヒットした場合は、キャッシュ１４３に登録されている当該アドレスのエントリのデータ情報をデータリードバス１３２１に出力する。 (B) When the cache 143 is hit by read access, the data information of the entry of the address registered in the cache 143 is output to the data read bus 1321.

（ｃ）リードアクセスで、キャッシュ１４３にてミスした場合は、メモリコントローラ１５に、当該アドレスを出力し、メモリ２から当該アドレスに対応するデータを読み出し、データリードバス１３２１に出力する。なお、このとき読み出したデータをキャッシュ１４３には登録しない。 (C) If there is a miss in the cache 143 in read access, the address is output to the memory controller 15, data corresponding to the address is read from the memory 2, and output to the data read bus 1321. Note that the data read at this time is not registered in the cache 143.

上記処理にて、キャッシュ１４３に登録する際に、全てのエントリが使用中であった場合には、従来のキャッシュと同様にＬＲＵ等のアルゴリズムを用い、キャッシュ１４３から追い出すエントリを探す。このとき、キャッシュ１４３がライトバックモードの場合には、当該エントリのデータをメモリ２へ書き戻しを行う。 If all entries are in use when registered in the cache 143 in the above processing, an entry to be evicted from the cache 143 is searched using an algorithm such as LRU as in the conventional cache. At this time, if the cache 143 is in the write back mode, the data of the entry is written back to the memory 2.

以上の手順により、Ｉ／Ｏ専用キャッシュ１４は、ＣＰＵ１１及びアクセラレータ１２からのライトデータをキャッシュ１４３に保持し、ＣＰＵ１１とアクセラレータ１２間のデータ連携をＩ／Ｏ専用キャッシュ１４内で実現することで、データ連携によるボトルネックを解消し、マルチメディア処理の高速化を実現できる。また、本当に連携するデータのみをＩ／Ｏ専用キャッシュ１４に保持させることで、Ｉ／Ｏ専用キャッシュ１４の使用効率を向上し、キャッシュミスによるオーバーヘッドを最小化することが可能となる。 By the above procedure, the I / O dedicated cache 14 holds the write data from the CPU 11 and the accelerator 12 in the cache 143, and realizes data linkage between the CPU 11 and the accelerator 12 in the I / O dedicated cache 14. It eliminates bottlenecks caused by data linkage and can speed up multimedia processing. Further, by holding only data that is really linked to the I / O dedicated cache 14, it is possible to improve the usage efficiency of the I / O dedicated cache 14 and to minimize the overhead due to a cache miss.

さらに、本Ｉ／Ｏ専用キャッシュ１４の処理を高速化及びスプリットバスに対応するため、処理をパイプライン化し、図１４に示すように３ステージ制を採る。なお、キャッシュミスによりメモリ２へのアクセス中のエントリに対しては、当該エントリへの登録処理が終了するまで、同一エントリへのアクセスは待たせ、メモリ競合においても、正しくメモリアクセスが行われるようにする。 Further, in order to increase the processing speed of the I / O dedicated cache 14 and support the split bus, the processing is pipelined and a three-stage system is adopted as shown in FIG. For an entry that is being accessed to the memory 2 due to a cache miss, the access to the same entry is kept waiting until the registration process for the entry is completed, so that the memory is correctly accessed even in a memory conflict. To.

すなわち、図１４に示すように、ステージ１では、判定回路１４２がキャッシュ要求判定を行い、キャッシュ１４３がライトアクセス及びリードアクセスの時にヒット判定を行う。ステージ２では、キャッシュの動作において、ライトアクセスの時はヒットの場合にキャッシュ１４３のデータ更新、ミスの場合にメモリ２へのアクセスを行い、リードアクセスの時はヒットの場合にキャッシュ１４３からデータ出力、ミスの場合にメモリ２へのアクセスを行う。ステージ３では、キャッシュの動作において、ライトアクセスの時はミスの場合にキャッシュ１４３への登録を行い、リードアクセスの時はミスの場合にバス１３へのデータ出力を行う。 That is, as shown in FIG. 14, in stage 1, the determination circuit 142 performs cache request determination, and the cache 143 performs hit determination when write access and read access are performed. In stage 2, in the cache operation, the cache 143 is updated in the case of a hit during a write access, the memory 2 is accessed in the case of a miss, and the data is output from the cache 143 in the case of a read access. In case of a miss, the memory 2 is accessed. In stage 3, in the cache operation, registration is made to the cache 143 if there is a miss during write access, and data is output to the bus 13 if there is a miss during read access.

これにより、メモリアクセス中においても、判定回路１４２によるキャッシュ要求判定や、キャッシュ１４３によるキャッシュ判定処理が行えるため、Ｉ／Ｏ専用キャッシュ１４によるオーバーヘッドを小さくすることができる。 As a result, the cache request determination by the determination circuit 142 and the cache determination processing by the cache 143 can be performed even during memory access, so that the overhead by the I / O dedicated cache 14 can be reduced.

さらに、Ｉ／Ｏ専用キャッシュ１４とメモリコントローラ１５とを組合せて、さらに効率を向上させる本実施の形態の応用例を、以下において説明する。 Further, an application example of this embodiment in which the I / O dedicated cache 14 and the memory controller 15 are combined to further improve the efficiency will be described below.

次に、図１５〜図１７を用いて、本実施の形態の応用例として、Ｉ／Ｏ専用キャッシュ１４とメモリコントローラ１５とを組合せて効率を向上させる場合を説明する。図１５は、メモリコントローラの構成を示す図である。図１６は、キャッシュの構成を示す図である。図１７は、アクセス要求のデータ構成を示す図である。 Next, as an application example of the present embodiment, a case where the efficiency is improved by combining the I / O dedicated cache 14 and the memory controller 15 will be described with reference to FIGS. FIG. 15 is a diagram illustrating a configuration of the memory controller. FIG. 16 is a diagram illustrating a configuration of a cache. FIG. 17 is a diagram illustrating a data structure of an access request.

まず、メモリコントローラ１５に、以下の機能を持たせる。 First, the memory controller 15 has the following functions.

（１）メモリ帯域確保のために、メモリアクセスに優先順位を導入する。すなわち、大きな帯域が必要なアクセラレータに対して、優先的にメモリアクセスが行われるようにする。 (1) In order to secure memory bandwidth, priority is introduced to memory access. That is, memory access is preferentially performed for an accelerator that requires a large bandwidth.

（２）メモリアクセスのオーバーヘッドを最小限にするＯｕｔ−ｏｆ−ｏｒｄｅｒアクセスを採用する。すなわち、ＳＤＲＡＭ及びＤＤＲ−ＳＤＲＡＭのバンク毎にアクティブ状態を管理し、各バンクに対してＣＡＳアドレス投入のみでアクセス可能な同じＲｏｗアドレスへのアクセスが連続するようにメモリアクセスの順番の入換えを行う。 (2) Adopt out-of-order access that minimizes memory access overhead. That is, the active state is managed for each bank of the SDRAM and the DDR-SDRAM, and the order of memory access is changed so that accesses to the same Row address that can be accessed only by inputting the CAS address to each bank are continuous. .

また、ライトアクセスはＩ／Ｏ専用キャッシュ１４がアクセス要求を受け取れば、ＣＰＵ１１やアクセラレータ１２は次の処理に移ることができるが、リードアクセスが遅れるとＣＰＵ１１やアクセラレータ１２がメモリ待ちとなるため、リードアクセスを優先して行う必要がある。そのため、本メモリコントローラ１５では、メモリアクセスのみを高速化し、帯域確保のための優先順位制御はリードアクセスに対してのみ行う。 In the write access, if the I / O dedicated cache 14 receives an access request, the CPU 11 and the accelerator 12 can move to the next processing. However, if the read access is delayed, the CPU 11 and the accelerator 12 wait for the memory. It is necessary to give priority to access. For this reason, in the present memory controller 15, only memory access is accelerated, and priority control for securing a bandwidth is performed only for read access.

さらに、注意すべき点は、帯域確保やＯｕｔ−ｏｆ−ｏｒｄｅｒアクセスを行うことにより、メモリ２へのアクセス順序の入換えが発生する。そのため、アクセス順序どおりにアクセスしたのと同じ結果が得られるようにするメモリコンシステンシを保つことが重要となる。このメモリコンシステンシの維持には、以下の配慮が必要となる。 Further, it should be noted that the access order to the memory 2 is changed by performing bandwidth reservation or out-of-order access. For this reason, it is important to maintain a memory consistency so that the same result as that accessed in the access order can be obtained. The following considerations are necessary to maintain this memory consistency.

すなわち、異なるアドレスへの２つのメモリアクセスに関する順序入換えは問題なし。同一アドレスへの２つのメモリアクセスに関して、ライトアクセスを越えた順序の入換えがないようにする。以後、同一アドレスへの２つのメモリアクセス要求のことを、「２つのメモリアクセスには依存関係がある」と呼ぶ。 That is, there is no problem with changing the order of two memory accesses to different addresses. The two memory accesses to the same address are not changed in order beyond the write access. Hereinafter, two memory access requests to the same address are referred to as “there is a dependency between the two memory accesses”.

このメモリコントローラ１５の構成を図１５に示す。図１５に示すように、メモリコントローラ１５は、アクセス制御回路１５１、リフレッシュ制御回路１５２、優先順位付きリードアクセス要求ＦＩＦＯ１５３、ライトアクセス要求ＦＩＦＯ１５４、メモリアクセス制御回路１５５から構成されている。リードアクセス要求ＦＩＦＯ１５３は、優先順位毎にＦＩＦＯ（１５３−１〜１５３−ｎ）が存在する。 The configuration of the memory controller 15 is shown in FIG. As shown in FIG. 15, the memory controller 15 includes an access control circuit 151, a refresh control circuit 152, a priority read access request FIFO 153, a write access request FIFO 154, and a memory access control circuit 155. The read access request FIFO 153 includes a FIFO (153-1 to 153-n) for each priority.

また、Ｉ／Ｏ専用キャッシュ１４内にあるキャッシュ１４３の構成を図１６に示す。図１６に示すように、キャッシュ１４３には、前記図１３に示したＮ個の各エントリに保持しているアドレス情報、データ、制御情報に加えて、優先順位を示す優先度が登録されている。 FIG. 16 shows the configuration of the cache 143 in the I / O dedicated cache 14. As shown in FIG. 16, in the cache 143, in addition to the address information, data, and control information held in each of the N entries shown in FIG. 13, a priority indicating priority is registered. .

このような構成による本実施の形態の応用例では、Ｉ／Ｏ専用キャッシュ１４からは、ＣＰＵ１１及びアクセラレータ１２に従った優先順位情報付きのアクセス要求が来る。これを受け、アクセス制御回路１５１は、図１７に示すアクセス要求フォーマットに変換する。このフォーマットは、アクセス要求に関するアクセス属性とメモリコンシステンシを維持するための依存関係情報からなっており、アクセス属性は、各アクセスを管理するためのｔａｇＮｏと、リードライト信号、アドレス、データから構成され、また、依存関係情報は、依存関係のあるメモリアクセス要求のｔａｇＮｏ及び、自分に依存するアクセスがあるかどうかの最終ビットから構成されている。 In the application example of the present embodiment having such a configuration, an access request with priority information according to the CPU 11 and the accelerator 12 comes from the I / O dedicated cache 14. In response to this, the access control circuit 151 converts the access request format shown in FIG. This format consists of access attributes related to access requests and dependency information for maintaining memory consistency, and the access attributes are composed of a tag No for managing each access, a read / write signal, an address, and data. In addition, the dependency relationship information includes a tag No of a memory access request having a dependency relationship and a final bit indicating whether or not there is an access depending on itself.

このアクセス制御回路１５１の動作は、Ｉ／Ｏ専用キャッシュ１４から来るアクセス要求は、下記のとおりである。 As for the operation of this access control circuit 151, the access request coming from the I / O dedicated cache 14 is as follows.

（１）新たなアクセス要求に対して、新たなタグを発行し、ｔａｇＮｏに登録する。また、最終ビットをセットする。 (1) In response to a new access request, a new tag is issued and registered in tagNo. Also, the last bit is set.

（２）続いて、リードアクセス要求ＦＩＦＯ１５３及びライトアクセス要求ＦＩＦＯ１５４にキューイングされている先行アクセス要求を調べ、依存関係があるかを確認する。この確認の結果、依存関係がない場合、リードアクセスの場合には、該当するリードアクセス要求ＦＩＦＯ１５３−１〜１５３−ｎの該当ＦＩＦＯに、また、ライトアクセスの場合は、ライトアクセス要求ＦＩＦＯ１５４にキューイングして終了する。 (2) Subsequently, the preceding access requests queued in the read access request FIFO 153 and the write access request FIFO 154 are checked to determine whether there is a dependency. As a result of this check, if there is no dependency relationship, in the case of read access, the corresponding read access request FIFOs 153-1 to 153-n are queued to the corresponding FIFO, and in the case of write access, they are queued to the write access request FIFO 154. And exit.

また、依存関係がある場合には、下記の手順に従う。 If there is a dependency, follow the procedure below.

（ａ）−（１）本アクセス要求がリードアクセス要求である場合、依存関係のある先行する最新アクセス要求（最終ビットがセット）に対し、ライトアクセス要求の場合は、当該先行アクセス要求のライトアクセスデータを返し、本リードアクセス要求はキューイングせずに終了する（ＦＩＦＯヒット）。 (A)-(1) When this access request is a read access request, in the case of a write access request with respect to the preceding latest access request having a dependency relationship (the last bit is set), the write access of the preceding access request Data is returned, and this read access request is terminated without queuing (FIFO hit).

（ａ）−（２）本アクセス要求がリードアクセス要求である場合、依存関係のある先行する最新アクセス要求（最終ビットがセット）に対し、リードアクセス要求の場合は、当該先行リードアクセス要求のｔａｇＮｏを、本アクセス要求の依存ｔａｇに登録し、先行リードアクセス要求の最終ビットをクリアする。 (A)-(2) When this access request is a read access request, in the case of a read access request with respect to the preceding latest access request having a dependency relationship (the last bit is set), the tag No. of the preceding read access request Is registered in the dependency tag of this access request, and the last bit of the preceding read access request is cleared.

（ｂ）キューイングするアクセス要求がライトアクセスである場合、当該先行アクセス要求のｔａｇＮｏを、本アクセス要求の依存ｔａｇに登録し、先行ライトアクセス要求の最終ビットをクリアする。 (B) When the access request to be queued is a write access, the tag No of the preceding access request is registered in the dependency tag of the present access request, and the last bit of the preceding write access request is cleared.

また、メモリアクセス制御回路１５５の動作は、各リードアクセス要求ＦＩＦＯ１５３とライトアクセス要求ＦＩＦＯ１５４に対して、各ＦＩＦＯの優先順位順に、アクセス要求を取り出す。このとき、ＳＤＲＡＭに対して発行するアクセスに対して、同一バンク、同一Ｒｏｗアドレスに関するアクセスについて、リードアクセスとライトアクセスをそれぞれまとめて、メモリ２にアクセスする。この際に、依存ｔａｇＮｏがセットされているアクセス要求に対しては、除外すると共に、メモリ２にアクセスする各アクセス要求に対して、最終ビットがセットされている場合は、依存関係がないので、そのまま終了する。最終ビットがクリアされている場合は、下記の手順に従い、依存関係リストを更新する。 The operation of the memory access control circuit 155 takes out access requests for each read access request FIFO 153 and write access request FIFO 154 in the order of priority of each FIFO. At this time, with respect to the access issued to the SDRAM, the read access and the write access are combined for the access related to the same bank and the same Row address, and the memory 2 is accessed. At this time, the access request for which the dependency tagNo is set is excluded, and if the last bit is set for each access request for accessing the memory 2, there is no dependency relationship. It ends as it is. If the last bit is cleared, the dependency list is updated according to the following procedure.

（ａ）キューイングされている各アクセス要求に対して、依存ｔａｇが終了した本アクセス要求のタグ番号であるかを調べる。 (A) For each access request queued, it is checked whether or not the tag number of the access request for which the dependency tag has ended is completed.

（ｂ）該当するキューイング中のアクセス要求に対して、依存ｔａｇをクリアする。 (B) The dependency tag is cleared for the corresponding queuing access request.

以上の方式を採ることにより、メモリコンシステンシ（一貫性）を保持しつつ、ＳＤＲＡＭ、ＤＤＲ−ＳＤＲＡＭの各バンクに対して、効率のよい同一Ｒｏｗアドレスへのアクセスをまとめてアクセスすることが可能となり、メモリ２へのアクセス効率を向上させることが可能となる。このアクセス効率とＩ／Ｏ専用キャッシュ１４による効果でマルチメディア処理はメモリ２によるボトルネックを最小限に抑え、スムーズな処理実行を実現できる。 By adopting the above method, it is possible to efficiently access the same row address to each bank of SDRAM and DDR-SDRAM while maintaining memory consistency (consistency). The access efficiency to the memory 2 can be improved. Due to the access efficiency and the effect of the I / O dedicated cache 14, the multimedia processing can minimize the bottleneck caused by the memory 2 and realize smooth processing execution.

次に、図１８を用いて、本実施の形態のマルチメディアマイコンを用いたマルチメディア端末の一例を説明する。図１８は、マルチメディアマイコンを用いたマルチメディア端末を示す構成図である。 Next, an example of a multimedia terminal using the multimedia microcomputer of this embodiment will be described with reference to FIG. FIG. 18 is a block diagram showing a multimedia terminal using a multimedia microcomputer.

マルチメディア端末としては、近年、携帯電話やＰＤＡなどの小型の表示機能を持つ携帯端末においても、音楽演奏機能やカメラ機能を持ち、画面に静止画（写真）や動画（ムービー）を表示したりすることが可能となっている。 As multimedia terminals, mobile terminals with small display functions, such as mobile phones and PDAs, have music performance functions and camera functions, and display still images (photos) and videos (movies) on the screen. It is possible to do.

このマルチメディア端末１００は、マルチメディアマイコン１を核として、このマルチメディアマイコン１に、メモリ２、入出力装置である画面３、カメラ４、スピーカ５、及び通信装置６を接続した構成となっている。 The multimedia terminal 100 has a configuration in which a multimedia microcomputer 1 is used as a core, and a memory 2, a screen 3, which is an input / output device, a camera 4, a speaker 5, and a communication device 6 are connected to the multimedia microcomputer 1. Yes.

このマルチメディアマイコン１は、画面３、カメラ４、スピーカ５、通信装置６と接続するインタフェースを持つと共に、画面表示制御、画像入力制御、音声出力制御、通信送受信制御を行うアクセラレータを持つ。これにより、カメラ４で撮影された映像を画面３への表示や、通信装置６を介した外部と映像を高速に送受信することが可能となる。 The multimedia microcomputer 1 has an interface for connecting to the screen 3, the camera 4, the speaker 5, and the communication device 6, and an accelerator that performs screen display control, image input control, audio output control, and communication transmission / reception control. As a result, it is possible to display video captured by the camera 4 on the screen 3 and to transmit / receive video to / from the outside via the communication device 6 at high speed.

次に、図１９及び図２０を用いて、本実施の形態において、さらに別のマルチメディアマイコンの構成及び動作の一例を説明する。図１９は、さらに別のマルチメディアマイコンを示す構成図である。図２０は、キャッシュとＩ／Ｏ専用キャッシュとの使い分けを示す図である。 Next, an example of the configuration and operation of another multimedia microcomputer in this embodiment will be described with reference to FIGS. 19 and 20. FIG. 19 is a block diagram showing still another multimedia microcomputer. FIG. 20 is a diagram showing the proper use of the cache and the I / O dedicated cache.

図１９に示すように、本実施の形態において、さらに別のマルチメディアマイコン１は、マスタとして動作し、内部にキャッシュ１１０を持つＣＰＵ１１と、スレーブとして動作する複数のアクセラレータ１２（１２−１〜１２−ｎ）と、本発明の特徴であるＩ／Ｏ専用キャッシュ１４と、これらを接続するバス１３と、メモリコントローラ１５から構成されている。このマルチメディアマイコン１の外部にはメモリ２が接続されており、メモリ２には、ＣＰＵ１１で実行される一連の手続き処理を記述したプログラム２１と、ワークエリア２２と、さらに、各アクセラレータ１２が処理するデータを格納するデータ領域２３（２３−１〜２３−ｎ）がある。 As shown in FIG. 19, in this embodiment, another multimedia microcomputer 1 operates as a master, and has a CPU 11 having a cache 110 therein and a plurality of accelerators 12 (12-1 to 12-12) operating as slaves. -N), an I / O dedicated cache 14 which is a feature of the present invention, a bus 13 for connecting them, and a memory controller 15. A memory 2 is connected to the outside of the multimedia microcomputer 1, and a program 21 describing a series of procedural processes executed by the CPU 11, a work area 22, and each accelerator 12 are processed in the memory 2. There is a data area 23 (23-1 to 23-n) for storing data to be processed.

キャッシュ１１０及びＩ／Ｏ専用キャッシュ１４はメモリ２の内容を一時的に保持するキャッシュとしての機能を持っており、キャッシュ１１０は、ＣＰＵ１１がメモリ２にアクセスする際のアクセス効率の向上を、また、Ｉ／Ｏ専用キャッシュ１４は、ＣＰＵ１１及びアクセラレータ１２がメモリ２にアクセスする際のアクセス効率の向上を実現させる。 The cache 110 and the I / O dedicated cache 14 have a function as a cache that temporarily holds the contents of the memory 2, and the cache 110 improves the access efficiency when the CPU 11 accesses the memory 2, The I / O dedicated cache 14 improves the access efficiency when the CPU 11 and the accelerator 12 access the memory 2.

図２０を用いて、キャッシュ１１０とＩ／Ｏ専用キャッシュ１４との使い分けについて説明する。なお、以下、キャッシュ１１０はコピーバック方式を採るものとし、スヌープ機能を用いて、アクセラレータ１２からのメモリ２へのアクセスを監視し、キャッシュ１１０と、メモリ２及びＩ／Ｏ専用キャッシュ１４との間で、キャッシュコヒーレンシを保つ機能を持つ。また、キャッシュがメモリ２からデータをラインサイズ分読み込むことをフィード、メモリ２に対してラインサイズ分書き込むことをパージと称する。 The use of the cache 110 and the I / O dedicated cache 14 will be described with reference to FIG. Hereinafter, it is assumed that the cache 110 adopts a copy-back method, the access to the memory 2 from the accelerator 12 is monitored using a snoop function, and between the cache 110 and the memory 2 and the I / O dedicated cache 14. And it has a function to maintain cache coherency. The cache reads data from the memory 2 for the line size is referred to as feed, and the writing to the memory 2 for the line size is referred to as purge.

ＣＰＵ１１が、プログラム２１及びワークエリア２２をアクセスする際には、キャッシュ１１０のみを動作し、Ｉ／Ｏ専用キャッシュ１４はスルーする（１２１）。従って、キャッシュ１１０でキャッシュミスが発生した際には、ＣＰＵ１１のアクセスがリード及びライト（ライトバック時）の両方において、キャッシュ１１０はメモリ２に対して、データをフィード及びパージを行う。 When the CPU 11 accesses the program 21 and the work area 22, only the cache 110 is operated, and the I / O dedicated cache 14 is passed through (121). Therefore, when a cache miss occurs in the cache 110, the cache 110 feeds and purges data to the memory 2 when the CPU 11 accesses both read and write (during write back).

一方、ＣＰＵ１１が、アクセラレータ２１のデータ領域２３をアクセスする際には、キャッシュ１１０及びＩ／Ｏ専用キャッシュ１４がともに動作する（１２２〜１２４）。従って、キャッシュ１１０でキャッシュミスが発生した際には、引き続いてＩ／Ｏ専用キャッシュ１４でもキャッシュ判定が行われる。 On the other hand, when the CPU 11 accesses the data area 23 of the accelerator 21, both the cache 110 and the I / O dedicated cache 14 operate (122 to 124). Therefore, when a cache miss occurs in the cache 110, the cache determination is also performed in the I / O dedicated cache 14 subsequently.

Ｉ／Ｏ専用キャッシュ１４でキャッシュヒット時、ＣＰＵ１１はＩ／Ｏ専用キャッシュ１４上のデータをアクセスする（１２２）。また、Ｉ／Ｏ専用キャッシュ１４でキャッシュミス時、キャッシュ１１０からのアクセスにより動作が異なる。 When a cache hit occurs in the I / O dedicated cache 14, the CPU 11 accesses the data in the I / O dedicated cache 14 (122). Further, when the I / O dedicated cache 14 has a cache miss, the operation differs depending on the access from the cache 110.

（１）キャッシュ１１０からのキャッシュフィードアクセス（リード）
Ｉ／Ｏ専用キャッシュ１４は、メモリ２からのリードデータをスルーしてキャッシュ１１０にデータを出力する（１２３）。 (1) Cache feed access from cache 110 (read)
The I / O dedicated cache 14 passes through the read data from the memory 2 and outputs the data to the cache 110 (123).

（２）キャッシュ１１０からのキャッシュパージアクセス（ライト）
（２）−（ａ）Ｉ／Ｏ専用キャッシュ１４は、当該パージデータが連携データである場合、Ｉ／Ｏ専用キャッシュ１４に登録する。このとき、キャッシュ１１０のラインサイズが、Ｉ／Ｏ専用キャッシュ１４のラインサイズより小さい場合、メモリ２より、当該パージデータを含むラインをフィード（１２４）した後に、当該パージデータを書き込む。 (2) Cache purge access (write) from the cache 110
(2)-(a) The I / O dedicated cache 14 registers the purge data in the I / O dedicated cache 14 when the purge data is linkage data. At this time, if the line size of the cache 110 is smaller than the line size of the I / O dedicated cache 14, the purge data is written after feeding the line including the purge data from the memory 2 (124).

（２）−（ｂ）当該パージデータが連携データでない場合には、Ｉ／Ｏ専用キャッシュ１４はスルーしてメモリ２に書き込む（１２３）。 (2)-(b) If the purge data is not linkage data, the I / O dedicated cache 14 passes through and writes to the memory 2 (123).

続いて、図２１〜図２８を用いて、暗号化をＩＰプロトコルレベルで行い、セキュリティ確保を実現するＩＰｓｅｃを用いた通信を高速化するマルチメディアマイコンの具体例について説明する。このＩＰｓｅｃは、ＶＰＮ（ＶｉｒｔｕａｌＰｒｉｖａｔｅＮｅｔｗｏｒｋ）の標準プロトコルとして規定されたものである。 Next, a specific example of a multimedia microcomputer that performs encryption at the IP protocol level and speeds up communication using IPsec for ensuring security will be described with reference to FIGS. This IPsec is specified as a standard protocol of VPN (Virtual Private Network).

図２１は、マルチメディアマイコン１の具体的な構成を示す。マルチメディアマイコン１は、ＣＰＵ１１、アクセラレータ１２、Ｉ／Ｏ専用キャッシュ１４と、これらを接続するバス１３及びメモリコントローラ１５から構成されている。また、アクセラレータ１２として、ＴＣＰアクセラレータ１２−１、ＩＰｓｅｃアクセラレータ１２−２、及びＥｔｈｅｒＭＡＣ１２−３を持ち、ＴＣＰアクセラレータ１２−１はチェックサム計算とメモリコピーを、ＩＰｓｅｃアクセラレータ１２−２は復号化及び認証処理を、ＥｔｈｅｒＭＡＣ１２−３はＬＡＮ１３０で接続されて、ＬＡＮからのフレームの送受信機能を持つ。なお、ここではＬＡＮ１３０は、ＬＡＮとして最も多く使用されているイーサネット（登録商標）とする。 FIG. 21 shows a specific configuration of the multimedia microcomputer 1. The multimedia microcomputer 1 includes a CPU 11, an accelerator 12, an I / O dedicated cache 14, a bus 13 that connects them, and a memory controller 15. The accelerator 12 includes a TCP accelerator 12-1, an IPsec accelerator 12-2, and an EtherMAC 12-3. The TCP accelerator 12-1 performs checksum calculation and memory copy, and the IPsec accelerator 12-2 performs decryption and authentication processing. The EtherMAC 12-3 is connected via the LAN 130 and has a function of transmitting and receiving frames from the LAN. Here, the LAN 130 is Ethernet (registered trademark), which is most frequently used as a LAN.

図２２は、ＩＰｓｅｃのトランスポートベースを用いて通信を行う際のフレーム構造を示す。ＬＡＮ及びインターネットでは、標準プロトコルとしてＴＣＰ／ＩＰプロトコルが使用されており、送受信されるデータサイズが１フレームで送信できるサイズより大きい場合、複数のＴＣＰパケットに分割されて送受信される。 FIG. 22 shows a frame structure when communication is performed using an IPsec transport base. In the LAN and the Internet, the TCP / IP protocol is used as a standard protocol. When the data size to be transmitted / received is larger than the size that can be transmitted in one frame, it is divided into a plurality of TCP packets and transmitted / received.

図２２に示すとおり、ＩＰｓｅｃのトランスポートモードでは、ＴＣＰパケットを暗号化したＩＰｓｅｃパケットにＩＰヘッダを付加してＩＰでカプセル化した構成を採っている。マルチメディアマイコン１では、ＬＡＮ向けとしてイーサネット（登録商標）を使用しているため、最後にＭＡＣヘッダを付加した構成となる。ちなみに、図２３には、ＩＰｓｅｃを用いない場合のＴＣＰ／ＩＰのフレーム構成を示している。 As shown in FIG. 22, the IPsec transport mode adopts a configuration in which an IP header is added to an IPsec packet obtained by encrypting a TCP packet and encapsulated in IP. Since the multimedia microcomputer 1 uses Ethernet (registered trademark) for LAN, it has a configuration in which a MAC header is added at the end. Incidentally, FIG. 23 shows a TCP / IP frame configuration when IPsec is not used.

なお、ＩＰｓｅｃパケットは、ＩＰｓｅｃヘッダとＩＰｓｅｃデータから構成されており、ＩＰｓｅｃヘッダには暗号化を行っているためにＥＳＰヘッダを使用する。ＩＰｓｅｃデータは、暗号化に必要なデータを持つＥＳＰトレーラをＴＣＰパケットに付加し、その全体を暗号化した後、改ざんを検出できるようにＥＳＰ認証値が加えられている。 Note that an IPsec packet is composed of an IPsec header and IPsec data, and the ESP header is used because the IPsec header is encrypted. The IPsec data is added with an ESP authentication value so that tampering can be detected after an ESP trailer having data necessary for encryption is added to the TCP packet and the entire packet is encrypted.

次に、キャッシュの動作として、Ｉ／Ｏ専用キャッシュを使用しない場合の受信処理（図２４）、Ｉ／Ｏ専用キャッシュを使用した場合の受信処理（図２５）、連携データ部のみをＩ／Ｏ専用キャッシュに保持し、Ｉ／Ｏ専用キャッシュを使用した場合の受信処理（図２６）を順に説明する。 Next, as cache operations, a reception process when the I / O dedicated cache is not used (FIG. 24), a reception process when the I / O dedicated cache is used (FIG. 25), and only the cooperative data section is I / O. The reception process (FIG. 26) in the case of holding in the dedicated cache and using the I / O dedicated cache will be described in order.

まず、Ｉ／Ｏ専用キャッシュ１４を使用しない場合、この図２２に示すＩＰｓｅｃのトランスポートモードのイーサフレームを受信する際の処理を、図２４を用いて説明する。 First, when the I / O dedicated cache 14 is not used, processing when receiving an IPsec frame in the IPsec transport mode shown in FIG. 22 will be described with reference to FIG.

（１）マルチメディアマイコン１がイーサネット（登録商標）のＬＡＮ１３０から、当該イーサフレームを受信し、メモリ２内のアクセラレータ１２のデータ領域２３に書き込む（１００１，１０１１）。 (1) The multimedia microcomputer 1 receives the Ethernet frame from the LAN 130 of Ethernet (registered trademark) and writes it in the data area 23 of the accelerator 12 in the memory 2 (1001, 1011).

（２）ＣＰＵ１１は、当該イーサフレーム１０１１のＭＡＣヘッダ及びＩＰヘッダをアクセラレータ１２のデータ領域２３より読み込み、イーサ受信及びＩＰ受信処理を行う（１００２）。 (2) The CPU 11 reads the MAC header and IP header of the Ethernet frame 1011 from the data area 23 of the accelerator 12, and performs Ethernet reception and IP reception processing (1002).

（３）ＣＰＵ１１は、当該イーサフレーム１０１１がＩＰｓｅｃパケットを含むため、当該イーサフレーム１０１１内のＩＰｓｅｃヘッダを読み込み、ＩＰｓｅｃ受信処理を行い、ＩＰｓｅｃアクセラレータ１２−２を起動する。 (3) Since the Ethernet frame 1011 includes an IPsec packet, the CPU 11 reads the IPsec header in the Ethernet frame 1011, performs an IPsec reception process, and activates the IPsec accelerator 12-2.

（４）ＩＰｓｅｃアクセラレータ１２−２は、アクセラレータ１２のデータ領域２３より当該イーサフレーム１０１１内のＩＰｓｅｃデータを読み込み、認証処理及び復号処理を行い、その結果をアクセラレータ１２のデータ領域２３にＴＣＰパケット（ＴＣＰデータ）１０１２として書き戻す（１００３）。 (4) The IPsec accelerator 12-2 reads the IPsec data in the Ethernet frame 1011 from the data area 23 of the accelerator 12, performs authentication processing and decryption processing, and sends the result to the data area 23 of the accelerator 12 as a TCP packet (TCP Data) 1012 is written back (1003).

（５）ＣＰＵ１１は、アクセラレータ１２のデータ領域２３内のＴＣＰパケット１０１２より、ＴＣＰヘッダを読み込み、受信処理を行うとともに、チェックサムを計算するため、ＴＣＰアクセラレータ１２−１を起動する（１００４）。 (5) The CPU 11 reads the TCP header from the TCP packet 1012 in the data area 23 of the accelerator 12, performs reception processing, and activates the TCP accelerator 12-1 to calculate the checksum (1004).

（６）ＴＣＰアクセラレータ１２−１は、アクセラレータ１２のデータ領域２３内のＴＣＰパケット１０１２を読み込み、チェックサムを計算するとともに、ＴＣＰデータを受信データ内の適切な位置（図では左から３番目の位置）に書き込む（１００５）。 (6) The TCP accelerator 12-1 reads the TCP packet 1012 in the data area 23 of the accelerator 12, calculates the checksum, and sends the TCP data to an appropriate position in the received data (the third position from the left in the figure). (1005).

このように、Ｉ／Ｏ専用キャッシュ１４を使用しない場合、メモリ２へのアクセスが、１イーサフレーム当たり５回発生することになる。 Thus, when the I / O dedicated cache 14 is not used, access to the memory 2 occurs five times per Ethernet frame.

一方、Ｉ／Ｏ専用キャッシュ１４を使用した場合の動作を、図２５を用いて説明する。 On the other hand, the operation when the I / O dedicated cache 14 is used will be described with reference to FIG.

（１’）マルチメディアマイコン１がイーサネット（登録商標）のＬＡＮ１３０から、当該イーサフレームを受信し、メモリ２内のアクセラレータ１２のデータ領域２３に書き込む（１０２１，１０１１）。しかし、アクセラレータ１２のデータ領域２３への書き込みであるため、Ｉ／Ｏ専用キャッシュ１４は当該フレームをキャッシュし（１０１１’）、実際にメモリ２へのアクセスは発生しない。 (1 ') The multimedia microcomputer 1 receives the Ethernet frame from the Ethernet (registered trademark) LAN 130 and writes it in the data area 23 of the accelerator 12 in the memory 2 (1021, 1011). However, since the data is written to the data area 23 of the accelerator 12, the I / O dedicated cache 14 caches the frame (1011 '), and the memory 2 is not actually accessed.

（２’）ＣＰＵ１１は、アクセラレータ１２のデータ領域２３内の当該イーサフレーム１０１１内にあるＭＡＣヘッダ及びＩＰヘッダを読み込む際、Ｉ／Ｏ専用キャッシュ１４にヒットする。そのため、メモリ２へのアクセスは発生せず、Ｉ／Ｏ専用キャッシュ１４から当該フレーム１０１１’のＭＡＣヘッダ及びＩＰヘッダが読み込まれ、イーサ受信及びＩＰ受信処理を行う（１０２２）。 (2 ′) When the CPU 11 reads the MAC header and IP header in the ether frame 1011 in the data area 23 of the accelerator 12, the CPU 11 hits the I / O dedicated cache 14. Therefore, access to the memory 2 does not occur, and the MAC header and the IP header of the frame 1011 'are read from the I / O dedicated cache 14 to perform Ethernet reception and IP reception processing (1022).

（３’）ＣＰＵ１１は、当該イーサフレーム１０１１’がＩＰｓｅｃパケットを含むため、当該イーサフレーム１０１１内のＩＰｓｅｃヘッダを読み込み、ＩＰｓｅｃ受信処理を行い、ＩＰｓｅｃアクセラレータ１２−２を起動する。このメモリ２へのアクセスも、（２）と同様にＩ／Ｏ専用キャッシュ１４にヒットするため、当該イーサフレーム１０１１’のＩＰｓｅｃヘッダが読み込まれ、メモリ２へのアクセスは発生しない（１０２２）。 (3 ′) The CPU 11 reads the IPsec header in the Ethernet frame 1011 because the Ethernet frame 1011 ′ includes an IPsec packet, performs an IPsec reception process, and activates the IPsec accelerator 12-2. Since the access to the memory 2 also hits the I / O dedicated cache 14 as in (2), the IPsec header of the Ether frame 1011 'is read, and the access to the memory 2 does not occur (1022).

（４’）ＩＰｓｅｃアクセラレータ１２−２は、当該イーサフレーム１０１１内のＩＰｓｅｃデータを読み込もうとするが、Ｉ／Ｏ専用キャッシュ１４にヒットし、実際には当該イーサフレーム１０１１’から読み込まれる（１０２３）。その後、ＩＰｓｅｃアクセラレータ１２−２は認証処理及び復号処理を行い、その結果をアクセラレータ１２のデータ領域２３にＴＣＰパケット１０１２として書き戻す。しかし、アクセラレータ１２のデータ領域２３への書き込みであるため、Ｉ／Ｏ専用キャッシュ１４はキャッシュし（１０１２’）、実際にメモリ２へのアクセスは発生しない（１０２３）。 (4 ') The IPsec accelerator 12-2 tries to read the IPsec data in the Ethernet frame 1011. However, the IPsec accelerator 12-2 hits the I / O dedicated cache 14 and is actually read from the Ethernet frame 1011' (1023). Thereafter, the IPsec accelerator 12-2 performs an authentication process and a decryption process, and writes the result back as a TCP packet 1012 in the data area 23 of the accelerator 12. However, since the data is written to the data area 23 of the accelerator 12, the I / O dedicated cache 14 caches (1012 '), and no access to the memory 2 actually occurs (1023).

（５’）ＣＰＵ１１は、アクセラレータ１２のデータ領域２３内のＴＣＰパケット１０１２より、ＴＣＰヘッダの読み込みを行うが、実際にはＩ／Ｏ専用キャッシュ１４でヒットするため、ＴＣＰパケット１０１２’のＴＣＰヘッダが読み込まれる（１０２４）。続いて、ＣＰＵ１１はＴＣＰ受信処理を行うとともに、チェックサムを計算するため、ＴＣＰアクセラレータ１２−１を起動する。 (5 ′) The CPU 11 reads the TCP header from the TCP packet 1012 in the data area 23 of the accelerator 12. However, since the CPU 11 actually hits the I / O dedicated cache 14, the TCP header of the TCP packet 1012 ′ It is read (1024). Subsequently, the CPU 11 performs TCP reception processing and activates the TCP accelerator 12-1 in order to calculate a checksum.

（６’）ＴＣＰアクセラレータ１２−１は、アクセラレータ１２のデータ領域２３内にあるＴＣＰパケット１０１２の読み込みを行うが、Ｉ／Ｏ専用キャッシュ１４でヒットするため、ＴＣＰパケット１０１２’を読み込む。ＴＣＰアクセラレータ１２−１は、チェックサムを計算するとともに、ＴＣＰデータを受信データ内の適切な位置に書き込む（１０２５）。 (6 ') The TCP accelerator 12-1 reads the TCP packet 1012 in the data area 23 of the accelerator 12, but reads the TCP packet 1012' because it hits the I / O dedicated cache 14. The TCP accelerator 12-1 calculates the checksum and writes the TCP data at an appropriate position in the received data (1025).

以上により、アクセラレータ１２とＣＰＵ１１がともにアクセスする連携データをＩ／Ｏ専用キャッシュ１４内に留めることで、メモリ２へのアクセスを０回にすることが可能となった。実際には上述のように、画像やダウンロードなどでは、複数のイーサフレームに分割されて送受信されるため、メモリ２へのアクセスのオーバーヘッドが通信性能に大きく影響することになる。 As described above, it is possible to make access to the memory 2 zero by keeping the cooperative data accessed by the accelerator 12 and the CPU 11 in the I / O dedicated cache 14. Actually, as described above, in an image, download, or the like, transmission and reception are performed by being divided into a plurality of ether frames, so the overhead of access to the memory 2 greatly affects the communication performance.

また、ＣＰＵ１１とアクセラレータ１２がともにアクセスする連携データは、１０３１及び１０３２のヘッダ部分である。この連携データをＩ／Ｏ専用キャッシュ１４がキャッシュすることにより、ＣＰＵ１１は、アクセラレータ１２が書き込んだデータをアクセスが遅いメモリ２からではなく、Ｉ／Ｏ専用キャッシュ１４から読み込むことができるため、オーバーヘッドとなるアクセス待ち時間を大幅に削減でき、ＩＰｓｅｃベースのＴＣＰ／ＩＰ通信を高速に行うことが可能となる。 The linkage data accessed by both the CPU 11 and the accelerator 12 are the header portions of 1031 and 1032. Since the I / O dedicated cache 14 caches this cooperative data, the CPU 11 can read the data written by the accelerator 12 from the I / O dedicated cache 14 instead of from the slow-access memory 2. Thus, it is possible to greatly reduce the access waiting time, and to perform IPsec-based TCP / IP communication at high speed.

また、図２６に、連携データ部１０３１（ＭＡＣヘッダ、ＩＰヘッダ、ＩＰｓｅｃヘッダ），１０３２（ＴＣＰヘッダ）のみをＩ／Ｏ専用キャッシュ１４に保持し、それ以外のデータ（ＩＰｓｅｃデータ、ＴＣＰデータ）をメモリ２内に保持した場合の構成図を示す。この構成は同時に複数のアクセラレータ１２が動作し、Ｉ／Ｏ専用キャッシュ１４に余裕がない場合である。 In FIG. 26, only the cooperative data sections 1031 (MAC header, IP header, IPsec header), 1032 (TCP header) are held in the I / O dedicated cache 14, and other data (IPsec data, TCP data) is stored. The block diagram at the time of hold | maintaining in the memory 2 is shown. This configuration is a case where a plurality of accelerators 12 operate simultaneously and the I / O dedicated cache 14 has no room.

一方、Ｉ／Ｏ専用キャッシュ１４に余裕がある場合には、図２５に示すように、連携データ部１０３１，１０３２に加えて、連携データ部以外のデータもキャッシュすることにより、アクセラレータ１２間のデータ転送への利用も可能となる。アクセラレータ１２側では、連続したアドレスに対してアクセスすることが多いことに注目し、アクセラレータ１２間のデータ転送により、連携データ１０３１，１０３２がキャッシュアウトされないことがポイントとなる。そこで、Ｉ／Ｏ専用キャッシュ１４上に連携データが優先的にキャッシュすることを実現する方法として、以下の方法が挙げられる。 On the other hand, when the I / O dedicated cache 14 has a margin, as shown in FIG. 25, in addition to the linked data units 1031 and 1032, data other than the linked data unit is also cached, so that data between the accelerators 12. It can also be used for forwarding. It is noted that the accelerator 12 often accesses consecutive addresses, and the point is that the linked data 1031 and 1032 are not cached out due to data transfer between the accelerators 12. Therefore, as a method for realizing that the cooperative data is preferentially cached on the I / O dedicated cache 14, the following method can be cited.

（ａ）連携データのみキャッシュする。 (A) Only the linkage data is cached.

（ｂ）連携データのキャッシュ滞在時間を他のデータよりも長くする（ＬＲＵのカウンタの進み具合を他のデータに比べて遅くするなど）。 (B) The cache data's staying time for linked data is made longer than for other data (for example, the progress of the LRU counter is delayed compared to other data).

（ｃ）ラインごとに連携データ用の使用中ビットを設け、ＣＰＵ１１での一連の処理が終了した時点で、当該使用中ビットをクリアする。クリアされたラインは、キャッシュアウトの対象となる。 (C) An in-use bit for linked data is provided for each line, and the in-use bit is cleared when a series of processing in the CPU 11 is completed. The cleared line is subject to cash out.

ここで、（ａ），（ｂ）の方法は、Ｉ／Ｏ専用キャッシュ１４での処理となるため、アプリケーションの介在が不要であるが、（ｃ）の方法は、使用中ビットをＯＳかドライバ・ミドルウエアレベルの管理処理が必須となる。 Here, since the methods (a) and (b) are performed in the I / O dedicated cache 14, no application intervention is required. However, the method (c) uses the OS or driver as the bit in use.・ Middleware level management processing is essential.

上記の方法により、連携データがＩ／Ｏ専用キャッシュ１４に長く滞在可能となり、特に複数のアクセラレータが同時に動作している際に、連携データがＩ／Ｏ専用キャッシュ１４からキャッシュアウトされることによる性能低下を防ぐ事が可能となる。 According to the above method, the linkage data can stay in the I / O dedicated cache 14 for a long time. Especially when a plurality of accelerators are operating simultaneously, the performance of the linkage data being cached out from the I / O dedicated cache 14 is achieved. It is possible to prevent the decrease.

さらに、同様にＩＰｓｅｃを用いて暗号化したデータを送信する処理を図２７に示す。送信処理は、受信処理と逆である。 Further, FIG. 27 shows a process for transmitting data encrypted similarly using IPsec. The transmission process is the reverse of the reception process.

ＣＰＵ１１は、メモリ２内のアクセラレータ１２のデータ領域２３内に送信データをセットする。このとき、Ｉ／Ｏ専用キャッシュ１４は、送信データがアクセラレータ１２のデータ領域２３にライトされるのを検出し、キャッシュする。図２７では、この送信データは４フレームに分割され、３番目のデータ１０６１が送信される処理を示している。 The CPU 11 sets transmission data in the data area 23 of the accelerator 12 in the memory 2. At this time, the I / O dedicated cache 14 detects and caches transmission data being written to the data area 23 of the accelerator 12. FIG. 27 shows a process in which the transmission data is divided into four frames and the third data 1061 is transmitted.

（１）ＣＰＵ１１は３番目のデータ１０６１を送信するため、ＴＣＰアクセラレータ１２−１に起動をかける。 (1) The CPU 11 activates the TCP accelerator 12-1 to transmit the third data 1061.

（２）ＴＣＰアクセラレータ１２−１は、アクセラレータ１２のデータ領域２３内の当該送信データから１フレームで送信可能なサイズ１０６１に切り出し、チェックサムを計算するとともに、送信用のバッファ１０６２内のＴＣＰデータ部にコピーする。このとき、ＴＣＰアクセラレータ１２−１はアクセラレータ１２のデータ領域２３内をアクセスするため、実際にはＩ／Ｏ専用キャッシュ１４内の１０６１’を読み込み、１０６２’のＴＣＰデータ部に書き込む（１０５１）。 (2) The TCP accelerator 12-1 cuts out the transmission data in the data area 23 of the accelerator 12 to a size 1061 that can be transmitted in one frame, calculates a checksum, and also transmits a TCP data portion in the transmission buffer 1062 Copy to. At this time, since the TCP accelerator 12-1 accesses the data area 23 of the accelerator 12, it actually reads 1061 'in the I / O dedicated cache 14 and writes it in the TCP data portion of 1062' (1051).

（３）ＣＰＵ１１はＴＣＰヘッダを作成し、アクセラレータ１２のデータ領域２３内のＴＣＰパケット１０６２内のＴＣＰヘッダに書き込む。しかし、実際にはＩ／Ｏ専用キャッシュ１４内のＴＣＰパケット１０６２’内のＴＣＰヘッダ部１０７１に書き込まれる（１０５２）。 (3) The CPU 11 creates a TCP header and writes it in the TCP header in the TCP packet 1062 in the data area 23 of the accelerator 12. However, it is actually written in the TCP header portion 1071 in the TCP packet 1062 'in the I / O dedicated cache 14 (1052).

（４）ＴＣＰパケットを暗号化するため、ＣＰＵ１１はＩＰｓｅｃアクセラレータ１２−２を起動する。これを受けて、ＩＰｓｅｃアクセラレータ１２−２は、ＴＣＰパケット１０６２を読み込み、暗号化した結果をイーサフレーム１０６３のＩＰｓｅｃデータ部に書き込む。このとき、実際には、Ｉ／Ｏ専用キャッシュ１４内の１０６２’を読み込み、暗号化したデータを１０６３’のＩＰｓｅｃデータ部に書き込む。 (4) In order to encrypt the TCP packet, the CPU 11 activates the IPsec accelerator 12-2. In response to this, the IPsec accelerator 12-2 reads the TCP packet 1062, and writes the encrypted result in the IPsec data portion of the Ethernet frame 1063. At this time, actually, 1062 'in the I / O dedicated cache 14 is read, and the encrypted data is written in the IPsec data portion of 1063'.

（５）ＣＰＵ１１はヘッダ部（ＭＡＣヘッダ、ＩＰヘッダ、ＩＰｓｅｃヘッダ）を作成し、アクセラレータ１２のデータ領域２３内のイーサフレーム１０６３のヘッダ部に書き込むが、実際にはＩ／Ｏ専用キャッシュ１４内の１０６３’のヘッダ部１０７２に書き込まれる（１０５３）。 (5) The CPU 11 creates a header part (MAC header, IP header, IPsec header) and writes it in the header part of the ether frame 1063 in the data area 23 of the accelerator 12, but actually it is in the I / O dedicated cache 14. It is written in the header portion 1072 of 1063 ′ (1053).

（６）ＣＰＵ１１は、イーサフレーム１０６３の作成終了を受け、ＥｔｈｅｒＭＡＣ１２−３に送信要求を行う。それを受け、ＥｔｈｅｒＭＡＣ１２−３はアクセラレータ１２のデータ領域２３内のイーサフレーム１０６３（実際にはＩ／Ｏ専用キャッシュ１４内の１０６３’）を読み出し、イーサネット（登録商標）のＬＡＮ１３０に出力する。 (6) Upon completion of creation of the Ethernet frame 1063, the CPU 11 sends a transmission request to the EtherMAC 12-3. In response to this, the EtherMAC 12-3 reads the Ether frame 1063 (actually 1063 'in the I / O dedicated cache 14) in the data area 23 of the accelerator 12, and outputs it to the LAN 130 of Ethernet (registered trademark).

以上のように、送信処理においても、ＣＰＵ１１及びアクセラレータ１２はともに、Ｉ／Ｏ専用キャッシュ１４の有無を気にすることなく実行することが出来る。 As described above, also in the transmission process, both the CPU 11 and the accelerator 12 can be executed without worrying about the presence / absence of the I / O dedicated cache 14.

また、上記受信処理及び送信処理が同時に発生しても、Ｉ／Ｏ専用キャッシュ１４はキャッシュであるため、問題なく利用できる。 Even if the reception process and the transmission process occur simultaneously, the I / O dedicated cache 14 is a cache and can be used without any problem.

次に、図２８を用いて、ＣＰＵ１１内のキャッシュ１１０がスヌープ機能を持つ際の処理を示す。 Next, processing when the cache 110 in the CPU 11 has a snoop function will be described with reference to FIG.

上記送信処理の（３）において、ＣＰＵ１１がキャッシュ１１０を有効かつライトバックモードで、ＴＣＰヘッダを作成すると、実際のＴＣＰヘッダはキャッシュ１１０内にのみ存在し、Ｉ／Ｏ専用キャッシュ１４内の１０７１及び、アクセラレータ１２のデータ領域２３内には存在しない。ここで、ＣＰＵ１１の起動を受けたＩＰｓｅｃアクセラレータ１２−２は、ＴＣＰヘッダを読みに行く。キャッシュ１４はこのアクセスを、バス１３を介して検出すると、ＩＰｓｅｃアクセラレータ１２−２にアクセス中断要求を出すとともに、キャッシュ１１０内のＴＣＰヘッダのデータをアクセラレータ１２のデータ領域２３内のＴＣＰパケット１０６２へパージする。しかし、実際にはＩ／Ｏ専用キャッシュ１４内のＴＣＰヘッダ部１０７１に書き込まれる。 In the transmission process (3), when the CPU 11 creates the TCP header with the cache 110 enabled and in the write-back mode, the actual TCP header exists only in the cache 110, and the 1071 in the I / O dedicated cache 14 and , It does not exist in the data area 23 of the accelerator 12. Here, the IPsec accelerator 12-2 that has been activated by the CPU 11 reads the TCP header. When the cache 14 detects this access via the bus 13, it issues an access interruption request to the IPsec accelerator 12-2 and purges the TCP header data in the cache 110 to the TCP packet 1062 in the data area 23 of the accelerator 12. To do. However, it is actually written in the TCP header portion 1071 in the I / O dedicated cache 14.

パージ処理が終了すると、キャッシュ１１０はＩＰｓｅｃアクセラレータ１２−２へのアクセス中断要求を解除する。これを受けて、ＩＰｓｅｃアクセラレータ１２−２はＴＣＰヘッダの読み込みを再開する。キャッシュ１１０からのパージ後の正しいＴＣＰヘッダ１０７１のデータを読み込むことが可能となる。 When the purge process ends, the cache 110 cancels the access interruption request to the IPsec accelerator 12-2. In response to this, the IPsec accelerator 12-2 resumes reading the TCP header. It is possible to read the correct TCP header 1071 data after purging from the cache 110.

ここで注目すべきは、アクセス時間の短いＩ／Ｏ専用キャッシュ１４を使用することで、キャッシュ１１０とメモリ２間のキャッシュコヒーレンシは、アクセス待ち時間の大きいメモリ２へのアクセスなしに、Ｉ／Ｏ専用キャッシュ１４を介したアクセスとなり、キャッシュパージによるオーバーヘッドを大幅に削減することが可能となる。 It should be noted that by using the I / O dedicated cache 14 having a short access time, cache coherency between the cache 110 and the memory 2 can be reduced without accessing the memory 2 having a large access latency. Access is made through the dedicated cache 14, and the overhead due to cache purge can be greatly reduced.

以上説明したように、本実施の形態によれば、以下のような効果を得ることができる。 As described above, according to the present embodiment, the following effects can be obtained.

（１）Ｉ／Ｏ専用キャッシュ１４を採用したマルチメディアマイコン１，１０によれば、マルチメディア処理をＣＰＵ１１とアクセラレータ１２が連携して動作する際に発生するメモリアクセスでのデータ連携によるボトルネックを最小限に抑えることができ、マルチメディア処理性能を高めることができる。 (1) According to the multimedia microcomputers 1 and 10 adopting the I / O dedicated cache 14, a bottleneck caused by data cooperation in memory access that occurs when the CPU 11 and the accelerator 12 operate in cooperation with the multimedia processing. It can be minimized and the multimedia processing performance can be improved.

（２）Ｉ／Ｏ専用キャッシュ１４は、ＣＰＵ１１とアクセラレータ１２間のデータ連携に必要なデータのみを保持すると共に、Ｉ／Ｏ専用キャッシュ１４に保持するかどうかの判定は、メモリ２へのライトアクセスのみで良いことに注目することにより、データ連携におけるＩ／Ｏ専用キャッシュ１４でのキャッシュヒット率を向上させることが可能となり、よりコンパクトにＩ／Ｏ専用キャッシュ１４を実現できる。 (2) The I / O dedicated cache 14 holds only data necessary for data linkage between the CPU 11 and the accelerator 12 and determines whether to hold the data in the I / O dedicated cache 14 by write access to the memory 2. It is possible to improve the cache hit rate in the I / O dedicated cache 14 in data linkage, and to realize the I / O dedicated cache 14 in a more compact manner.

（３）複数のマルチメディア向けのアクセラレータ１２を搭載した場合でも、データ連携を効率よく行うことが可能となるため、音声や静止画、動画などの複数のマルチメディア処理を同時かつ高速に処理可能なマルチメディアマイコン１，１０、及びこのマルチメディアマイコンを用いたマルチメディア端末１００を構成できる。 (3) Even when multiple accelerators for multimedia 12 are installed, data linkage can be performed efficiently, so multiple multimedia processing such as audio, still images, and videos can be processed simultaneously and at high speed. The multimedia microcomputers 1 and 10 and the multimedia terminal 100 using the multimedia microcomputer can be configured.

以上、本発明者によってなされた発明を実施の形態に基づき具体的に説明したが、本発明は前記実施の形態に限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能であることはいうまでもない。 As mentioned above, the invention made by the present inventor has been specifically described based on the embodiment. However, the present invention is not limited to the embodiment, and various modifications can be made without departing from the scope of the invention. Needless to say.

例えば、前記実施の形態においては、イーサネット（登録商標）を使用した有線による通信機能を具体例として説明したが、本発明はこれに限定されるものではなく、（１）無線による通信機能、（２）グラフィックス、ＭＰＥＧやＪＰＥＧ（画像圧縮／伸張）などによる画面表示機能、（３）画像回転や画質調整などの画像処理によるカメラ処理機能、（４）音楽、ＭＰ３（音声圧縮／伸張）などによるスピーカ処理機能、などにも同様に適用することができる。 For example, in the above-described embodiment, a wired communication function using Ethernet (registered trademark) has been described as a specific example. However, the present invention is not limited to this, and (1) a wireless communication function, ( 2) Screen display function by graphics, MPEG, JPEG (image compression / decompression), (3) Camera processing function by image processing such as image rotation and image quality adjustment, (4) Music, MP3 (audio compression / decompression), etc. The same can be applied to the speaker processing function according to the above.

尚、上述した実施の形態ではＣＰＵを１つ備えた構成例を示したが、複数のＣＰＵを備えた構成においても本発明を有効に適用することが出来る。 In the above-described embodiment, the configuration example including one CPU is shown. However, the present invention can be effectively applied to a configuration including a plurality of CPUs.

以上に説明した本発明の内容は、マイコンに関し、特に、ＣＰＵによる処理以外にアクセラレータなどの補助回路を有する通信及びマルチメディア処理を行うマイコンに適用することが可能である。 The contents of the present invention described above relate to a microcomputer, and in particular, can be applied to a microcomputer that performs auxiliary processing such as an accelerator and multimedia processing in addition to processing by a CPU.

本発明の一実施の形態に係るマルチメディアマイコンを示す構成図である。It is a block diagram which shows the multimedia microcomputer which concerns on one embodiment of this invention. 本発明の一実施の形態において、メモリの構成を示す図である。FIG. 3 is a diagram showing a configuration of a memory in an embodiment of the present invention. 本発明の一実施の形態において、別のマルチメディアマイコンを示す構成図である。In one embodiment of the present invention, it is a block diagram showing another multimedia microcomputer. 本発明の一実施の形態において、マルチメディア処理の流れを示す図である。FIG. 5 is a diagram illustrating a flow of multimedia processing in an embodiment of the present invention. 本発明の一実施の形態において、マルチメディア処理のデータの流れ（前処理からアクセラレータ処理まで）を示す図である。In one embodiment of the present invention, it is a diagram showing a data flow (from pre-processing to accelerator processing) of multimedia processing. 本発明の一実施の形態において、マルチメディア処理のデータの流れ（処理結果セットから後処理まで）を示す図である。FIG. 6 is a diagram illustrating a data flow (from a processing result set to post-processing) of multimedia processing in an embodiment of the present invention. 本発明の一実施の形態において、バスの構成を示す図である。In one embodiment of the present invention, it is a figure showing composition of a bus. 本発明の一実施の形態において、Ｉ／Ｏ専用キャッシュの構成を示す図である。FIG. 3 is a diagram showing a configuration of an I / O dedicated cache in an embodiment of the present invention. 本発明の一実施の形態において、レジスタの構成を示す図である。FIG. 3 is a diagram illustrating a configuration of a register in an embodiment of the present invention. （ａ），（ｂ）は本発明の一実施の形態において、Ｉ／Ｏ専用キャッシュ内のレジスタアクセス経路を示す図である。(A), (b) is a figure which shows the register access path | route in an I / O exclusive cache in one embodiment of this invention. 本発明の一実施の形態において、判定回路での処理の流れを示す図である。In one embodiment of the present invention, it is a figure showing a flow of processing in a judgment circuit. 本発明の一実施の形態において、アドレス判定回路の構成を示す図である。FIG. 3 is a diagram showing a configuration of an address determination circuit in an embodiment of the present invention. 本発明の一実施の形態において、キャッシュの構成を示す図である。In one embodiment of the present invention, it is a diagram showing a configuration of a cache. 本発明の一実施の形態において、キャッシュの動作を示す図である。FIG. 6 is a diagram illustrating a cache operation in the embodiment of the present invention. 本発明の一実施の形態の応用例において、メモリコントローラの構成を示す図である。In the application example of one embodiment of this invention, it is a figure which shows the structure of a memory controller. 本発明の一実施の形態の応用例において、キャッシュの構成を示す図である。In the application example of one embodiment of this invention, it is a figure which shows the structure of a cache. 本発明の一実施の形態の応用例において、アクセス要求のデータ構成を示す図である。In the application example of one embodiment of this invention, it is a figure which shows the data structure of an access request. 本発明の一実施の形態において、マルチメディアマイコンを用いたマルチメディア端末を示す構成図である。In one embodiment of the present invention, it is a block diagram showing a multimedia terminal using a multimedia microcomputer. 本発明の一実施の形態において、さらに別のマルチメディアマイコンを示す構成図である。In one embodiment of the present invention, it is a block diagram showing still another multimedia microcomputer. 本発明の一実施の形態において、キャッシュとＩ／Ｏ専用キャッシュとの使い分けを示す図である。In one embodiment of the present invention, it is a diagram showing the proper use of a cache and an I / O dedicated cache. 本発明の一実施の形態において、マルチメディアマイコンの具体的な構成を示す構成図である。In one embodiment of the present invention, it is a block diagram showing a specific configuration of a multimedia microcomputer. 本発明の一実施の形態において、通信を行う際のフレーム構造を示す構成図である。In one embodiment of the present invention, it is a block diagram showing a frame structure when performing communication. 本発明の一実施の形態において、通信を行う際の別のフレーム構造を示す構成図である。In one embodiment of this invention, it is a block diagram which shows another frame structure at the time of performing communication. 本発明の一実施の形態において、キャッシュの動作（Ｉ／Ｏ専用キャッシュを使用しない場合の受信処理）を示す図である。FIG. 10 is a diagram illustrating a cache operation (reception processing when an I / O dedicated cache is not used) in an embodiment of the present invention. 本発明の一実施の形態において、キャッシュの動作（Ｉ／Ｏ専用キャッシュを使用した場合の受信処理）を示す図である。FIG. 10 is a diagram illustrating a cache operation (reception processing when an I / O dedicated cache is used) in an embodiment of the present invention. 本発明の一実施の形態において、キャッシュの動作（連携データ部のみをＩ／Ｏ専用キャッシュに保持し、Ｉ／Ｏ専用キャッシュを使用した場合の受信処理）を示す図である。FIG. 11 is a diagram illustrating a cache operation (reception processing when only the linkage data unit is held in the I / O dedicated cache and the I / O dedicated cache is used) in the embodiment of the present invention. 本発明の一実施の形態において、暗号化したデータを送信する処理を示す図である。FIG. 6 is a diagram illustrating a process of transmitting encrypted data according to an embodiment of the present invention. 本発明の一実施の形態において、キャッシュの動作（スヌープ機能を持つ場合）を示す図である。FIG. 5 is a diagram illustrating a cache operation (when a snoop function is provided) in an embodiment of the present invention.

Explanation of symbols

１…マルチメディアマイコン、２…メモリ、３…画面、４…カメラ、５…スピーカ、６…通信装置、１０…マルチメディアマイコン、１１…ＣＰＵ、１２…アクセラレータ、１３…バス、１４…Ｉ／Ｏ専用キャッシュ、１５…メモリコントローラ、２１…プログラム、２２…ワークエリア、２３…データ領域、１００…マルチメディア端末、１１０…キャッシュ、１３０…ＬＡＮ、１４１…レジスタ、１４２…判定回路、１４３…キャッシュ、１５１…アクセス制御回路、１５２…リフレッシュ制御回路、１５３…リードアクセス要求ＦＩＦＯ、１５４…ライトアクセス要求ＦＩＦＯ、１５５…メモリアクセス制御回路。 DESCRIPTION OF SYMBOLS 1 ... Multimedia microcomputer, 2 ... Memory, 3 ... Screen, 4 ... Camera, 5 ... Speaker, 6 ... Communication apparatus, 10 ... Multimedia microcomputer, 11 ... CPU, 12 ... Accelerator, 13 ... Bus, 14 ... I / O Dedicated cache, 15 ... Memory controller, 21 ... Program, 22 ... Work area, 23 ... Data area, 100 ... Multimedia terminal, 110 ... Cache, 130 ... LAN, 141 ... Register, 142 ... Judgment circuit, 143 ... Cache, 151 ... access control circuit, 152 ... refresh control circuit, 153 ... read access request FIFO, 154 ... write access request FIFO, 155 ... memory access control circuit.

Claims

A CPU acting as a master;
An accelerator that operates as a slave,
A microcomputer capable of accessing a memory from the CPU and the accelerator,
The data that the CPU and the accelerator access to the memory is composed of first data that the CPU and the accelerator exchange with each other, and second data that excludes the first data,
A microcomputer having a cache means for holding the first data out of the first data and the second data.

The microcomputer according to claim 1.
The microcomputer having a function of determining whether to hold data of the write access request when a write access request from the CPU and the accelerator to the memory is made.

The microcomputer according to claim 2.
The microcomputer has a function of issuing a holding request to the cache unit when performing write access to the memory.

The microcomputer according to claim 3.
The microcomputer has a function of determining whether or not to hold data output from the accelerator in response to a holding request output at the time of write access to the memory from the accelerator. .

The microcomputer according to claim 2.
The cache unit has a function of determining whether to hold the data based on an address output from the CPU and the accelerator during a write access to the memory from the CPU and the accelerator. A microcomputer to do.

The microcomputer according to claim 1.
The cache means outputs the data to the accelerator when the cache means holds the data of the read access request at the time of a read access request from the accelerator to the memory. A microcomputer having a function.

The microcomputer according to claim 1.
A memory controller for controlling access to the memory from the CPU and the accelerator;
Has priority for access requests from the CPU and the accelerator,
The microcomputer having a function of processing an access request from the CPU and the accelerator according to the priority order.

The microcomputer according to claim 7, wherein
The memory is SDRAM or DDR-SDRAM,
The microcomputer having a function of continuously accessing the same bank and the same row address of the memory in response to an access request from the CPU and the accelerator.

The microcomputer according to claim 8, wherein
The memory controller has a function of managing dependency relations and maintaining consistency of access to the memory with respect to accesses to the same address among access requests from the CPU and the accelerator. Computer.

The microcomputer according to claim 1.
The microcomputer having the memory outside the microcomputer.

The microcomputer according to claim 1.
The microcomputer having the memory inside the microcomputer.

The microcomputer according to claim 1.
The microcomputer has a cache therein.

The microcomputer according to claim 12, wherein
The microcomputer is connected to an external memory;
A microcomputer having a program or work area formed in the external memory.

The microcomputer according to claim 13.
The microcomputer according to claim 1, wherein a data area of the accelerator is formed in the external memory.

The microcomputer according to claim 12, wherein
The microcomputer having a snoop function in the cache inside the CPU.