JP2008052518A

JP2008052518A - Cpu system

Info

Publication number: JP2008052518A
Application number: JP2006228362A
Authority: JP
Inventors: Toshiyuki Maekawa; 俊行前川
Original assignee: Digital Electronics Corp
Current assignee: Schneider Electric Japan Holdings Ltd
Priority date: 2006-08-24
Filing date: 2006-08-24
Publication date: 2008-03-06

Abstract

<P>PROBLEM TO BE SOLVED: To reduce the wait time of a CPU during the processing of a branch instruction in a CPU system that uses a high-speed DRAM. <P>SOLUTION: The CPU system 1 operates on condition that the operating speed of the CPU 2 is not more than the operating speed of an SDRAM 3 during a burst read. When the CPU 2 processes a branch instruction, a comparator 7 determines whether or not the instruction of a branch destination is stored in an instruction cache memory 5. If the instruction of the branch destination is stored in the instruction cache memory 5, the instruction is read from the instruction cache memory 5. Thus, when the CPU 2 processes the branch instruction, the need for the operation of reading instructions by accessing discontinuous addresses at the SDRAM 3 is eliminated. When the instruction cache memory 5 is hit as stated above, the need for the process of randomly accessing the SDRAM 3 is eliminated and the CPU 2 does not wait for operation. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、ＤＲＡＭに格納されたプログラムの命令を一旦キャッシュに読み込んでＣＰＵに供給するＣＰＵシステムに関するものである。 The present invention relates to a CPU system that once reads instructions of a program stored in a DRAM into a cache and supplies them to a CPU.

ＣＰＵ（Central Processing Unit）を中心として構成されるマイクロコンピュータなどのＣＰＵシステムにおいては、メモリから読み出したプログラムを順次実行していく。近年、このようなＣＰＵシステムの動作速度を向上させるため、ＣＰＵの動作周波数が年々高められている。 In a CPU system such as a microcomputer mainly composed of a CPU (Central Processing Unit), programs read from the memory are sequentially executed. In recent years, the operating frequency of CPUs has been increased year by year in order to improve the operating speed of such CPU systems.

また、低コスト化を図るため、メモリとして比較的低速なＤＲＡＭ（Dynamic Random Access Memory）を用いた場合、ＣＰＵの動作に対してＤＲＡＭのアクセスが遅いため、ＣＰＵシステムの高速動作を実現できない。このため、従来、ＣＰＵと低速な外部メモリとの間にキャッシュ（Cashe）を設けることにより、ＣＰＵシステムの高速化を達成している（例えば特許文献１参照）。キャッシュは容量の少ない高速なメモリであるため、ＣＰＵが必要とするプログラムやデータをキャッシュに一旦読み込んでおき、ＣＰＵがキャッシュから必要な部分を読み出す。特に、プログラムの命令を保存するキャッシュは命令キャッシュ（I-Cashe）と呼ばれる。 Further, when a relatively low-speed DRAM (Dynamic Random Access Memory) is used as a memory in order to reduce the cost, the access of the DRAM is slow relative to the operation of the CPU, and thus the high-speed operation of the CPU system cannot be realized. For this reason, conventionally, a high-speed CPU system has been achieved by providing a cache between the CPU and a low-speed external memory (see, for example, Patent Document 1). Since the cache is a high-speed memory with a small capacity, the program and data required by the CPU are once read into the cache, and the CPU reads out necessary portions from the cache. In particular, a cache that stores program instructions is called an instruction cache (I-Cashe).

必要な命令が命令キャッシュになければ（ヒットしなければ）、当該命令がＤＲＡＭより命令キャッシュに読み込まれるまでＣＰＵが待機（ウエイト）状態となって高速動作性が損なわれる。このため、命令の連続性や反復性などを利用して頻繁に必要とされる命令を予めＤＲＡＭから読み出して命令キャッシュに読み込んでおくプリフェッチ（Pre-fetch）という手法が用いられている。これにより、ＣＰＵが命令キャッシュに必要な命令を読み出すときにミスヒットすれば、必要な命令がＤＲＡＭからプリフェッチバッファに読み込まれるとともに、当該命令がプリフェッチバッファから命令キャッシュおよびＣＰＵへ読み込まれる。したがって、このような動作によって、ＣＰＵのウエイトを短くすることができる。 If the necessary instruction is not in the instruction cache (if it is not hit), the CPU is in a wait (wait) state until the instruction is read from the DRAM into the instruction cache, and high-speed operation is impaired. For this reason, a method called prefetch (Pre-fetch) in which frequently required instructions are read from the DRAM in advance and read into the instruction cache using the continuity and repeatability of the instructions is used. As a result, if the CPU misses when reading a necessary instruction in the instruction cache, the necessary instruction is read from the DRAM into the prefetch buffer and the instruction is read from the prefetch buffer into the instruction cache and the CPU. Therefore, the weight of the CPU can be shortened by such an operation.

従来の命令キャッシュに対するアクセス方法では、上記のような動作が、命令が分岐（jump,branch,call,return）命令であるか否かに関わらず、ＣＰＵがアクセスする毎に行われる。
特開２００３−１６２４４６号公報（２００３年６月６日公開） In the conventional method for accessing the instruction cache, the above operation is performed every time the CPU accesses regardless of whether the instruction is a branch (jump, branch, call, return) instruction or not.
JP 2003-162446 A (released on June 6, 2003)

ところで、近年、ＤＲＡＭの高速化が進んでおり、ＳＤＲＡＭ（Synchronous DRAM）、ＤＤＲ−ＳＤＲＡＭ（Double Data Rate DRAM）、ＲＤＲＡＭ（Rumbus DRAM）、ＸＤＲＤＲＡＭ（eXtreme Data Rate DRAM ）などの高速ＤＲＡＭが普及してきている。したがって、このような高速ＤＲＡＭのＣＰＵシステムへの利用も進められている。 By the way, in recent years, DRAMs have been accelerated, and high-speed DRAMs such as SDRAM (Synchronous DRAM), DDR-SDRAM (Double Data Rate DRAM), RDRAM (Rumbus DRAM), and XDRDRAM (eXtreme Data Rate DRAM) have become widespread. Yes. Therefore, the use of such a high-speed DRAM in a CPU system is also being promoted.

また、上記のＣＰＵシステムをＡＳＩＣ（Application Specific Integrated Circuit）で作製する場合、フルカスタムＡＳＩＣまたはセミカスタムＡＳＩＣのいずれかを選択することになる。フルカスタムＡＳＩＣで作製する場合は、高集積化が可能であるので、ＣＰＵの動作周波数を１ＧＨｚのように高くすることができる反面、設計コストが嵩む。一方、ゲートアレイやセルベースといったセミカスタムＡＳＩＣで作製する場合は、予め用意されている標準ゲートや機能ブロックを用いることにより、設計コストを抑えることができる反面、集積度がフルカスタムＡＳＩＣより劣るため、ＣＰＵの動作周波数が１００ＭＨｚのように低い。 Further, when the CPU system is manufactured by an application specific integrated circuit (ASIC), either a full custom ASIC or a semi-custom ASIC is selected. In the case of manufacturing with a full custom ASIC, since high integration is possible, the operating frequency of the CPU can be increased to 1 GHz, but the design cost increases. On the other hand, when a semi-custom ASIC such as a gate array or a cell base is used, the design cost can be reduced by using standard gates and function blocks prepared in advance, but the degree of integration is inferior to that of a full custom ASIC. The operating frequency of the CPU is as low as 100 MHz.

このようなセミカスタムＡＳＩＣで作製されたＣＰＵシステムにおいて高速ＤＲＡＭが組み込まれる場合、高速ＤＲＡＭをバーストリード動作させることによってＣＰＵの動作速度がバーストリード動作の速度以下となることがある。この条件下では、プログラムの命令がＤＲＡＭから連続的に順次読み出されるときには、ＤＲＡＭのバーストリード動作によって命令が読み出されて、プリフェッチバッファを介してＣＰＵに読み込まれるので、ウエイトなしでＣＰＵが命令を読み込むことができる。したがって、この場合は、命令キャッシュを必要としない。これに対し、命令が分岐命令である場合、ＤＲＡＭにおいては、不連続なアドレスがアクセスされて命令が読み出されるため、ＣＰＵ内の処理（パイプライン処理）も含めた処理の連続性が崩れる。このとき、ＤＲＡＭに対しては、ＲＡＳ（Row Address Strobe）やＣＡＳ（Column Address Strobe）の処理を行う必要があるので、この間はＣＰＵ動作にウエイトが生じる。 When a high-speed DRAM is incorporated in a CPU system manufactured with such a semi-custom ASIC, the operation speed of the CPU may be lower than the burst read operation speed by performing the burst read operation of the high-speed DRAM. Under these conditions, when program instructions are read sequentially and sequentially from the DRAM, the instructions are read by the burst read operation of the DRAM and read into the CPU via the prefetch buffer. Can be read. Therefore, in this case, no instruction cache is required. On the other hand, when the instruction is a branch instruction, in the DRAM, the discontinuous address is accessed and the instruction is read, so that the continuity of processing including processing in the CPU (pipeline processing) is lost. At this time, since it is necessary to perform RAS (Row Address Strobe) or CAS (Column Address Strobe) processing for the DRAM, the CPU operation waits during this time.

本発明は、上記の問題点に鑑みてなされたものであり、高速ＤＲＡＭを用いたＣＰＵシステムにおいて、分岐命令の処理時におけるＣＰＵのウエイト時間を短縮することを目的としている。 The present invention has been made in view of the above-described problems, and an object of the present invention is to shorten the wait time of a CPU when processing a branch instruction in a CPU system using a high-speed DRAM.

本発明に係るＣＰＵシステムは、プログラムを格納し、バーストリード動作の可能な高速ＤＲＡＭと、前記プログラムにおける分岐命令が指定する分岐先の分岐先命令を少なくとも格納するとともに、当該命令を前記高速ＤＲＡＭにおいて指定するアドレスを格納する命令キャッシュメモリと、当該命令キャッシュメモリから読み出された前記プログラムの命令を処理し、バーストリード動作時の前記高速ＤＲＡＭの動作速度以下の動作速度で動作するＣＰＵと、当該ＣＰＵが前記高速ＤＲＡＭから読み出された前記分岐命令を処理したときに、当該分岐命令が指定する前記分岐先命令が前記命令キャッシュメモリに格納されているか否かを判定する判定手段と、前記分岐先命令が前記命令キャッシュメモリに格納されていると判定されると、前記命令キャッシュメモリから読み出された分岐先命令を前記ＣＰＵに出力する命令出力手段とを備えていることを特徴としている。 The CPU system according to the present invention stores at least a high-speed DRAM capable of storing a program and capable of a burst read operation and a branch destination instruction specified by a branch instruction in the program, and the instruction is stored in the high-speed DRAM. An instruction cache memory that stores an address to be designated, a CPU that processes an instruction of the program read from the instruction cache memory, and that operates at an operation speed lower than the operation speed of the high-speed DRAM during a burst read operation; Determining means for determining whether or not the branch destination instruction specified by the branch instruction is stored in the instruction cache memory when the CPU processes the branch instruction read from the high-speed DRAM; When it is determined that the previous instruction is stored in the instruction cache memory It is characterized by comprising an instruction output means for outputting the branch target instruction read from the instruction cache memory to the CPU.

上記の構成では、ＣＰＵがバーストリード動作時の前記高速ＤＲＡＭの動作速度以下の動作速度で動作する。この動作条件において、高速ＤＲＡＭから読み出された命令（非分岐命令）がＣＰＵにおいて連続的に順次処理されるときには、当該命令がＤＲＡＭのバーストリード動作によって連続的に読み出されてＣＰＵ２に読み込まれる。これにより、ウエイトなしでＣＰＵが命令を読み込むことができる。したがって、この場合は、命令キャッシュメモリから命令を読み出す必要がない。 In the above configuration, the CPU operates at an operation speed lower than the operation speed of the high-speed DRAM during the burst read operation. Under these operating conditions, when instructions (non-branch instructions) read from the high-speed DRAM are successively processed sequentially by the CPU, the instructions are continuously read by the DRAM burst read operation and read into the CPU 2. . As a result, the CPU can read the instruction without waiting. Therefore, in this case, there is no need to read an instruction from the instruction cache memory.

一方、ＣＰＵが分岐命令を処理したとき、判定手段によって、当該分岐命令が指定する前記分岐先命令が前記命令キャッシュメモリに格納されているか否かが判定される。そして、分岐先命令が命令キャッシュメモリに格納されていると判定された場合、命令出力手段によって、命令キャッシュメモリから読み出された分岐先命令がＣＰＵに出力される。これにより、分岐先命令が命令キャッシュメモリに格納されていれば、その分岐先命令をＣＰＵに読み込むことにより、分岐命令を処理するときでも、ＣＰＵの動作にウエイトが生じることはない。 On the other hand, when the CPU processes the branch instruction, the determination unit determines whether or not the branch destination instruction specified by the branch instruction is stored in the instruction cache memory. If it is determined that the branch destination instruction is stored in the instruction cache memory, the instruction output unit outputs the branch destination instruction read from the instruction cache memory to the CPU. As a result, if the branch destination instruction is stored in the instruction cache memory, the CPU does not wait for the operation of the CPU even when the branch instruction is processed by reading the branch destination instruction into the CPU.

前記ＣＰＵシステムにおいて、前記ＣＰＵは前記分岐命令を指定するアドレスを出力し、ＣＰＵシステムは、前記分岐先命令が前記命令キャッシュメモリに格納されていないと判定されると、前記高速ＤＲＡＭから前記アドレスに基づいて前記分岐先命令を読み出す読出制御手段を備え、前記命令キャッシュメモリは、前記高速ＤＲＡＭから読み出された前記分岐先命令を格納することが好ましい。 In the CPU system, the CPU outputs an address designating the branch instruction, and when the CPU system determines that the branch destination instruction is not stored in the instruction cache memory, the CPU reads the address from the high-speed DRAM to the address. It is preferable that a read control unit is provided for reading the branch destination instruction based on the instruction cache memory, and the instruction cache memory stores the branch destination instruction read from the high-speed DRAM.

上記の構成においては、判定手段による判定結果が否である場合、読出制御手段によって、ＣＰＵから出力されるアドレスに基づいて、高速ＤＲＡＭから分岐先命令が読み出され、この分岐先命令が命令キャッシュメモリに格納される。この場合は、ＣＰＵが分岐命令を処理したときに、命令キャッシュメモリから分岐先命令を読み出すことができないので、分岐先命令が高速ＤＲＡＭから命令キャッシュメモリに格納されるまでＣＰＵの動作にウエイトが生じてしまう。しかしながら、比較的繰り返して読み込まれることの多い分岐先命令を一旦命令キャッシュメモリに格納しておけば、以降の分岐命令が当該分岐先命令を指定する場合に、当該分岐先命令を命令キャッシュメモリから直ちに読み出すことができる。 In the above configuration, when the determination result by the determination unit is negative, the read control unit reads the branch destination instruction from the high-speed DRAM based on the address output from the CPU, and the branch destination instruction is read from the instruction cache. Stored in memory. In this case, since the branch destination instruction cannot be read from the instruction cache memory when the CPU processes the branch instruction, the CPU operation waits until the branch destination instruction is stored in the instruction cache memory from the high-speed DRAM. End up. However, once a branch destination instruction that is read relatively repeatedly is stored in the instruction cache memory, when the subsequent branch instruction designates the branch destination instruction, the branch destination instruction is read from the instruction cache memory. It can be read immediately.

本発明に係るＣＰＵシステムは、以上のように、前述の高速ＤＲＡＭと、命令キャッシュメモリと、ＣＰＵと、判定手段と、命令出力手段とを備えているので、ＣＰＵが分岐命令を処理するときに、高速ＤＲＡＭにおいて、不連続なアドレスをアクセスして命令を読み出す動作が不要となる。したがって、ＣＰＵ内の処理も含めたＣＰＵシステム全体の処理の連続性を維持することができるという効果を奏する。 As described above, the CPU system according to the present invention includes the above-described high-speed DRAM, instruction cache memory, CPU, determination means, and instruction output means, so that when the CPU processes a branch instruction, In a high-speed DRAM, an operation of accessing a discontinuous address and reading an instruction becomes unnecessary. Therefore, it is possible to maintain the continuity of the processing of the entire CPU system including the processing in the CPU.

本発明の実施形態について図１ないし図３に基づいて説明すると、以下の通りである。 An embodiment of the present invention will be described with reference to FIGS. 1 to 3 as follows.

図１に示すように、ＣＰＵシステム１は、ＣＰＵ２と、ＳＤＲＡＭ３と、命令キャッシュメモリ５と、プリフェッチバッファ６と、コンパレータ７と、セレクタ８と、８進カウンタ９と、ＲＳフリップフロップ１０と、ＡＮＤゲート１１，１２と、加算器１３とを備えている。このＣＰＵシステム１は、例えばセミカスタムＡＳＩＣによって作製されている。 As shown in FIG. 1, the CPU system 1 includes a CPU 2, an SDRAM 3, an instruction cache memory 5, a prefetch buffer 6, a comparator 7, a selector 8, an octal counter 9, an RS flip-flop 10, an AND Gates 11 and 12 and an adder 13 are provided. The CPU system 1 is manufactured by, for example, a semi-custom ASIC.

なお、図１において各回路間の接続線は、太い実線が複数ビットの信号線を表し、細い実線が１ビットの信号線を表している。 Note that in FIG. 1, among the connection lines between the circuits, a thick solid line represents a multi-bit signal line, and a thin solid line represents a 1-bit signal line.

ＣＰＵ２は、入力されるプログラムの命令（プログラムコード）をパイプライン処理によって並列的に処理していくＣＰＵコア２ａを有している。このＣＰＵ２の動作速度（動作周波数）は、後述するＳＤＲＡＭ３のバーストリード動作時の動作速度（動作周波数）以下である。ＣＰＵコア２ａは、飛び先アドレスJumpAddおよび分岐命令識別信号BranchSigを出力する。飛び先アドレスJumpAddは、分岐命令によって指定された分岐先の命令であり、当該命令がＳＤＲＡＭ３に格納されている領域のアドレスである。また、分岐命令識別信号BranchSigは、ＣＰＵコア２によって処理される命令が分岐命令であるか否かを識別するための信号である。ＣＰＵコア２ａは、その分岐命令を実行したときに飛び先アドレスJumpAddおよび分岐命令識別信号BranchSigを出力する。つまり、この分岐命令識別信号BranchSigは、命令が分岐命令であるときに“１”であり、命令が分岐命令でないときに“０”である。 The CPU 2 has a CPU core 2a that processes input program instructions (program codes) in parallel by pipeline processing. The operation speed (operation frequency) of the CPU 2 is equal to or less than the operation speed (operation frequency) at the time of burst read operation of the SDRAM 3 described later. The CPU core 2a outputs a jump destination address JumpAdd and a branch instruction identification signal BranchSig. The jump destination address JumpAdd is a branch destination instruction specified by the branch instruction, and is an address of an area in which the instruction is stored in the SDRAM 3. The branch instruction identification signal BranchSig is a signal for identifying whether or not the instruction processed by the CPU core 2 is a branch instruction. When executing the branch instruction, the CPU core 2a outputs a jump destination address JumpAdd and a branch instruction identification signal BranchSig. That is, the branch instruction identification signal BranchSig is “1” when the instruction is a branch instruction, and is “0” when the instruction is not a branch instruction.

ＳＤＲＡＭ３は、プログラムおよびデータを格納しており、ＤＲＡＭコントローラ４の制御によって、プログラムおよびデータの読み書きが制御される。ＳＤＲＡＭ３は、高速ＤＲＡＭとしてＣＰＵシステム１に設けられているが、高速ＤＲＡＭとしては、ＳＤＲＡＭ３の代わりに、前述のＤＤＲ−ＳＤＲＡＭ、ＲＤＲＡＭ、ＸＤＲＤＲＡＭなどが設けられていてもよい。 The SDRAM 3 stores programs and data, and reading and writing of the programs and data are controlled by the control of the DRAM controller 4. The SDRAM 3 is provided in the CPU system 1 as a high-speed DRAM. However, as the high-speed DRAM, the above-described DDR-SDRAM, RDRAM, XDRDRAM, or the like may be provided instead of the SDRAM 3.

ＤＲＡＭコントローラ４は、ＳＤＲＡＭ３によるプログラムおよびデータの読み書きを制御する制御回路である。ここでは、ＤＲＡＭコントローラ４のプログラムの読み書きについて説明する。 The DRAM controller 4 is a control circuit that controls reading and writing of programs and data by the SDRAM 3. Here, reading and writing of the program of the DRAM controller 4 will be described.

ＤＲＡＭコントローラ４は、飛び先アドレスJumpAddおよび分岐命令識別信号BranchSigが制御信号として入力される。ＤＲＡＭコントローラ４は、分岐命令識別信号BranchSigが“１”であるときに加算器１３からの飛び先アドレスJumpAddが入力されると、ＳＤＲＡＭ３に対してプログラム（プログラムコード）の読み出しの準備動作を行い、当該飛び先アドレスJumpAddの命令を読み出す。上記の準備動作は、具体的には、アクセス先のメモリセルの行アドレスを指定するためのＲＡＳ信号の出力や、当該メモリセルの列アドレスを指定するためのＣＡＳ信号の出力などである。 The DRAM controller 4 receives the jump address JumpAdd and the branch instruction identification signal BranchSig as control signals. When the jump destination address JumpAdd is input from the adder 13 when the branch instruction identification signal BranchSig is “1”, the DRAM controller 4 performs a preparatory operation for reading the program (program code) to the SDRAM 3. Read the instruction of the jump destination address JumpAdd. Specifically, the above preparation operation includes an output of a RAS signal for designating a row address of an access destination memory cell, an output of a CAS signal for designating a column address of the memory cell, and the like.

命令キャッシュメモリ５は、ＳＤＲＡＭ３よりも高速動作が可能な小容量のキャッシュメモリである。この命令キャッシュメモリ５は、ＳＤＲＡＭ３から読み出された８ステップ分の命令（プログラムコード）と、当該命令のＳＤＲＡＭ３における格納領域のアドレスと、valid/writeビットとを格納する領域をそれぞれ有している。valid/writeビットは、命令キャッシュメモリ５に有効な命令が格納されているときに“１”となり、命令キャッシュメモリ５に有効な命令が格納されていないとき（命令の書き込み可能な状態であるとき）に“０”となる。また、命令キャッシュメモリ５は、分岐命令識別信号BranchSigが“１”であり、かつヒットしなかったときに、ＳＤＲＡＭ３から読み出されてプリフェッチバッファ６に書き込まれた命令と同じ命令が書き込まれる。 The instruction cache memory 5 is a small-capacity cache memory that can operate at a higher speed than the SDRAM 3. The instruction cache memory 5 has an area for storing an instruction (program code) for 8 steps read from the SDRAM 3, an address of a storage area of the instruction in the SDRAM 3, and a valid / write bit. . The valid / write bit is “1” when a valid instruction is stored in the instruction cache memory 5, and when a valid instruction is not stored in the instruction cache memory 5 (when the instruction is writable). ) Becomes “0”. The instruction cache memory 5 is written with the same instruction as the instruction read from the SDRAM 3 and written to the prefetch buffer 6 when the branch instruction identification signal BranchSig is “1” and there is no hit.

プリフェッチバッファ６は、命令キャッシュメモリ５にプリフェッチするためのバッファメモリであり、例えばＦＩＦＯ（First In First Out）メモリによって構成されている。このプリフェッチバッファ６は、分岐命令識別信号BranchSigが“１”であり、かつ命令キャッシュメモリ５がヒットしたときに、ＳＤＲＡＭ３から読み出された命令が書き込まれる。 The prefetch buffer 6 is a buffer memory for prefetching to the instruction cache memory 5, and is constituted by, for example, a FIFO (First In First Out) memory. The prefetch buffer 6 is written with the instruction read from the SDRAM 3 when the branch instruction identification signal BranchSig is “1” and the instruction cache memory 5 is hit.

セレクタ８は、ヒット判別信号Ｈｉｔが“０”であるときに、入力端子Ａから入力されるプリフェッチバッファ６から読み出された命令を出力端子ＹからＣＰＵ２へ出力する。また、セレクタ８は、ヒット判別信号Ｈｉｔが“１”であるときに、入力端子Ｂから入力される命令キャッシュメモリ５から読み出された命令を出力端子ＹからＣＰＵ２へ出力する。ヒット判別信号Ｈｉｔは、命令キャッシュメモリ５がヒットしたか否かを示す信号であり、ヒットしたときに“１”となり、ヒットしなかったときに“０”となる。 The selector 8 outputs the instruction read from the prefetch buffer 6 input from the input terminal A to the CPU 2 from the output terminal Y when the hit determination signal Hit is “0”. The selector 8 outputs the instruction read from the instruction cache memory 5 input from the input terminal B to the CPU 2 from the output terminal Y when the hit determination signal Hit is “1”. The hit determination signal Hit is a signal indicating whether or not the instruction cache memory 5 has been hit, and becomes “1” when hit and becomes “0” when it does not hit.

コンパレータ７は、ＣＰＵコア２ａから出力される前述の飛び先アドレスJumpAddを命令キャッシュメモリ５に格納されている全てのアドレスと比較するとともに、ＣＰＵコア２ａから出力される前述の分岐命令識別信号BranchSigを命令キャッシュメモリ５に格納されているvalid/writeビットと比較する。コンパレータ７は、その比較の結果として、両アドレスが一致し、かつ分岐命令識別信号BranchSigとvalid/writeビットとがともに“１”であるときに、“１”のヒット判別信号Ｈｉｔを出力し、それ以外の場合は“０”のヒット判別信号Ｈｉｔを出力する。 The comparator 7 compares the above jump destination address JumpAdd output from the CPU core 2a with all the addresses stored in the instruction cache memory 5, and uses the branch instruction identification signal BranchSig output from the CPU core 2a. The valid / write bit stored in the instruction cache memory 5 is compared. As a result of the comparison, the comparator 7 outputs a hit determination signal Hit of “1” when both addresses match and the branch instruction identification signal BranchSig and the valid / write bit are both “1”. In other cases, a hit discrimination signal Hit of “0” is output.

８進カウンタ９は、命令キャッシュメモリ５に命令が８ステップ分書き込まれたことを検知するために設けられている。この８進カウンタ９は、命令キャッシュメモリ５から命令が１ステップ書き込まれる毎に出力されるパルスをカウントし、パルスを８個カウントすると、カウントアップ信号（“１”）を出力するとともに自身をリセットする。 The octal counter 9 is provided to detect that instructions are written in the instruction cache memory 5 for 8 steps. The octal counter 9 counts a pulse output every time an instruction is written from the instruction cache memory 5 and outputs a count-up signal ("1") and resets itself when counting eight pulses. To do.

ＲＳフリップフロップ１０は、ＣＰＵコア２ａから出力される分岐命令識別信号BranchSigが“０”から“１”に変化するときにセットして、出力端子Ｑの値を“０”から“１”に変化させる。また、ＲＳフリップフロップ１０は、８進カウンタ９から出力されるカウントアップ信号が“０”から“１”に変化するときにリセットして、出力端子Ｑの値を“１”から“０”に変化させる。これにより、命令キャッシュメモリ５がヒットしたときに、命令が読み出されて命令キャッシュメモリ５における８ステップ分の命令が全て書き替えられるまでに分岐命令識別信号BranchSigの値を保持するとともに、命令の書き替えが終わると、次の新たな分岐命令識別信号BranchSigが出力されるまで、コンパレータ７への入力を“０”にする。 The RS flip-flop 10 is set when the branch instruction identification signal BranchSig output from the CPU core 2a changes from “0” to “1”, and changes the value of the output terminal Q from “0” to “1”. Let The RS flip-flop 10 is reset when the count-up signal output from the octal counter 9 changes from “0” to “1”, and the value of the output terminal Q is changed from “1” to “0”. Change. As a result, when the instruction cache memory 5 is hit, the value of the branch instruction identification signal BranchSig is held until the instruction is read and all the instructions for eight steps in the instruction cache memory 5 are rewritten, and the instruction When the rewriting is completed, the input to the comparator 7 is set to “0” until the next new branch instruction identification signal BranchSig is output.

ＡＮＤゲート１１は、ヒット判別信号Ｈｉｔの反転信号と、ＲＳフリップフロップ１０からの出力との論理積を出力する。この論理積は、ＣＰＵコア２ａにウエイト信号waitとして入力される。ＣＰＵコア２ａは、“１”のウエイト信号waitが入力されると、待機（ウエイト）状態となり、パイプライン処理を停止する一方、“０”のウエイト信号waitが入力されると、動作してパイプライン処理を行う。 The AND gate 11 outputs a logical product of the inverted signal of the hit determination signal Hit and the output from the RS flip-flop 10. This logical product is input to the CPU core 2a as a wait signal wait. When a wait signal wait of “1” is input, the CPU core 2a enters a wait (wait) state and stops pipeline processing. On the other hand, when a wait signal wait of “0” is input, the CPU core 2a operates. Perform line processing.

ＡＮＤゲート１２は、コンパレータ７からのヒット判別信号Ｈｉｔと「８」を示すデータとの論理積を出力する。これにより、ヒット判別信号Ｈｉｔが“１”であるとき（ヒットしたとき）、入力データがそのまま出力される一方、ヒット判別信号Ｈｉｔが“０”であるとき（ヒットしなかったとき）、入力データが出力されない。 The AND gate 12 outputs a logical product of the hit determination signal Hit from the comparator 7 and data indicating “8”. Thus, when the hit determination signal Hit is “1” (when hit), the input data is output as it is, while when the hit determination signal Hit is “0” (when no hit occurs), the input data Is not output.

加算器１３は、飛び先アドレスJumpAddとＡＮＤゲート１２からの出力データとを加算してＤＲＡＭコントローラ４に出力する。 The adder 13 adds the jump destination address JumpAdd and the output data from the AND gate 12 and outputs the result to the DRAM controller 4.

続いて、上記のように構成されるＣＰＵシステム１の動作について説明する。 Next, the operation of the CPU system 1 configured as described above will be described.

まず、ＣＰＵコア２ａが分岐命令以外の演算命令を処理するとき、バーストリード動作によってＳＤＲＡＭ３より読み出された命令は、一旦、プリフェッチバッファ６に蓄えられてから、セレクタ８を介してＣＰＵ２に与えられる。 First, when the CPU core 2a processes an arithmetic instruction other than a branch instruction, the instruction read from the SDRAM 3 by the burst read operation is temporarily stored in the prefetch buffer 6 and then given to the CPU 2 via the selector 8. .

ＳＤＲＡＭ３から読み出された命令（非分岐命令）がＣＰＵ２において連続的に順次処理されるときには、当該命令がＳＤＲＡＭ３のバーストリード動作によって連続的に読み出されて、プリフェッチバッファ６を介してＣＰＵ２に読み込まれる。これにより、ウエイトなしでＣＰＵ２が命令を読み込むことができる。したがって、この場合は、命令キャッシュメモリ５から命令を読み出す必要がない。 When instructions (non-branch instructions) read from the SDRAM 3 are sequentially processed sequentially by the CPU 2, the instructions are continuously read by the burst read operation of the SDRAM 3 and read into the CPU 2 via the prefetch buffer 6. It is. As a result, the CPU 2 can read an instruction without waiting. Therefore, in this case, there is no need to read an instruction from the instruction cache memory 5.

このとき、ＲＳフリップフロップ１０がリセットされている状態であるので、コンパレータ７においては、ＲＳフリップフロップ１０の出力“０”とのvalid/writeビット（“１”）とが一致しないと判断され、また、ＣＰＵコア２ａから飛び先アドレスJumpAddも出力されないことから、コンパレータ７から出力されるヒット判別信号Ｈｉｔは“０”となる。 At this time, since the RS flip-flop 10 is in a reset state, the comparator 7 determines that the valid / write bit (“1”) of the output “0” of the RS flip-flop 10 does not match, Since the jump address JumpAdd is not output from the CPU core 2a, the hit determination signal Hit output from the comparator 7 is "0".

これにより、セレクタ８は、プリフェッチバッファ６とＣＰＵ２とを接続する。また、Ｓフリップフロップ１０の出力が“０”であるので、ウエイト信号waitも“０”となる。また、このとき、ＣＰＵコア２ａから出力される分岐命令識別信号BranchSigは“０”である。 Thereby, the selector 8 connects the prefetch buffer 6 and the CPU 2. Since the output of the S flip-flop 10 is “0”, the wait signal wait is also “0”. At this time, the branch instruction identification signal BranchSig output from the CPU core 2a is "0".

図２に示すように、ＣＰＵコア２ａが、プリフェッチバッファ６から得た分岐命令ＢＲ５０を処理するとき、分岐命令識別信号BranchSigが“１”となるので、ＲＳフリップフロップ１０がセットされることにより“１”を出力する。また、飛び先アドレスJumpAddがＣＰＵコア２ａから出力される。また、このとき、ＣＰＵコア２ａは、分岐命令ＢＲ５０に続く命令の処理を中断して、パイプライン処理を停止する。 As shown in FIG. 2, when the CPU core 2a processes the branch instruction BR50 obtained from the prefetch buffer 6, since the branch instruction identification signal BranchSig becomes “1”, the RS flip-flop 10 is set to “1”. 1 "is output. Further, the jump address JumpAdd is output from the CPU core 2a. At this time, the CPU core 2a interrupts the processing of the instruction following the branch instruction BR50 and stops the pipeline processing.

パイプライン停止までに至る動作は、具体的には図３に示すように進行する。まず、ステージ１において時間Ｔ１で分岐命令ＢＲ５０が読み込まれ（フェッチされ）、続くステージ２では、時間Ｔ１でアドレス計算命令Ａｄｄが読み込まれ、時間Ｔ２で分岐命令ＢＲ５０がデコードされる。さらに、ステージ３では、時間Ｔ１でロード命令ＬＤが読み込まれ、時間Ｔ２でアドレス計算命令Ａｄｄがデコードされ、時間Ｔ３で分岐命令ＢＲ５０が実行される。これにより、ＣＰＵコア２ａは、パイプライン処理を停止するとともに、分岐命令識別信号BranchSigを“０”から“１”に変化させる。 Specifically, the operation up to the pipeline stop proceeds as shown in FIG. First, in stage 1, branch instruction BR50 is read (fetched) at time T1, and in subsequent stage 2, address calculation instruction Add is read at time T1, and branch instruction BR50 is decoded at time T2. Further, in stage 3, the load instruction LD is read at time T1, the address calculation instruction Add is decoded at time T2, and the branch instruction BR50 is executed at time T3. Thus, the CPU core 2a stops the pipeline processing and changes the branch instruction identification signal BranchSig from “0” to “1”.

このように、パイプライン処理により、分岐命令ＢＲ５０がＣＰＵコア２ａに読み込まれてから、分岐命令識別信号BranchSigの値が上記のように変わるまで時間のずれが生じる。また、条件付きの分岐命令もあり、このような分岐命令では条件によってはジャンプしない場合もある。 As described above, a time lag occurs after the branch instruction BR50 is read into the CPU core 2a by pipeline processing until the value of the branch instruction identification signal BranchSig changes as described above. There are also conditional branch instructions, and such branch instructions may not jump depending on the condition.

一方、コンパレータ７は、飛び先アドレスJumpAddと命令キャッシュメモリ５に登録されているアドレスとを比較するとともに、ＲＳフリップフロップ１０の出力とvalid/writeビットとを比較する。ＲＳフリップフロップ１０の出力とvalid/writeビットとがともに“１”であるとき、飛び先アドレスJumpAddと同じアドレスが命令キャッシュメモリ５に登録されていれば、図２に示すように、コンパレータ７から出力されるヒット判別信号Ｈｉｔは“１”となり、ヒットする。 On the other hand, the comparator 7 compares the jump destination address JumpAdd with the address registered in the instruction cache memory 5 and compares the output of the RS flip-flop 10 with the valid / write bit. When the output of the RS flip-flop 10 and the valid / write bit are both “1”, if the same address as the jump destination address JumpAdd is registered in the instruction cache memory 5, as shown in FIG. The hit determination signal Hit output is “1” and hits.

これにより、セレクタ８が命令キャッシュメモリ５とＣＰＵ２とを接続する。したがって、命令キャッシュメモリ５から読み出された飛び先アドレスJumpAddにより指定される分岐先の命令ＳＵＢとそれに続く７ステップ分の命令が順次読み出され、ＣＰＵ２に与えられる。 Thereby, the selector 8 connects the instruction cache memory 5 and the CPU 2. Therefore, the branch destination instruction SUB designated by the jump destination address JumpAdd read from the instruction cache memory 5 and the instructions for the subsequent 7 steps are sequentially read and given to the CPU 2.

命令キャッシュメモリ５から８ステップ分の命令が読み出されると、それに続く新たな命令が必要となる。このため、上記のように命令が命令キャッシュメモリ５から読み出されているときに、加算器１３によって、飛び先アドレスJumpAddと、ＡＮＤゲート１２を介して出力された「８」のデータとが加算され、その結果のアドレス（飛び先アドレスJumpAddから８個後のアドレス）がＤＲＡＭコントローラ４に与えられる。ＤＲＡＭコントローラ４は、そのアドレスに基づいて、ＳＤＲＡＭ３から命令を順次読み出していく。読み出された命令は、プリフェッチバッファ６に一旦蓄えられて、ＣＰＵ２への読み出しのために待機する。 When an instruction for 8 steps is read from the instruction cache memory 5, a new instruction following it is required. For this reason, when the instruction is read from the instruction cache memory 5 as described above, the adder 13 adds the jump destination address JumpAdd and the data “8” output via the AND gate 12. The resulting address (eight addresses after the jump address JumpAdd) is given to the DRAM controller 4. The DRAM controller 4 sequentially reads instructions from the SDRAM 3 based on the address. The read instruction is temporarily stored in the prefetch buffer 6 and waits for reading to the CPU 2.

このとき、すなわちヒット判別信号Ｈｉｔが“１”であるとき（ヒットしたとき）は、後述するヒットしなかったときのようにヒットさせるための分岐先の命令を命令キャッシュメモリ５に読み込む必要がない。したがって、命令キャッシュメモリ５には、ＳＤＲＡＭ３から読み出された命令が書き込まれない。 At this time, that is, when the hit determination signal Hit is “1” (when hit), it is not necessary to read into the instruction cache memory 5 a branch destination instruction for hitting, as in the case of no hit as will be described later. . Therefore, the instruction read from the SDRAM 3 is not written in the instruction cache memory 5.

前述のコンパレータ７の比較において、飛び先アドレスJumpAddと同じアドレスが命令キャッシュメモリ５に登録されていなければ、コンパレータ７から出力されるヒット判別信号Ｈｉｔは“０”となり、ヒットしない。このとき、ＡＮＤゲート１１から出力されるウエイト信号waitが“１”となるので、ＣＰＵ２がウエイト状態となる。また、飛び先アドレスJumpAddは、命令キャッシュメモリ５に格納されて登録されるとともに、そのまま加算器１３を介してＤＲＡＭコントローラ４に出力される。 In the comparison of the comparator 7 described above, if the same address as the jump destination address JumpAdd is not registered in the instruction cache memory 5, the hit determination signal Hit output from the comparator 7 is “0” and does not hit. At this time, since the wait signal wait output from the AND gate 11 is “1”, the CPU 2 enters a wait state. The jump address JumpAdd is stored and registered in the instruction cache memory 5 and is output to the DRAM controller 4 via the adder 13 as it is.

ＤＲＡＭコントローラ４は、その飛び先アドレスJumpAddに基づいて、ＳＤＲＡＭ３から分岐先の命令とそれに続く命令とを順次読み出していく。読み出された命令は、プリフェッチバッファ６に一旦蓄えられて、ＣＰＵ２への読み出しのために待機する。また、このとき、ヒット判別信号Ｈｉｔが“０”であるので、ヒットさせるための分岐先の命令を命令キャッシュメモリ５に読み込む必要がある。したがって、命令キャッシュメモリ５にも、ＳＤＲＡＭ３から読み出された命令が書き込まれる。 Based on the jump destination address JumpAdd, the DRAM controller 4 sequentially reads the branch destination instruction and the subsequent instruction from the SDRAM 3. The read instruction is temporarily stored in the prefetch buffer 6 and waits for reading to the CPU 2. At this time, since the hit determination signal Hit is “0”, it is necessary to read into the instruction cache memory 5 a branch destination instruction to be hit. Therefore, the instruction read from the SDRAM 3 is also written in the instruction cache memory 5.

この結果、コンパレータ７による比較で、飛び先アドレスJumpAddと命令キャッシュメモリ５に登録されているアドレスとが一致するので、ヒット判別信号Ｈｉｔが“１”となる。したがって、この場合は、命令キャッシュメモリ５から、新たに書き込まれた分岐先の命令とそれに続く命令が読み出され、セレクタ８を介してＣＰＵ２に与えられる。このとき、ウエイト信号waitが“１”となっていることから、ＣＰＵ２は、入力された命令を処理していく。 As a result, in the comparison by the comparator 7, the jump destination address JumpAdd and the address registered in the instruction cache memory 5 match, so the hit determination signal Hit becomes “1”. Therefore, in this case, the newly written branch destination instruction and the subsequent instruction are read from the instruction cache memory 5 and are given to the CPU 2 via the selector 8. At this time, since the wait signal wait is “1”, the CPU 2 processes the input instruction.

このように、本実施の形態のＣＰＵシステム１は、ＣＰＵ２の動作速度がＳＤＲＡＭ３のバーストリード時の動作速度以下という条件で動作し、ＣＰＵ２が分岐命令を処理するときに、命令キャッシュメモリ５に分岐先の命令が格納されていれば、当該命令を命令キャッシュメモリ５から読み出す。これにより、ＣＰＵ２が分岐命令を処理するときに、ＳＤＲＡＭ３において、不連続なアドレスをアクセスして命令を読み出す動作が不要となるので、ＣＰＵ２内の処理（パイプライン処理）も含めた処理の連続性を維持することができる。また、上記のように命令キャッシュメモリ５がヒットしているとき、ＳＤＲＡＭ３に対しては、ＲＡＳやＣＡＳの処理を行う必要がなく、ＣＰＵ２の動作にウエイトが生じない。 As described above, the CPU system 1 according to the present embodiment operates under the condition that the operation speed of the CPU 2 is equal to or lower than the operation speed at the time of burst read of the SDRAM 3, and the CPU 2 branches to the instruction cache memory 5 when processing the branch instruction. If the previous instruction is stored, the instruction is read from the instruction cache memory 5. This eliminates the need for the SDRAM 3 to access discontinuous addresses and read the instructions when the CPU 2 processes a branch instruction. Therefore, the continuity of processing including processing in the CPU 2 (pipeline processing) is eliminated. Can be maintained. Further, when the instruction cache memory 5 is hit as described above, the SDRAM 3 does not need to be subjected to RAS or CAS processing, and the operation of the CPU 2 is not waited.

また、ＣＰＵシステム１においては、命令キャッシュメモリ５が分岐先の命令とそれに続く命令の８ステップ分の命令を格納するので、程度まとまったメモリブロック（１２８バイト単位）を格納する一般の命令キャッシュメモリのように大きい容量を必要としない。 Further, in the CPU system 1, since the instruction cache memory 5 stores instructions for 8 steps of the branch destination instruction and the subsequent instruction, a general instruction cache memory for storing a memory block (128-byte unit) in a certain degree. Does not require a large capacity.

本発明は、上述した実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能である。すなわち、請求項に示した範囲で適宜変更した技術的手段を組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。 The present invention is not limited to the above-described embodiments, and various modifications can be made within the scope shown in the claims. That is, embodiments obtained by combining technical means appropriately modified within the scope of the claims are also included in the technical scope of the present invention.

本発明のＣＰＵシステムは、ＣＰＵが分岐命令を処理するときに、命令キャッシュメモリに分岐先の命令が格納されていれば、当該命令を命令キャッシュメモリから読み出すことにより、処理の連続性をより高めることができるので、セミカスタムＡＳＩＣによって作製されるマイクロコンピュータなどのシステムに好適に利用できる。 In the CPU system of the present invention, when the CPU processes a branch instruction, if the instruction at the branch destination is stored in the instruction cache memory, the instruction is read from the instruction cache memory, thereby further improving the continuity of processing. Therefore, it can be suitably used for a system such as a microcomputer manufactured by a semi-custom ASIC.

本発明の実施形態に係るＣＰＵシステムの構成を示すブロック図である。1 is a block diagram illustrating a configuration of a CPU system according to an embodiment of the present invention. 上記ＣＰＵシステムの動作を示すタイミングチャートである。It is a timing chart which shows operation | movement of the said CPU system. 上記ＣＰＵシステムにおける分岐命令を処理するときのＣＰＵのパイプライン動作を示す図である。It is a figure which shows the pipeline operation | movement of CPU when processing the branch instruction in the said CPU system.

Explanation of symbols

１ＣＰＵシステム
２ＣＰＵ
２ａＣＰＵコア
３ＳＤＲＡＭ（高速ＤＲＡＭ）
４ＤＲＡＭコントローラ
５命令キャッシュメモリ
６プリフェッチバッファ
７コンパレータ
８セレクタ
９８進カウンタ
１０ＲＳフリップフロップ
１１，１２ＡＮＤゲート
１３加算器 1 CPU system 2 CPU
2a CPU core 3 SDRAM (high-speed DRAM)
4 DRAM controller 5 Instruction cache memory 6 Prefetch buffer 7 Comparator 8 Selector 9 Octal counter 10 RS flip-flops 11 and 12 AND gate 13 Adder

Claims

A high-speed DRAM that stores a program and is capable of burst read operation;
An instruction cache memory that stores at least a branch destination instruction specified by a branch instruction in the program and stores an address that specifies the instruction in the high-speed DRAM;
A CPU that processes an instruction of the program read from the instruction cache memory and operates at an operation speed equal to or lower than an operation speed of the high-speed DRAM during a burst read operation;
Determining means for determining whether or not the branch destination instruction designated by the branch instruction is stored in the instruction cache memory when the CPU processes the branch instruction read from the high-speed DRAM;
When it is determined that the branch destination instruction is stored in the instruction cache memory, an instruction output unit that outputs the branch destination instruction read from the instruction cache memory to the CPU is provided. CPU system.

The CPU outputs an address designating the branch instruction;
The CPU system includes a read control unit that reads the branch destination instruction from the high-speed DRAM based on the address when it is determined that the branch destination instruction is not stored in the instruction cache memory.
The CPU system according to claim 1, wherein the instruction cache memory stores the branch destination instruction read from the high-speed DRAM.