JP5480793B2

JP5480793B2 - Programmable controller

Info

Publication number: JP5480793B2
Application number: JP2010275719A
Authority: JP
Inventors: 哲明中三川; 正上脇; 山田　　勉; 雅裕白石; 辰幸大谷
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2010-12-10
Filing date: 2010-12-10
Publication date: 2014-04-23
Anticipated expiration: 2030-12-10
Also published as: CN103262029B; CN103262029A; JP2012123719A; WO2012077516A1

Description

本発明は、鉄鋼、電力、上下水道等のプラントや各種機械に対してシーケンス制御を高速に行うビット演算プロセッサを搭載したプログラマブルコントローラに関する。 The present invention relates to a programmable controller equipped with a bit arithmetic processor that performs high-speed sequence control on plants and various machines such as steel, electric power, and water and sewage.

プログラマブルコントローラにおいては、従来からシーケンス制御を効率よく記述できるラダー言語が使われている。ラダー言語では、スイッチの開閉状態等、１ビットで表わされる情報を入力とし、リレー（継電器）出力等、１ビットで表わされる情報を出力とする場合が多い。そのため、プログラマブルコントローラは、ラダー言語特有の１ビットデータ処理を高速に行うため、専用のビット演算プロセッサを搭載することが多い。 In a programmable controller, a ladder language that can efficiently describe sequence control has been used. In the ladder language, information represented by 1 bit such as an open / closed state of a switch is input, and information represented by 1 bit such as a relay (relay) output is often output. For this reason, the programmable controller is often equipped with a dedicated bit arithmetic processor in order to perform 1-bit data processing unique to the ladder language at high speed.

ビット演算プロセッサは、１ビットのデータを扱うのに適した専用の命令セットをサポートしているが、演算結果を格納する先は汎用のメモリ素子であるため、８ビットや１６ビットといったメモリ素子のアクセス単位サイズに合わせなければならない。汎用のメモリ素子に対して１ビットのデータを書き込むには、リードモディファイライト動作、即ち、メモリをアクセス単位サイズでリードし、リードしたデータの一部を変更し、再びアクセス単位サイズでライトする動作、が必要になるため、ラダー言語処理では汎用言語処理よりもメモリアクセス回数が多くなる特徴がある。 The bit arithmetic processor supports a dedicated instruction set suitable for handling 1-bit data. However, since the operation result is stored in a general-purpose memory element, a memory element such as 8-bit or 16-bit is used. Must match the access unit size. To write 1-bit data to a general-purpose memory device, a read-modify-write operation, that is, an operation of reading the memory with the access unit size, changing a part of the read data, and writing again with the access unit size Therefore, the ladder language processing has a feature that the number of memory accesses is larger than that of the general-purpose language processing.

ビット演算プロセッサの高速化に関連する従来技術として、特許文献１には、過去複数回分のワード単位のメモリアクセスアドレスとデータを高速アクセスが可能なバッファに記憶しておき、アクセスしようとするビットを含むワードデータのアドレスがバッファに記憶されているアドレスと一致する場合には、メモリでなくバッファ上のデータを使用することによりメモリアクセス回数を減らす工夫が記載されている。 As a prior art related to the speeding up of a bit arithmetic processor, Patent Document 1 discloses that a memory access address and data in units of words for the past multiple times are stored in a buffer that can be accessed at high speed, and a bit to be accessed is stored. When the address of the word data to be included coincides with the address stored in the buffer, a device for reducing the number of memory accesses by using data on the buffer instead of the memory is described.

一方、汎用コンピュータの分野では、メモリアクセスの高速化のために従来からキャッシュメモリが使用されてきたが、近年の半導体微細化に伴い、キャッシュメモリ用内部ＳＲＡＭ（Static Random Access Memory）のソフトエラー対策が必要になり、キャッシュメモリにＥＣＣ（Error Check and Correct：誤り検出・訂正符号）を付加して保護する場合も多くなってきている。ＥＣＣの付加単位は４バイト程度で設計されることが多いため、キャッシュメモリへの書き込み単位も４バイト以上になってしまう。汎用コンピュータの命令でも１バイトや２バイトの書き込みがサポートされるが、このようなＥＣＣ付加単位以下のサイズの書き込みを行おうとするとキャッシュメモリに対してリードモディファイライトを行うことになる。 On the other hand, in the field of general-purpose computers, cache memory has been used for speeding up memory access. However, with recent miniaturization of semiconductors, measures against soft errors in cache memory internal SRAM (Static Random Access Memory) In many cases, the cache memory is protected by adding ECC (Error Check and Correct) to the cache memory. Since the ECC addition unit is often designed to be about 4 bytes, the unit of writing to the cache memory is also 4 bytes or more. Even general-purpose computer instructions support writing of 1 byte or 2 bytes. However, if a write of a size smaller than the ECC additional unit is to be performed, a read-modify-write is performed on the cache memory.

単純な構成のキャッシュメモリでリードモディファイライトを行う場合、キャッシュメモリが２サイクル以上占有され、その間プロセッサのパイプライン処理が中断されてしまう。リードモディファイライトがあってもパイプライン処理を中断させないための工夫としては、複数段のストアバッファを設け、ストア処理を置き去りにすることが行われている。特許文献２には、ＥＣＣを付加したキャッシュメモリのストアバッファの構成が記載されている。なお、ストアバッファによるストア処理の置き去りは、ロード・ストア命令以外を実行する際のキャッシュメモリの空き時間を利用してストア処理を行うものであるため、パイプライン処理の中断を確定的に無くすものではなく、中断の確率をできるだけ減らす工夫である。 When read-modify-write is performed with a cache memory having a simple configuration, the cache memory is occupied for two or more cycles, and the pipeline processing of the processor is interrupted during that time. As a device for preventing the pipeline processing from being interrupted even if there is a read-modify-write, a store buffer having a plurality of stages is provided and the store processing is left behind. Patent Document 2 describes a configuration of a store buffer of a cache memory to which ECC is added. Since store processing by the store buffer is performed using the free time of the cache memory when executing instructions other than load / store instructions, the interruption of pipeline processing is definitely eliminated. Rather, it is a device that reduces the probability of interruption as much as possible.

また、特許文献３には、キャッシュメモリを多ポート化し、オペランドフェッチ（リード）のパイプラインステージとオペランドライトのパイプラインステージを分けることにより、メモリ間データ転送を行う命令を効率よく実行するための工夫が記載されているが、キャッシュミス時の動作やオペランドフェッチとライトのアドレス競合時の動作については記載されていないし、リードモディファイライトの処理にも全く触れられていない。 Japanese Patent Laid-Open No. 2004-228561 has a multi-port cache memory and separates the operand fetch (read) pipeline stage and the operand write pipeline stage, thereby efficiently executing instructions for data transfer between memories. Although the device is described, the operation at the time of cache miss and the operation at the time of address conflict between operand fetch and write are not described, and the read modify write processing is not mentioned at all.

特開平１１−３９１６０号公報Japanese Patent Laid-Open No. 11-39160 国際公開第２００７／０８８５９７号International Publication No. 2007/085597 特開平４−４０５２４号公報Japanese Patent Laid-Open No. 4-40524

ラダー言語処理用のビット演算プロセッサにおいても高速化のためにパイプライン処理やキャッシュメモリを採用することが考えられるが、前述のようにラダー言語処理ではリードモディファイライトが多発するため、リードモディファイライトによるパイプライン処理のストール（中断）を防ぐ必要がある。特許文献２のようにアドレス比較器の付いた多段のストアバッファを使用すれば、パイプライン処理が中断する確率を減らすことができるが、ラダー言語で記述されるプログラムは汎用言語に比べてビット単位でのストア命令の比率が高い傾向があり、ストアバッファの段数をより大きくしなければならず、実現に必要な回路規模が大きくなってしまう。 It is conceivable to adopt pipeline processing and cache memory for speedup even in the bit arithmetic processor for ladder language processing. However, as described above, read modify write occurs frequently in ladder language processing. It is necessary to prevent stalling (interruption) of pipeline processing. If a multi-stage store buffer with an address comparator as in Patent Document 2 is used, the probability that the pipeline processing is interrupted can be reduced. However, the program written in the ladder language is bit-wise compared to the general-purpose language. The ratio of store instructions tends to be high, and the number of stages of the store buffer must be increased, resulting in an increase in the circuit scale necessary for implementation.

一方、現在のほとんどの汎用プロセッサの命令セットは、ＲＩＳＣ（Reduced Instruction Set Computer：縮小命令セットコンピュータ）の思想が取り入れられているため、演算処理とロード・ストア処理とはそれぞれ異なる命令で実行されるようになっている。即ち、演算処理は演算命令によって汎用レジスタ間で行い、メモリとレジスタ間のロード・ストアは別の命令で行うようになっている。このため、汎用プロセッサ用のパイプラインは、レジスタ間演算に適した構成になっているが、ラダー言語処理はメモリとアキュムレータ間の演算が中心であり、汎用プロセッサ用のパイプライン構成は必ずしも処理効率のよいものではない。 On the other hand, most current general-purpose processors have the instruction set of RISC (Reduced Instruction Set Computer), so that arithmetic processing and load / store processing are executed with different instructions. It is like that. In other words, arithmetic processing is performed between general-purpose registers by arithmetic instructions, and load / store between the memory and the registers is performed by different instructions. For this reason, the pipeline for general-purpose processors has a configuration suitable for register-to-register operations, but ladder language processing is centered on operations between memory and accumulators, and the pipeline configuration for general-purpose processors is not necessarily processing efficiency. Is not good.

従来技術によるパイプライン及びメモリの構成によってラダー言語処理を行った場合の課題を具体例によって示す。図１４に典型的なＲＩＳＣプロセッサのパイプライン構成を示す。ＰＣ（Program Counter）は命令アドレスを計算するステージ、ＩＦ（Instruction Fetch）は命令をフェッチするステージ、Ｄ（Decode）はフェッチした命令をデコードするステージ、ＥＸ（Execute）はレジスタ間演算またはメモリアドレス計算を行うステージ、Ｍ（Memory）はメモリをリード又はメモリにライトするステージ、ＷＢ（Write Back）は演算結果又はリード値をレジスタに書き戻すステージである。ビットデータ演算に伴うリードモディファイライトを行う場合には、ＭステージをＭ１とＭ２の２つに分け、Ｍ１サイクルでリードを、Ｍ２サイクルでビットデータのマージとライトを行う。 Specific problems will be described in the case where ladder language processing is performed by a pipeline and memory configuration according to the prior art. FIG. 14 shows a typical RISC processor pipeline configuration. PC (Program Counter) is a stage for calculating an instruction address, IF (Instruction Fetch) is a stage for fetching an instruction, D (Decode) is a stage for decoding a fetched instruction, and EX (Execute) is an operation between registers or memory address calculation. , M (Memory) is a stage where the memory is read or written to the memory, and WB (Write Back) is a stage where the calculation result or read value is written back to the register. When performing read modify write accompanying bit data calculation, the M stage is divided into two, M1 and M2, and read is performed in the M1 cycle, and bit data is merged and written in the M2 cycle.

ここで、図７（ａ）に例示したラダー図を前記従来技術のパイプライン構成で実行する場合を考える。図７（ｂ）は、図７（ａ）に例示したラダー図を通常のラダー言語の命令列に変換したプログラムの例である。また、図７（ｃ）は、それを前記従来技術のパイプライン構成で実行するための命令列に変換したプログラムの例である。 Here, consider a case where the ladder diagram illustrated in FIG. 7A is executed with the pipeline structure of the above-described prior art. FIG. 7B is an example of a program obtained by converting the ladder diagram illustrated in FIG. 7A into a normal ladder language instruction sequence. FIG. 7C shows an example of a program in which it is converted into an instruction sequence for execution in the conventional pipeline configuration.

変数Ｘ０〜Ｘ３は入力、変数Ｙ２〜Ｙ３は出力を表し、それぞれ１ビット長のデータとしてメモリ内の２つのワードデータ内に記憶されているものとする。図８はこれらの変数がメモリに割り付けられて格納される様子を示している。図７（ｂ）において、ＬＤ（Load）はメモリからアキュムレータへのロード命令、ＡＮＤはメモリからロードした値とアキュムレータ値の論理積をアキュムレータに記憶する命令、ＳＴ（Store）はアキュムレータ値をメモリにストアする命令、ＯＲはメモリからロードした値とアキュムレータ値の論理和をアキュムレータに記憶する命令である。また、図７（ｃ）においては、ＬＤはメモリからレジスタへのロード命令、ＡＮＤは２つのレジスタ値の論理積をレジスタに記憶する命令、ＳＴはレジスタ値をメモリにストアする命令、ＯＲは２つのレジスタ値の論理和をレジスタに記憶する命令である。 Variables X0 to X3 represent inputs, and variables Y2 to Y3 represent outputs, each of which is stored in two word data in the memory as 1-bit data. FIG. 8 shows how these variables are allocated and stored in the memory. In FIG. 7B, LD (Load) is a load instruction from the memory to the accumulator, AND is an instruction for storing the logical product of the value loaded from the memory and the accumulator value in the accumulator, and ST (Store) is the accumulator value in the memory. The store instruction, OR, is an instruction for storing the logical sum of the value loaded from the memory and the accumulator value in the accumulator. In FIG. 7C, LD is a load instruction from the memory to the register, AND is an instruction for storing the logical product of two register values in the register, ST is an instruction for storing the register value in the memory, and OR is 2 This is an instruction for storing a logical sum of two register values in a register.

図１５は、図７（ｃ）に示した命令列を図１４のパイプライン構成で実行した場合の動作を示すタイムチャートである。図１５に示すように、１番目及び２番目のＬＤ命令はそれぞれサイクルｔ０及びｔ１から開始され、サイクルｔ５及びｔ６までそのまま実行され、レジスタＲ１とＲ２にデータがロードされる。３番目のＡＮＤ命令はサイクルｔ２からｔ４まではそのまま実行されるが、サイクルｔ５のＥＸステージで必要な変数Ｘ１のデータがまだリードされていないためパイプラインが１サイクルの間ストールする（「−」印）。変数Ｘ１のデータは２番目のＬＤ命令のＭステージの実行が完了する次のサイクルｔ６で利用可能となり、その後はサイクルｔ８までそのまま実行される。４番目のＳＴ命令はサイクルｔ３から開始されるが、サイクルｔ５での前命令のストールに伴い一緒にストールする。続いてサイクルｔ６及びｔ７が実行されたのち、Ｍステージではサイクルｔ８とｔ９との２サイクルをかけてリードモディファイライトが実行される。そのため、５番目のＬＤ命令は、サイクルｔ５に加えてサイクルｔ９でもストールする。 FIG. 15 is a time chart showing an operation when the instruction sequence shown in FIG. 7C is executed in the pipeline configuration of FIG. As shown in FIG. 15, the first and second LD instructions start from cycles t0 and t1, respectively, and are executed as they are until cycles t5 and t6, and data is loaded into the registers R1 and R2. The third AND instruction is executed as it is from cycle t2 to t4. However, since the data of variable X1 necessary for the EX stage of cycle t5 has not been read yet, the pipeline stalls for one cycle ("-"). mark). The data of the variable X1 becomes available at the next cycle t6 when the execution of the M stage of the second LD instruction is completed, and thereafter is executed as it is until the cycle t8. The fourth ST instruction starts from cycle t3, but stalls together with the stall of the previous instruction in cycle t5. Subsequently, after cycles t6 and t7 are executed, in the M stage, read-modify-write is executed over two cycles of cycles t8 and t9. Therefore, the fifth LD instruction stalls at cycle t9 in addition to cycle t5.

以下同様にして、６番目のＬＤ命令はサイクルｔ５及びｔ９が、７番目のＯＲ命令及び８番目のＳＴ命令はサイクルｔ９及びｔ１１が、それぞれストールすることになる。また、８番目のＳＴ命令はサイクルｔ１４とｔ１５の２サイクルをかけてリードモディファイライトが実行される。以上説明したように、従来技術のパイプライン構成では、通常のラダー言語では６命令で記述される処理を実行するのに８命令を要し、かつ、３つのサイクルにてパイプラインのストールが発生するため、全ての処理を完了するまでに計１７サイクルを要することとなる。 Similarly, the sixth LD instruction stalls at cycles t5 and t9, and the seventh OR instruction and the eighth ST instruction stall at cycles t9 and t11, respectively. The eighth ST instruction is subjected to read-modify-write over two cycles t14 and t15. As described above, in the conventional pipeline configuration, the normal ladder language requires 8 instructions to execute the processing described in 6 instructions, and a pipeline stall occurs in 3 cycles. Therefore, a total of 17 cycles are required to complete all the processes.

本発明の目的は、ラダー言語を処理するビット演算プロセッサを備えたプログラマブルコントローラに好適な、リードモディファイライトなどによるパイプライン処理のストールを起こさないパイプラインの構成を提案することにある。さらには、キャッシュメモリを備えるビット演算プロセッサにおいて好適なパイプラインとキャッシュメモリの構成を提案することを目的とする。 An object of the present invention is to propose a pipeline configuration suitable for a programmable controller having a bit arithmetic processor for processing a ladder language and which does not cause a stall of pipeline processing by read-modify-write or the like. It is another object of the present invention to propose a pipeline and a cache memory configuration suitable for a bit arithmetic processor including a cache memory.

ラダー言語を処理するビット演算プロセッサにおいて、パイプライン処理のストールを起こさないためには、ＥＸステージの演算結果を次のサイクルで実行される次命令のＥＸステージで利用可能とするとともに、リードモディファイライトの対象となるメモリの内容を事前にリードする新たなステージであるＲ（Read）ステージを設け、その次のＥＸステージにてビット演算及びビットデータのマージを行い、その結果を最後のＷ（Write）ステージにてメモリにストアするようにすればよい。 In order to prevent pipeline processing from stalling in a bit operation processor that processes the ladder language, the operation result of the EX stage can be used in the EX stage of the next instruction executed in the next cycle, and the read modify write can be performed. The R (Read) stage, which is a new stage for pre-reading the contents of the target memory, is provided, the bit operation and bit data merging are performed at the next EX stage, and the result is the last W (Write ) Store in memory on stage.

そこで、前記の目的を達成するために、本発明は、ビット演算処理の対象となる１ビットのデータを複数個まとめたワードの単位でメモリの読み書きを行うプログラマブルコントローラであって、プログラムに含まれるビット演算処理命令列を、パイプライン処理機構によって並列に実行するビット演算プロセッサを備え、前記ビット演算プロセッサが備えるパイプラインステージは、前記ビット演算処理命令を構成する各命令をデコードし、そのデコードした命令がメモリアクセスを伴う場合はデータアドレスを生成するデコードステージの次にリードステージがあり、ビット演算対象となるデータをワード単位でメモリから読み込む前記リードステージの次に演算ステージがあり、前記演算ステージの次に、前記演算ステージによって演算されたビット演算の結果を含むワードデータを前記リードステージで読み込んだデータと同じアドレスに書き込むライトステージがあることを特徴とする。
ここで、前記メモリは、前記リードステージにて読み込んだデータのアドレスを保持する第１アドレス保持レジスタと、前記第１アドレス保持レジスタからコピーされる前記データのアドレスを、前記ライトステージまで保持し、前記ライトステージの書き込み先アドレスとして使用させる第２アドレス保持レジスタと、を有するアドレス保持回路を備える。 Therefore, in order to achieve the above object, the present invention is a programmable controller that reads and writes a memory in units of words in which a plurality of 1-bit data to be subjected to bit operation processing are collected, and is included in a program The bit operation processor includes a bit operation processor that executes a bit operation instruction sequence in parallel by a pipeline processing mechanism, and the pipeline stage included in the bit operation processor decodes and decodes each instruction constituting the bit operation processing instruction. instructions are: a read stage of the decode stage for generating data addresses when involving memory access, the following the calculation stage of the read stage of reading from the memory the data to be bit operation target word by word, the operation stage Followed by the computation stage. It characterized in that there is a write stage for writing word data including the result of the bit operation to the same address as read data in the read stage.
Here, the memory holds a first address holding register holding an address of data read in the read stage, and an address of the data copied from the first address holding register until the write stage , And an address holding circuit having a second address holding register used as a write destination address of the write stage .

また、本発明は、前記メモリが、少なくとも２ウェイ以上のセットアソシアティブ方式又はフルアソシアティブ方式のキャッシュメモリであるとともに、
前記リードステージにて読み込んだキャッシュエントリのアドレスから抜き出されたインデックス情報を保持する第１インデックス保持レジスタと、
前記リードステージにてヒットしたウェイ情報を保持する第１ウェイ保持レジスタと、
前記第１インデックス保持レジスタからコピーされる前記インデックス情報を、前記ライトステージまで保持する第２インデックス保持レジスタと、
前記第１ウェイ保持レジスタからコピーされる前記ウェイ情報を、前記ライトステージまで保持する第２ウェイ保持レジスタと、を有しており、
前記第２インデックス保持レジスタ内の前記インデックス情報と、前記第２ウェイ保持レジスタ内の前記ウェイ情報との組み合わせを、前記ライトステージの書き込み先アドレスとして使用させるアドレス保持回路を備える
ことを特徴とする。 Further, the present invention, the memory is at least 2-way or more set associative or fully associative cache memory der Rutotomoni,
A first index holding register for holding index information extracted from the address of the cache entry read in the read stage ;
A first way holding register for holding way information hit in the read stage ;
A second index holding register for holding the index information copied from the first index holding register up to the write stage ;
A second way holding register for holding the way information copied from the first way holding register up to the write stage;
An address holding circuit that uses a combination of the index information in the second index holding register and the way information in the second way holding register as a write destination address of the write stage;
It is characterized by that.

また、メモリにアクセスする命令が連続した場合でもパイプライン処理をストールさせないために、前記メモリは単一のパイプライン処理サイクル時間内に１回以上のリードと１回以上のライトが独立に行える２ポート以上のメモリであることが好ましい。 Further, in order not to stall pipeline processing even when instructions accessing the memory are continuous, the memory can independently perform one or more reads and one or more writes within a single pipeline processing cycle time. The memory is preferably more than a port.

なお、特許文献３には、キャッシュリード、演算、キャッシュライト、という順番のパイプライン構成が記載されているが、リードとライトを独立に行うことが目的であるため、リードとライトには別なアドレスを指定でき、そのためキャッシュライト時にもキャッシュヒット判定が行われる。すなわち、リードモディファイライトを行うことは考慮されておらず、２リード＋１ライトの３ポートメモリが必要となるなど、本発明とは目的と構成が異なっている。 Note that Patent Document 3 describes a pipeline configuration in the order of cache read, operation, and cache write. However, since the purpose is to perform read and write independently, there is a difference between read and write. An address can be specified, so that a cache hit determination is performed even during a cache write. That is, it does not consider performing read-modify-write, and a 3-port memory of 2 reads + 1 writes is required, and the object and configuration are different from the present invention.

本発明によれば、ビット演算プロセッサを備えたプログラマブルコントローラにおいて実行されるプログラムにおいて、リードモディファイライトに伴うパイプライン処理のストールが起こらないため、ラダー言語によって記述されたプログラムを効率よく処理することができる。また、キャッシュメモリを備えたビット演算プロセッサにおいても同様にパイプライン処理のストールが起こらないため、ラダー言語によって記述されたプログラムを効率よく処理することができる。また、ストア命令の実行時に先行するリードステージでキャッシュヒット判定を行うことにより、ライトステージでのキャッシュ入れ替え処理が不要となるので、パイプライン制御が単純になる。 According to the present invention, a program executed in a programmable controller having a bit arithmetic processor does not cause a stall of pipeline processing due to read-modify-write, so that a program written in a ladder language can be processed efficiently. it can. Similarly, since a stall of pipeline processing does not occur in a bit arithmetic processor having a cache memory, a program written in a ladder language can be processed efficiently. Also, by performing a cache hit determination at the preceding read stage at the time of execution of the store instruction, the cache replacement process at the write stage becomes unnecessary, so that pipeline control is simplified.

第１実施形態に係るプログラマブルコントローラが備えるビット演算プロセッサの内部構成及びパイプライン構成を示す図The figure which shows the internal structure and pipeline structure of a bit arithmetic processor with which the programmable controller which concerns on 1st Embodiment is provided. 第１実施形態に係るプログラマブルコントローラの全体構成を示す図The figure which shows the whole structure of the programmable controller which concerns on 1st Embodiment. データメモリとして２ポートメモリを用いた場合の構成を示すブロック図Block diagram showing the configuration when a 2-port memory is used as the data memory 第１実施形態に係るビット演算プロセッサの演算ステージの詳細構成を示す図The figure which shows the detailed structure of the operation stage of the bit operation processor which concerns on 1st Embodiment. 第２実施形態に係るキャッシュメモリの構成を示す図The figure which shows the structure of the cache memory which concerns on 2nd Embodiment. 第２実施形態に係るビット演算プロセッサの演算ステージの詳細構成を示す図The figure which shows the detailed structure of the operation stage of the bit operation processor which concerns on 2nd Embodiment. ビット演算プロセッサの動作を説明するためのラダー図とそのプログラム例Ladder diagram for explaining the operation of bit arithmetic processor and its program example 図７のプログラムで使用される変数のメモリへの割り付けを示す図The figure which shows the allocation to the memory of the variable used with the program of FIG. ビット演算プロセッサによって実行される命令の命令フォーマットを示す図The figure which shows the instruction format of the instruction executed by the bit arithmetic processor 図７のプログラムに対する命令コード列がメモリに格納された様子を示す図The figure which shows a mode that the instruction code sequence with respect to the program of FIG. 7 was stored in memory. 第１及び第２実施形態に係るビット演算プロセッサの動作例を示すタイムチャートTime chart showing an operation example of the bit arithmetic processor according to the first and second embodiments 第２実施形態に係るビット演算プロセッサにおいてキャッシュミスが発生した場合の動作例を示すタイムチャートTime chart showing an operation example when a cache miss occurs in the bit arithmetic processor according to the second embodiment 第１実施形態に係るパイプライン構成を図１４と対比して表現した図The figure which expressed the pipeline structure which concerns on 1st Embodiment in contrast with FIG. 典型的なＲＩＳＣプロセッサのパイプライン構成を示す図A diagram showing a pipeline configuration of a typical RISC processor 従来技術によるビット演算プロセッサの動作例を示すタイムチャートTime chart showing an example of operation of a bit arithmetic processor according to the prior art

以下、本発明を実施するための形態につき、図１から図１３を用いて説明する。
《第１実施形態》
図１は、本発明の第１実施形態に係るプログラマブルコントローラが備えるビット演算プロセッサの内部構成及びパイプライン構成を示す図である。また、図１３は、第１実施形態に係るパイプライン構成を図１４の従来技術と対比して表現したものである。図１及び図１３に示すように、本実施形態に係るパイプライン構成は、（１）プログラムカウンタ（ＰＣ）ステージ、（２）命令フェッチ（ＩＦ）ステージ、（３）デコード（Ｄ）ステージ、（４）メモリリード（Ｒ）ステージ、（５）演算実行（ＥＸ）ステージ、（６）メモリライト（Ｗ）ステージ、の６段からなる。 Hereinafter, embodiments for carrying out the present invention will be described with reference to FIGS. 1 to 13.
<< First Embodiment >>
FIG. 1 is a diagram illustrating an internal configuration and a pipeline configuration of a bit arithmetic processor provided in the programmable controller according to the first embodiment of the present invention. FIG. 13 represents the pipeline configuration according to the first embodiment in comparison with the prior art of FIG. As shown in FIGS. 1 and 13, the pipeline configuration according to the present embodiment includes (1) a program counter (PC) stage, (2) an instruction fetch (IF) stage, (3) a decode (D) stage, ( 4) 6 stages of memory read (R) stage, (5) operation execution (EX) stage, and (6) memory write (W) stage.

ＰＣステージは、直前の命令アドレスを示すＰＣ（プログラムカウンタ）１０１の値に定数「１」又は指定されたレジスタ値を加算する加算器１０２、加算結果又は分岐先アドレスを示すレジスタ値を選択するセレクタ１０３を備え、直前に選択された命令の次に実行すべき命令の命令アドレスを、当該ステージの出力として命令アドレスレジスタ１１１にセットする。ＩＦステージは命令アドレスレジスタ１１１にセットされた命令アドレスに対応する命令を命令バッファ３０から読み出し、命令レジスタ１２１にセットする。 The PC stage includes an adder 102 for adding a constant “1” or a designated register value to a PC (program counter) 101 value indicating the immediately preceding instruction address, and a selector for selecting a register value indicating an addition result or a branch destination address. The instruction address of the instruction to be executed next to the instruction selected immediately before is set in the instruction address register 111 as the output of the stage. The IF stage reads an instruction corresponding to the instruction address set in the instruction address register 111 from the instruction buffer 30 and sets it in the instruction register 121.

Ｄステージは、読み出した命令を解釈するデコーダ１２２、デコーダ１２２によって命令から抜き出されたアドレスに指定されたレジスタ値を加算する加算器１２３を備え、デコードした命令がメモリアクセスを伴う場合はデータアドレスを生成してデータアドレスレジスタ１３１にセットする。また、図示は省略しているが、デコード結果に応じてレジスタ選択や演算機能選択などその後のステージの制御に必要な制御情報を取り出す。 The D stage includes a decoder 122 that interprets the read instruction, and an adder 123 that adds the specified register value to the address extracted from the instruction by the decoder 122. If the decoded instruction involves memory access, the data address Is set in the data address register 131. Although not shown, control information necessary for subsequent stage control such as register selection and arithmetic function selection is extracted according to the decoding result.

Ｒステージは、データアドレスレジスタ１３１で示されたデータアドレスに該当するデータをデータメモリ２０から読み出す。読み出されたデータは次のＥＸステージからのバイパスデータとの選択を行うセレクタ１３２を介してワードバッファ１４１にセットされる。また、データアドレスレジスタ１３１で示されたデータアドレスをＷステージで再度利用できるように、データメモリ２０内に備えるアドレス保持回路２２にセットする。 The R stage reads data corresponding to the data address indicated by the data address register 131 from the data memory 20. The read data is set in the word buffer 141 via the selector 132 that selects the bypass data from the next EX stage. Further, the data address indicated by the data address register 131 is set in the address holding circuit 22 provided in the data memory 20 so that it can be used again in the W stage.

ＥＸステージは、ＡＬＵ（Arithmetic Logic Unit：演算器）１４２、及びビットマージ機構１４３をもち、ワードバッファ１４１にセットされたデータ値及び／又は指定されたレジスタ値を使って命令で指示された演算を行う。このとき、ビットデータの演算を行う場合は、演算対象のワードデータのなかから指定されたビット位置のビットデータを抽出して演算を実行し、ビットマージ機構１４３を使用して当該ビット位置に演算結果のビットデータを埋め込んでワードデータを生成する。結果を格納する先がレジスタの場合はレジスタファイル１５２に書き込みを行い、格納する先がデータメモリ２０の場合は格納すべき演算結果のデータをライトバッファ１５１に書き込む。 The EX stage has an ALU (Arithmetic Logic Unit: arithmetic unit) 142 and a bit merge mechanism 143, and performs an operation instructed by an instruction using a data value set in the word buffer 141 and / or a specified register value. Do. At this time, when performing bit data calculation, the bit data at the designated bit position is extracted from the word data to be calculated, and the calculation is performed, and the bit merge mechanism 143 is used to perform the calculation at the bit position. Word data is generated by embedding the resulting bit data. When the result is stored in the register, the register file 152 is written. When the result is stored in the data memory 20, the operation result data to be stored is written in the write buffer 151.

Ｗステージは、Ｒステージにてアドレス保持回路２２にセットされたデータアドレス、つまりＲステージでデータを読み出したのと同じデータメモリ２０内のデータアドレスにライトバッファ１５１にセットされたデータを格納する。 The W stage stores the data set in the write buffer 151 at the data address set in the address holding circuit 22 in the R stage, that is, the same data address in the data memory 20 from which data was read out in the R stage.

図２は、第１実施形態に係るプログラマブルコントローラの全体構成を示す図である。図２に示すように、プログラマブルコントローラ１０００は、ＣＰＵ（Central Proseccing Unit）モジュール１、Ｉ／Ｏ（Input／Output）モジュール２Ａ及び２Ｂ、それらを接続するＩ／Ｏバス３、ＣＰＵモジュール１に着脱可能に接続されるプログラム入力装置４を備えて構成される。Ｉ／Ｏモジュール２Ａ，２Ｂは、それぞれＩ／Ｏバス接続回路とＩ／Ｏインターフェース回路を有しており、必要なＩ／Ｏの仕様と接点数に応じて種類と数を変えられるようになっている。 FIG. 2 is a diagram illustrating an overall configuration of the programmable controller according to the first embodiment. As shown in FIG. 2, the programmable controller 1000 is detachable from the CPU (Central Proseccing Unit) module 1, I / O (Input / Output) modules 2A and 2B, the I / O bus 3 connecting them, and the CPU module 1. And a program input device 4 connected to the computer. Each of the I / O modules 2A and 2B has an I / O bus connection circuit and an I / O interface circuit, and the type and number can be changed according to the required I / O specifications and the number of contacts. ing.

ＣＰＵモジュール１は、ビット演算プロセッサ１０、データメモリ２０、命令バッファ３０、Ｉ／Ｏバス制御回路４０、メモリコントローラ５０、外部ＲＡＭ（Random Access Memory）６０、ＲＯＭ（Read Only Memory）７０、汎用マイクロプロセッサ８０、通信Ｉ／Ｆ（Interface）９０を備えて構成される。 The CPU module 1 includes a bit arithmetic processor 10, a data memory 20, an instruction buffer 30, an I / O bus control circuit 40, a memory controller 50, an external RAM (Random Access Memory) 60, a ROM (Read Only Memory) 70, and a general-purpose microprocessor. 80 and a communication I / F (Interface) 90.

データメモリ２０に格納される所定アドレス範囲のデータ群は、それぞれがＩ／Ｏモジュール２Ａ，２Ｂに接続された外部機器（図示省略）との間でやり取りされる入力データ又は出力データに対応する。Ｉ／Ｏバス制御回路４０は、Ｉ／Ｏバス３を制御し、Ｉ／Ｏモジュール２Ａ，２Ｂに接続された外部機器から得られる入力データをデータメモリ２０に書き込み、また、データメモリ２０から読み出した出力データをＩ／Ｏバス３経由でＩ／Ｏモジュール２Ａ，２Ｂに接続された外部機器に出力する。 Data groups in a predetermined address range stored in the data memory 20 correspond to input data or output data exchanged with external devices (not shown) connected to the I / O modules 2A and 2B, respectively. The I / O bus control circuit 40 controls the I / O bus 3, writes input data obtained from an external device connected to the I / O modules 2A and 2B into the data memory 20, and reads out from the data memory 20. The output data is output to an external device connected to the I / O modules 2A and 2B via the I / O bus 3.

命令バッファ３０は、ビット演算プロセッサ１０から要求される命令アドレスに対する命令がバッファ内に蓄えられていればそれを返し、当該命令がバッファ内にない場合にはメモリコントローラ５０に外部ＲＡＭ６０からの命令の読み込みを要求する。メモリコントローラ５０は、ビット演算プロセッサ１０、命令バッファ３０、Ｉ／Ｏバス制御回路４０、汎用マイクロプロセッサ８０からの要求に応じて外部ＲＡＭ６０のリードライト又はＲＯＭ７０のリードを行う。また、汎用マイクロプロセッサ８０は、通信Ｉ／Ｆ９０を介してプログラム入力装置４からローディングされるラダープログラムを外部ＲＡＭ６０に書き込む等、プログラマブルコントローラ１０００全体の制御を司る。汎用マイクロプロセッサ８０を動作させるプログラムはＲＯＭ７０に格納されている。 The instruction buffer 30 returns the instruction for the instruction address requested from the bit arithmetic processor 10 if it is stored in the buffer, and returns the instruction from the external RAM 60 to the memory controller 50 if the instruction is not in the buffer. Request reading. The memory controller 50 reads / writes the external RAM 60 or the ROM 70 in response to requests from the bit arithmetic processor 10, the instruction buffer 30, the I / O bus control circuit 40, and the general-purpose microprocessor 80. The general-purpose microprocessor 80 controls the entire programmable controller 1000 such as writing a ladder program loaded from the program input device 4 to the external RAM 60 via the communication I / F 90. A program for operating the general-purpose microprocessor 80 is stored in the ROM 70.

なお、本実施形態においては、ビット演算プロセッサ１０、データメモリ２０、命令バッファ３０、Ｉ／Ｏバス制御回路４０、及びメモリコントローラ５０は、システムＬＳＩ（Large Scale Integration）１００に内蔵されているものとする。 In this embodiment, the bit arithmetic processor 10, the data memory 20, the instruction buffer 30, the I / O bus control circuit 40, and the memory controller 50 are incorporated in a system LSI (Large Scale Integration) 100. To do.

図３は、本実施形態に係るデータメモリ２０として好適な２ポートメモリを用いた場合の構成を示す図である。図３に示すように、データメモリ２０は、アドレスセレクタ２０１、アドレス保持レジスタ２２１及び２２２を有するアドレス保持回路２２、ライトアドレスセレクタ２１５、ライトデータセレクタ２０４、メモリアレイ２１を備えて構成される。 FIG. 3 is a diagram showing a configuration when a suitable 2-port memory is used as the data memory 20 according to the present embodiment. As shown in FIG. 3, the data memory 20 includes an address selector 201, an address holding circuit 22 having address holding registers 221 and 222, a write address selector 215, a write data selector 204, and a memory array 21.

アドレスセレクタ２０１は、ビット演算プロセッサ１０のデータアドレスレジスタ１３１（図１）の値とＩ／Ｏバス制御回路４０から出力されるＩ／Ｏデータアドレスのいずれかを選択する。アドレス保持レジスタ２２１，２２２は、Ｒステージの実行に使用されたデータアドレスを２サイクル後のＷステージまで保持する。すなわち、Ｒステージの出力データとして１段目のアドレス保持レジスタ２２１にデータアドレスを保持し、その値をＥＸステージの出力データとして２段目のアドレス保持レジスタ２２２にコピーして保持することによって、Ｗステージで使用できるようにする。 The address selector 201 selects either the value of the data address register 131 (FIG. 1) of the bit arithmetic processor 10 or the I / O data address output from the I / O bus control circuit 40. The address holding registers 221 and 222 hold the data address used for the execution of the R stage until the W stage after two cycles. That is, the data address is held in the first stage address holding register 221 as output data of the R stage, and the value is copied and held in the second stage address holding register 222 as output data of the EX stage. Make it available on stage.

ライトアドレスセレクタ２１５は、データアドレスレジスタ１３１の値とアドレス保持レジスタ２２２の値のいずれかを選択する。ライトデータセレクタ２０４は、ライトバッファ１５１の値とＩ／Ｏバス制御回路４０から出力されるＩ／Ｏデータの値のいずれかを選択する。 The write address selector 215 selects either the value of the data address register 131 or the value of the address holding register 222. The write data selector 204 selects either the value of the write buffer 151 or the value of I / O data output from the I / O bus control circuit 40.

メモリアレイ２１は、リード用のポート１とライト用のポート２の２つが単一のサイクル内でアクセス可能な２ポートメモリで構成される。そのため、パイプライン処理されるある命令のＲステージの処理と並列に実行される別命令のＷステージの処理とを同一サイクル内に並列に実行することが可能となる。ここでは、メモリアレイ２１は２ポートメモリとしたが、単一のサイクル内で１以上のリード用ポートと１以上のライト用ポートがアクセス可能な３ポート以上のメモリとしてもよい。なお、Ｉ／Ｏバス制御回路４０からのメモリアクセス時にはリードとライトはそれぞれ独立して行うので、アドレス保持回路２２は使用されない。 The memory array 21 is composed of a two-port memory in which two ports, a read port 1 and a write port 2, can be accessed within a single cycle. Therefore, it is possible to execute the R stage process of a certain instruction to be pipelined and the W stage process of another instruction to be executed in parallel in the same cycle. Although the memory array 21 is a two-port memory here, it may be a three-port or more memory that can be accessed by one or more read ports and one or more write ports in a single cycle. Note that the address holding circuit 22 is not used because reading and writing are performed independently during memory access from the I / O bus control circuit 40.

図４は、第１実施形態に係るビット演算プロセッサの演算ステージ（ＥＸステージ、一部他ステージの要素を含む）の詳細構成を示す図である。図４に示すように、演算ステージは、ＡＬＵ１４２の他に、ＡＬＵ１４２の入力を選択するセレクタ１４４及び１４５、データメモリ２０からリードされた1ワード（１６ビット）のデータもしくはこの演算ステージの演算結果を次命令で再び演算に使用するためのバイパスデータのいずれかを選択するセレクタ１３２（Ｒステージの要素）、その選択されたワードデータを保持するワードバッファ１４１、演算結果であるＡＬＵ１４２の出力を入力されたワードデータにマージするビットマージ機構１４３、ビットマージされたデータを保持するライトバッファ１５１、アキュムレータや汎用レジスタなどのデータを保持するレジスタファイル１５２（Ｗステージの要素）を備えて構成される。ＡＬＵ１４２は１ビット長又は１ワード即ち１６ビット長の演算を行うことができる。 FIG. 4 is a diagram showing a detailed configuration of an operation stage (EX stage, including some elements of other stages) of the bit operation processor according to the first embodiment. As shown in FIG. 4, in addition to the ALU 142, the operation stage receives selectors 144 and 145 for selecting the input of the ALU 142, 1-word (16 bits) data read from the data memory 20, or the operation result of this operation stage. A selector 132 (an element of the R stage) that selects any of the bypass data to be used for the operation again in the next instruction, the word buffer 141 that holds the selected word data, and the output of the ALU 142 that is the operation result are input. A bit merge mechanism 143 for merging with the word data, a write buffer 151 for holding the bit-merged data, and a register file 152 (W-stage element) for holding data such as an accumulator and a general-purpose register. The ALU 142 can perform a 1-bit or 1-word or 16-bit operation.

図９は、ビット演算プロセッサ１０によって実行される命令の命令フォーマットを示す図である。図９に示すように、命令には２つの形式があるが、いずれも３２ビット固定長である。命令形式１は、５ビットの命令コード（ＯＰ：Operation）、４ビットのビット位置フィールド（ＢＡ：Bit Address）、２３ビットのワードアドレスフィールド（ＷＡ：Word Address）からなり、ビットデータを対象とする。命令形式２は、５ビットの命令コード（ＯＰ）、４ビットのレジスタ指定フィールド（ＲＡ：Register Address）、２３ビットのワードアドレスフィールド（ＷＡ）からなり、ワードデータを対象とする。 FIG. 9 is a diagram illustrating an instruction format of an instruction executed by the bit arithmetic processor 10. As shown in FIG. 9, there are two types of instructions, both of which have a fixed length of 32 bits. The instruction format 1 includes a 5-bit instruction code (OP: Operation), a 4-bit bit position field (BA: Bit Address), and a 23-bit word address field (WA: Word Address), and targets bit data. . The instruction format 2 includes a 5-bit instruction code (OP), a 4-bit register designation field (RA), and a 23-bit word address field (WA), and targets word data.

次に、図７及び図８に示したラダープログラムを例にとり、図１を参照しつつ、本実施形態に係るビット演算プロセッサ１０の動作を図１１のタイムチャートを使って説明する。図１０は、図７（ｂ）のラダープログラムに対する前記した命令形式１の命令コード列がプログラムメモリ（外部ＲＡＭ６０）に格納された様子を示したものであり、アドレスＡ番地から順に命令コード列が格納されているものとする。 Next, taking the ladder program shown in FIGS. 7 and 8 as an example, the operation of the bit arithmetic processor 10 according to this embodiment will be described using the time chart of FIG. 11 with reference to FIG. FIG. 10 shows a state in which the instruction code string of the above-described instruction format 1 for the ladder program of FIG. 7B is stored in the program memory (external RAM 60). Assume that it is stored.

先ず、１番目のＬＤ命令は、サイクルｔ０から開始され、サイクルｔ５までそのまま実行される。このとき、サイクルｔ０（ＰＣステージ）では、ＬＤ命令のアドレスが計算されて命令アドレスレジスタ１１１にセットされる。サイクルｔ１（ＩＦステージ）では、命令バッファ３０から当該命令がリードされて命令レジスタ１２１にセットされる。サイクルｔ２（Ｄステージ）では、命令がデコードされ、ＬＤ命令であることが認識される。１番目のＬＤ命令では、リードすべきワードアドレスは命令中で直接指定されているので、該当するワードアドレスがデータアドレスレジスタ１３１にセットされる。サイクルｔ３（Ｒステージ）では、データメモリ２０から、対象となるビットデータを含む１ワード分のデータが読み込まれ、ワードバッファ１４１にセットされる。サイクルｔ４（ＥＸステージ）では、対象となる変数Ｘ０のビットが抜き出されてレジスタファイル１５２に含まれるアキュムレータにセットされる。サイクルｔ５（Ｗステージ）では何の動作も行わない（図の網かけ表示部分）。 First, the first LD instruction starts from cycle t0 and is executed as it is until cycle t5. At this time, in cycle t0 (PC stage), the address of the LD instruction is calculated and set in the instruction address register 111. In cycle t 1 (IF stage), the instruction is read from the instruction buffer 30 and set in the instruction register 121. In cycle t2 (D stage), the instruction is decoded and recognized as an LD instruction. In the first LD instruction, since the word address to be read is directly specified in the instruction, the corresponding word address is set in the data address register 131. In cycle t3 (R stage), data for one word including the target bit data is read from the data memory 20 and set in the word buffer 141. In cycle t4 (EX stage), the bit of the target variable X0 is extracted and set in the accumulator included in the register file 152. No operation is performed in cycle t5 (W stage) (shaded display portion in the figure).

２番目のＡＮＤ命令は、サイクルｔ１から開始され、サイクルｔ６までそのまま実行される。なお、サイクルｔ６（Ｗステージ）では何の動作も行わない。このとき、ＰＣステージからＲステージまでは１番目のＬＤ命令と同様に実行される。サイクルｔ５（ＥＸステージ）では、対象となる変数Ｘ１のビットを抜き出したのちにＡＬＵ１４２でアキュムレータの内容との論理積を計算し、再びアキュムレータにセットする。本実施例ではＥＸステージの前に実行されるＲステージにおいて、変数Ｘ１を含むワードデータがリードされるので、従来技術の例（図１５）のようにＡＮＤ命令のＥＸステージでパイプライン処理がストールすることはない。 The second AND instruction starts from cycle t1 and is executed as it is until cycle t6. Note that no operation is performed in cycle t6 (W stage). At this time, the PC stage to the R stage are executed in the same manner as the first LD instruction. In cycle t5 (EX stage), after extracting the bit of the target variable X1, the ALU 142 calculates the logical product with the contents of the accumulator, and sets it again in the accumulator. In this embodiment, since word data including the variable X1 is read in the R stage executed before the EX stage, the pipeline processing is stalled at the EX stage of the AND instruction as in the conventional example (FIG. 15). Never do.

３番目のＳＴ命令は、サイクルｔ２から開始されサイクルｔ７までそのまま実行される。ＰＣステージからＲステージまでは１番目のＬＤ命令と同様に実行される。ＳＴ命令においてもＲステージにて事前にデータのリードを行うことが本実施形態の大きな特徴である。サイクルｔ６（ＥＸステージ）では、アキュムレータ値（１ビット）が読み出され、ワードバッファ１４１の値（１ワード）にマージされた結果がライトバッファ１５１にセットされる。サイクルｔ７のＷステージでは、データメモリ２０にライトバッファ１５１の内容が格納される。その際、格納先のメモリアドレスとしては、Ｒステージで使用した値を保持しているアドレス保持回路２２に記憶された内容が使用される。このように、本実施形態のパイプライン構成によれば、１ビットデータの書き込みを行う際においてもパイプライン処理のストールは発生しない。 The third ST instruction starts from cycle t2 and is executed as it is until cycle t7. The PC stage to the R stage are executed in the same manner as the first LD instruction. A major feature of this embodiment is that data is read in advance at the R stage even in the ST instruction. In cycle t6 (EX stage), the accumulator value (1 bit) is read, and the result merged with the value (1 word) of the word buffer 141 is set in the write buffer 151. In the W stage of cycle t7, the contents of the write buffer 151 are stored in the data memory 20. At that time, the contents stored in the address holding circuit 22 holding the value used in the R stage are used as the memory address of the storage destination. As described above, according to the pipeline configuration of the present embodiment, stalling of pipeline processing does not occur even when 1-bit data is written.

以下同様の動作により、４番目のＬＤ命令はサイクルｔ３からサイクルｔ８にかけて、５番目のＯＲ命令はサイクルｔ４からサイクルｔ９にかけて、６番目のＳＴ命令はサイクルｔ５からサイクルｔ１０にかけて、それぞれパイプライン処理がストールすることなく実行される。このように、本実施形態によれば、６命令を実行する間にパイプライン処理のストールは全く発生せず、図７（ｂ）に示した６命令で記述される処理を実行するのに要するサイクル数を、従来技術の１７サイクルから１１サイクルに短縮することができる。 By the same operation, the fourth LD instruction is cycled from cycle t3 to cycle t8, the fifth OR instruction is cycled from cycle t4 to cycle t9, and the sixth ST instruction is cycled from cycle t5 to cycle t10. It runs without stalling. As described above, according to the present embodiment, pipeline processing stall does not occur at all during execution of six instructions, and it is necessary to execute the process described by the six instructions shown in FIG. 7B. The number of cycles can be reduced from 17 cycles of the prior art to 11 cycles.

以上説明したように、第１実施形態に係るビット演算プロセッサを備えたプログラマブルコントローラによれば、ラダー言語処理に特有のビット演算処理をパイプライン処理のストールなしに１サイクルピッチで実行できるので、ラダー言語によって記述されたプログラムを高速に実行することができる。 As described above, according to the programmable controller including the bit arithmetic processor according to the first embodiment, the bit arithmetic processing peculiar to the ladder language processing can be executed at one cycle pitch without stalling the pipeline processing. A program written in a language can be executed at high speed.

《第２実施形態》
続いて、第１実施形態におけるデータメモリ２０（図２参照）をキャッシュメモリによって構成する本発明の第２実施形態について説明する。図５は、第２実施形態に係るキャッシュメモリの構成を示す図である。図５に示すように、データメモリ２０としてのキャッシュメモリ２０Ａは、２ウェイセットアソシアティブ方式のキャッシュメモリであり、アドレスセレクタ２０１Ａ、インデックス保持レジスタ２２１Ａ及び２２２Ａ、ウェイセレクタ２０３、ライトデータセレクタ２０４Ａ、ウェイ０タグメモリ２０５、ウェイ１タグメモリ２０６、ＬＲＵ（Least Recently Used）メモリ２０７、ウェイ０データメモリ２０８、ウェイ１データメモリ２０９、ヒット判定回路２１０、ライトバック制御回路２１１、ウェイデータセレクタ２１２、ウェイ保持レジスタ２１３及び２１４を備えて構成される。図１のアドレス保持回路２２は、インデックス保持レジスタ２２１Ａ，２２２Ａを有するアドレス保持回路１（２２Ａ）と、ウェイ保持レジスタ２１３，２１４を有するアドレス保持回路２（２２Ｂ）によって構成される。 << Second Embodiment >>
Next, a second embodiment of the present invention in which the data memory 20 (see FIG. 2) in the first embodiment is configured by a cache memory will be described. FIG. 5 is a diagram illustrating a configuration of the cache memory according to the second embodiment. As shown in FIG. 5, the cache memory 20A as the data memory 20 is a two-way set associative cache memory, and includes an address selector 201A, index holding registers 221A and 222A, a way selector 203, a write data selector 204A, and a way 0. Tag memory 205, way 1 tag memory 206, LRU (Least Recently Used) memory 207, way 0 data memory 208, way 1 data memory 209, hit determination circuit 210, write-back control circuit 211, way data selector 212, way holding register 213 and 214 are provided. The address holding circuit 22 in FIG. 1 includes an address holding circuit 1 (22A) having index holding registers 221A and 222A and an address holding circuit 2 (22B) having way holding registers 213 and 214.

ウェイデータメモリ２０８及び２０９は、それぞれ２５６個のエントリを持ち、各エントリはそれぞれ１６バイト（８ワード）のデータを保持する。キャッシュメモリのアクセス単位は１６バイトであり、この１６バイト単位のデータを１ラインと呼ぶ。タグメモリ２０５及び２０６は、それぞれ２５６個のエントリを持ち、各エントリは、有効ビットＶ（Valid）、ダーティビットＤ（Dirty）及びタグアドレス（図５では「タグ」と略記）からなる。ＬＲＵメモリ２０７は２５６個のエントリを持ち、最近使用されたウェイを保持する。 The way data memories 208 and 209 each have 256 entries, and each entry holds 16 bytes (8 words) of data. The access unit of the cache memory is 16 bytes, and this 16-byte unit data is called one line. Each of the tag memories 205 and 206 has 256 entries, and each entry includes a valid bit V (Valid), a dirty bit D (Dirty), and a tag address (abbreviated as “tag” in FIG. 5). The LRU memory 207 has 256 entries and holds recently used ways.

アドレスセレクタ２０１Ａは、ビット演算プロセッサ１０のデータアドレスレジスタ１３１（図１）の値とＩ／Ｏバス制御回路４０から出力されるＩ／Ｏデータアドレスのいずれかを選択し、そのなかから１２ビットのタグアドレスと８ビットのインデックス値を抽出する。インデックス保持レジスタ２２１Ａ，２２２Ａは、Ｒステージの実行に使用されたデータアドレスから抜き出されたインデックス値の部分（８ビット）を２サイクル後のＷステージまで保持する。同様に、ウェイ保持レジスタ２１３，２１４は、ＲステージにてヒットしたウェイをＷステージまで保持する。 The address selector 201A selects either the value of the data address register 131 (FIG. 1) of the bit arithmetic processor 10 or the I / O data address output from the I / O bus control circuit 40, and from that, 12-bit A tag address and an 8-bit index value are extracted. The index holding registers 221A and 222A hold the index value portion (8 bits) extracted from the data address used for the execution of the R stage until the W stage after two cycles. Similarly, the way holding registers 213 and 214 hold the way hit in the R stage up to the W stage.

ウェイセレクタ２０３は、２段目のインデックス保持レジスタ２２２Ａに記憶されたインデックス値と２段目のウェイ保持レジスタ２１４に記憶されたヒットウェイを、Ｗステージにて書き込みを行うインデックス値とウェイとしてウェイデータメモリ２０８，２０９に出力する。ライトデータセレクタ２０４Ａは、後記するライトバッファ１５１Ａの値とＩ／Ｏバス制御回路４０から出力される１ライン分のＩ／Ｏデータの値のいずれかを選択する。 The way selector 203 uses the index value stored in the second-stage index holding register 222A and the hit way stored in the second-stage way holding register 214 as the index value and way to write in the W stage as way data. The data is output to the memories 208 and 209. The write data selector 204A selects either the value of a write buffer 151A described later or the value of I / O data for one line output from the I / O bus control circuit 40.

ヒット判定回路２１０は、アクセスするデータアドレスの上位１２ビットとタグメモリ２０５，２０６から出力されるタグ値をウェイごとに比較し、ヒット又はミスを判定する。ウェイデータセレクタ２１２は、ヒット判定回路２１０の出力によってウェイ０データ又はウェイ１データのいずれかのラインデータを選択する。なお、Ｉ／Ｏバス制御回路４０からのキャッシュアクセス時にはキャッシュのリードとライトをそれぞれ独立して行うため、アドレス保持回路１，２（２２Ａ，２２Ｂ）は使用されない。 The hit determination circuit 210 compares the upper 12 bits of the data address to be accessed with the tag values output from the tag memories 205 and 206 for each way to determine a hit or a miss. The way data selector 212 selects either line data of way 0 data or way 1 data according to the output of the hit determination circuit 210. Note that the address holding circuits 1 and 2 (22A and 22B) are not used because the cache read and write are performed independently during the cache access from the I / O bus control circuit 40.

ライトバック制御回路２１１は、キャッシュミス時にＬＲＵメモリ２０７の出力が示す追い出しウェイ（最近使用されたウェイでない方のウェイ）の対象エントリがライトバックモードの対象となるアドレス範囲に含まれていてダーティであった場合、つまり、データの更新が行われていた場合は、タグと同時に読み出されているデータをメモリコントローラ５０に送って外部ＲＡＭ６０（図２）の記憶内容を更新させる。なお、Ｉ／Ｏデータアドレスに相当する所定のアドレス範囲については、動作をライトスルーモードに切り替えて直ちに記憶内容の更新を行わせるので、ライトバック動作は行わない。また、不図示のリード制御回路により、当該追い出した対象エントリにキャッシュミスとなったラインデータを読み込む。 The write-back control circuit 211 indicates that the target entry of the eviction way (the way that is not the most recently used way) indicated by the output of the LRU memory 207 at the time of a cache miss is included in the address range that is the target of the write-back mode. If there is, that is, if the data has been updated, the data read simultaneously with the tag is sent to the memory controller 50 to update the stored contents of the external RAM 60 (FIG. 2). Note that, for a predetermined address range corresponding to the I / O data address, the operation is switched to the write-through mode and the stored contents are immediately updated, so the write-back operation is not performed. Further, the read control circuit (not shown) reads the line data that caused the cache miss into the evicted target entry.

なお、これらキャッシュメモリ２０Ａを構成するタグメモリ２０５，２０６及びウェイデータメモリ２０８，２０９は、単一サイクル内でリード用のポートとライト用のポートの２つが並列にアクセス可能な２ポート以上のメモリで構成することが好ましく、それによってメモリへのアクセスが競合することによって生じるパイプライン処理のストールをなくすことができる。また、ここではキャッシュメモリ２０Ａは２ウェイセットアソシアティブ方式としたが、３ウェイ以上のセットアソシアティブ方式やフルアソシアティブ方式としてもよい。 The tag memories 205 and 206 and the way data memories 208 and 209 constituting the cache memory 20A are memories of two or more ports that can be accessed in parallel by a read port and a write port in a single cycle. This makes it possible to eliminate pipeline processing stalls caused by contention for access to the memory. Here, the cache memory 20A is a two-way set associative method, but may be a three-way or more set associative method or a full associative method.

図６は、第２実施形態に係るビット演算プロセッサの演算ステージ（ＥＸステージ、一部他ステージの要素を含む）の詳細構成を示す図である。図６に示すように、この演算ステージ構成は、前記の第１実施形態に係る演算ステージ構成（図４）における入力及び出力データのサイズを１ライン長（１６バイト＝１２８ビット）に変更するとともに、１ライン長のデータを保持しておいて演算結果のワードデータをマージするための手段を付加したものであり、その他の構成要素は図４と同様であるので重複する説明を省略する。 FIG. 6 is a diagram showing a detailed configuration of an operation stage (EX stage, including some elements of other stages) of the bit operation processor according to the second embodiment. As shown in FIG. 6, this operation stage configuration changes the size of input and output data in the operation stage configuration (FIG. 4) according to the first embodiment to one line length (16 bytes = 128 bits). Means for merging the word data of the operation result while holding data of one line length is added, and the other components are the same as in FIG.

セレクタ１３２Ａ（Ｒステージの要素）は、キャッシュメモリ２０Ａからリードされた１ライン長（１６バイト）のデータのなかの1ワード（１６ビット）もしくはこの演算ステージの演算結果を次命令で再び演算に使用するためのバイパスデータのいずれかを選択する。ラインバッファ１４６は、リードされた１ライン長のデータを保持する。また、ワードマージ機構１４７は、ビットマージ機構１４３の出力である演算結果のワードデータをラインバッファ１４６に保持されているラインデータにマージする。ライトバッファ１５１Ａはマージされた１ライン分のデータを保持する。 The selector 132A (the element of the R stage) uses one word (16 bits) in the data of one line length (16 bytes) read from the cache memory 20A or the operation result of this operation stage for the next instruction for the operation again. To select one of the bypass data. The line buffer 146 holds the read data of one line length. Further, the word merge mechanism 147 merges the word data of the operation result that is the output of the bit merge mechanism 143 into the line data held in the line buffer 146. The write buffer 151A holds merged data for one line.

ここで、前記と同様に、図７及び図８に示したラダープログラムを例にとり、本実施形態に係るビット演算プロセッサの動作を説明する。キャッシュメモリ２０Ａがリード動作とライト動作を単一のサイクルで実行可能な２ポートメモリによって構成されており、かつ、対象データが全てキャッシュヒットするものと仮定した場合の動作は、前記した図１１のタイムチャートと同様である。本実施形態の動作が前記の第１実施形態と異なるのは、Ｒステージ及びＷステージにおけるデータのリード及びライトが１ラインの単位で行われ、ＥＸステージでは、セレクタ１３２Ａによって抽出された対象ワードに対して演算を実行し、演算結果のワードデータを、ラインバッファ１４６に保持しておいたラインデータにマージしてライトするラインデータを生成する点にある。 Here, as described above, the operation of the bit arithmetic processor according to the present embodiment will be described using the ladder program shown in FIGS. 7 and 8 as an example. The operation when the cache memory 20A is configured by a two-port memory capable of executing a read operation and a write operation in a single cycle, and all the target data hits the cache is as shown in FIG. It is the same as the time chart. The operation of this embodiment differs from that of the first embodiment in that data reading and writing in the R stage and W stage are performed in units of one line, and in the EX stage, the target word extracted by the selector 132A is added to the target word. The operation is performed on the data, and the word data of the operation result is merged with the line data stored in the line buffer 146 to generate line data to be written.

次に、図７（ｂ）のラダープログラムにおける３番目のＳＴ命令でキャッシュミスが起こった場合の動作を図１２のタイムチャートを使って説明する。１番目のＬＤ命令と２番目のＡＮＤ命令の動作は図１１の場合と同じである。３番目のＳＴ命令のＲステージ（サイクルｔ５）でキャッシュミスが発生すると、メモリコントローラ５０を介して外部ＲＡＭ６０のデータを読み込むために、例えばサイクルｔ５からサイクルｔ１４までパイプライン処理がストールする。しかし、サイクルｔ１５でＲステージが再開されたのちは、パイプライン処理のストールなしに実行される。このようにＳＴ命令に対するキャッシュミスが発生した場合であっても、Ｒステージにおいてキャッシュラインの入れ替えを行うだけで済むのでキャッシュの制御が単純になる。 Next, the operation when a cache miss occurs in the third ST instruction in the ladder program of FIG. 7B will be described using the time chart of FIG. The operations of the first LD instruction and the second AND instruction are the same as in FIG. When a cache miss occurs in the R stage (cycle t5) of the third ST instruction, the pipeline process stalls from cycle t5 to cycle t14, for example, in order to read data in the external RAM 60 via the memory controller 50. However, after the R stage is resumed at cycle t15, the pipeline process is executed without stalling. Thus, even when a cache miss occurs for the ST instruction, it is only necessary to replace the cache line in the R stage, so that the cache control is simplified.

なお、ＳＴ命令の次命令がアクセスするデータアドレスがＳＴ命令と同一インデックス値を有しており、かつ、次命令がキャッシュミスとなった場合でも、２ウェイセットアソシアティブ以上のキャッシュメモリであれば、当該ＳＴ命令がＷステージでラインデータを上書きする前に当該エントリがパージされてしまっていることはない。 Even if the data address accessed by the next instruction of the ST instruction has the same index value as that of the ST instruction and the next instruction is a cache miss, if the cache memory is 2 way set associative or more, The entry is not purged before the ST instruction overwrites the line data at the W stage.

以上説明したように、第２実施形態に係るキャッシュメモリを内蔵したビット演算プロセッサを備えたプログラマブルコントローラによれば、ラダー言語処理に特有のビット演算処理をパイプライン処理のストールなしに１サイクルピッチで実行できるので、ラダー言語によって記述されたプログラムを高速に実行することができる。 As described above, according to the programmable controller including the bit arithmetic processor incorporating the cache memory according to the second embodiment, the bit arithmetic processing peculiar to the ladder language processing can be performed at one cycle pitch without stalling the pipeline processing. Since it can be executed, a program written in a ladder language can be executed at high speed.

以上にて本発明を実施する形態の説明を終えるが、本発明の実施の態様はこれに限られるものではなく、本発明の趣旨を逸脱しない範囲内で各種の変更が可能である。 Although the description of the embodiment of the present invention has been completed above, the embodiment of the present invention is not limited to this, and various modifications can be made without departing from the spirit of the present invention.

１ＣＰＵモジュール
２Ａ，２ＢＩ／Ｏモジュール
３Ｉ／Ｏバス
４プログラム入力装置
１０ビット演算プロセッサ
２０データメモリ
２０Ａキャッシュメモリ
２１メモリアレイ
２２，２２Ａ，２２Ｂアドレス保持回路
３０命令バッファ
４０Ｉ／Ｏバス制御回路
５０メモリコントローラ
６０外部ＲＡＭ
７０ＲＯＭ
８０汎用マイクロプロセッサ
９０通信Ｉ／Ｆ
１００システムＬＳＩ
１０００プログラマブルコントローラ 1 CPU module 2A, 2B I / O module 3 I / O bus 4 Program input device 10-bit arithmetic processor 20 Data memory 20A Cache memory 21 Memory array 22, 22A, 22B Address holding circuit 30 Instruction buffer 40 I / O bus control circuit 50 Memory controller 60 External RAM
70 ROM
80 General-purpose microprocessor 90 Communication I / F
100 system LSI
1000 Programmable controller

Claims

A programmable controller that reads and writes a memory in a unit of a word in which a plurality of 1-bit data to be subjected to bit operation processing are collected,
A bit operation processor that executes a bit operation processing instruction sequence included in the program in parallel by a pipeline processing mechanism;
The pipeline stage included in the bit operation processor has a read stage next to a decode stage that decodes each instruction constituting the bit operation processing instruction and generates a data address when the decoded instruction involves memory access. , are: the operation stage of the read stage of reading data to be calculated in units of words from the memory, the next computation stage, the read stage of the word data including the result of the computed bit calculated by the calculation stage There is a write stage that writes to the same address as the data read in
The memory is
A memory with two or more ports that can perform one or more reads and one or more writes within a single pipeline stage processing cycle time,
A first address holding register that holds an address of data read in the read stage, and an address of the data copied from the first address holding register up to the write stage, and a write destination address of the write stage A programmable controller comprising: an address holding circuit having a second address holding register to be used .

A programmable controller that reads and writes a memory in a unit of a word in which a plurality of 1-bit data to be subjected to bit operation processing are collected,
A bit operation processor that executes a bit operation processing instruction sequence included in the program in parallel by a pipeline processing mechanism;
The pipeline stage included in the bit operation processor has a read stage next to a decode stage that decodes each instruction constituting the bit operation processing instruction and generates a data address when the decoded instruction involves memory access. , are: the operation stage of the read stage of reading data to be calculated in units of words from the memory, the next computation stage, the read stage of the word data including the result of the computed bit calculated by the calculation stage There is a write stage that writes to the same address as the data read in
The memory is
A two-port or more memory capable of performing one or more reads and one or more writes within a single pipeline stage processing cycle time, and at least two-way set associative or full associative of cache memory der Rutotomoni,
A first index holding register for holding index information extracted from the address of the cache entry read in the read stage ;
A first way holding register for holding way information hit in the read stage ;
A second index holding register for holding the index information copied from the first index holding register up to the write stage ;
A second way holding register for holding the way information copied from the first way holding register up to the write stage;
A programmable memory device comprising: an address holding circuit that uses a combination of the index information in the second index holding register and the way information in the second way holding register as a write destination address of the write stage. controller.

The bit arithmetic processor is:
When executing a bit data store instruction in the bit operation processing instruction sequence ,
Read the original word data including the bits to be stored in the read stage and hold it in the storage unit,
The programmable controller according to claim 1 or 2, wherein the bit data obtained as a result of the bit operation calculated in the operation stage is merged with the original word data held.

Cache hit determination is performed in the read stage, and in the write stage, the cache indicated by the index information held in the second index holding register of the address holding circuit and the way information held in the second way holding register The programmable controller according to claim 2, wherein line data in which word data including the result of the bit operation is merged is written in the entry.

The programmable controller according to claim 2, wherein the write-through mode and the write-back mode are switched according to an address range to be stored.