JP2006171827A

JP2006171827A - Processor and processing program

Info

Publication number: JP2006171827A
Application number: JP2004359553A
Authority: JP
Inventors: Hiroto Hirabayashi; 裕人平林
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2004-12-13
Filing date: 2004-12-13
Publication date: 2006-06-29

Abstract

<P>PROBLEM TO BE SOLVED: To enable a vector processor to process, at high speed, instructions having a saturation function, while keeping circuit scale from increasing. <P>SOLUTION: When either an add-subtract shift unit A25 or B26 performs an addition process, a saturation process part 25a or 26a refers to an S-bit in a status register 22. When the S-bit is "1" a decision is made as to whether the result of the addition by the add-subtract shift unit A25 or B26 overflows or underflows. When the result of the addition by the add-subtract shift unit A25 or B26 overflows, "0x7FFFFFFF" is outputted as the result of the addition. When the result of the addition by the add-subtract shift unit A25 or B26 underflows, "0x80000000" is outputted as the result of the addition. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、演算処理装置および演算処理プログラムに関し、特に、ＡＭＲ−ＮＢ（ＡｄａｐｔｉｖｅＭｕｌｔｉ−ＲａｔｅＮａｒｒｏｗＢａｎｄ）コーディック（以下、ＡＭＲコーディックと称す）などに用いられるベクトルパイプラインプロセッサに適用して好適なものである。 The present invention relates to an arithmetic processing unit and an arithmetic processing program, and in particular, is suitably applied to a vector pipeline processor used in an AMR-NB (Adaptive Multi-Rate Narrow Band) codec (hereinafter referred to as an AMR codec). It is.

ＡＭＲコーディックは、３ＧＰＰ（３ｒｄＧｅｎｅｒａｔｉｏｎＰａｒｔｎｅｒｓｈｉｐＰｒｏｊｅｃｔ）の提案する音声コーディックの１つであり、人間の音声を効率的に圧縮符号化することができるという特徴があり、１６ｂｉｔ精度での飽和処理や積和演算などが頻繁に行われる。特に、ＡＭＲディコーダでコール回数が多い関数として、Ｌ_ＭＡＣ（）という積和演算関数がある。このＬ_ＭＡＣ（）関数では、Ｌ_ｍｕｌｔ（）関数とＬ_ａｄｄ（）関数を呼び出すことにより、積和演算が行われるとともに、オーバーフローの有無に応じて飽和処理が行われる。 The AMR codec is one of the speech codecs proposed by 3GPP (3rd Generation Partnership Project), and is characterized by the ability to efficiently compress and encode human speech, with saturation processing and sum of products with 16-bit accuracy. Arithmetic etc. are frequently performed. In particular, as a function having a large number of calls in an AMR decoder, there is a product-sum operation function called L_MAC (). In this L_MAC () function, a product-sum operation is performed by calling the L_mult () function and the L_add () function, and saturation processing is performed depending on whether there is an overflow.

図１７はＬ_ＭＡＣ（）関数のソースコードの一例を示す図、図１８は、図１７のソースコードのコンパイル例を示す図である。
図１７において、Ｌ_ＭＡＣ（）関数を実行されると、Ｌ_ｍｕｌｔ（）関数が呼び出され、通常の乗算が行われる。そして、乗算結果を左１ビットシフトしたときにオーバーフローしない時は、乗算結果の左１ビットシフトが行われ、オーバーフローする時は、オーバーフローフラグが立てられるとともに、乗算結果として飽和値が代入される。 FIG. 17 is a diagram illustrating an example of the source code of the L_MAC () function, and FIG. 18 is a diagram illustrating a compilation example of the source code of FIG.
In FIG. 17, when the L_MAC () function is executed, the L_multi () function is called and normal multiplication is performed. When the multiplication result does not overflow when shifted by 1 bit to the left, the multiplication result is shifted by 1 bit to the left. When overflow occurs, an overflow flag is set and a saturation value is substituted as the multiplication result.

次に、Ｌ_ａｄｄ（）関数が呼び出され、通常の加算が行われる。そして、加算結果がオーバーフローしているかどうかが判断され、オーバーフローしている時は、オーバーフローフラグが立てられるとともに、加算結果として飽和値が代入される。
このように、図１７のＬ_ＭＡＣ（）関数では複雑な処理が行われるため、図１７のソースコードのコンパイルすると、図１８に示すように、３６行にも及ぶ膨大な関数になる。 Next, the L_add () function is called and normal addition is performed. Then, it is determined whether or not the addition result has overflowed, and when it has overflowed, an overflow flag is set and a saturation value is substituted as the addition result.
As described above, since complicated processing is performed in the L_MAC () function in FIG. 17, when the source code in FIG. 17 is compiled, as shown in FIG.

また、特許文献１には、ＭＡＣ（ＭｕｌｔｉｐｌｙａｎｄＡｃｃｕｍｕｌａｔｅ）命令において、汎用レジスタＲｍとＲｎの内容をアドレスとする１６ビットオペランドを符号付きで乗算し、乗算結果の３２ビットとＭＡＣレジスタの内容とを加算し、加算結果をＭＡＣレジスタに格納するとともに、Ｓビットが１のときはＭＡＣレジスタとの加算を飽和演算とする方法が開示されている。この飽和演算においては、オーバーフローが発生すると、ＭＡＣＨレジスタのＬＳＢが１にセットされ、加算結果が負の方向にオーバーフローした時は、Ｈ”８０００００００（最小値）がＭＡＣレジスタに格納され、加算結果が正の方向にオーバーフローした時は、Ｈ”７ＦＦＦＦＦＦＦ（最大値）がＭＡＣレジスタに格納される。Ｓビットが０のときは、連結されたＭＡＣＨ／Ｌレジスタに加算結果の４２ビットが格納される。
特開平６−１５００２３号公報 Further, in Patent Document 1, in a MAC (Multiple and Accurate) instruction, a 16-bit operand having the contents of general-purpose registers Rm and Rn as an address is multiplied with a sign, and 32 bits of the multiplication result and the contents of the MAC register are multiplied. A method is disclosed in which addition is performed and the addition result is stored in the MAC register, and when the S bit is 1, addition to the MAC register is a saturation operation. In this saturation operation, when an overflow occurs, the LSB of the MACH register is set to 1, and when the addition result overflows in the negative direction, H "80000000 (minimum value) is stored in the MAC register, and the addition result is When overflowing in the positive direction, H ″ 7FFFFFFF (maximum value) is stored in the MAC register. When the S bit is 0, 42 bits of the addition result are stored in the concatenated MACH / L register.
JP-A-6-150023

しかしながら、Ｌ_ＭＡＣ（）関数では飽和機能が実装されているため、１回分の演算を行うごとにオーバーフローの有無を判断し、その判断結果に従って分岐処理を行う必要がある。このため、１命令分の演算ごとにサイクル数が変動し、Ｌ_ＭＡＣ（）関数をベクトル命令化することができず、演算速度の高速化に支障をきたすという問題があった。
また、特許文献１に開示された方法では、Ｓビットを参照しながらＭＡＣ命令における飽和演算が実行されるが、ＭＡＣ命令を高速に処理できるようにするため、ＭＡＣ命令に専用のハードウェアを追加すると、回路規模の増大を招くという問題があった。 However, since the saturation function is implemented in the L_MAC () function, it is necessary to determine whether or not there is an overflow every time one calculation is performed, and to perform branch processing according to the determination result. For this reason, the number of cycles fluctuates for each operation for one instruction, and the L_MAC () function cannot be converted into a vector instruction, which hinders the increase in the operation speed.
In addition, in the method disclosed in Patent Document 1, a saturation operation in the MAC instruction is executed while referring to the S bit. However, in order to be able to process the MAC instruction at high speed, dedicated hardware is added to the MAC instruction. Then, there was a problem that the circuit scale was increased.

そこで、本発明の目的は、回路規模の増大を抑制しつつ、飽和機能を持つ命令を高速に処理することが可能な演算処理装置および演算処理プログラムを提供することである。 Accordingly, an object of the present invention is to provide an arithmetic processing device and an arithmetic processing program capable of processing an instruction having a saturation function at high speed while suppressing an increase in circuit scale.

上述した課題を解決するために、本発明の一態様に係る演算処理装置によれば、ベクトル加算処理を行うベクトル加算器と、前記ベクトル加算器に搭載され、前記ベクトル加算器にて得られた加算結果に基づいて飽和処理を行う飽和処理部とを備えることを特徴とする。
これにより、加算演算に伴う飽和処理をハードウェア上で行わせることが可能となり、オーバーフローの有無の判断およびその判断結果に従った分岐処理をプログラムに記述することなく、飽和処理を実行させることが可能となる。このため、加算命令に飽和機能が実装された場合においても、１命令分の演算を１サイクルで実行させることが可能となり、飽和機能が実装された加算命令のベクトル化ができるようにして、飽和機能が実装された加算処理の高速化を図ることができる。 In order to solve the above-described problem, according to an arithmetic processing device according to one aspect of the present invention, a vector adder that performs vector addition processing and the vector adder are mounted on the vector adder and obtained by the vector adder. And a saturation processing unit that performs saturation processing based on the addition result.
As a result, the saturation processing associated with the addition operation can be performed on the hardware, and the saturation processing can be executed without describing the determination of the presence or absence of overflow and the branch processing according to the determination result in the program. It becomes possible. For this reason, even when the saturation function is implemented in the addition instruction, it is possible to execute the operation for one instruction in one cycle, and the vectorization of the addition instruction in which the saturation function is implemented can be performed. It is possible to speed up the addition processing in which the function is implemented.

また、本発明の一態様に係る演算処理装置によれば、前記飽和処理を有効化するための飽和指定フラグを格納するステータスレジスタをさらに備え、前記飽和処理部は、前記飽和指定フラグの参照結果に基づいて飽和処理を行うことを特徴とする。
これにより、飽和処理を実行させるかどうかをプログラムから指定することが可能となる。このため、加算演算に伴う飽和処理のハードウェア化を行った場合においても、加算結果に対して飽和処理が一律に行われることを防止することが可能となり、一つの加算命令を用意することで、飽和機能が実装された加算処理と飽和機能が実装されていない加算処理とを実現することができる。 The arithmetic processing device according to an aspect of the present invention further includes a status register that stores a saturation designation flag for enabling the saturation processing, and the saturation processing unit includes a reference result of the saturation designation flag. Based on the above, saturation processing is performed.
This makes it possible to specify from the program whether or not to execute saturation processing. This makes it possible to prevent the saturation process from being uniformly performed on the addition result even when the saturation process associated with the addition operation is implemented by hardware, and by preparing one addition instruction. Thus, it is possible to realize an addition process in which the saturation function is implemented and an addition process in which the saturation function is not implemented.

また、本発明の一態様に係る演算処理装置によれば、ベクトルシフト演算処理を行うベクトルシフト演算器と、前記ベクトルシフト演算器に搭載され、前記ベクトルシフト演算器にて得られたシフト演算結果に基づいて飽和処理を行う飽和処理部とを備えることを特徴とする。
これにより、シフト演算に伴う飽和処理をハードウェア上で行わせることが可能となり、オーバーフローの有無の判断およびその判断結果に従った分岐処理をプログラムに記述することなく、飽和処理を実行させることが可能となる。このため、シフト演算命令に飽和機能が実装された場合においても、１命令分の演算を１サイクルで実行させることが可能となり、飽和機能が実装されたシフト演算命令のベクトル化ができるようにして、飽和機能が実装されたシフト演算処理の高速化を図ることができる。 Further, according to the arithmetic processing device according to one aspect of the present invention, a vector shift arithmetic unit that performs vector shift arithmetic processing, and a shift arithmetic result that is mounted on the vector shift arithmetic unit and obtained by the vector shift arithmetic unit And a saturation processing unit that performs saturation processing based on the above.
As a result, the saturation processing associated with the shift operation can be performed on the hardware, and the saturation processing can be executed without describing the determination of the presence of overflow and the branch processing according to the determination result in the program. It becomes possible. For this reason, even when the saturation function is implemented in the shift operation instruction, it is possible to execute the operation for one instruction in one cycle, and to enable vectorization of the shift operation instruction in which the saturation function is implemented. Therefore, it is possible to increase the speed of the shift operation processing in which the saturation function is implemented.

また、本発明の一態様に係る演算処理装置によれば、前記飽和処理部は、前記ベクトルシフト演算器にて行われるシフト演算が算術シフトの場合、前記ベクトルシフト演算器にて得られたシフト演算結果に応じて飽和処理を無条件に実行し、前記ベクトルシフト演算器にて行われるシフト演算が論理シフトの場合、前記ベクトルシフト演算器にて得られたシフト演算結果のオーバーフローの有無に関わらず飽和処理を実行しないようにすることを特徴とする。 Moreover, according to the arithmetic processing device which concerns on 1 aspect of this invention, the said saturation process part is the shift obtained by the said vector shift computing unit, when the shift operation performed by the said vector shift computing unit is an arithmetic shift When saturation processing is unconditionally executed in accordance with the operation result and the shift operation performed by the vector shift operator is a logical shift, whether or not the shift operation result obtained by the vector shift operator has overflowed. This is characterized in that saturation processing is not executed.

これにより、算術シフト命令をプログラム上に記述することで、飽和機能が実装された左シフト演算処理を実現することが可能となるとともに、論理シフト命令をプログラム上に記述することで、飽和機能が実装されていない左シフト演算処理を実現することが可能となる。このため、シフト演算に伴う飽和処理のハードウェア化を行った場合においても、シフト演算結果に対して飽和処理が一律に行われることを防止することが可能となり、左シフトに対して等しい演算結果が得られる２種類の冗長な左右シフト演算命令を使い分けることで、飽和機能が実装されたシフト演算処理と飽和機能が実装されていないシフト演算処理とを実現することができる。 As a result, it is possible to realize the left shift operation processing in which the saturation function is implemented by describing the arithmetic shift instruction on the program, and it is possible to realize the saturation function by describing the logical shift instruction on the program. It is possible to realize left shift calculation processing that is not implemented. This makes it possible to prevent the saturation processing from being uniformly performed on the shift operation result even when the saturation processing associated with the shift operation is implemented in hardware, and the same operation result for the left shift. By properly using the two types of redundant left and right shift operation instructions that can be obtained, it is possible to realize shift operation processing in which the saturation function is implemented and shift operation processing in which the saturation function is not implemented.

また、本発明の一態様に係る演算処理装置によれば、ベクトル乗算処理を行うベクトル乗算器と、ベクトル加算処理を行うベクトル加算器と、前記ベクトル加算器に搭載され、前記ベクトル加算器にて得られた加算結果に基づいて飽和処理を行う第１飽和処理部と、ベクトルシフト演算処理を行うベクトルシフト演算器と、前記ベクトルシフト演算器に搭載され、前記ベクトルシフト演算器にて得られたシフト結果に基づいて飽和処理を行う第２飽和処理部とを備えることを特徴とする。 In addition, according to the arithmetic processing device according to an aspect of the present invention, a vector multiplier that performs vector multiplication processing, a vector adder that performs vector addition processing, and the vector adder are mounted on the vector adder. The first saturation processing unit that performs saturation processing based on the obtained addition result, the vector shift arithmetic unit that performs vector shift arithmetic processing, and the vector shift arithmetic unit are obtained by the vector shift arithmetic unit. And a second saturation processing unit that performs saturation processing based on the shift result.

これにより、乗算命令とシフト演算命令と加算命令とを含む３つの命令を実行させることでＬ_ＭＡＣ（）関数を実行させることが可能となり、オーバーフローの有無の判断およびその判断結果に従った分岐処理をプログラムに記述することなく、Ｌ_ＭＡＣ（）関数を実行させることが可能となる。このため、Ｌ_ＭＡＣ（）関数に専用のハードウェアを追加することなく、Ｌ_ＭＡＣ（）関数をベクトル命令化することが可能となり、回路規模の増大を抑制しつつ、Ｌ_ＭＡＣ（）関数を高速に実行させることが可能となる。 As a result, the L_MAC () function can be executed by executing three instructions including a multiplication instruction, a shift operation instruction, and an addition instruction. It is possible to execute the L_MAC () function without describing it in the program. Therefore, the L_MAC () function can be converted into a vector instruction without adding dedicated hardware to the L_MAC () function, and the L_MAC () function is executed at high speed while suppressing an increase in circuit scale. It becomes possible.

また、本発明の一態様に係る演算処理装置によれば、前記飽和処理部は、前記飽和処理の有無に基づいてオーバーフローフラグを前記ステータスレジスタに設定することを特徴とする。
これにより、ステータスレジスタに設定されたオーバーフローフラグの値を参照することで、演算終了後に飽和の有無を検知することが可能となり、飽和の有無による処理の分岐を可能にする。 Further, according to the arithmetic processing device according to one aspect of the present invention, the saturation processing unit sets an overflow flag in the status register based on the presence or absence of the saturation processing.
As a result, by referring to the value of the overflow flag set in the status register, it is possible to detect the presence or absence of saturation after the calculation is completed, and to enable branching of processing depending on the presence or absence of saturation.

また、本発明の一態様に係る演算処理プログラムによれば、飽和機能が実装されていないベクトル乗算命令と、飽和機能が実装されたベクトル加算命令と、飽和機能が実装されたベクトルシフト命令とをコンピュータに実行させることを特徴とする。
これにより、オーバーフローの有無の判断およびその判断結果に従った分岐処理をプログラムに記述することなく、Ｌ_ＭＡＣ（）関数を実行させることが可能となる。このため、Ｌ_ＭＡＣ（）関数に専用のハードウェアを追加することなく、Ｌ_ＭＡＣ（）関数をベクトル命令化することが可能となり、回路規模の増大を抑制しつつ、Ｌ_ＭＡＣ（）関数を高速に実行させることが可能となる。 In addition, according to the arithmetic processing program according to one aspect of the present invention, a vector multiplication instruction in which the saturation function is not implemented, a vector addition instruction in which the saturation function is implemented, and a vector shift instruction in which the saturation function is implemented The computer is executed.
This makes it possible to execute the L_MAC () function without describing in the program the determination of whether or not there is an overflow and the branch process according to the determination result. Therefore, the L_MAC () function can be converted into a vector instruction without adding dedicated hardware to the L_MAC () function, and the L_MAC () function is executed at high speed while suppressing an increase in circuit scale. It becomes possible.

また、本発明の一態様に係る演算処理プログラムによれば、前記ベクトル加算命令の飽和機能のオン／オフを指定する命令をコンピュータに実行させることを特徴とする。
これにより、飽和処理を実行させるかどうかをプログラムから指定することが可能となる。このため、加算演算に伴う飽和処理のハードウェア化を行った場合においても、加算結果に対して飽和処理が一律に行われることを防止することが可能となり、一つの加算命令を用意することで、飽和機能が実装された加算処理と飽和機能が実装されていない加算処理とを実現することができる。 The arithmetic processing program according to one aspect of the present invention causes a computer to execute an instruction that specifies on / off of the saturation function of the vector addition instruction.
This makes it possible to specify from the program whether or not to execute saturation processing. This makes it possible to prevent the saturation process from being uniformly performed on the addition result even when the saturation process associated with the addition operation is implemented by hardware, and by preparing one addition instruction. Thus, it is possible to realize an addition process in which the saturation function is implemented and an addition process in which the saturation function is not implemented.

また、本発明の一態様に係る演算処理プログラムによれば、前記ベクトル加算命令による加算結果または前記ベクトルシフト命令によるシフト演算結果のオーバーフローの有無を検出する検出関数をコンピュータに実行させることを特徴とする。
これにより、検出関数をプログラムに記述することで、演算終了後に飽和の有無を検知することが可能となり、飽和の有無による処理の分岐を可能にする。 According to the arithmetic processing program of one aspect of the present invention, the computer is caused to execute a detection function for detecting the presence or absence of overflow of the addition result by the vector addition instruction or the shift operation result by the vector shift instruction. To do.
Thus, by describing the detection function in the program, it is possible to detect the presence or absence of saturation after the calculation is completed, and to enable branching of processing depending on the presence or absence of saturation.

以下、本発明の実施形態に係る演算処理装置について図面を参照しながら説明する。なお、以下の実施形態では、図１のコンピュータシステムにおいて、図２のベクトルプロセッサ１００の各演算器を動作させることによりベクトルパイプライン処理を行う場合を例にとって説明する。
図１は、本発明の一実施形態に係るコンピュータシステムの概略構成を示すブロック図である。 Hereinafter, an arithmetic processing apparatus according to an embodiment of the present invention will be described with reference to the drawings. In the following embodiment, a case where vector pipeline processing is performed by operating each arithmetic unit of the vector processor 100 of FIG. 2 in the computer system of FIG. 1 will be described as an example.
FIG. 1 is a block diagram showing a schematic configuration of a computer system according to an embodiment of the present invention.

図１において、コンピュータシステムには、ベクトル演算器を備えるベクトルプロセッサ１００、所定領域にあらかじめベクトルプロセッサ１００の制御プログラム等を格納するメインメモリ１１０、データを入力可能なヒューマンインターフェースとしての入力部１２０、ディスプレイ等のデータを出力可能な出力部１３０、ネットワーク等を介して外部との通信を行う通信部１４０が設けられている。 In FIG. 1, a computer system includes a vector processor 100 having a vector calculator, a main memory 110 for storing a control program of the vector processor 100 in a predetermined area in advance, an input unit 120 as a human interface capable of inputting data, a display An output unit 130 that can output data such as a communication unit 140 and a communication unit 140 that communicates with the outside via a network or the like are provided.

ここで、メインメモリ１１０には、プログラムを格納するプログラム・テキスト領域１１１、定数などのデータを予め格納する初期化済みデータ領域１１２、定数などのデータを格納するための事前に確保された未初期化データ領域１１３、プログラム実行時に動的に確保されるヒープ領域１１４およびスタック領域１１５ならびにその他論理的に区分された記憶領域を有している。 Here, in the main memory 110, a program / text area 111 for storing a program, an initialized data area 112 for storing data such as constants in advance, and an uninitialized area reserved in advance for storing data such as constants Data area 113, heap area 114 and stack area 115 which are dynamically secured during program execution, and other logically partitioned storage areas.

そして、制御プログラムは、ベクトルプロセッサ１００が直接実行可能な低水準言語（例えば、機械語）で構成されており、高水準言語（例えば、Ｃ言語）により記述されたアセンブリソースコード２００を、アセンブラ２１０およびリンカ２２０からなる命令コード生成系により低水準言語にコンパイルし、実行プログラム２３０として生成される。そして、生成された制御プログラムは、ハードディスク等の補助記憶装置に格納されるが、ベクトルプロセッサ１００が実行するときは、プログラムローダ２４０によりメインメモリ１１０の記憶領域のうちプログラム・テキスト領域１１１に配置され、実行可能な状態に置かれる。なお、アセンブラ２１０、リンカ２２０およびプログラムローダ２４０は、一般にソフトウェアにより構成することができる。 The control program is composed of a low-level language (for example, machine language) that can be directly executed by the vector processor 100, and an assembly source code 200 described in a high-level language (for example, C language) is converted into an assembler 210. The program is compiled into a low-level language by an instruction code generation system including the linker 220 and generated as an execution program 230. The generated control program is stored in an auxiliary storage device such as a hard disk. When the vector processor 100 executes the program, it is placed in the program / text area 111 in the storage area of the main memory 110 by the program loader 240. , Put into a workable state. The assembler 210, linker 220, and program loader 240 can generally be configured by software.

図２は、図１のベクトルプロセッサ１００の概略構成を示すブロック図である。
図２において、ベクトルプロセッサ１００には、メインメモリ１１０から読み出された命令の内容を解読する命令ディコーダ２１、ベクトルプロセッサ１００のステータスに関する情報を記憶するステータスレジスタ２２、スカラー演算に利用するスカラーレジスタレジスタおよびベクトル演算に利用するベクトルレジスタが配置されたレジスタファイル２３、乗算を行う乗算ユニット２４、加減算およびシフト演算を行う加減算／シフトユニットＡ２５、加減算およびシフト演算を行う加減算／シフトユニットＢ２６、シフト、アンドおよびオアなどの論理演算を行うＭＯＶＥユニットＡ２７、シフト、アンドおよびオアなどの論理演算を行うＭＯＶＥユニットＢ２８、演算処理を行うＡＬＵ２９、ロード用のアドレスを生成するアドレス生成ユニットＸ３０、ロード用のアドレスを生成するアドレス生成ユニットＹ３１、ストア用のアドレスを生成するアドレス生成ユニットＺ３２が設けられている。 FIG. 2 is a block diagram showing a schematic configuration of the vector processor 100 of FIG.
In FIG. 2, the vector processor 100 includes an instruction decoder 21 that decodes the contents of instructions read from the main memory 110, a status register 22 that stores information about the status of the vector processor 100, and a scalar register register that is used for scalar operations. A register file 23 in which vector registers used for vector operations are arranged, a multiplication unit 24 that performs multiplication, an addition / subtraction / shift unit A25 that performs addition / subtraction and shift operations, an addition / subtraction / shift unit B26 that performs addition / subtraction and shift operations, shift, and AND MOVE unit A27 that performs logical operations such as shift and OR, MOVE unit B28 that performs logical operations such as shift, AND, and OR, ALU 29 that performs arithmetic processing, and an address that generates a load address Forming unit X30, the address generation unit Y31 to generate an address for the load, the address generation unit Z32 to generate an address for the store is provided.

ここで、命令ディコーダ２１には分岐制御を行う分岐制御部２１ａが設けられている。また、ステータスレジスタ２２には、加減算処理時の飽和処理を有効化するための飽和指定フラグを格納するＳビットが割り当てられるとともに、加減算処理およびシフト演算時のオーバーフローフラグを格納するＶＶビットが割り当てられている。さらに、加減算／シフトユニットＡ２５、加減算／シフトユニットＢ２６、ＭＯＶＥユニットＡ２７およびＭＯＶＥユニットＢ２８には、飽和処理部２５ａ〜２８ａがそれぞれ搭載されている。 Here, the instruction decoder 21 is provided with a branch control unit 21a for performing branch control. The status register 22 is assigned an S bit for storing a saturation designation flag for enabling saturation processing during addition / subtraction processing, and a VV bit for storing an overflow flag during addition / subtraction processing and shift operation. ing. Further, saturation processing units 25a to 28a are mounted on the addition / subtraction / shift unit A25, addition / subtraction / shift unit B26, MOVE unit A27, and MOVE unit B28, respectively.

ここで、飽和処理部２５ａは、ステータスレジスタ２２のＳビットを参照し、Ｓビットが“１”の時、加減算／シフトユニットＡ２５で得られた加減算結果に応じて飽和処理を実行するとともに、飽和処理を実行した場合には、ステータスレジスタ２２のＶＶビットに“１”を設定する。また、飽和処理部２６ａは、ステータスレジスタ２２のＳビットを参照し、Ｓビットが“１”の時、加減算／シフトユニットＢ２６で得られた加減算結果に応じて飽和処理を実行するとともに、飽和処理を実行した場合には、ステータスレジスタ２２のＶＶビットに“１”を設定する。 Here, the saturation processing unit 25a refers to the S bit of the status register 22. When the S bit is “1”, the saturation processing unit 25a executes saturation processing according to the addition / subtraction result obtained by the addition / subtraction / shift unit A25 and When the process is executed, “1” is set to the VV bit of the status register 22. The saturation processing unit 26a refers to the S bit of the status register 22. When the S bit is "1", the saturation processing unit 26a executes the saturation processing according to the addition / subtraction result obtained by the addition / subtraction / shift unit B26, and also performs the saturation processing. Is executed, the VV bit of the status register 22 is set to “1”.

また、飽和処理部２７ａは、ＭＯＶＥユニットＡ２７で行われるシフト演算が算術シフトの場合、ＭＯＶＥユニットＡ２７にて得られたシフト演算結果に応じて飽和処理を無条件に実行するとともに、ステータスレジスタ２２のＶＶビットに“１”を設定し、ＭＯＶＥユニットＡ２７にて行われるシフト演算が論理シフトの場合、ＭＯＶＥユニットＡ２７にて得られたシフト演算結果のオーバーフローの有無に関わらず飽和処理を実行しないようにする。また、飽和処理部２８ａは、ＭＯＶＥユニットＢ２８で行われるシフト演算が算術シフトの場合、ＭＯＶＥユニットＢ２８にて得られたシフト演算結果に応じて飽和処理を無条件に実行するとともに、ステータスレジスタ２２のＶＶビットに“１”を設定し、ＭＯＶＥユニットＢ２８にて行われるシフト演算が論理シフトの場合、ＭＯＶＥユニットＢ２８にて得られたシフト演算結果のオーバーフローの有無に関わらず飽和処理を実行しないようにする。 Further, when the shift operation performed in the MOVE unit A27 is an arithmetic shift, the saturation processing unit 27a unconditionally executes the saturation processing according to the shift operation result obtained in the MOVE unit A27, and When the VV bit is set to “1” and the shift operation performed in the MOVE unit A27 is a logical shift, the saturation process is not executed regardless of whether the shift operation result obtained in the MOVE unit A27 overflows. To do. Further, when the shift operation performed in the MOVE unit B28 is an arithmetic shift, the saturation processing unit 28a unconditionally executes the saturation processing according to the shift operation result obtained in the MOVE unit B28, and When the VV bit is set to “1” and the shift operation performed in the MOVE unit B28 is a logical shift, the saturation processing is not executed regardless of whether or not the shift operation result obtained in the MOVE unit B28 overflows. To do.

そして、命令ディコーダ２１は、プログラムバスＰＢおよびプログラムアドレスバスＰＡＢに接続されるとともに、バスＢ１〜Ｂ３をそれぞれ介してレジスタファイル２３、乗算ユニット２４、加減算／シフトユニットＡ２５、加減算／シフトユニットＢ２６、ＭＯＶＥユニットＡ２７、ＭＯＶＥユニットＢ２８、ＡＬＵ２９、アドレス生成ユニットＸ３０、アドレス生成ユニットＹ３１およびアドレス生成ユニットＺ３２に接続されている。また、レジスタファイル２３は、Ｘ−バスＸＢ、Ｙ−バスＸＹ、ＺＸ−バスＺＸＢおよびＺＹ−バスＺＹＢに接続されるとともに、バスＢ４を介して乗算ユニット２４、加減算／シフトユニットＡ２５、加減算／シフトユニットＢ２６、ＭＯＶＥユニットＡ２７、ＭＯＶＥユニットＢ２８、ＡＬＵ２９、アドレス生成ユニットＸ３０、アドレス生成ユニットＹ３１およびアドレス生成ユニットＺ３２に接続されている。また、アドレス生成ユニットＸ３０はＸ−アドレスバスＸＡＢに接続され、アドレス生成ユニットＹ３１はＹ−アドレスバスＹＡＢに接続され、アドレス生成ユニットＺ３２はＺＸ−アドレスバスＺＸＡＢおよびＺＹ−アドレスバスＺＹＡＢに接続されている。 The instruction decoder 21 is connected to the program bus PB and the program address bus PAB, and via the buses B1 to B3, the register file 23, the multiplication unit 24, the addition / subtraction / shift unit A25, the addition / subtraction / shift unit B26, and MOVE. The unit A27, the MOVE unit B28, the ALU 29, the address generation unit X30, the address generation unit Y31, and the address generation unit Z32 are connected. The register file 23 is connected to the X-bus XB, Y-bus XY, ZX-bus ZXB, and ZY-bus ZYB, and via the bus B4, the multiplication unit 24, the addition / subtraction / shift unit A25, and the addition / subtraction / shift. The unit B26, the MOVE unit A27, the MOVE unit B28, the ALU 29, the address generation unit X30, the address generation unit Y31, and the address generation unit Z32 are connected. The address generation unit X30 is connected to the X-address bus XAB, the address generation unit Y31 is connected to the Y-address bus YAB, and the address generation unit Z32 is connected to the ZX-address bus ZXAB and the ZY-address bus ZYAB. Yes.

図３は、図２のレジスタファイル２３の概略構成を示すブロック図である。
図３において、レジスタファイル２３にはスカラーレジスタおよびベクトルレジスタが配置されている。ここで、スカラーレジスタには、例えば、３２ビット分のデータをそれぞれ記憶する記憶領域ＳＲ０〜ＳＲ１５を１６本だけ設けることができる。
また、例えば、ベクトルの要素数が８であるとすると、３２ビット分のデータをそれぞれ記憶する８個の記憶領域ＶＲ０［０］〜ＶＲ０［７］で１本分のベクトルレジスタを構成することができる。そして、例えば、３２ビット分のデータをそれぞれ記憶する６４個の記憶領域ＶＲ０［０］〜ＶＲ０［７］、ＶＲ１［０］〜ＶＲ１［７］、ＶＲ２［０］〜ＶＲ２［７］、ＶＲ３［０］〜ＶＲ３［７］、ＶＲ４［０］〜ＶＲ４［７］、ＶＲ５［０］〜ＶＲ５［７］、ＶＲ６［０］〜ＶＲ６［７］、ＶＲ７［０］〜ＶＲ７［７］を設けることで、８本分のベクトルレジスタを設けることができる。 FIG. 3 is a block diagram showing a schematic configuration of the register file 23 of FIG.
In FIG. 3, the register file 23 includes a scalar register and a vector register. Here, for example, only 16 storage areas SR0 to SR15 for storing 32-bit data can be provided in the scalar register.
For example, assuming that the number of elements of a vector is 8, one vector register can be configured by eight storage areas VR0 [0] to VR0 [7] each storing 32-bit data. it can. Then, for example, 64 storage areas VR0 [0] to VR0 [7], VR1 [0] to VR1 [7], VR2 [0] to VR2 [7], VR3 [ 0] to VR3 [7], VR4 [0] to VR4 [7], VR5 [0] to VR5 [7], VR6 [0] to VR6 [7], VR7 [0] to VR7 [7] are provided. Thus, eight vector registers can be provided.

図４は、ベクトル命令のデータ構造を示す図である。
図４において、ベクトル命令には、乗算や加算などの命令の種類を定義するオペコードｏｐｅｃｏｄｅおよびベクトル演算の実行回数を定義するリピートアマウントｒｐｔａｍｔが設けられている。また、ベクトル命令では、書き込み先のディスティネーションレジスタｄｓｔおよび読み出し先のソースレジスタｓｒｃ１、ｓｒｃ２を指定することができる。また、図４のデータ構造において、リピートアマウントｒｐｔａｍを０にした場合、同じデータ構造を用いてスカラー命令、ベクトル命令を格納することができる。 FIG. 4 is a diagram illustrating the data structure of a vector instruction.
In FIG. 4, the vector instruction is provided with an opcode opecode that defines the type of instruction such as multiplication and addition, and a repeat amount rptamt that defines the number of executions of the vector operation. In the vector instruction, the destination register dst to be written and the source registers src1 and src2 to be read can be designated. Also, in the data structure of FIG. 4, when the repeat amount rptam is set to 0, scalar instructions and vector instructions can be stored using the same data structure.

図５は、図２のベクトルプロセッサ１００のベクトル加算処理を示す図である。
図５において、ベクトル命令において、例えば、オペコードｏｐｅｃｏｄｅにて加算が指定され、リピートアマウントｒｐｔａｍｔで８が指定され、ディスティネーションレジスタｄｓｔとして図３のベクトルレジスタＶＲ０が指定され、ソースレジスタｓｒｃ１、ｓｒｃ２として図３のベクトルレジスタＶＲ１、ＶＲ２がそれぞれ指定されていたものとする。この場合、図２の加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６が加算器Ａ１として選択され、ベクトルレジスタＶＲ１に格納されている要素ａ０〜ａ７およびベクトルレジスタＶＲ２に格納されている要素ｘ０〜ｘ７が加算器Ａ１に順次送られる。そして、加算器Ａ１にて各要素ごとに加算が行われた後、その加算結果がベクトルレジスタＶＲ０に格納される。 FIG. 5 is a diagram showing vector addition processing of the vector processor 100 of FIG.
In FIG. 5, in the vector instruction, for example, addition is designated by the operation code opecode, 8 is designated by the repeat amount rptamt, the vector register VR0 of FIG. 3 is designated as the destination register dst, and the source registers src1, src2 are shown. 3 vector registers VR1 and VR2 are respectively designated. In this case, the addition / subtraction / shift unit A25 or the addition / subtraction / shift unit B26 of FIG. 2 is selected as the adder A1, and the elements a0 to a7 stored in the vector register VR1 and the elements x0 to x7 stored in the vector register VR2 are selected. Are sequentially sent to the adder A1. Then, after addition is performed for each element by the adder A1, the addition result is stored in the vector register VR0.

図６は、図２の飽和処理部２５ａ、２６ａの概略構成を示すブロック図である。
図６において、飽和処理部２５ａ、２６ａは、ステータスレジスタ２２のＳビットおよびＶＶビットに接続することができる。また、オーバーフローした時の飽和値として“０ｘ７ＦＦＦＦＦＦＦ”が設定され、アンダーフローした時の飽和値として“０ｘ８０００００００”が設定されているものとする。そして、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６で加算処理がそれぞれ行われる時に、飽和処理部２５ａ、２６ａは、ステータスレジスタ２２のＳビットを参照し、Ｓビットが“１”の時、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６にて得られた加算結果がオーバーフローまたはアンダーフローしているかどうかをそれぞれ判断する。 FIG. 6 is a block diagram showing a schematic configuration of the saturation processing units 25a and 26a of FIG.
In FIG. 6, the saturation processing units 25 a and 26 a can be connected to the S bit and the VV bit of the status register 22. Further, it is assumed that “0x7FFFFFFF” is set as the saturation value when overflowing, and “0x80000000” is set as the saturation value when underflowing. When the addition processing is performed in the addition / subtraction / shift unit A25 or the addition / subtraction / shift unit B26, the saturation processing units 25a and 26a refer to the S bit of the status register 22, and when the S bit is “1”, the addition / subtraction is performed. It is determined whether or not the addition result obtained by / shift unit A25 or addition / subtraction / shift unit B26 overflows or underflows.

そして、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６にて得られた加算結果がオーバーフローまたはアンダーフローしていない場合、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６にて得られた加算結果をそのまま出力する。一方、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６にて得られた加算結果がオーバーフローしている場合、その加算結果として“０ｘ７ＦＦＦＦＦＦＦ”を出力し、ステータスレジスタ２２のＶＶビットに“１”を設定する。また、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６にて得られた加算結果がアンダーフローしている場合、その加算結果として“０ｘ８０００００００”を出力し、ステータスレジスタ２２のＶＶビットに“１”を設定する。 When the addition result obtained by the addition / subtraction / shift unit A25 or the addition / subtraction / shift unit B26 does not overflow or underflow, the addition result obtained by the addition / subtraction / shift unit A25 or the addition / subtraction / shift unit B26 is used as it is. Output. On the other hand, when the addition result obtained by the addition / subtraction / shift unit A25 or the addition / subtraction / shift unit B26 overflows, “0x7FFFFFFF” is output as the addition result, and “1” is set to the VV bit of the status register 22 To do. If the addition result obtained by the addition / subtraction / shift unit A25 or the addition / subtraction / shift unit B26 underflows, “0x80000000” is output as the addition result, and “1” is output to the VV bit of the status register 22. Set.

一方、飽和処理部２５ａ、２６ａは、ステータスレジスタ２２のＳビットを参照した結果、Ｓビットが“０”の時、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６にてそれぞれ得られた加算結果をそのまま出力する。
これにより、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６の加算演算に伴う飽和処理をハードウェア上で行わせることが可能となり、オーバーフローまたはアンダーフローの有無の判断およびその判断結果に従った分岐処理をプログラムに記述することなく、飽和処理を実行させることが可能となる。このため、加算命令に飽和機能が実装された場合においても、１命令分の演算を１サイクルで実行させることが可能となり、飽和機能が実装された加算命令のベクトル化ができるようにして、飽和機能が実装された加算処理の高速化を図ることができる。 On the other hand, as a result of referring to the S bit of the status register 22 as a result of referring to the S bit of the status register 22, the saturation processing units 25a and 26a indicate the addition results obtained by the addition / subtraction / shift unit A25 or the addition / subtraction / shift unit B26, respectively. Output as is.
As a result, the saturation processing associated with the addition operation of the addition / subtraction / shift unit A25 or the addition / subtraction / shift unit B26 can be performed on the hardware, and it is determined whether there is an overflow or underflow, and branch processing according to the determination result. It is possible to execute saturation processing without describing the above in the program. For this reason, even when the saturation function is implemented in the addition instruction, it is possible to execute the operation for one instruction in one cycle, and the vectorization of the addition instruction in which the saturation function is implemented can be performed. It is possible to speed up the addition processing in which the function is implemented.

また、ステータスレジスタ２２にＳビットを割り当てることにより、飽和処理を実行させるかどうかをプログラムから指定することが可能となる。このため、加算演算に伴う飽和処理のハードウェア化を行った場合においても、加算結果に対して飽和処理が一律に行われることを防止することが可能となり、一つの加算命令を用意することで、飽和機能が実装された加算処理と飽和機能が実装されていない加算処理とを実現することができる。 Further, by assigning the S bit to the status register 22, it is possible to specify from the program whether or not to execute saturation processing. This makes it possible to prevent the saturation process from being uniformly performed on the addition result even when the saturation process associated with the addition operation is implemented by hardware, and by preparing one addition instruction. Thus, it is possible to realize an addition process in which the saturation function is implemented and an addition process in which the saturation function is not implemented.

さらに、ステータスレジスタ２２にＶＶビットを割り当てることにより、ステータスレジスタ２２に設定されたＶＶビットの値を参照することで、演算終了後にプログラマが飽和の有無を検知することが可能となり、デバッグなどの作業に役立てることができる。
図７は、図２の飽和処理部２７ａ、２８ａの概略構成を示すブロック図である。
図７において、飽和処理部２７ａ、２８ａは、ステータスレジスタ２２のＶＶビットに接続することができる。また、オーバーフローした時の飽和値として“０ｘ７ＦＦＦＦＦＦＦ”が設定され、アンダーフローした時の飽和値として“０ｘ８０００００００”が設定されているものとする。そして、飽和処理部２７ａ、２８ａは、ＭＯＶＥユニットＡ２７およびＭＯＶＥユニットＢ２８でそれぞれ行われるシフト演算が算術シフトの場合、ＭＯＶＥユニットＡ２７またはＭＯＶＥユニットＢ２８にて得られた左シフト演算結果がオーバーフローまたはアンダーフローしているかどうかをそれぞれ判断する。 Further, by assigning the VV bit to the status register 22, it is possible to detect the presence or absence of saturation after the operation is completed by referring to the value of the VV bit set in the status register 22, and debugging and the like. Can be useful.
FIG. 7 is a block diagram showing a schematic configuration of the saturation processing units 27a and 28a of FIG.
In FIG. 7, the saturation processing units 27 a and 28 a can be connected to the VV bit of the status register 22. Further, it is assumed that “0x7FFFFFFF” is set as the saturation value when overflowing, and “0x80000000” is set as the saturation value when underflowing. The saturation processing units 27a and 28a, when the shift operation performed in the MOVE unit A27 and the MOVE unit B28 is an arithmetic shift, causes the left shift operation result obtained in the MOVE unit A27 or the MOVE unit B28 to overflow or underflow. Judge whether or not

そして、ＭＯＶＥユニットＡ２７またはＭＯＶＥユニットＢ２８にて得られた左シフト演算結果がオーバーフローまたはアンダーフローしていない場合、ＭＯＶＥユニットＡ２７またはＭＯＶＥユニットＢ２８にて得られた左シフト演算結果をそのまま出力する。一方、ＭＯＶＥユニットＡ２７またはＭＯＶＥユニットＢ２８にて得られた左シフト演算結果がオーバーフローしている場合、その左シフト演算結果として“０ｘ７ＦＦＦＦＦＦＦ”を出力し、ステータスレジスタ２２のＶＶビットに“１”を設定する。また、ＭＯＶＥユニットＡ２７またはＭＯＶＥユニットＢ２８にて得られた左シフト演算結果がアンダーフローしている場合、その左シフト演算結果として“０ｘ８０００００００”を出力し、ステータスレジスタ２２のＶＶビットに“１”を設定する。 If the left shift calculation result obtained by the MOVE unit A27 or the MOVE unit B28 does not overflow or underflow, the left shift calculation result obtained by the MOVE unit A27 or the MOVE unit B28 is output as it is. On the other hand, if the left shift calculation result obtained by MOVE unit A27 or MOVE unit B28 overflows, “0x7FFFFFFF” is output as the left shift calculation result, and “1” is set to the VV bit of status register 22 To do. If the left shift operation result obtained by the MOVE unit A27 or the MOVE unit B28 is underflowing, “0x80000000” is output as the left shift operation result, and “1” is output to the VV bit of the status register 22. Set.

一方、飽和処理部２７ａ、２８ａは、ＭＯＶＥユニットＡ２７およびＭＯＶＥユニットＢ２８でそれぞれ行われるシフト演算が論理シフトの場合、ＭＯＶＥユニットＡ２７およびＭＯＶＥユニットＢ２８にてそれぞれ得られたシフト演算結果をそのまま出力する。
これにより、ＭＯＶＥユニットＡ２７およびＭＯＶＥユニットＢ２８のシフト演算に伴う飽和処理をハードウェア上で行わせることが可能となり、オーバーフローの有無の判断およびその判断結果に従った分岐処理をプログラムに記述することなく、飽和処理を実行させることが可能となる。このため、シフト演算命令に飽和機能が実装された場合においても、１命令分の演算を１サイクルで実行させることが可能となり、飽和機能が実装されたシフト演算命令のベクトル化ができるようにして、飽和機能が実装されたシフト演算処理の高速化を図ることができる。 On the other hand, when the shift operations performed in MOVE unit A27 and MOVE unit B28 are logical shifts, saturation processing units 27a and 28a output the shift operation results respectively obtained in MOVE unit A27 and MOVE unit B28 as they are.
As a result, the saturation processing associated with the shift operation of the MOVE unit A27 and the MOVE unit B28 can be performed on the hardware, and the determination of the presence or absence of overflow and the branch processing according to the determination result are not described in the program. The saturation process can be executed. For this reason, even when the saturation function is implemented in the shift operation instruction, it is possible to execute the operation for one instruction in one cycle, and to enable vectorization of the shift operation instruction in which the saturation function is implemented. Therefore, it is possible to increase the speed of the shift operation processing in which the saturation function is implemented.

また、算術シフト命令をプログラム上に記述することで、飽和機能が実装されたシフト演算処理を実現することが可能となるとともに、論理シフト命令をプログラム上に記述することで、飽和機能が実装されていないシフト演算処理を実現することが可能となる。このため、シフト演算に伴う飽和処理のハードウェア化を行った場合においても、シフト演算結果に対して飽和処理が一律に行われることを防止することが可能となり、等しい演算結果が得られる２種類の冗長なシフト演算命令を使い分けることで、飽和機能が実装されたシフト演算処理と飽和機能が実装されていないシフト演算処理とを実現することができる。 In addition, by describing the arithmetic shift instruction on the program, it is possible to realize the shift operation processing in which the saturation function is implemented, and the saturation function is implemented by describing the logical shift instruction on the program. It is possible to realize shift operation processing that is not performed. For this reason, even when the hardware of the saturation processing associated with the shift operation is performed, it is possible to prevent the saturation processing from being uniformly performed on the shift operation result, and two types that can obtain the same operation result. By using different redundant shift operation instructions, it is possible to realize shift operation processing in which the saturation function is implemented and shift operation processing in which the saturation function is not implemented.

さらに、ステータスレジスタ２２にＶＶビットを割り当てることにより、ステータスレジスタ２２に設定されたＶＶビットの値を参照することで、演算終了後にプログラマが飽和の有無を検知することが可能となり、デバッグなどの作業に役立てることができる。
図８は、図２のベクトルプロセッサ１００のパイプライン処理を示すブロック図である。 Further, by assigning the VV bit to the status register 22, it is possible to detect the presence or absence of saturation after the operation is completed by referring to the value of the VV bit set in the status register 22, and debugging and the like. Can be useful.
FIG. 8 is a block diagram showing pipeline processing of the vector processor 100 of FIG.

図８のサイクルＣ１において、ベクトル命令のインストラクションフェッチＩＦが行われると、そのベクトル命令が図２の命令ディコーダ２１に送られる。そして、インストラクションフェッチＩＦが行われたベクトル命令が図２の命令ディコーダ２１に送られると、サイクルＣ２において、デコードＲＤが命令ディコーダ２１にて行われ、ベクトル命令の解読が行われる。そして、命令ディコーダ２１にてベクトル命令を解読することにより、オペコードｏｐｅｃｏｄｅおよびリピートアマウントｒｐｔａｍｔが抽出され、そのベクトル命令では、どのような種類のベクトル演算が何回分だけ行われるかを判別することができる。 When the instruction fetch IF of a vector instruction is performed in cycle C1 in FIG. 8, the vector instruction is sent to the instruction decoder 21 in FIG. When the vector instruction on which the instruction fetch IF has been performed is sent to the instruction decoder 21 in FIG. 2, the decode RD is performed in the instruction decoder 21 in cycle C2, and the vector instruction is decoded. Then, by decoding the vector instruction by the instruction decoder 21, the operation code opecode and repeat amount rptamt are extracted, and in the vector instruction, it is possible to determine what kind of vector operation is performed how many times. .

ここで、ベクトル命令のオペコードｏｐｅｃｏｄｅで加算が指定され、リピートアマウントｒｐｔａｍｔに４が設定されているものとすると、サイクルＣ３において、ベクトル命令の１回目の加算に対応した実行ＥＸＥが、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６にて行われる。そして、飽和処理部２５ａ、２６ａは、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６にて実行ＥＸＥが行われる時に、ステータスレジスタ２２のＳビットを参照し、Ｓビットが“１”の時、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６にて得られた加算結果がオーバーフローまたはアンダーフローしているかどうかをそれぞれ判断する。 Here, assuming that addition is specified by the opcode opecode of the vector instruction and 4 is set in the repeat amount rptamt, the execution EXE corresponding to the first addition of the vector instruction is added / subtracted / shifted in cycle C3. This is performed by A25 or addition / subtraction / shift unit B26. The saturation processing units 25a and 26a refer to the S bit of the status register 22 when the execution EXE is performed in the addition / subtraction / shift unit A25 or the addition / subtraction / shift unit B26, and when the S bit is “1”, the addition / subtraction is performed. It is determined whether or not the addition result obtained by / shift unit A25 or addition / subtraction / shift unit B26 overflows or underflows.

そして、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６にて得られた加算結果がオーバーフローまたはアンダーフローしている場合、その加算結果として飽和値を出力し、ステータスレジスタ２２のＶＶビットに“１”を設定する。
次に、サイクルＣ４において、ベクトル命令の１回目の加算結果の記憶ＭＥＭがベクトルレジスタに行われるとともに、ベクトル命令の２回目の加算に対応した実行ＥＸＥが行われる。そして、飽和処理部２５ａ、２６ａは、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６にて実行ＥＸＥが行われる時に、ステータスレジスタ２２のＳビットを参照し、Ｓビットが“１”の時、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６にて得られた加算結果がオーバーフローまたはアンダーフローしているかどうかをそれぞれ判断する。 If the addition result obtained by the addition / subtraction / shift unit A25 or the addition / subtraction / shift unit B26 overflows or underflows, a saturation value is output as the addition result, and “1” is output to the VV bit of the status register 22. Set.
Next, in cycle C4, storage MEM of the first addition result of the vector instruction is performed in the vector register, and execution EXE corresponding to the second addition of the vector instruction is performed. The saturation processing units 25a and 26a refer to the S bit of the status register 22 when the execution EXE is performed in the addition / subtraction / shift unit A25 or the addition / subtraction / shift unit B26, and when the S bit is “1”, the addition / subtraction is performed. It is determined whether or not the addition result obtained by / shift unit A25 or addition / subtraction / shift unit B26 overflows or underflows.

そして、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６にて得られた加算結果がオーバーフローまたはアンダーフローしている場合、その加算結果として飽和値を出力し、ステータスレジスタ２２のＶＶビットに“１”を設定する。
次に、サイクルＣ５おいて、ベクトル命令の２回目の加算結果の記憶ＭＥＭがベクトルレジスタに行われるとともに、ベクトル命令の３回目の加算に対応した実行ＥＸＥが行われる。そして、飽和処理部２５ａ、２６ａは、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６にて実行ＥＸＥが行われる時に、ステータスレジスタ２２のＳビットを参照し、Ｓビットが“１”の時、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６にて得られた加算結果がオーバーフローまたはアンダーフローしているかどうかをそれぞれ判断する。 If the addition result obtained by the addition / subtraction / shift unit A25 or the addition / subtraction / shift unit B26 overflows or underflows, a saturation value is output as the addition result, and “1” is output to the VV bit of the status register 22. Set.
Next, in cycle C5, storage MEM of the second addition result of the vector instruction is performed in the vector register, and execution EXE corresponding to the third addition of the vector instruction is performed. The saturation processing units 25a and 26a refer to the S bit of the status register 22 when the execution EXE is performed in the addition / subtraction / shift unit A25 or the addition / subtraction / shift unit B26, and when the S bit is “1”, the addition / subtraction is performed. It is determined whether or not the addition result obtained by / shift unit A25 or addition / subtraction / shift unit B26 overflows or underflows.

そして、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６にて得られた加算結果がオーバーフローまたはアンダーフローしている場合、その加算結果として飽和値を出力し、ステータスレジスタ２２のＶＶビットに“１”を設定する。
次に、サイクルＣ６おいて、ベクトル命令の３回目の加算結果の記憶ＭＥＭがベクトルレジスタに行われるとともに、ベクトル命令の４回目の加算に対応した実行ＥＸＥが行われる。そして、飽和処理部２５ａ、２６ａは、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６にて実行ＥＸＥが行われる時に、ステータスレジスタ２２のＳビットを参照し、Ｓビットが“１”の時、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６にて得られた加算結果がオーバーフローまたはアンダーフローしているかどうかをそれぞれ判断する。 If the addition result obtained by the addition / subtraction / shift unit A25 or the addition / subtraction / shift unit B26 overflows or underflows, a saturation value is output as the addition result, and “1” is output to the VV bit of the status register 22. Set.
Next, in cycle C6, the storage MEM of the third addition result of the vector instruction is performed in the vector register, and the execution EXE corresponding to the fourth addition of the vector instruction is performed. The saturation processing units 25a and 26a refer to the S bit of the status register 22 when the execution EXE is performed in the addition / subtraction / shift unit A25 or the addition / subtraction / shift unit B26, and when the S bit is “1”, the addition / subtraction is performed. It is determined whether or not the addition result obtained by / shift unit A25 or addition / subtraction / shift unit B26 overflows or underflows.

そして、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６にて得られた加算結果がオーバーフローまたはアンダーフローしている場合、その加算結果として飽和値を出力し、ステータスレジスタ２２のＶＶビットに“１”を設定する。
次に、サイクルＣ７おいて、ベクトル命令の４回目の加算結果の記憶ＭＥＭがベクトルレジスタに行われ、今回インストラクションフェッチＩＦが行われたベクトル命令の処理が完了する。 If the addition result obtained by the addition / subtraction / shift unit A25 or the addition / subtraction / shift unit B26 overflows or underflows, a saturation value is output as the addition result, and “1” is output to the VV bit of the status register 22. Set.
Next, in cycle C7, the storage MEM of the fourth addition result of the vector instruction is performed in the vector register, and the processing of the vector instruction for which the instruction fetch IF has been performed this time is completed.

図９は、本発明の一実施形態に係るＬ_ＭＡＣ（）関数のハンドアセンブル例を示す図である。
加減算／シフトユニットＡ２５、加減算／シフトユニットＢ２６、ＭＯＶＥユニットＡ２７およびＭＯＶＥユニットＢ２８に飽和処理部２５ａ〜２８ａがそれぞれ搭載されない場合には、１回分の演算を行うごとにオーバーフローの有無を判断し、その判断結果に従って分岐処理を行う必要がある。このため、Ｌ_ＭＡＣ関数のソースコードには、図１７に示したように、条件分岐などの命令を挿入する必要があり、図１７のソースコードのコンパイル例は、図１８に示すように、３６行にも及ぶ膨大な関数になる。 FIG. 9 is a diagram illustrating an example of hand assembly of the L_MAC () function according to an embodiment of the present invention.
When the addition / subtraction / shift unit A25, the addition / subtraction / shift unit B26, the MOVE unit A27, and the MOVE unit B28 are not equipped with the saturation processing units 25a to 28a, the presence / absence of overflow is determined every time one calculation is performed. It is necessary to perform branch processing according to the determination result. Therefore, it is necessary to insert an instruction such as a conditional branch into the source code of the L_MAC function as shown in FIG. 17, and the source code compilation example of FIG. 17 has 36 lines as shown in FIG. It becomes a huge function that extends to.

これに対して、加減算／シフトユニットＡ２５、加減算／シフトユニットＢ２６、ＭＯＶＥユニットＡ２７およびＭＯＶＥユニットＢ２８に飽和処理部２５ａ〜２８ａをそれぞれ搭載することにより、図９に示すように、乗算を行わせるＭＰＨ命令、飽和機能付きシフト演算を行わせるＳＡＷ命令および飽和機能付き加算を行わせるＡＤＤＷ命令の３個の命令でＬ_ＭＡＣ（）関数を実現することができる。 On the other hand, by adding saturation processing units 25a to 28a to the addition / subtraction / shift unit A25, the addition / subtraction / shift unit B26, the MOVE unit A27, and the MOVE unit B28, as shown in FIG. The L_MAC () function can be realized by three instructions: an instruction, a SAW instruction for performing a shift operation with a saturation function, and an ADDW instruction for performing an addition with a saturation function.

図１０は、本発明の一実施形態に係るベクトル命令化されたＬ_ＭＡＣ（）関数の使用例を示す図、図１１は、図１０のソースコードのハンドアセンブル例を示す図である。
加減算／シフトユニットＡ２５、加減算／シフトユニットＢ２６、ＭＯＶＥユニットＡ２７およびＭＯＶＥユニットＢ２８に飽和処理部２５ａ〜２８ａがそれぞれ搭載されない場合、ベクトル命令を使うことができないので、例えば、Ｌ_ＭＡＣ（）関数を４０回だけ実行させるには、図１８に示した３６行にも及ぶ膨大な関数を４０回も繰り返す必要がある。 FIG. 10 is a diagram illustrating a usage example of the L_MAC () function converted into a vector instruction according to an embodiment of the present invention, and FIG. 11 is a diagram illustrating a hand assembly example of the source code of FIG.
When the addition / subtraction / shift unit A25, addition / subtraction / shift unit B26, MOVE unit A27, and MOVE unit B28 are not equipped with the saturation processing units 25a to 28a, vector instructions cannot be used. For example, the L_MAC () function is executed 40 times. In order to execute only this, it is necessary to repeat the huge function of 36 lines shown in FIG. 18 40 times.

これに対して、加減算／シフトユニットＡ２５、加減算／シフトユニットＢ２６、ＭＯＶＥユニットＡ２７およびＭＯＶＥユニットＢ２８に飽和処理部２５ａ〜２８ａをそれぞれ搭載することにより、図８に示したように、飽和機能付き加算の実行および飽和機能付きシフト演算の実行をそれぞれ１サイクルで行わせることができる。このため、図１０に示すように、Ｌ_ＭＡＣ（）関数をベクトル命令化することが可能となり、例えば、Ｌ_ＭＡＣ（）関数を４０回実行させる場合においても、図１１に示すように、長さ８のベクトル命令を５回発行すればよい。この結果、図１０の例では、ベクトル命令を使わないでＬ_ＭＡＣ（）関数を実行させた場合に比べて、約２０倍の速度向上が見られた。 On the other hand, by adding saturation processing units 25a to 28a to the addition / subtraction / shift unit A25, addition / subtraction / shift unit B26, MOVE unit A27, and MOVE unit B28, as shown in FIG. And the shift operation with a saturation function can be executed in one cycle. Therefore, as shown in FIG. 10, the L_MAC () function can be converted into a vector instruction. For example, when the L_MAC () function is executed 40 times, as shown in FIG. It is sufficient to issue vector instructions five times. As a result, in the example of FIG. 10, the speed was improved by about 20 times compared to the case where the L_MAC () function was executed without using a vector instruction.

これにより、既存のベクトルプロセッサの命令セットとの整合性を保ちつつ、命令の変更量を極力少なくしてＬ_ＭＡＣ（）関数の高速化を図ることができ、ＡＭＲコーディックのベクトルプロセッサ１００上で実現することが可能となる。
なお、図１１の例では、コメントアウトしたＮＯＰが１回につき４度書かれているが、インターロックがかからないユニットなら、ここに他の命令を書くこともできる。例えば、積和演算中に絶えず積和値を確認し、その値が飽和していたら、即座に別の処理に移行させるようにしてもよい。 As a result, while maintaining consistency with the instruction set of an existing vector processor, the amount of instruction change can be reduced as much as possible to increase the speed of the L_MAC () function, which is realized on the vector processor 100 of the AMR codec. It becomes possible.
In the example of FIG. 11, the commented-out NOP is written four times at a time. However, other instructions can be written here if the unit does not require interlock. For example, the product-sum value may be constantly checked during the product-sum operation, and if the value is saturated, the processing may be immediately shifted to another process.

図１２は、本発明の一実施形態に係るステータスレジスタの飽和機能の指定方法の一例を示す図である。
飽和機能をオン／オフする場合、例えば、図１２に示すような命令をソースコードに挿入することができる。これにより、ソースコード中で飽和機能のオン／オフを指定することができ、プログラマが飽和機能のオン／オフを指定することができる。なお、図１２の例では、インラインアセンブラであるが、アセンブラファイルに直接記述するようにしてもよい。 FIG. 12 is a diagram showing an example of a method for specifying the saturation function of the status register according to an embodiment of the present invention.
When the saturation function is turned on / off, for example, an instruction as shown in FIG. 12 can be inserted into the source code. Thereby, on / off of the saturation function can be designated in the source code, and the programmer can designate on / off of the saturation function. In the example of FIG. 12, an inline assembler is used, but it may be described directly in an assembler file.

図１３は、本発明の一実施形態に係るＶＶビットの検出関数の一例を示す図、図１４は、本発明の一実施形態に係るオーバーフローの検出方法の一例を示す図である。
図１３において、オーバーフローの有無を確認するために、Ｔｅｓｔ_Ｏｖｅｒｆｌｏｗ（）という命令を定義するソースコードをヘッダファイルに書いておく。そして、図１４に示すように、オーバーフローを調べたい箇所で、図１３のインラインアセンブラを呼び出せばよい。この方法では、ｇｌｏｂａｌ変数“Ｏｖｅｒｆｌｏｗ”が既に１の時かＶＶビットが１の時、“Ｏｖｅｒｆｌｏｗ”を１にできるため、オリジナルソースコードの変数Ｏｖｅｒｆｌｏｗと競合することなく、オーバーフローの検出ができる。 FIG. 13 is a diagram illustrating an example of a VV bit detection function according to an embodiment of the present invention, and FIG. 14 is a diagram illustrating an example of an overflow detection method according to an embodiment of the present invention.
In FIG. 13, in order to confirm the presence or absence of overflow, source code defining an instruction Test_Overflow () is written in a header file. Then, as shown in FIG. 14, the inline assembler shown in FIG. In this method, when the global variable “Overflow” is already 1 or the VV bit is 1, “Overflow” can be set to 1, so that overflow can be detected without conflicting with the variable Overflow of the original source code.

図１５は、図２の加減算／シフトユニットＡ２５および加減算／シフトユニットＢ２６でそれぞれ実行可能な飽和機能付き加減算命令の一覧を示す図である。
図１５において、ＡＤＤＷ命令では、レジスタｓｒｃ１とレジスタｓｒｃ２のワードの加算が行われる。なお、ＡＤＤＷ命令では、ｓｈａｍｔで指定されたビット数分だけ加算結果の算術右シフトができる。ＡＤＤ２Ｈ命令では、レジスタｓｒｃ１とレジスタｓｒｃ２の上位と下位のハーフワードごとに加算が行われる。なお、ＡＤＤ２Ｈ命令では、ｓｈａｍｔで指定されたビット数分だけ加算結果の算術右シフトができる。 FIG. 15 is a diagram showing a list of addition / subtraction instructions with a saturation function that can be executed by the addition / subtraction / shift unit A25 and the addition / subtraction / shift unit B26 of FIG.
In FIG. 15, in the ADDW instruction, the words of the registers src1 and src2 are added. In the ADDW instruction, the addition result can be arithmetically shifted to the right by the number of bits specified by shamt. In the ADD2H instruction, addition is performed for each of the upper and lower halfwords of the registers src1 and src2. In the ADD2H instruction, the addition result can be arithmetically shifted to the right by the number of bits specified by the shamt.

ＳＵＢＷ命令では、レジスタｓｒｃ１からレジスタｓｒｃ２のワードの減算が行われる。なお、ＳＵＢＷ命令では、ｓｈａｍｔで指定されたビット数分だけ加算結果の算術右シフトができる。ＳＵＢ２Ｈ命令では、レジスタｓｒｃ１からレジスタｓｒｃ２の上位と下位のハーフワードごとに減算が行われる。なお、ＳＵＢ２Ｈ命令では、ｓｈａｍｔで指定されたビット数分だけ減算結果の算術右シフトができる。 In the SUBW instruction, the word of the register src2 is subtracted from the register src1. In the SUBW instruction, the addition result can be arithmetically shifted to the right by the number of bits specified by shamt. In the SUB2H instruction, subtraction is performed from the register src1 for each of the upper and lower halfwords of the register src2. In the SUB2H instruction, the subtraction result can be arithmetically shifted to the right by the number of bits specified by shamt.

ＡＤＳＢ２Ｈ命令では、レジスタｓｒｃ１とレジスタｓｒｃ２の上位と下位のハーフワードごとに加減算が行われる。ここで、下位のハーフワードに対しては加算が行われ、上位のハーフワードに対しては減算が行われる。なお、ＡＤＳＢ２Ｈ命令では、ｓｈａｍｔで指定されたビット数分だけ加減算結果の算術右シフトができる。ＳＢＡＤ２Ｈ命令では、レジスタｓｒｃ１とレジスタｓｒｃ２の上位と下位のハーフワードごとに加減算が行われる。ここで、下位のハーフワードに対しては減算が行われ、上位のハーフワードに対しては加算が行われる。なお、ＳＢＡＤ２Ｈ命令では、ｓｈａｍｔで指定されたビット数分だけ加減算結果の算術右シフトができる。 In the ADSB2H instruction, addition / subtraction is performed for each of the upper and lower halfwords of the register src1 and the register src2. Here, addition is performed on the lower halfword and subtraction is performed on the upper halfword. In the ADSB2H instruction, the arithmetic right shift of the addition / subtraction result can be performed by the number of bits specified by shamt. In the SBAD2H instruction, addition / subtraction is performed for each of the upper and lower halfwords of the registers src1 and src2. Here, subtraction is performed on the lower halfword and addition is performed on the upper halfword. In the SBAD2H instruction, the arithmetic right shift of the addition / subtraction result can be performed by the number of bits specified by the shamt.

ＡＤＤＷＩ命令では、レジスタｓｒｃ１と１０ビットの拡張符号ｉｍｍｅｄｉａｔｅのワードの加算が行われる。ＡＤＤ２ＨＩ命令では、レジスタｓｒｃ１の上位と下位の各々のハーフワードに対して１０ビットの拡張符号ｉｍｍｅｄｉａｔｅの加算が行われる。
ＡＤＤ２Ｗ命令では、ＡＤＤＷ命令のデュアル・ユニット命令が発行される。ＡＤＤ４Ｈ命令では、ＡＤＤ２Ｈ命令のデュアル・ユニット命令が発行される。ＳＵＢ２Ｗ命令では、ＳＵＢＷ命令のデュアル・ユニット命令が発行される。ＳＵＢ４Ｈ命令では、ＳＵＢ２Ｈ命令のデュアル・ユニット命令が発行される。ＡＤＳＢ４Ｈ命令では、ＡＤＳＢ２Ｈ命令のデュアル・ユニット命令が発行される。ＳＢＡＤ４Ｈ命令では、ＳＢＡＤ２Ｈ命令のデュアル・ユニット命令が発行される。ＡＤＤ２ＷＩ命令では、ＡＤＤＷＩ命令のデュアル・ユニット命令が発行される。ＡＤＤ４ＨＩ命令では、ＡＤＤ２ＨＩ命令のデュアル・ユニット命令が発行される。 In the ADDWI instruction, the word of the register src1 and the 10-bit extension code immediate is added. In the ADD2HI instruction, the 10-bit extension code “immediate” is added to the upper and lower halfwords of the register src1.
In the ADD2W instruction, a dual unit instruction of the ADDW instruction is issued. In the ADD4H instruction, a dual unit instruction of the ADD2H instruction is issued. In the SUB2W instruction, a dual unit instruction of the SUBW instruction is issued. In the SUB4H instruction, a dual unit instruction of the SUB2H instruction is issued. In the ADSB4H instruction, a dual unit instruction of the ADSB2H instruction is issued. In the SBAD4H instruction, a dual unit instruction of the SBAD2H instruction is issued. In the ADD2WI instruction, a dual unit instruction of the ADDWI instruction is issued. In the ADD4HI instruction, a dual unit instruction of the ADD2HI instruction is issued.

ここで、1)の命令では、ステータスレジスタ２２のＳビットが“１”の時、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６にて得られたシフト前の演算結果を符号付き３２ビットの値と解釈する。そして、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６にて得られたシフト前の演算結果がオーバーフローオーバーフローしている場合、“０ｘ７ＦＦＦＦＦＦＦ”に飽和し、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６にて得られたシフト前の演算結果がアンダーフローしている場合、“０ｘ８０００００００”に飽和する。そして、飽和処理した場合には、ステータスレジスタ２２のＶＶビットに“１”を設定する。なお、1)´の命令ではシフトは行わない。 Here, in the instruction 1), when the S bit of the status register 22 is “1”, the operation result before the shift obtained by the addition / subtraction / shift unit A25 or the addition / subtraction / shift unit B26 is a signed 32-bit value. To be interpreted. If the pre-shift operation result obtained by the addition / subtraction / shift unit A25 or the addition / subtraction / shift unit B26 overflows, it is saturated to “0x7FFFFFFF”, and the addition / subtraction / shift unit A25 or addition / subtraction / shift unit B26 When the operation result before shifting obtained in this way is underflowing, it is saturated to “0x80000000”. When the saturation processing is performed, “1” is set to the VV bit of the status register 22. Note that the 1) ′ instruction does not shift.

また、2)の命令では、ステータスレジスタ２２のＳビットが“１”の時、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６にて得られたシフト前の演算結果を上位および下位とも符号付き１６ビットの値と解釈する。そして、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６にて得られたシフト前の演算結果がオーバーフローオーバーフローしている場合、“０ｘ７ＦＦＦ”に飽和し、加減算／シフトユニットＡ２５または加減算／シフトユニットＢ２６にて得られたシフト前の演算結果がアンダーフローしている場合、“０ｘ８０００”に飽和する。そして、飽和処理した場合には、ステータスレジスタ２２のＶＶビットに“１”を設定する。なお、2)´の命令ではシフトは行わない。 Further, in the instruction 2), when the S bit of the status register 22 is “1”, the calculation result before the shift obtained by the addition / subtraction / shift unit A25 or the addition / subtraction / shift unit B26 is signed 16 Interpreted as bit value. When the pre-shift operation result obtained by the addition / subtraction / shift unit A25 or the addition / subtraction / shift unit B26 overflows, it is saturated to “0x7FFF”, and the addition / subtraction / shift unit A25 or addition / subtraction / shift unit B26 When the operation result before shifting obtained underflows underflows, it is saturated to “0x8000”. When the saturation processing is performed, “1” is set to the VV bit of the status register 22. Note that the 2) ′ instruction does not shift.

図１６は、図２のＭＯＶＥユニットＡ２７およびＭＯＶＥユニットＢ２８でそれぞれ実行可能な飽和機能付きシフト命令の一覧を示す図である。
図１６において、ＳＡＷ命令では、レジスタｓｒｃ１のワードをｓｈａｍｔで指定されたビット数分だけ算術シフトする。ＳＡ２Ｈ命令では、レジスタｓｒｃ１の上位と下位のハーフワードごとにｓｈａｍｔで指定されたビット数分だけ算術シフトする。ＳＡＶＷ命令では、レジスタｓｒｃ１のワードをレジスタｓｒｃ２で指定されたビット数分だけ算術シフトする。ＳＡ２Ｗ命令では、２つのレジスタの２ワードを各々ｓｈａｍｔで指定されたビット数分だけ算術シフトする。ＳＡ４Ｈ命令では、２つのレジスタの上位と下位のハーフワードを、各々ｓｈａｍｔで指定されたビット数分だけ算術シフトする。ＳＡＶ２Ｗ命令では、２並列でレジスタｓｒｃ１のワードをレジスタｓｒｃ２で指定されたビット数分だけ算術シフトする。 FIG. 16 is a diagram showing a list of shift instructions with a saturation function that can be executed by the MOVE unit A27 and the MOVE unit B28 of FIG.
In FIG. 16, in the SAW instruction, the word of the register src1 is arithmetically shifted by the number of bits specified by shamt. In the SA2H instruction, an arithmetic shift is performed by the number of bits specified by shamt for each of the upper and lower halfwords of the register src1. In the SAVW instruction, the word of the register src1 is arithmetically shifted by the number of bits specified by the register src2. In the SA2W instruction, the two words of the two registers are arithmetically shifted by the number of bits specified by the shamt. In the SA4H instruction, the upper and lower halfwords of the two registers are arithmetically shifted by the number of bits specified by shamt. In the SAV2W instruction, the word of the register src1 is arithmetically shifted by the number of bits designated by the register src2 in two parallel.

ここで、1)の命令では、左シフトの場合、ＭＯＶＥユニットＡ２７またはＭＯＶＥユニットＢ２８で得られた演算結果の３２ビットの値がオーバーフローしている場合、“０ｘ７ＦＦＦＦＦＦＦ”に飽和し、ＭＯＶＥユニットＡ２７またはＭＯＶＥユニットＢ２８で得られた演算結果の３２ビットの値がアンダーフローしている場合、“０ｘ８０００００００”に飽和する。そして、飽和処理した場合には、ステータスレジスタ２２のＶＶビットに“１”を設定する。 Here, in the instruction 1), in the case of left shift, when the 32-bit value of the operation result obtained by the MOVE unit A27 or the MOVE unit B28 overflows, it is saturated to “0x7FFFFFFF”, and the MOVE unit A27 or When the 32-bit value of the calculation result obtained by the MOVE unit B28 is underflowing, it is saturated to “0x80000000”. When the saturation processing is performed, “1” is set to the VV bit of the status register 22.

また、2)の命令では、左シフトの場合、ＭＯＶＥユニットＡ２７またはＭＯＶＥユニットＢ２８で得られた演算結果の上位または下位の１６ビットの値がオーバーフローしている場合、“０ｘ７ＦＦＦ”に飽和し、ＭＯＶＥユニットＡ２７またはＭＯＶＥユニットＢ２８で得られた上位または下位の１６ビットの値がアンダーフローしている場合、“０ｘ８０００”に飽和する。そして、飽和処理した場合には、ステータスレジスタ２２のＶＶビットに“１”を設定する。 In the case of the instruction 2), in the case of left shift, if the upper or lower 16-bit value of the operation result obtained by the MOVE unit A27 or the MOVE unit B28 overflows, it is saturated to “0x7FFF”, and the MOVE When the upper or lower 16-bit value obtained by the unit A27 or the MOVE unit B28 is underflowing, it is saturated to “0x8000”. When the saturation processing is performed, “1” is set to the VV bit of the status register 22.

本発明の一実施形態に係るコンピュータシステムの概略構成を示すブロック図。1 is a block diagram showing a schematic configuration of a computer system according to an embodiment of the present invention. 図１のベクトルプロセッサ１００の概略構成を示すブロック図。FIG. 2 is a block diagram showing a schematic configuration of the vector processor 100 of FIG. 1. 図２のレジスタファイル２３の概略構成を示すブロック図。FIG. 3 is a block diagram showing a schematic configuration of a register file 23 in FIG. 2. 本発明の一実施形態に係るベクトル命令のデータ構造を示す図である。It is a figure which shows the data structure of the vector command which concerns on one Embodiment of this invention. 図２のベクトルプロセッサ１００のベクトル加算処理を示す図。The figure which shows the vector addition process of the vector processor 100 of FIG. 図２の飽和処理部２５ａ、２６ａの概略構成を示すブロック図。The block diagram which shows schematic structure of the saturation process parts 25a and 26a of FIG. 図２の飽和処理部２７ａ、２８ａの概略構成を示すブロック図。The block diagram which shows schematic structure of the saturation process parts 27a and 28a of FIG. 図２のベクトルプロセッサ１００のパイプライン処理を示すブロック図。FIG. 3 is a block diagram showing pipeline processing of the vector processor 100 of FIG. 2. 本発明の一実施形態に係るＬ_ＭＡＣ（）関数のハンドアセンブル例を示す図。The figure which shows the hand assembly example of the L_MAC () function which concerns on one Embodiment of this invention. ベクトル命令化されたＬ_ＭＡＣ（）関数の使用例を示す図。The figure which shows the usage example of L_MAC () function made into vector instruction. ベクトル命令化されたＬ_ＭＡＣ（）関数のハンドアセンブル例を示す図。The figure which shows the hand assembly example of L_MAC () function made into vector instruction. ステータスレジスタの飽和機能の指定方法の一例を示す図。The figure which shows an example of the designation | designated method of the saturation function of a status register. 本発明の一実施形態に係るＶＶビットの検出関数の一例を示す図。The figure which shows an example of the detection function of the VV bit which concerns on one Embodiment of this invention. 本発明の一実施形態に係るオーバーフローの検出方法の一例を示す図。The figure which shows an example of the detection method of the overflow which concerns on one Embodiment of this invention. 本発明の一実施形態に係る飽和機能付き加減算命令の一覧を示す図。The figure which shows the list of the addition / subtraction instruction with a saturation function which concerns on one Embodiment of this invention. 本発明の一実施形態に係る飽和機能付きシフト命令の一覧を示す図。The figure which shows the list of the shift instruction with a saturation function which concerns on one Embodiment of this invention. Ｌ_ＭＡＣ（）関数のソースコードの一例を示す図。The figure which shows an example of the source code of a L_MAC () function. Ｌ_ＭＡＣ（）関数のコンパイル例を示す図。The figure which shows the compilation example of a L_MAC () function.

Explanation of symbols

１００ベクトルプロセッサ、２１命令デコータ、２１ａ分岐制御部、２２ステータスレジスタ、２３レジスタファイル、２４乗算ユニット、２５加減算／シフトユニットＡ、２６加減算／シフトユニットＢ、２７ＭＯＶＥユニットＡ、２８ＭＯＶＥユニットＢ、２５ａ〜２８ａ飽和処理部、２９ＡＬＵ、３０、アドレス生成ユニットＸ、３１、アドレス生成ユニットＹ、３２、アドレス生成ユニットＺ、Ｂ１〜Ｂ４バス、１１０メインメモリ、１１１プログラムテキスト領域、１１２初期化済みデータ領域、１１３未初期化データ領域、１１４ヒープ領域、１１５スタック領域、１２０入力部、１３０出力部、１４０通信部、２００アセンブリソースコード、２１０アセンブラ、２２０リンカ、２３０実行プログラム、２４０プログラムローダ、Ａ１加算器、ＶＲ０〜ＶＲ２ベクトルレジスタ 100 vector processor, 21 instruction decoder, 21a branch control unit, 22 status register, 23 register file, 24 multiplication unit, 25 addition / subtraction / shift unit A, 26 addition / subtraction / shift unit B, 27 MOVE unit A, 28 MOVE unit B, 25a ˜28a Saturation processing unit, 29 ALU, 30, address generation unit X, 31, address generation unit Y, 32, address generation unit Z, B1-B4 bus, 110 main memory, 111 program text area, 112 initialized data area , 113 Uninitialized data area, 114 heap area, 115 stack area, 120 input section, 130 output section, 140 communication section, 200 assembly source code, 210 assembler, 220 linker, 230 Line program, 240 program loader, A1 adder, VR0-VR2 vector register

Claims

A vector adder for performing vector addition processing;
An arithmetic processing apparatus, comprising: a saturation processing unit that is mounted on the vector adder and performs a saturation process based on an addition result obtained by the vector adder.

A status register for storing a saturation designation flag for enabling the saturation processing;
The arithmetic processing apparatus according to claim 1, wherein the saturation processing unit performs saturation processing based on a reference result of the saturation designation flag.

A vector shift computing unit for performing vector shift computing processing;
An arithmetic processing apparatus, comprising: a saturation processing unit mounted on the vector shift computing unit and performing a saturation processing based on a shift computation result obtained by the vector shift computing unit.

When the shift operation performed by the vector shift computing unit is an arithmetic shift, the saturation processing unit unconditionally executes saturation processing according to the shift operation result obtained by the vector shift computing unit, and The saturation processing is not executed regardless of whether or not the shift operation result obtained by the vector shift operation unit overflows when the shift operation performed by the shift operation unit is a logical shift. 3. The arithmetic processing apparatus according to 3.

A vector multiplier for performing vector multiplication processing;
A vector adder for performing vector addition processing;
A first saturation processing unit mounted on the vector adder and performing saturation processing based on the addition result obtained by the vector adder;
A vector shift computing unit for performing vector shift computing processing;
An arithmetic processing apparatus, comprising: a second saturation processing unit that is mounted on the vector shift computing unit and performs saturation processing based on a shift result obtained by the vector shift computing unit.

The arithmetic processing apparatus according to claim 1, wherein the saturation processing unit sets an overflow flag in the status register based on presence or absence of the saturation processing.

A vector multiply instruction that does not implement the saturation function;
A vector addition instruction with a saturation function implemented;
An arithmetic processing program for causing a computer to execute a vector shift instruction in which a saturation function is implemented.

8. The arithmetic processing program according to claim 7, causing a computer to execute an instruction designating on / off of a saturation function of the vector addition instruction.

The arithmetic processing program according to claim 7 or 8, wherein a computer executes a detection function for detecting whether or not an overflow of a result of addition by the vector addition instruction or a result of shift operation by the vector shift instruction is overflowed.