KR20120030332A

KR20120030332A - Dsp engine with implicit mixed operands

Info

Publication number: KR20120030332A
Application number: KR1020117022549A
Authority: KR
Inventors: 마이클 아이. 캐더우드; 세튜 두라이사미
Original assignee: 마이크로칩 테크놀로지 인코포레이티드
Priority date: 2009-05-27
Filing date: 2010-05-21
Publication date: 2012-03-28
Also published as: US8495125B2; EP2435906A1; EP2435906B1; CN102356378A; CN102356378B; KR101667977B1; US20100306292A1; WO2010138417A1

Abstract

프로세서는, 사인드, 언사인드, 또는 혼합형 부호 모드로 동작하도록 제어될 수 있는 적어도 하나의 곱셈기 유닛과, 상기 곱셈기 유닛에 연결되고, 제1 오퍼랜드의 위치 정보 및 제2 오퍼랜드의 위치 정보를 수신하는 곱셈기 유닛 모드 디코더를 포함하고, 상기 위치 정보에 따라 상기 혼합형 부호 모드에 있는 경우, 상기 곱셈기 유닛 모드 디코더는 사인드 모드, 언사인드 모드, 또는 결합형 사인드/언사인드 모드로 동작하도록 상기 곱셈기 유닛을 제어할 수 있다.The processor includes at least one multiplier unit that can be controlled to operate in signed, unsigned, or mixed sign mode, and coupled to the multiplier unit to receive position information of a first operand and position information of a second operand. A multiplier unit mode decoder, wherein when in the mixed sign mode according to the positional information, the multiplier unit mode decoder operates to operate in a signed mode, an unsigned mode, or a combined sign / unsigned mode. Can be controlled.

Description

DSP ENGINE WITH IMPLICIT MIXED OPERANDS

본 발명은 디지털 신호 처리기들(DSP)의 디지털 신호 처리 엔진들 및/또는 마이크로프로세서들 또는 마이크로컴퓨터들의 중앙 처리 유닛들(CPU)에 관한 것이다.The present invention relates to digital signal processing engines of digital signal processors (DSP) and / or central processing units (CPU) of microprocessors or microcomputers.

DSP 엔진들은 산술적인 계산들을 빠르게 수행해야 한다. 하지만, 확실한 계산을 위해 DSP 엔진에 정밀성이 요구되면, 타협이 이루어진다. 예를 들면, 16비트 DSP 엔진은 일반적으로 16비트 산술 연산으로 제한된다. 하지만, 32비트 연산은 하드웨어에 의해 지원될 수 있거나, 각각의 프로그래밍에 의해 구현될 수도 있다. 이 때문에, 예를 들면, 많은 16비트 DSP 엔진들은 40비트 누산기들과 같은 매우 큰 누산기들, 및 높은 정밀성을 제공할 수 있는 다른 하드웨어를 제공한다. 곱셈기와 결합된 이들 하드웨어 구조들은, 16비트 DSP 엔진에서 32 x 32 비트 곱셈과 같은 높은 비트 곱셈을 수행하도록 이용될 수 있다. 그럼에도 불구하고, 특히, 높은 정밀성 곱셈들이 많이 요구되면, 그러한 연산들은 처리 속도를 상당히 감소시킬 수 있다. 예를 들면, 고속 푸리에 변환(FFT) 연산들은 그러한 연산들이 많이 필요하고, 따라서 많은 처리 시간이 필요할 수 있다. 전용 32비트 곱셈기는 칩의 실제 영역 중에서 많은 부분을 필요로 함으로, 비용이 증가될 것이다. 또한, 그러한 추가적인 하드웨어를 동작시키기 위하여, 새로운 명령어들이 요구될 것이다. DSP engines must perform arithmetic calculations quickly. However, if precision is required in the DSP engine for certain calculations, a compromise is made. For example, 16-bit DSP engines are generally limited to 16-bit arithmetic operations. However, 32-bit operations may be supported by hardware or may be implemented by each programming. For this reason, for example, many 16-bit DSP engines provide very large accumulators, such as 40-bit accumulators, and other hardware that can provide high precision. These hardware structures combined with the multiplier can be used to perform high bit multiplication, such as 32 x 32 bit multiplication in a 16 bit DSP engine. Nevertheless, especially if high precision multiplication is required, such operations can significantly reduce the processing speed. For example, Fast Fourier Transform (FFT) operations require many such operations, and thus may require a lot of processing time. Dedicated 32-bit multipliers will require a large portion of the real area of the chip, which will increase the cost. Also, in order to operate such additional hardware, new instructions will be required.

명령어 세트를 변경하지 않고 또한 종래의 하드웨어에 최소한의 변경으로, 종래의 DSP 코어들에서의 DSP 수리 능력의 개선이 요구된다. Without changing the instruction set and with minimal changes to conventional hardware, improvements in DSP repair capabilities in conventional DSP cores are required.

본 발명의 일실시예에 따르면, 프로세서는, 사인드, 언사인드, 또는 혼합형 부호 모드로 동작하도록 제어될 수 있는 적어도 하나의 곱셈기 유닛과, 상기 곱셈기 유닛에 연결되고, 제1 오퍼랜드의 위치 정보 및 제2 오퍼랜드의 위치 정보를 수신하는 곱셈기 유닛 모드 디코더를 포함하고, 상기 곱셈기 유닛 모드 디코더는, 상기 위치 정보에 따라 상기 혼합형 부호 모드에 있는 경우, 사인드 모드, 언사인드 모드, 또는 결합형 사인드/언사인드 모드로 동작하도록 상기 곱셈기 유닛을 제어할 수 있다.According to one embodiment of the invention, the processor comprises: at least one multiplier unit which can be controlled to operate in signed, unsigned, or mixed sign mode, the multiplier unit being connected, the position information of the first operand and A multiplier unit mode decoder for receiving position information of a second operand, wherein the multiplier unit mode decoder is in signed mode, unsigned mode, or combined signed when in the mixed sign mode according to the position information; The multiplier unit can be controlled to operate in / unsigned mode.

또 하나의 실시예에 따르면, 상기 곱셈기 유닛은 2개의 입력 오퍼랜드들에 대해 사인드, 언사인드, 또는 혼합형 부호 곱셈을 수행하도록 제어가능한 n 비트 곱셈기를 포함할 수 있다. 또 하나의 실시예에 따르면, 상기 곱셈기 유닛은, 상기 곱셈기 유닛에 연결되고, 2개의 입력 오퍼랜드들을 독립적으로 부호 확장 또는 제로 확장하는 곱셈기 데이터 전처리기와, 사인드 곱셈기를 포함할 수 있다. 또 하나의 실시예에 따르면, 상기 사인드 곱셈기는 n+1 비트 곱셈기일 수 있다. 또 하나의 실시예에 따르면, 상기 프로세서는 상기 사인드 모드와, 상기 언사인드 모드와, 및 사인드, 언사인드, 또는 결합형 사인드/언사인드 곱셈의 자동적인 선택을 수행하는 상기 혼합형 부호 모드를 선택하는 제어 레지스터를 더 포함할 수 있다. 또 하나의 실시예에 따르면, 상기 위치 정보는, 복수의 워킹 레지스터들에서의 레지스터가 홀수 레지스터인지 짝수 레지스터인지에 대한 정보를 포함할 수 있다. 또 하나의 실시예에 따르면, 제1 오퍼랜드 및 제2 오퍼랜드는 데이터 메모리에 의해 공급되고, 상기 위치 정보는, 메모리의 어드레스가 홀수 어드레스인지 짝수 어드레스인지에 대한 정보를 포함할 수 있다. 또 하나의 실시예에 따르면, 제1 오퍼랜드는 2개의 연이은 레지스터들의 제1 세트로부터 선택되고, 제2 오퍼랜드는 2개의 연이은 레지스터들의 제2 세트로부터 선택될 수 있다. 또 하나의 실시예에 따르면, 상기 프로세서는 적어도 상기 곱셈기 유닛에 의해 생성된 결과의 크기를 수용할 크기를 구비한 배럴 시프터를 더 포함할 수 있다. 또 하나의 실시예에 따르면, 상기 프로세서는 적어도 하나의 누산기와 상기 배럴 시프터에 연결된 가산기를 더 포함하고, 여기서 상기 곱셈기 유닛, 상기 누산기, 상기 배럴 시프터는 디지털 신호 처리(DSP) 엔진의 일부분일 수 있다. 또 하나의 실시예에 따르면, 상기 프로세서는 상기 곱셈기 유닛과 상기 배럴 시프터 사이에 연결된 결과 확장 유닛과, 상기 결과 확장 유닛에 연결된 제로-백필(zero-backfill)을 더 포함할 수 있다. 또 하나의 실시예에 따르면, 상기 프로세서는 상기 누산기에 연결된 라운드 로직(round logic)을 더 포함할 수 있다. 또 하나의 실시예에 따르면, 상기 DSP 엔진은 복수의 16비트 레지스터를 구비한 16비트 DSP 엔진이고, 여기서 상기 배럴 시프터와 상기 누산기는 각각 40비트를 포함할 수 있다. 또 하나의 실시예에 따르면, 상기 프로세서는 마이크로컨트롤러 유닛을 더 포함하고, 여기서 적어도 상기 곱셈기 유닛은 산술 마이크로컨트롤러 명령어들은 실행하도록 상기 마이크로컨트롤러 유닛과 상기 DSP 엔진에 의해 공유될 수 있다. 또 하나의 실시예에 따르면, 사인드 모드의 경우, 상기 곱셈기 데이터 전처리기는 모든 입력을 사인드 확장하고, 언사인드 모드의 경우, 상기 곱셈기 데이터 전처리기는 모든 입력을 제로 확장하고, 혼합형 부호 모드의 경우, 상기 곱셈기 모드 디코더는 소스가 홀수 레지스터 수 또는 홀수 메모리 어드레스이면 입력을 사인드 확장하고, 소스가 짝수 레지스터 수 또는 짝수 메모리 어드레스이면 입력을 제로 확장하도록 상기 곱셈기 데이터 전처리기에 지시할 수 있다.According to another embodiment, the multiplier unit may comprise an n-bit multiplier controllable to perform signed, unsigned, or mixed sign multiplication on two input operands. According to another embodiment, the multiplier unit may include a multiplier data preprocessor and a signed multiplier coupled to the multiplier unit to independently sign or zero extend two input operands. According to another embodiment, the signed multiplier may be an n + 1 bit multiplier. According to another embodiment, the processor is configured to perform the automatic selection of the signed mode, the unsigned mode, and the signed, unsigned, or combined sign / unsigned multiplication. It may further include a control register for selecting. According to another embodiment, the location information may include information on whether a register in the plurality of working registers is an odd register or an even register. According to another embodiment, the first operand and the second operand are supplied by the data memory, and the location information may include information on whether the address of the memory is an odd address or an even address. According to another embodiment, the first operand may be selected from a first set of two consecutive registers and the second operand may be selected from a second set of two consecutive registers. According to another embodiment, the processor may further comprise a barrel shifter having a size to accommodate at least the magnitude of the result produced by the multiplier unit. According to another embodiment, the processor further comprises at least one accumulator and an adder coupled to the barrel shifter, wherein the multiplier unit, the accumulator and the barrel shifter may be part of a digital signal processing (DSP) engine. have. According to another embodiment, the processor may further include a result expansion unit coupled between the multiplier unit and the barrel shifter, and a zero-backfill coupled to the result expansion unit. According to another embodiment, the processor may further include round logic coupled to the accumulator. According to another embodiment, the DSP engine is a 16-bit DSP engine having a plurality of 16-bit registers, wherein the barrel shifter and the accumulator may each include 40 bits. According to another embodiment, the processor further comprises a microcontroller unit, wherein at least the multiplier unit may be shared by the microcontroller unit and the DSP engine to execute arithmetic microcontroller instructions. According to another embodiment, in a signed mode, the multiplier data preprocessor sign-extends all inputs; in an unsigned mode, the multiplier data preprocessor extends all inputs zero, and in a mixed sign mode. The multiplier mode decoder can instruct the multiplier data preprocessor to sign extend the input if the source is an odd register number or an odd memory address and zero extend the input if the source is an even register number or an even memory address.

본 발명의 다른 실시예에 따르면, 프로세서에서의 곱셈 수행 방법은, 제1 위치로부터 사인드, 언사인드 또는 결합형 사인드/언사인드 모드로 동작하도록 제어될 수 있는 곱셈기 유닛으로 제1 n 비트 오퍼랜드를 제공하는 단계와, 제2 위치로부터 상기 곱셈기 유닛으로 제2 오퍼랜드를 제공하는 단계와, 상기 제1 오퍼랜드와 상기 제2 오퍼랜드에 대한 상기 위치를 복호화하고, 사인드, 언사인드, 또는 결합형 사인드/언사인드 곱셈이 상기 위치들에 따라 수행되는 혼합형 모드로 동작하도록 상기 곱셈기 유닛을 제어하는 단계를 포함할 수 있다.According to another embodiment of the present invention, a method of performing multiplication in a processor includes a first n-bit operand from a first position to a multiplier unit that can be controlled to operate in signed, unsigned, or combined signed / unsigned mode. Providing a second operand from a second position to the multiplier unit, decoding the position for the first operand and the second operand, and signing, unsigning, or a combined sine Controlling the multiplier unit to operate in a mixed mode where de / unsign multiplication is performed according to the positions.

상기 방법의 또 하나의 실시예에 따르면, 상기 제1 오퍼랜드와 상기 제2 오퍼랜드는 레지스터들에 저장되고, 상기 위치는, 복수의 워킹 레지스터들에서의 레지스터가 홀수 레지스터인지 짝수 레지스터인지를 포함할 수 있다. 상기 방법의 또 하나의 실시예에 따르면, 상기 제1 오퍼랜드와 상기 제2 오퍼랜드는 데이터 메모리에 의해 제공되고, 상기 위치는, 메모리의 어드레스가 홀수 어드레스인지 짝수 어드레스인지를 포함할 수 있다. 상기 방법의 또 하나의 실시예에 따르면, 제1 오퍼랜드는 2개의 연이은 레지스터들의 제1 세트로부터 선택되고, 제2 오퍼랜드는 2개의 연이은 레지스터들의 제2 세트로부터 선택될 수 있다. 상기 방법의 또 하나의 실시예에 따르면, 제어 레지스터는 상기 곱셈기 유닛이 상기 사인드, 상기 언사인드 또는 상기 혼합형 모드로 동작할 것인지를 결정할 수 있다. 상기 방법의 또 하나의 실시예에 따르면, 사인드 모드의 경우, 상기 제1 오퍼랜드와 상기 제2 오퍼랜드는 사인드 확장되고, 언사인드 모드의 경우, 상기 제1 오퍼랜드와 상기 제2 오퍼랜드는 제로 확장되고, 혼합형 모드의 경우, 상기 제1 오퍼랜드와 상기 제2 오퍼랜드는 오퍼랜드가 홀수 레지스터 수 또는 홀수 메모리 어드레스에 의해 공급되면 사인드 확장되고, 오퍼랜드가 짝수 레지스터 수 또는 짝수 메모리 어드레스에 의해 공급되면 제로 확장될 수 있다.According to another embodiment of the method, the first operand and the second operand are stored in registers, and the location may include whether a register in a plurality of working registers is an odd register or an even register. have. According to another embodiment of the method, the first operand and the second operand are provided by a data memory, and the location may include whether the address of the memory is an odd address or an even address. According to another embodiment of the method, the first operand may be selected from a first set of two consecutive registers and the second operand may be selected from a second set of two consecutive registers. According to another embodiment of the method, a control register may determine whether the multiplier unit is to operate in the signed, unsigned or mixed mode. According to another embodiment of the method, in signed mode, the first operand and the second operand are signed extended, and in unsigned mode, the first operand and the second operand are zero extended. And, in mixed mode, the first operand and the second operand are signed extended if the operand is supplied by an odd register number or an odd memory address, and zero expanded when the operand is supplied by an even register number or an even memory address. Can be.

본 발명의 또 다른 실시예에 따르면, 4 n 비트 데이터 워드들을 이용한 2n 비트 곱셈 수행 방법은, 2개의 연이은 레지스터들 또는 2개의 연이은 메모리 위치들의 제1 세트에 2n 비트 곱셈을 위한 제1 오퍼랜드를 저장하는 단계와, 2개의 연이은 레지스터들 또는 2개의 연이은 메모리 어드레스들의 제2 세트에 2n 비트 곱셈을 위한 제2 오퍼랜드를 저장하는 단계와, 상기 제1 세트의 제1 레지스터 또는 메모리 어드레스와 상기 제2 세트의 제1 레지스터 또는 메모리 어드레스를 이용하여 제어가능한 곱셈기 유닛에 의해 제1 곱셈을 수행하고, 연관된 제1 결과를 시프팅하는 단계와, 연관된 제2 결과를 생성하기 위해, 상기 제1 세트의 상기 제1 레지스터 또는 메모리 어드레스와 상기 제2 세트의 제2 레지스터 또는 메모리 어드레스를 이용하여 제어가능한 곱셈기 유닛에 의해 제2 곱셈을 수행하는 단계와, 연관된 제3 결과를 생성하기 위해, 상기 제2 세트의 상기 제1 레지스터 또는 메모리 어드레스와 상기 제1 세트의 제2 레지스터 또는 메모리 어드레스를 이용하여 제어가능한 곱셈기 유닛에 의해 제3 곱셈을 수행하는 단계와, 상기 제1 결과, 상기 제2 결과 및 상기 제3 결과를 더해 최종 결과를 생성하고, 상기 최종 결과를 레지스터들 또는 메모리에 저장하는 단계를 포함하고, 여기서 각 곱셈을 위해, 상기 곱셈기 유닛은, 상기 레지스터 또는 메모리 어드레스의 위치에 따라 자동으로 제어되어 사인드, 언사인드, 또는 결합형 사인드/언사인드 모드로 동작할 수 있다.According to another embodiment of the present invention, a method of performing 2n bit multiplication using 4n bit data words stores a first operand for 2n bit multiplication in a first set of two consecutive registers or two consecutive memory locations. Storing a second operand for 2n bit multiplication in a second set of two successive registers or two successive memory addresses, the first set of first register or memory address and the second set Performing a first multiplication by a controllable multiplier unit using a first register or a memory address of, shifting an associated first result, and generating an associated second result; Multiplier unit controllable using one register or memory address and the second set of second registers or memory addresses Multiplier unit controllable using the second set of first registers or memory addresses and the first set of second registers or memory addresses to perform a second multiplication by Performing a third multiplication by adding the first result, the second result and the third result to produce a final result, and storing the final result in registers or memory, wherein For each multiplication, the multiplier unit can be automatically controlled according to the position of the register or memory address to operate in signed, unsigned, or combined sign / unsigned mode.

상기 방법의 또 하나의 실시예에 따르면, 상기 위치는 복수의 워킹 레지스터들에서의 레지스터가 홀수 레지스터인지 짝수 레지스터인지를 포함할 수 있다. 상기 방법의 또 하나의 실시예에 따르면, 상기 위치는 메모리의 어드레스가 홀수 어드레스인지 짝수 어드레스인지를 포함할 수 있다. 상기 방법의 또 하나의 실시예에 따르면, 제어 레지스터는 상기 곱셈기 유닛이 사인드, 언사인드 및 혼합형 부호 모드 중 어느 모드로 동작할 것인지를 결정할 수 있다. 상기 방법의 또 하나의 실시예에 따르면, 사인드 모드의 경우, 상기 곱셈기 유닛으로의 모든 입력은 사인드 확장되고, 혼합형 부호 모드의 경우, 상기 곱셈기 유닛으로의 입력은 그 입력이 홀수 레지스터 수 또는 홀수 메모리 어드레스에 의해 제공되면 사인드 확장되고, 그 입력이 짝수 레지스터 수 또는 짝수 메모리 어드레스에 의해 제공되면 제로 확장될 수 있다. 상기 방법의 또 하나의 실시예에 따르면, 상기 제2 결과 및 상기 제3 결과는 시프팅되고, 4 n 비트 데이터 워드들을 이용한 2n 비트 곱셈 수행 방법은 연관된 제4 결과를 발생하기 위해, 상기 제1 세트의 상기 제2 레지스터 또는 메모리 어드레스와 상기 제2 세트의 제2 레지스터 또는 메모리 어드레스를 이용하여 상기 제어가능한 곱셈기 유닛에 의해 제4 곱셈을 수행하는 단계를 더 포함하고, 상기 제4 결과가 상기 제1 결과, 상기 제2 결과 및 상기 제3 결과에 더해져 상기 최종 결과를 생성할 수 있다. 상기 방법의 또 하나의 실시예에 따르면, 제어 레지스터는 상기 곱셈기 유닛이 사인드, 언사인드 및 혼합형 부호 모드 중 어느 모드로 동작할 것인지를 결정할 수 있다. 상기 방법의 또 하나의 실시예에 따르면, 상기 곱셈기 유닛은 사인드 곱셈기를 포함하고, 사인드 모드의 경우, 상기 곱셈기 유닛으로의 모든 입력은 사인드 확장되고, 언사인드 모드의 경우, 상기 곱셈기 유닛으로의 모든 입력은 제로 확장되고, 혼합형 부호 모드의 경우, 상기 곱셈기 유닛으로의 입력은 그 입력이 홀수 레지스터 수 또는 홀수 메모리 어드레스에 의해 제공되면 사인드 확장되고, 그 입력이 짝수 레지스터 수 또는 짝수 메모리 어드레스에 의해 제공되면 제로 확장될 수 있다. According to another embodiment of the method, the location may comprise whether the register in the plurality of working registers is an odd register or an even register. According to another embodiment of the method, the location may include whether the address of the memory is an odd address or an even address. According to another embodiment of the method, a control register may determine which mode of the multiplier unit to operate in signed, unsigned, and mixed sign modes. According to another embodiment of the method, in the signed mode, all inputs to the multiplier unit are signed extended, and in mixed sign mode, the input to the multiplier unit is such that the input is an odd register number or It can be signed extended if provided by an odd memory address, and zero expanded if its input is provided by an even register number or even memory address. According to another embodiment of the method, the second result and the third result are shifted, and a method of performing 2n bit multiplication using 4 n bit data words generates the associated fourth result, in order to generate an associated fourth result. Performing a fourth multiplication by the controllable multiplier unit using the second register or memory address of the set and the second register or memory address of the second set, wherein the fourth result is the second result. The first result may be added to the second result and the third result to generate the final result. According to another embodiment of the method, a control register may determine which mode of the multiplier unit to operate in signed, unsigned, and mixed sign modes. According to another embodiment of the method, the multiplier unit comprises a signed multiplier, in the case of signed mode all inputs to the multiplier unit are signed extended, and in the unsigned mode, the multiplier unit All inputs to are zero-extended, and in mixed sign mode, the input to the multiplier unit is signed extended if its input is provided by an odd register number or an odd memory address, and the input is an even register number or even memory. If provided by an address, it can be zero expanded.

본 발명은 여기에 내재적인 것들뿐만 아니라 상술한 목적을 이행하고, 목표 및 장점을 달성하기 위하여 잘 변형된다. 많은 변화가 이 기술분야에서 숙련된 기술자에 의해 있을 수 있고, 그러한 변화들은 첨부된 청구범위에 의해 정의된 바와 같이, 본 발명의 사상 내에 포함된다.The present invention is well adapted to fulfill the above objects as well as those inherent therein, and to achieve the goals and advantages. Many changes may be made by those skilled in the art, and such changes are included within the spirit of the invention, as defined by the appended claims.

첨부한 도면과 관련된 다음의 설명을 참조하면, 본 발명과 장점에 대하여 보다 완전하게 이해할 수 있다.
도 1은 본 발명에 따른 DSP 엔진의 블록도를 도시한 도면이다.
도 2는 곱셈기/스케일러 유닛을 구현한 블록도를 도시한 도면이다.
도 3은 16비트 멀티플렉서를 이용하여 32비트 곱셈의 주요 연산을 도시한 도면이다.
도 4는 도 2에서의 전처리기의 일실시예를 도시한 도면이다.
도 5 내지 도 7은 곱셈기 오퍼랜드들과 결과 포맷들을 구비한 테이블들을 도시한 도면이다.
도 8은 블록도에서의 배럴 시프트의 일실시예를 도시한 도면이다.
도 9는 배럴 시프터 모드, 방향 및 규모 제어를 도시한 테이블이다.
도 10 및 도 11은 배럴 시프터 먹스 구성 매트릭스를 도시한 테이블들이다.
도 12는 데이터 누산기 먹스 구성 매트릭스를 도시한 테이블이다.
도 13은 데이터 누산기 먹스 구성 매트릭스를 도시한 다른 테이블이다.
도 14는 오버플로우와 포화 연산 예들을 도시한 테이블이다.
도 15는 포화와 오버플로우 모드들을 도시한 테이블이다.
도 16은 라운드&데이터 버스 포화 로직 블록도이다.
도 17은 라운드 먹스 인코딩 및 기능을 도시한 테이블이다.
도 18은 종래의 수렴 라운딩 모드들을 도시한 테이블이다.
도 19는 파인드 퍼스트 명령어 하드웨어 블록도이다.
본 발명의 실시예는 바람직한 실시예들을 참조하여, 묘사되고 서술되고 정의되지만, 그러한 참조가 본 발명의 제한을 암시하는 것도 아니며, 또한 어떤 제한도 암시하지 않는다. 본 발명의 특징은 본 발명의 기술분야에서 보통의 지식을 가지고 또한 본 발명의 이익을 갖는 자에게 일어날 수 있는 상당한 변경 및 대체가 가능하고, 또한 방식과 기능에 있어 균등물을 포함한다. 본 발명을 묘사하고 서술한 실시예들은 단지 예들로, 본 발명의 범위를 완전하게 묘사하고 서술한 것은 아니다. With reference to the following description in conjunction with the accompanying drawings, it is possible to more fully understand the present invention and its advantages.
1 is a block diagram of a DSP engine according to the present invention.
2 is a block diagram of a multiplier / scaler unit.
3 is a diagram illustrating the main operations of 32-bit multiplication using a 16-bit multiplexer.
4 is a diagram illustrating an embodiment of a preprocessor of FIG. 2.
5-7 illustrate tables with multiplier operands and result formats.
8 illustrates one embodiment of a barrel shift in a block diagram.
9 is a table illustrating barrel shifter mode, direction and scale control.
10 and 11 are tables illustrating the barrel shifter mux configuration matrix.
12 is a table showing a data accumulator mux configuration matrix.
FIG. 13 is another table showing a data accumulator mux configuration matrix. FIG.
14 is a table showing examples of overflow and saturation operations.
15 is a table illustrating saturation and overflow modes.
16 is a round & data bus saturation logic block diagram.
17 is a table illustrating round mux encoding and functionality.
18 is a table illustrating conventional convergent rounding modes.
Fig. 19 is a block diagram of the Find First instruction hardware.
While embodiments of the invention are described, described, and defined with reference to preferred embodiments, such references do not imply a limitation of the invention, nor do they imply any limitation. Features of the present invention are capable of significant modifications and substitutions which may occur to those skilled in the art and having the benefit of the present invention, and also include equivalents in manner and function. The embodiments depicted and described herein are by way of example only, and are not intended to fully describe and describe the scope of the present invention.

본 발명의 개시에 따르면, DSP는 각 32비트 오퍼랜드(operand)를 2개의 16비트 오퍼랜드들로 분리함으로써 전용 하드웨어 없이, 32비트 곱셈을 처리할 수 있다. 그리고 각각의 32 또는 64비트 결과를 얻기 위하여, 복수의 곱셈들(multiplications), 시프팅(shifting) 및 덧셈이 수행될 수 있다. DSP는 일반적으로 n비트 곱셈기를 갖는데, 여기서 n<32로, 예를 들면 17비트 곱셈기를 포함할 수 있다. 그러한 곱셈기는 다른 타입의 곱셈들을 수행하도록 구성가능하다. 오퍼랜드들에 따라, 곱셈기가 구성되어야 하는 다른 타입의 곱셈들이 수행될 수 있다. 예를 들면, 2개의 오퍼랜드들이 사인드(signed)를 갖는 곱셈의 경우, 곱셈기는 2개의 오퍼랜드들이 언사인드(unsigned)인 연산 또는 단지 하나의 오퍼랜드가 사인드(signed)인 연산과 다르게 구성되어야 한다. 아래에 더 상세하게 설명되어 있듯이, 그러한 구성은 다양한 방법들로 달성될 수 있다.According to the present disclosure, the DSP can process 32-bit multiplication without dedicated hardware by separating each 32-bit operand into two 16-bit operands. And a plurality of multiplications, shifting and addition can be performed to obtain each 32 or 64 bit result. A DSP generally has an n-bit multiplier, where n <32, for example it may include a 17-bit multiplier. Such multipliers are configurable to perform other types of multiplications. Depending on the operands, other types of multiplications may be performed in which the multiplier must be configured. For example, in a multiplication where two operands are signed, the multiplier must be configured differently than an operation in which two operands are unsigned or only one operand is signed. . As described in more detail below, such a configuration can be accomplished in various ways.

도 3은 4개의 사인드, 언사인드 및 혼합형 사인드 곱셈들을 수행할 수 있는 16비트 레지스터들과 16비트 곱셈기를 이용한 32비트 곱셈의 간단한 예를 도시한다. 도시된 바와 같이, 64비트 결과를 얻기 위하여, 4개의 다른 타입의 곱셈기들이 필요하다. 연산(350)은, 2개의 오퍼랜드(310, 330)가 32비트 오퍼랜드들의 최상위 비트들(MSB) 또는 상위 절반을 표시하기 때문에, 2개의 오퍼랜드(310, 330)가 사인드인 곱셈을 수행한다. 64비트 결과를 위해, 이 연산 결과는 가산기(390)에 공급되기 전에 32비트만큼 왼쪽으로 시프트된다. 다른 정밀성을 위해서는, 연산 결과가 실행에 따라 가산기(390)에 공급되기 전에 8, 16, 24비트만큼 왼쪽으로 시프트될 수 있다. 연산(360)은 제1 32비트 오퍼랜드의 하위 절반 또는 최하위 비트들(LSB)을 표시하는 언사인드 16비트 부분(320)과 제2 오퍼랜드의 MSB를 나타내는 사인드 부분(330)을 곱한다. 유사하게, 연산(370)은 제2 32비트 오퍼랜드의 하위 절반 또는 LSB를 표시하는 언사인드 16비트 부분(340)과 제1 오퍼랜드의 MSB를 나나태는 사인드 부분(310)을 곱한다. 따라서, 이 2가지 경우에 있어서, 혼합형 타입을 갖는 연산이 수행되어야 하는데, 여기서 하나의 오퍼랜드는 사인드 오퍼랜드로서, 그리고 다른 하나는 언사인드 오퍼랜드로서 취급된다. 64비트 결과를 위해, 연산들(360, 370)의 결과는 가산기(390)에 공급되기 전에 16비트만큼 왼쪽으로 시프트된다. 다른 정밀성을 위해서는 다양한 시프트 값들이 그에 맞춰 적용된다. 마지막으로, 정밀성에 따라, 2개의 오퍼랜드들의 LSB들을 나타내는 2개의 오퍼랜드(320, 340)의 하위 절반(320, 340)은 연산(380)에 의해 곱셈되어야 한다. 각 연산 결과들은, 적절한 결과를 얻기 위하여 적당하게 시프팅된 후에, 연산(390)에 의해 더해진다. 마지막 결과에 추가적인 시프팅이 적용될 수 있다.Figure 3 shows a simple example of 32-bit multiplication using 16-bit registers and a 16-bit multiplier that can perform four signed, unsigned, and mixed signed multiplications. As shown, to get a 64-bit result, four different types of multipliers are needed. Operation 350 performs a signed multiplication because the two operands 310, 330 represent the most significant bits (MSB) or upper half of the 32 bit operands. For a 64-bit result, this operation result is shifted left by 32 bits before being fed to the adder 390. For other precision, the result of the operation may be shifted left by 8, 16, or 24 bits before being fed to the adder 390 as it executes. Operation 360 multiplies the unsigned 16-bit portion 320 representing the lower half or least significant bits (LSB) of the first 32-bit operand with the signed portion 330 representing the MSB of the second operand. Similarly, operation 370 multiplies the unsigned 16-bit portion 340 representing the lower half or LSB of the second 32-bit operand and the signed portion 310 representing the MSB of the first operand. Thus, in these two cases, operations with mixed types must be performed, where one operand is treated as a signed operand and the other as an unsigned operand. For a 64-bit result, the result of the operations 360, 370 is shifted left by 16 bits before being fed to the adder 390. For other precision various shift values are applied accordingly. Finally, depending on the precision, the lower half 320, 340 of the two operands 320, 340 representing the LSBs of the two operands must be multiplied by operation 380. Each operation result is added by operation 390 after being properly shifted to obtain an appropriate result. Additional shifting can be applied to the final result.

연산들(350, 360, 370, 380)을 수행하기 위하여, 각 곱셈은 곱셈기 또는 오퍼랜드 또는 곱셈기와 오퍼랜드 모두의 재구성을 요구할 수 있다. 즉, 결과적으로 추가적인 스텝을 야기하는, 곱셈기를 구성하거나 오퍼랜드들을 변환하기 위한 개별 스텝이 필요하므로, 그 연산은 상당하게 늘어난다.In order to perform the operations 350, 360, 370, 380, each multiplication may require a multiplier or reconstruction of the operand or both the multiplier and the operand. That is, the computation increases considerably since separate steps are needed to construct a multiplier or transform operands, resulting in additional steps.

본 발명의 개시에 따르면, 곱셈기의 동작 모드를 제어하기 위하여, 또는 곱셈기에 의해 요구된 바와 같이 오퍼랜드들을 처리하는 전처리기를 제어하기 위하여, 32비트 워드를 나타내는 레지스터들 또는 메모리 위치들의 연합이 이용될 수 있다. 본 발명의 다양한 실시예에 따르면, DSP를 위해, 오퍼랜드들은 일반적으로 특정 레지스터들 또는 메모리 위치들에 저장된다. 예를 들면, 2n비트 곱셈과 같은 특정 연산을 위한 오퍼랜드들을 저장하기 위하여, DSP 엔진은 4개의 범용 레지스터들 또는 4개의 규정 메모리 위치들을 이용할 수 있다. 2n비트 곱셈을 수행하기 위해 이용된 레지스터들의 각 결합은, 곱셈기 유닛에 대한 특정 연산 모드를 자동으로 트리거할 수 있다. 32비트 로드 연산들이 항상 레지스터 파일 또는 메모리의 미리 정의된 영역들에서 수행된다면, 그것은 특히 유용할 수 있다. 그러므로, 예를 들면, 4개의 워킹 레지스터들 W5, W6, W7 및 W8을 오퍼랜드들을 저장하기 위해 이용한 DSP에 있어서, 32비트 워드는 항상 홀수 레지스터와 이어지는 짝수 레지스터, 예를 들면 레지스터 W5(310) 및 W6(320)에 저장된다. 유사하게, 메모리를 이용하는 경우, 32비트 워드는 항상 구현에 따라 짝수 또는 홀수 어드레스로 시작하여 저장될 수 있다. 그러므로, 이 방식에 따르면, W5 및 W7(홀수 레지스터들) 또는 메모리의 짝수 어드레스들은 항상 사인드 값들로 간주될 수 있을 것이며, W6 및 W8(짝수 레지스터들) 또는 메모리의 홀수 레지스터들은 항상 언사인드 값들로 간주될 것이다. 곱셈기 유닛의 구성을 필요로 하는 DSP 엔진의 어떤 동작들을 수행함에 있어서, 이들 레지스터들 또는 메모리 위치들의 일부가 이용될 수 있다.In accordance with the present disclosure, an association of registers or memory locations representing a 32-bit word may be used to control the mode of operation of the multiplier, or to control the preprocessor processing operands as required by the multiplier. have. According to various embodiments of the present invention, for a DSP, operands are generally stored in specific registers or memory locations. For example, the DSP engine may use four general purpose registers or four qualified memory locations to store operands for a particular operation, such as 2n bit multiplication. Each combination of registers used to perform 2n bit multiplication may automatically trigger a particular mode of operation for the multiplier unit. It can be particularly useful if 32-bit load operations are always performed in predefined areas of a register file or memory. Thus, for example, in a DSP that uses four working registers W5, W6, W7, and W8 to store operands, a 32-bit word is always an odd register followed by an even register, such as register W5 310 and W6 320 is stored. Similarly, with memory, 32-bit words can always be stored starting with even or odd addresses, depending on the implementation. Therefore, according to this scheme, W5 and W7 (odd registers) or even addresses of memory may always be considered signed values, and W6 and W8 (even registers) or odd registers of memory will always be unsigned values. Will be considered. In performing certain operations of the DSP engine that require configuration of a multiplier unit, some of these registers or memory locations may be used.

그러므로, 이 곱셈기의 동작 모드를 구성하기 위해 이용된 특정 구성 레지스터는 각각의 모드를 세팅하기 위해 이용될 수 있다. 예를 들면, 구성 레지스터는 사인드 곱셈을 위한 세팅 및 언사인드 곱셈을 위한 세팅을 포함할 수 있다. 본 발명의 다양한 실시예에 따르면, 제1 및 제2 오퍼랜드들을 위해 이용된 레지스터들 또는 메모리 어드레스의 할당에 따라, 자동적인 선택을 가능하게 하는 제3 혼합형 모드 세팅이 포함된다. 도 3에 도시된 레지스터들이 괄호로 도시된 바와 같이 할당된 경우에 있어서, 제1 및 제2 오퍼랜드로서 레지스터들 W5 및 W7를 이용하는 명령어는 자동으로 사인드 모드를 선택한다. 곱셈을 수행하는 명령어에 있어서, 제1 및 제2 오퍼랜드로서 레지스터들 W6 및 W8의 선택은 자동으로 언사인드 모드를 선택한다. 레지스터들 W5 및 W8의 선택은, W5가 사인드로 취급되고 W8이 언사인드로 취급되는 사인드 및 언사인드 모드의 결합을 야기하며, 레지스터들 W6 및 W7의 선택은 W7이 사인드로 취급되고 W6이 언사인드로 취급되는 그러한 모드를 야기한다. 메모리 위치들이 이용되는 경우 유사한 디코딩이 발생한다. 여기에서 짝수 어드레스는 사인드 값을 표시하기 위하여, 그리고 홀수 어드레스는 언사인드 값을 위하여 이용될 수 있다. 그러므로, 곱셈기의 재구성이 필요하지 않으며, DSP 엔진은 "하위 비트" DSP 엔진을 갖는 32비트 계산을 수행하는데 필요한 각각의 스텝을 위한 정확한 결과들을 수행할 것이다. Therefore, the specific configuration registers used to configure the mode of operation of this multiplier can be used to set each mode. For example, the configuration register may include a setting for signed multiplication and a setting for unsigned multiplication. According to various embodiments of the present invention, a third mixed mode setting is included that enables automatic selection, depending on the allocation of registers or memory address used for the first and second operands. In the case where the registers shown in FIG. 3 are allocated as shown in parentheses, the instruction using registers W5 and W7 as the first and second operands automatically selects the signed mode. In an instruction to perform a multiplication, the selection of registers W6 and W8 as the first and second operands automatically selects the unsigned mode. The selection of registers W5 and W8 causes a combination of signed and unsigned modes where W5 is treated as signed and W8 is treated as unsigned, and the selection of registers W6 and W7 is treated as signed and W6 It causes such a mode to be treated as unsigned. Similar decoding occurs when memory locations are used. Here even addresses can be used to indicate signed values, and odd addresses can be used for unsigned values. Therefore, no reconfiguration of the multiplier is needed, and the DSP engine will perform the correct results for each step needed to perform 32 bit calculations with the "low bit" DSP engine.

다양한 구현들은 사인드 확장 정확성 알고리즘의 실행을 용이하게 한다. 예를 들면, 일단 곱셈기가 오퍼랜드 모드를 자동으로 세팅하기 위해 구성되면, 확장 정밀성 멀티플라이-누산기(multiply-accumulate : MAC) 클래스 명령어들과 크로스-멀티플라이들(cross-multiplies)은 오퍼랜드 타입에 대한 염려 없이, 순차적으로 실행될 수 있다. Various implementations facilitate the execution of the signed extended accuracy algorithm. For example, once the multiplier is configured to automatically set operand mode, extended precision multiply-accumulate (MAC) class instructions and cross-multiplies can be used without concern for operand type. , Can be executed sequentially.

이어지는 전형적인 DSP 명령어들은 이 특별하게 구성가능한 DSP 엔진을 구비하여 이용될 수 있다.The following typical DSP instructions can be used with this specially configurable DSP engine.

명령어command 대수 연산Algebraic Operations EDED A = (x-y)²A = (x-y) ² EDACEDAC A = A+(x-y)²A = A + (x-y) ² MACMAC A = A+(x*y)A = A + (x * y) MPYMPY A = x*yA = x * y MPY.NMPY.N A = -x*yA = -x * y MSCMSC A = A-x*yA = A-x * y

하지만, 다른 명령어는 역시 추가적인 연산 모드로부터 혜택을 입을 수 있다. 예를 들면, DSP 기능 및 마이크로컴퓨터 또는 마이크로프로세서 기능을 갖는 결합된 프로세서 코어는, 아래에서 더 상세하게 설명되듯이, 역시 비DSP(non-DSP) 명령어들 또는 모든 명령어들에 대하여, 동일한 개념을 이용할 수 있다. 그러한 프로세서의 또 다른 실시예에 있어서, DSP 명령어만이 이 메커니즘을 위해 제공될 수 있지만, 반면에 어떤 타입의 마이크로컨트롤러 또는 마이크로프로세서 명령어는 수동적인 세팅을 필요로 할 수 있다.However, other instructions may also benefit from additional modes of operation. For example, a combined processor core with DSP function and microcomputer or microprocessor function may also use the same concept for non-DSP instructions or all instructions, as described in more detail below. It is available. In another embodiment of such a processor, only DSP instructions may be provided for this mechanism, while some types of microcontroller or microprocessor instructions may require manual setting.

특정 레지스터를 사인드 또는 언사인드 데이터 타입에 연관시키는 DSP 엔진에 모드를 추가함으로써, 레지스터들은 32비트 데이터 정렬에 근거하여 선택될 수 있다. 그러므로 멀티플라이 연산(multiply operation)은 데이터 소스에 근거하여 내재적으로 사인드 또는 언사인드일 것이다. 본 발명의 일실시예에 따르면, 본 발명의 개시에 따라 특정한 바람직한 실시예를 달성하기 위하여, 4개의 주요 요소들이 사용될 수 있다.By adding a mode to the DSP engine that associates a particular register with a signed or unsigned data type, the registers can be selected based on a 32-bit data alignment. Therefore, a multiply operation will be implicitly signed or unsigned based on the data source. According to one embodiment of the present invention, four major elements may be used to achieve certain preferred embodiments in accordance with the present disclosure.

1) 4 x 16비트 CPU 레지스터들1) 4 x 16 bit CPU registers

2) 17 x 17비트 곱셈기(역시 사인드/언사인드 모드 제어를 갖는 16x16비트 곱셈기일 수 있음)2) 17 x 17 bit multiplier (can also be a 16x16 bit multiplier with signed / unsigned mode control)

3) 부호(sign) 또는 제로 확장 입력 데이터를 위해 이용된 곱셈기 데이터 전처리기 3) Multiplier data preprocessor used for sign or zero extended input data

4) DSP 엔진 곱셈기 모드 디코더 4) DSP engine multiplier mode decoder

일실시예에 따르면, 사인드, 언사인드 또는 혼합형 부호 연산을 선택하기 위해, DSP 엔진 곱셈기 모드 디코더는 사용자 제어 비트들을 디코딩한다. 이 실시예에 있어서, 항상 사인드 모드로 동작하는 n+1 비트 곱셈기가 이용된다. 또한, 전처리기는 입력 오퍼랜드들을 n비트로부터 n+1비트로 변경할 수 있는데, 여기서 최상위 비트는 부호로서 이용된다. 사인드 모드에 있어서, 그것은 곱셈기 데이터 전처리기가 모든 입력 데이터를 (17비트로) 부호 확장하게 한다. 오퍼랜드를 17비트들로 사인드 확장하기 위하여, 오퍼랜드의 MS비트는 제17 비트로 복사된다. 언사인드 모드에 있어서, 그것은 곱셈기 데이터 전처리기가 모든 입력 데이터를 (17비트로) 제로 확장하게 한다. 오퍼랜드를 17비트들로 제로 확장하기 위하여, 제17 비트는 단순히 0으로 세팅된다(즉, 오퍼랜드는 항상 17비트 곱셈기에 의해 양수 값으로 보임). 혼합형 부호 모드에 있어서, 소스가 홀수 레지스터 수 또는 홀수/짝수 메모리 어드레스이면, 그것은 곱셈기 데이터 전처리기 입력을 부호 확장하게 하고, 또는 소스가 짝수 레지스터 수 또는 짝수/홀수 메모리 어드레스이면, 그것은 곱셈기 데이터 전처리기가 입력을 제로 확장하게 한다.According to one embodiment, to select a signed, unsigned or mixed sign operation, the DSP engine multiplier mode decoder decodes the user control bits. In this embodiment, an n + 1 bit multiplier is used which always operates in signed mode. The preprocessor may also change the input operands from n bits to n + 1 bits, where the most significant bit is used as a sign. In signed mode, it causes the multiplier data preprocessor to sign-extend all input data (17 bits). To sign extend the operand to 17 bits, the MS bits of the operand are copied to the seventeenth bit. In unsigned mode, it causes the multiplier data preprocessor to zero out all input data (17 bits). To zero extend the operand to 17 bits, the seventeenth bit is simply set to zero (ie, the operand is always seen as a positive value by a 17-bit multiplier). In mixed sign mode, if the source is an odd register number or an odd / even memory address, it causes the multiplier data preprocessor input to be code-extended, or if the source is an even register number or an even / odd memory address, the multiplier data preprocessor Causes input to expand to zero.

일실시예에 따르면, LS 워드가 짝수 레지스터(또는 메모리 어드레스)에 위치되고 MS 워드가 홀수 레지스터(또는 메모리 어드레스)에 위치되는 정렬 방식으로, 32비트 데이터는 CPU 레지스터들(또는 메모리)로 로딩된다. 그 결과, 혼합형 부호 모드에 있어서, 32 X 32 비트 곱셈을 완료하기 위해 필요한 모든 16비트 크로스 곱셈들(cross multiplies)의 부호는, 사용자의 개입 없이 자동으로 선택될 것이고, (예를 들어, DSP 엔진 연산 모드들을 일정하게 스위칭될 필요를 제거함으로써) 실행 속도를 상당하게 높일 것이다According to one embodiment, 32-bit data is loaded into CPU registers (or memory) in an alignment manner where LS words are located in even registers (or memory addresses) and MS words are located in odd registers (or memory addresses). . As a result, in mixed sign mode, the sign of all 16-bit cross multiplies needed to complete 32 X 32-bit multiplication will be automatically selected without user intervention, e.g., DSP engine By eliminating the need to constantly switch computation modes)

도 1은 본 발명의 개시에 따라 사용될 수 있는 바람직한 DSP 엔진을 도시한 도면이다. 2개의 40비트 누산기(110, 115)가 제공될 수 있다. 2개의 40비트 누산기(110, 115)는 멀티플렉서들(120, 125, 150, 155, 190)을 통해, 가산기(145), 라운드 로직(round logic)(130) 및 배럴 시프터(barrel shifter)(160)에 연결된다. 추가의 멀티플렉서(135)는 라운드 로직(130)과 배럴 시프터(160)의 출력과 X 데이터 버스를 연결한다. 도 1에 도시된 바와 같이, 가산기(145)는 하나의 입력을 무효화하고, 그 결과를 포화할 가능성이 있다. 또한, 배럴 시프터(160)는 추가된 멀티플렉서(175), 부호 확장 유닛(165) 및 멀티플렉서(155)를 통해 제어가능한 멀티플렉서/스케일러 유닛(185)로부터 데이터를 수신할 수 있다. 제공된 제로-백필(zero-backfill) 유닛(170)은 X 데이터 버스를 멀티플렉서(175)의 제2 입력에 연결한다. 제어가능한 16비트 멀티플렉서/스케일러 유닛(185)은 모드 레지스터(180)를 통해 구성되어, 사인드, 언사인드 또는 혼합형 부호/미부호(unsign)로서 연산할 수 있고, 또한 레지스터 어레이로부터/로 데이터를 송신할 수 있다. 16비트 곱셈기 스케일러 유닛(185)의 출력은 멀티플렉서(175)의 제1 입력에 연결된다. 모드 디코더(195)는 제어가능한 곱셈기/스케일러(185) 연산 모드의 자동적인 제어를 제공한다. 이를 위해, 모드 디코더는 제1 및 제2 오퍼랜드를 나타내는 레지스터들의 수 또는 제1 및 제2 오퍼랜드들의 메모리 위치들의 어드레스들을 수신할 수 있다. 대안적으로, 어드레스 또는 레지스터가 홀수인지 짝수인지의 정보는 모드 디코더에 공급될 수 있다. 모드 디코더는 매트릭스를 사용하여, 이 정보에 따라 곱셈기/스케일러(185)의 구성 모드를 스위칭할 수 있다. 이 자동적인 모드 선택은 모드 레지스터(180)를 통해 프로그램적으로 제어된다. 모드 레지스터(180)의 각각의 비트들이 셋팅되면, 곱셈기/스케일러(185)는 모드 디코더(195)로부터 모드 정보를 직접 수신한다. 그렇지 않으면, 특정 고정 모드(사인드/언사인드)는 모드 레지스터(180)를 통해 선택된다. 이 때문에, 도 1에 점선으로 도시된 바와 같이, 모드 디코더(195)는 역시 모드 레지스터(180)으로부터 제어 신호들을 수신할 수 있다. 1 illustrates a preferred DSP engine that may be used in accordance with the teachings of the present invention. Two 40-bit accumulators 110, 115 may be provided. Two 40-bit accumulators 110, 115, via multiplexers 120, 125, 150, 155, 190, adder 145, round logic 130, and barrel shifter 160. ) An additional multiplexer 135 connects the X data bus with the output of the round logic 130 and the barrel shifter 160. As shown in FIG. 1, the adder 145 may invalidate one input and saturate the result. The barrel shifter 160 may also receive data from the multiplexer / scaler unit 185 controllable via the added multiplexer 175, sign extension unit 165 and multiplexer 155. The provided zero-backfill unit 170 connects the X data bus to the second input of the multiplexer 175. Controllable 16-bit multiplexer / scaler unit 185 is configured via mode register 180 to operate as signed, unsigned, or mixed sign / unsign, and also to transfer data from / to a register array. I can send it. An output of the 16-bit multiplier scaler unit 185 is connected to a first input of the multiplexer 175. The mode decoder 195 provides automatic control of the controllable multiplier / scaler 185 mode of operation. To this end, the mode decoder may receive the number of registers representing the first and second operands or the addresses of the memory locations of the first and second operands. Alternatively, information of whether the address or register is odd or even may be supplied to the mode decoder. The mode decoder may use the matrix to switch the configuration mode of multiplier / scaler 185 according to this information. This automatic mode selection is controlled programmatically via the mode register 180. When each bit of the mode register 180 is set, the multiplier / scaler 185 receives the mode information directly from the mode decoder 195. Otherwise, a particular fixed mode (sign / unsign) is selected via the mode register 180. Because of this, as shown by the dotted lines in FIG. 1, the mode decoder 195 may also receive control signals from the mode register 180.

곱셈기/스케일러 유닛(185)과 모드 디코더(195)의 결합의 또 다른 실시예는 도 2에 도시되어 있다. 이 실시예는 2개의 17비트 입력 워드들을 수신하고, 오퍼랜드들이 분수 오퍼랜드들인 경우 1비트씩 그 결과를 시프트할 수 있는 분수 스케일러(210)로 공급되는 32비트 결과를 출력하는 17 x 17 비트 사인드 곱셈기(220)를 포함할 수 있다. 이를 위해, 분수 스케일러(210)는 모드 제어 레지스터(240)의 비트만큼 제어될 수 있다. 또한 워킹 레지스터들(또는 메모리)로부터 2개의 16비트 오퍼랜드들을 수신하고, 곱셈기(220)로 공급되는 2개의 사인드 17비트 오퍼랜드들을 발생하는 혼합형 모드 오퍼랜드 전처리기(230)가 제공된다. 곱셈기(220)의 결과들은 레지스터 어레이 또는 메모리로 다시 공급될 수 있다. 혼합형 모드 오퍼랜드 전처리기(230)는 사인드 곱셈기(220)에 사인드 오퍼랜드들을 공급한다. 혼합형 모드 오퍼랜드 전처리기는 입력하는 데이터 위치에 따라 그 입력 데이터를 정적 또는 자동으로 변환한다. 용어 위치는 실제 데이터가 그러한 방법으로 저장되는 정보로서 해석될 수 있다. 예를 들면, 2개의 데이터 워드를 저장하는 2개의 연이은 레지스터들 또는 메모리 어드레스들에 있어서, 하나의 데이터 워드는 항상 홀수에 있고, 이어지는 워드는 짝수 위치에 있으며, 그 반대도 같다. 만약 2개의 워드들이 항상 동일한 영역 상에 저장된다면, 이 정보(짝수/홀수)는 MS 워드와 LS 워드 사이에서 신뢰할 수 있게 구별되도록 이용될 수 있다. 혼합형 모드 선택 신호에 의해 나타낸 바와 같이, 여기에서, 이 정보는 혼합형 모드 오퍼랜드 전처리기로 공급된다.Another embodiment of the combination of multiplier / scaler unit 185 and mode decoder 195 is shown in FIG. 2. This embodiment receives two 17-bit input words and outputs a 32-bit result fed to the fractional scaler 210 which can shift the result by one bit if the operands are fractional operands. The multiplier 220 may be included. To this end, the fractional scaler 210 may be controlled by the bits of the mode control register 240. Also provided is a mixed mode operand preprocessor 230 that receives two 16-bit operands from working registers (or memory) and generates two signed 17-bit operands that are fed to multiplier 220. The results of multiplier 220 may be fed back to a register array or memory. Mixed mode operand preprocessor 230 supplies the signed operands to signed multiplier 220. The mixed mode operand preprocessor converts the input data statically or automatically, depending on the location of the data entered. The term location can be interpreted as information in which actual data is stored in such a manner. For example, in two consecutive registers or memory addresses that store two data words, one data word is always at odd numbers, the subsequent word is at even positions, and vice versa. If two words are always stored on the same area, this information (even / odd) can be used to reliably distinguish between the MS word and the LS word. As indicated by the mixed mode selection signal, this information is fed to the mixed mode operand preprocessor.

도 4는 도 2에 도시된 전처리기의 일실시예를 도시한 도면이다. 여기에서, 부호 확장 유닛(410) 및 제로 필 유닛(420)이 제1 및/또는 제2 오퍼랜드 상의 오퍼랜드 변경들을 수행하기 위하여 제공된다. 비교 및 선택 유닛은 어느 유닛(410, 420)이 제1 및 제2 오퍼랜들을 위해 이용되는지를 결정한다. 비교 및 선택 유닛은 레지스터의 어드레스 또는 메모리 위치를 수신하여 그 위치가 홀수인지 짝수인지를 결정할 수 있다. 대안적으로, 이 정보는 예를 들어, 각각의 어드레스 비트 또는 이 정보를 포워딩하거나 발생하도록 동작할 수 있는 다른 수단에 의해 직접 제공될 수 있다.FIG. 4 is a diagram illustrating an embodiment of the preprocessor illustrated in FIG. 2. Here, sign extension unit 410 and zero fill unit 420 are provided to perform operand changes on the first and / or second operand. The comparing and selecting unit determines which units 410, 420 are used for the first and second operands. The comparison and selection unit may receive an address or memory location of a register and determine whether the location is odd or even. Alternatively, this information may be provided directly by, for example, each address bit or other means operable to forward or generate this information.

모든 도면에 도시된 일실시예들은 16비트 레지스터들과, 16비트 또는 17비트 곱셈기들과, 배럴 시프트를 구비한 16비트 DSP를 나타낸다. 이러한 값들은 예시적이다. 본 발명은 2n의 크기를 갖는 오퍼랜드들을 직접 처리할 수 없는 복수의 n비트 레지스터들과 곱셈기를 구비한 또 다른 n비트 프로세서에 적용할 수 있다. One embodiment shown in all figures represents a 16-bit DSP with 16-bit registers, 16-bit or 17-bit multipliers, and a barrel shift. These values are exemplary. The present invention is applicable to another n-bit processor having a multiplier and a plurality of n-bit registers that cannot directly process operands having a size of 2n.

다양한 구현에 따르면, DSP 엔진(100)은 마이크로컨트롤러 유닛(MCU)에 결합될 수 있으며, W 레지스터 어레이로부터 데이터를 공급받지만 그 자신의 특정한 결과 레지스터들을 담고 있는 하드웨어 블록일 수 있다. 하지만, 다른 실시예에 있어서, 데이터는 역시 메모리로부터 공급될 수 있다. DSP 엔진(100)은 MCU 산술 논리 유닛(ALU)을 총괄하는 동일한 단일 발행 명령어 디코더로부터 제어될 수 있다. 또한, 모든 오퍼랜드의 사실상의 어드레스들은 W 레지스터 어레이 내에서 생성될 수 있다. 일실시예에 따르면, 그 결과, 비록 2개의 MCU ALU 및 DSP 엔진 모두 동일 명령어(예를 들면, ED 및 EDAC 명령어들)에 의해 동시적으로 이용될 수 있지만, MCU 명령어 흐름과 함께하는 동시 연산은 가능하지 않을 것이다. According to various implementations, the DSP engine 100 may be coupled to a microcontroller unit (MCU) and may be a hardware block that receives data from a W register array but contains its own specific result registers. However, in other embodiments, data may also be supplied from memory. The DSP engine 100 may be controlled from the same single issue instruction decoder that oversees the MCU arithmetic logic unit (ALU). In addition, virtual addresses of all operands may be generated in the W register array. According to one embodiment, as a result, concurrent computation with the MCU instruction flow is possible, although both MCU ALU and DSP engines can be used simultaneously by the same instruction (eg, ED and EDAC instructions). I will not.

도 1 및 도 2에 도시된 바와 같이, DSP 엔진은 고속 17비트 x 17비트 곱셈기(220), 배럴 시프터(160), 및 2개의 타겟 레지스터들(110, 115)를 구비한 40비트 가산기/감산기(145), 및 라운드/포화 로직(130)으로 이루어진다. DSP 엔진(100)은 누산기 결과 레지스터들(110, 115)만이 클록된 채로, 본질적으로 하나의 큰 비동기 블록일 수 있다. DSP 엔진(100)으로의 데이터 입력은 다음으로부터 얻을 수 있다.As shown in Figures 1 and 2, the DSP engine is a 40-bit adder / subtractor with a fast 17-bit x 17-bit multiplier 220, barrel shifter 160, and two target registers 110, 115. 145, and round / saturation logic 130. DSP engine 100 may be essentially one large asynchronous block, with only accumulator result registers 110, 115 clocked. Data input to the DSP engine 100 can be obtained from the following.

1. W 어레이 레지스터들로부터 직접:1. Directly from W array registers:

- 예를 들어, 명령어들 중에서 MAC 계층을 위한 W4, W5, W6 또는 W7For example, among instructions, W4, W5, W6 or W7 for MAC layer

- 명령어들 중에서 MUL.xx 계층을 위한 어떤 W 레지스터(타겟 누산기 A 또는 B)-Any W register (target accumulator A or B) for the MUL.xx layer among the instructions.

2. 모든 다른 DSP 명령어들을 위한 X-버스로부터2. From the X-bus for all other DSP instructions

3. 배럴 시프터(160)를 이용하는 모든 MCU 명령어들을 위한 X-버스로부터3. From X-bus for all MCU instructions using barrel shifter 160

DSP 엔진으로부터의 데이터 출력은 다음으로 기록될 수 있다.Data output from the DSP engine can then be recorded.

1. 실행될 DSP 명령어에 정의된 바와 같이, 타겟 누산기(110, 115)1. Target accumulators 110, 115, as defined in the DSP instruction to be executed.

2. MAC, MSA, CLRAC 및 MOVSAC 누산기를 위한 X-버스는 EA가 W13 레지스터인 곳 또는 [W13]+=2에 기록한다(MPY(N), SQR{AC), ED{AC)는 누산기 기록 옵션을 제공하지 않음에 주목함).2. Write the X-buses for the MAC, MSA, CLRAC, and MOVSAC accumulators where EA is the W13 register or at [W13] + = 2 (MPY (N), SQR {AC), ED {AC). Note that no option is provided).

3. 배럴 시프터(160)를 이용하는 MCU 명령어들을 위한 X-버스3. X-bus for MCU instructions using barrel shifter 160

4. MCU 멀티플라이 명령어들을 지지하기 위한, 곱셈기(220)로부터 W 어레이로의 32비트 정렬 레지스터 쌍 기록 버스. 4. A 32-bit aligned register pair write bus from multiplier 220 to the W array to support MCU multiply instructions.

또한, DSP 엔진은 추가적인 데이터를 필요로 하지 않는 누산기 연산들로 내재하는 누산기를 수행하는 능력을 구비할 수 있다. 이들 명령어들은 ADDAB, SUBAB 및 NEGAB이다. The DSP engine may also have the ability to perform an accumulator inherent with accumulator operations that do not require additional data. These instructions are ADDAB, SUBAB and NEGAB.

도 1에 도시된 바와 같이, DSP 엔진(100)의 블록도는 사실상 개념적이고, 그것을 행사하는 명령어들에 의해 요구된 데이터 흐름을 이해하는 도움으로서의 사용을 위해 의도된다. 실제 구현의 블록도는 상당히 다를 수 있다. 하나 이상의 특별한 구현들에 대하여, 도 1 및 도 2에 도시된 여러 가지 유닛과 그들의 기능성이 아래에 설명될 것이다. 그러므로, 아래에 설명될 바람직한 그리고 특별한 구현은 본 발명의 범위를 제한하지 않을 것이다.As shown in FIG. 1, the block diagram of the DSP engine 100 is conceptual in nature and is intended for use as an aid in understanding the data flow required by the instructions exerting it. The block diagram of the actual implementation can vary considerably. For one or more particular implementations, the various units and their functionality shown in FIGS. 1 and 2 will be described below. Therefore, the preferred and specific implementations described below will not limit the scope of the present invention.

곱셈기Multiplier

17 x 17 비트 곱셈기(220)는 사인드 연산이 가능하고, 1.31 분수 또는 32비트 정수 결과들을 지지하도록 스케일러를 이용하여 그 출력을 다중화한다. MAC/ MSA, MPY{N}, ED{AC) 및 SQR{AC} 연산들은 전형적으로 사인드이지만, DSP 엔진은 언사인드 또는 혼합형 부호 연산을 위해 구성될 수 있다.The 17 x 17 bit multiplier 220 is capable of a signed operation and multiplexes its output using a scaler to support 1.31 fractional or 32-bit integer results. The MAC / MSA, MPY {N}, ED {AC) and SQR {AC} operations are typically signed, but the DSP engine may be configured for unsigned or mixed code operations.

16비트 CPU 코어 제어 레지스터(240)(CORCON) 내의 3개의 제어 비트들(IF, US<1:0>)는 정수/분수 및 누산기를 타겟으로 하는 DSP 및 MCU 곱셈 명령어들을 위해 각각 언사인드/사인드/혼합형 부호 연산을 결정한다. W 어레이를 타겟으로 하는 MCU 멀티플라이 명령어들은 항상 정수 연산들로 고려될 수 있다. 스케일러(210)는 분수 연산 동안에만 곱셈기 결과를 1비트 왼쪽으로 시프트한다. The three control bits (IF, US <1: 0>) in the 16-bit CPU core control register 240 (CORCON) are unsigned / signed for DSP and MCU multiplication instructions targeting integers / fractionals and accumulators, respectively. Determines the coded / mixed sign operation. MCU multiply instructions targeting the W array can always be considered integer operations. Scaler 210 shifts the multiplier result one bit left only during fractional operations.

정수/분수(IF) 제어 비트Integer / Fractional (IF) Control Bits

일실시예에 따르면, 레지스터(240)의 CORCON<IF> 비트의 상태가 누산기(110, 115)를 타겟으로 하는 DSP와 MCU 곱셈 명령어들을 위한 오퍼랜드 타입을 제어한다. CORCON<IF>=0이면, 멀티플라이 오퍼랜드들은 고정 포인트 1.15 분수 값들로 고려된다. CORCON<IF>=1이면, 멀티플라이 오퍼랜드들은 정수 값들로 고려된다. According to one embodiment, the state of the CORCON <IF> bit in register 240 controls the operand type for DSP and MCU multiply instructions targeting accumulators 110 and 115. If CORCON <IF> = 0, multiply operands are considered fixed point 1.15 fractional values. If CORCON <IF> = 1, multiply operands are considered integer values.

MCU 명령어들이 누산기(110, 115)를 타겟팅하고 있으면, 그들은 DSP 명령어들과 접속하여 사용될 수 있으므로, 동일한 오퍼랜드 타입을 물려받아야 한다고 가정한다. 그러한 경우가 아니라면, 사용자는 그에 맞춰 수동으로 CORCON<IF> 비트를 조작해야한다.If MCU instructions are targeting accumulators 110 and 115, it is assumed that they can be used in conjunction with DSP instructions, so they must inherit the same operand type. If this is not the case, the user must manually manipulate the CORCON <IF> bit accordingly.

사인드Signed /Of 언사인드Unsigned 제어 비트 Control bits

MAC/MSA, MPY{N}, ED{AC} 및 SQR{AC} 명령어들의 CORCON<US[1:0]>=2'bOO이면, 양쪽 오퍼랜드들은 나중에 항상 곱셈기 입력 값의 제17 비트로 부호 확장되는 사인드 값으로 고려된다. 또한, 그 결과는 누산기를 구비한 일부 연산에 우선하여 부호 확장된다(항상 효과적으로 사인드될 것임).If CORCON <US [1: 0]> = 2'bOO of the MAC / MSA, MPY {N}, ED {AC}, and SQR {AC} instructions, both operands are always sign extended to the seventeenth bit of the multiplier input value later. It is considered a signed value. In addition, the result is sign extended prior to some operations with an accumulator (which will always be signed effectively).

MAC/MSA, MPY{N}, ED{AC} 및 SQR{AC} 명령어들의 CORCON<US[1:0]>=2'bO1이면, 양쪽 오퍼랜드들은 나중에 항상 곱셈기 입력 값의 제17 비트로 제로 확장된는 언사인드 값으로 고려된다. 또한, 그 결과는 누산기를 구비한 일부 연산에 우선하여 제로 확장된다(항상 효과적으로 사인드될 것임).If CORCON <US [1: 0]> = 2'bO1 of MAC / MSA, MPY {N}, ED {AC}, and SQR {AC} instructions, both operands are always zero-extended to the 17th bit of the multiplier input value later. It is considered an unsigned value. In addition, the result is zero expanded (which will always be effectively signed) prior to some operations with an accumulator.

MAC/MSA, MPY{N}, ED{AC} 및 SQR{AC} 내의 CORCON<US[1:0]>=2'b1x이면, 오퍼랜드들은 W 레지스터 소스에 따라 사인드 또는 언사인드 값으로서 고려된다. W 레지스터 소스가 홀수(W5, W7)이면, 오퍼랜드는 사인드될 것으로 추정된다. W 레지스터 소스가 짝수이면, 오퍼렌드는 언사인드될 것으로 추정된다. 오퍼랜드들의 하나 또는 양쪽 모두 사인드이면, 그 결과는 사인드 확장되고, 그렇지 않으면, 누산기를 구비한 일부 연산에 앞서 제로 확장된다(항상 효과적으로 사인드될 것임). If CORCON <US [1: 0]> = 2'b1x in MAC / MSA, MPY {N}, ED {AC} and SQR {AC}, the operands are considered as signed or unsigned values depending on the W register source. . If the W register source is odd (W5, W7), the operand is assumed to be signed. If the W register source is even, the operator is assumed to be unsigned. If one or both of the operands are signed, the result is signed expanded, otherwise zero expanded (some will always be effectively signed) prior to some operation with an accumulator.

일실시예에 따르면, CORCON<US[1:0]> 비트들은 그들 자신의 연산이 사인드인지 또는 언사인드인지를 결정하는 MCU 멀티플라이 명령어들에 영향을 주지 않는다.According to one embodiment, the CORCON <US [1: 0]> bits do not affect MCU multiply instructions that determine whether their own operation is signed or unsigned.

MCU 멀티플라이 명령어들MCU Multiply Instructions

정수 16비트 사인드, 언사인드, 혼합형 부호 곱셈들을 포함하는 MCU 멀티플라이 명령어들을 지지하도록, 동일한 곱셈기들이 이용될 수 있다. 도 2에 도시된 바와 같이, 추가적인 데이터 통로들이 제공되어, 이들 명령어들이 그 결과를 W 어레이 및 (W 어레이를 통해) X 데이터 버스로 다시 기록하게 한다. 이들 통로는 데이터 스케일러(210)에 우선하여 위치된다. 또한, 이들 명령어는 역시 누산기들(110, 115)를 타겟으로 할 수 있다. 오퍼랜드들이 분수(CORCON<IF>=0)일 수 있기 때문에, 이들 통로는 데이터 스케일러(210) 뒤에 위치될 수 있다. 즉, 그 결과는 IF 비트의 상태에 기초하여 보통으로 스케일링될 것이다. 모든 MCU 멀티플라이 연산들은 명백하게 사인드 또는 언사인드 연산을 확인한다. MCU 멀티플라이 명령어들은, 명령어 목적지 필드 인코딩에 기초하여 풀(full) 32비트 결과를 짝수 배열된 W 레지스터 쌍으로 또는 그 결과의 LS 16비트만 단일 (짝수) W 레지스터로 기록할 수 있다.The same multipliers can be used to support MCU multiply instructions including integer 16-bit signed, unsigned, mixed sign multiplications. As shown in FIG. 2, additional data paths are provided to allow these instructions to write the results back to the W array and to the X data bus (via the W array). These passages are located prior to the data scaler 210. In addition, these instructions may also target accumulators 110, 115. Since the operands may be fractions (CORCON <IF> = 0), these passages may be located behind the data scaler 210. That is, the result will be scaled to normal based on the state of the IF bit. All MCU multiply operations explicitly identify signed or unsigned operations. MCU multiply instructions may write a full 32-bit result into an evenly arranged W register pair or only LS 16 bits of the result into a single (even) W register based on the instruction destination field encoding.

일부 실시예에 따르면, 32비트 결과들을 위해, MCU 멀티플라이들을 위한 목적지 레지스터 쌍이 정렬될 것이며(즉, 홀수:짝수), 여기서 '홀수'는 MS 결과 워드를 담고, '짝수'는 LS 결과 워드를 담는다. 그러한 실시예에 있어서, W3:W2는 허용될 수 있지만, \4:W3은 허용되지 않으며, 에러로서 어셈블리에 의해 표시될 것이다. 유사하게, 16비트 결과를 위해, 목적지 레지스터는 짝수 값일 수 있다. 예를 들면, W6는 허용될 수 있지만, W7는 허용되지 않으며 어셈블러에 의해 표시될 것이다.According to some embodiments, for 32-bit results, the destination register pairs for the MCU multiplies will be sorted (ie odd: even), where 'odd' contains the MS result word and 'even' contains the LS result word. . In such an embodiment, W3: W2 may be allowed, but \ 4: W3 is not allowed and will be indicated by the assembly as an error. Similarly, for 16-bit results, the destination register can be an even value. For example, W6 may be allowed, but W7 is not allowed and will be indicated by the assembler.

일실시예에 따르면, 언사인드 멀티플라이 명령어는 바이트 또는 워드 규모 오퍼랜드들을 이용하도록 지시된다. 목적지는 항상 W 어레이의 W3:W2 레지스터 쌍일 수 있다. 바이트 오퍼랜드들은 16비트 결과를 W2(W3 변경되지 않음)로 안내하고, 워드 오퍼랜드들은 32비트 결과를 W3:W2로 안내한다. According to one embodiment, an unsigned multiply instruction is instructed to use byte or word scale operands. The destination can always be a W3: W2 register pair in the W array. Byte operands direct the 16-bit result to W2 (W3 unchanged), and word operands direct the 32-bit result to W3: W2.

예를 들면 도 4에 도시된 단순 데이터 전처리 로직은, 언사인드, 사인드 또는 혼합형 부호 곱셈들이 사인드 값들로 실행될 수 있도록, 오퍼랜드들을 17비트로 제로 또는 부호 확장한다.For example, the simple data preprocessing logic shown in FIG. 4 zeros or sign extends the operands to 17 bits so that unsigned, signed or mixed sign multiplications can be performed with signed values.

일실시예에 따르면, 모든 언사인드 오퍼랜드들은 항상 곱셈기 입력 값의 제17 비트로 제로 확장될 수 있다. 모든 사인드 오퍼랜드들은 항상 곱셈기 입력 값의 제17 비트로 사인드 확장될 수 있다. 일실시예에 따르면, 사인드 16비트 멀티플라이들의 경우, 곱셈기(220)는 데이터 30비트와 부호 2비트를 만들고, 이들은 스케일러(210)로 제공된다. 명령어가 정수 모드에서 동작하고 있다면, 그 결과는 변경되지 않고, 32비트 사인드 수로서 곱셈기 블록으로부터 출력된다. 명령어가 분수 모드(CORCON<IF>=0인 경우 DSP 누산기를 타겟으로 하는 DSP ops 및 MCU 멀티플라이 ops)에서 동작하고 있다면, 그 결과는 왼쪽으로 1비트 시프팅된다(즉, 그것은 부호 1비트를 남기게 함). 분수 멀티플라이들의 경우, 그 결과의 비트 0은 항상 0이다. 16비트 혼합형 모드(사인드/언사인드) 멀티플라이들의 경우, 곱셈기는 데이터 31비트와 부호 1비트를 만든다. 언사인드 16비트 멀티플라이들의 경우, 곱셈기는 32비트 언사인드 결과를 만든다.According to one embodiment, all unsigned operands can always be zero extended to the seventeenth bit of the multiplier input value. All signed operands can always be signed extended to the seventeenth bit of the multiplier input value. According to one embodiment, for signed 16-bit multipliers, multiplier 220 produces 30 bits of data and 2 bits of sign, which are provided to scaler 210. If the instruction is operating in integer mode, the result is unchanged and output from the multiplier block as a 32-bit signed number. If the instruction is operating in fractional mode (DSP ops and MCU multiply ops targeting DSP accumulators when CORCON <IF> = 0), the result is shifted 1 bit to the left (ie, it leaves 1 bit of sign). box). For fractional multipliers, bit 0 of the result is always zero. For 16-bit mixed mode (sign / unsign) multipliers, the multiplier produces 31 bits of data and 1 bit of sign. For unsigned 16-bit multipliers, the multiplier produces a 32-bit unsigned result.

도 5 내지 도 7은 각 멀티플라이 타입 및 생성된 해당 결과 포맷에 대하여, 오퍼랜드들이 어떻게 취급되는지를 도시한 표다. 단지 하나의 (짝수) W 레지스터를 타겟으로 하는 MCU 곱셈 명령어들의 경우, 그 결과의 LS 워드, R<15:0>는 타겟 레지스터에 기록된다. 남아있는 MS 비트들은 폐기된다. 5 through 7 are tables showing how operands are treated for each multiply type and the corresponding result format generated. For MCU multiply instructions targeting only one (even) W register, the resulting LS word, R <15: 0>, is written to the target register. The remaining MS bits are discarded.

배럴 Barrel 시프터Shifter

도 8은 단일 사이클 내에 최대 16비트 연산 오른쪽 시프트 또는 16비트 왼쪽 시프트까지 수행할 수 있는 40비트 배럴 시프터(160)의 블록도이다. 소스는 (레지스터의 멀티비트 시프트 또는 메모리 데이터를 지지하도록) 2개의 DSP 누산기들(110, 115) 또는 X-버스 중의 어느 하나일 수 있다. DSP 또는 MCU 멀티비트 시프트 명령어들을 위한 다양한 요구조건을 지지하도록, 시프터(160)는 연산의 2 모드를 특별히 포함하는 주문 설계일 수 있다. 동작 모드는 BIDIR 신호에 의해 제어된다.8 is a block diagram of a 40-bit barrel shifter 160 that can perform up to 16-bit arithmetic right shift or 16-bit left shift in a single cycle. The source may be either the two DSP accumulators 110, 115 or the X-bus (to support the multi-bit shift or memory data of the register). To support various requirements for DSP or MCU multibit shift instructions, the shifter 160 may be custom designed specifically to include two modes of operation. The operating mode is controlled by the BIDIR signal.

모든 MCU 시프트 명령어들(즉, SFTAC 및 SFTACK를 제외한 모든 시프트 명령어들)에 의해 이용될 수 있는 제1 모드에 있어서, 배럴 시프터(160)는 5비트 시프트 규모 값, SFTNUM<4:0> 및 지시 신호 L_R를 받아들인다. L_R=O이면, 시프터(160)는 입력 오퍼랜드를 SFTNUM<4:0>에 의해 정의된 비트들의 수만큼 왼쪽으로 시프트할 것이다. L_R=1이면, 시프터(160)는 입력 오퍼랜드를 SFTNUM<4:0>에 의해 정의된 비트들의 수만큼 오른쪽으로 시프트할 것이다. 도 9는 제어 신호들에 따른 시프트 범위뿐만 아니라 방향 및 규모 제어를 도시한 표이다.In a first mode that can be used by all MCU shift instructions (ie, all shift instructions except SFTAC and SFTACK), barrel shifter 160 has a 5-bit shift scale value, SFTNUM <4: 0> and an indication. Accept the signal L_R. If L_R = O, shifter 160 will shift the input operand left by the number of bits defined by SFTNUM <4: 0>. If L_R = 1, shifter 160 will shift the input operand to the right by the number of bits defined by SFTNUM <4: 0>. 9 is a table showing direction and scale control as well as shift range according to control signals.

MCU 멀티비트 시프트 명령어에는 2 계층이 있는데, 하나는 일정한 시프트 값을 갖고, 나머지는 다른 하나는 가변적인 시프트 값을 갖는다. 일정한 시프트 명령어들(ASRK, LSRK, SLK)은 시프트 범위가 0과 15 사이에 놓이도록 제한하는 4비트 규모 문자 필드를 함유한다. 가변적인 시프트 명령어들(ASRW, LSRW, SLW)은 시프트 규모의 소스로서 W 레지스터를 사용한다. 시프트 결과가 모든 값들에 대하여 정확해지도록, 시프트 연산은 체계화될 것이다. 데이터 값을 15비트들보다 더 큰 16비트로 시프팅하면, 그것을 클리어하거나 세팅하는 것과 동일하기 때문에, 유용한 시프트 범위는 0과 15 사이이다. The MCU multibit shift instruction has two layers, one with a constant shift value and the other with a variable shift value. Certain shift instructions ASRK, LSRK, SLK contain a 4-bit scale character field that limits the shift range to be between 0 and 15. Variable shift instructions (ASRW, LSRW, SLW) use the W register as the source of the shift magnitude. The shift operation will be organized so that the shift result is correct for all values. When shifting a data value to 16 bits larger than 15 bits, a useful shift range is between 0 and 15, as it is the same as clearing or setting it.

40비트 DSP 누산기들에서만 동작하는 남아있는 다른 DSP 시프트 명령어들 SFTAC 및 SFTACK에 의해 이용될 수 있는 제2 모드에 있어서, 시프터는 시프트 연산의 규모와 방향을 나타내는 6비트 2의 보수 사인드 시프트 값에 의해 지시받는다. 시프트 방향과 부호는 데이터 정규화를 필요한 것에 일치한다. In the second mode, which can be used by the remaining remaining DSP shift instructions SFTAC and SFTACK operating only on 40-bit DSP accumulators, the shifter is applied to a 6-bit two's complement signed shift value representing the magnitude and direction of the shift operation. Instructed by The shift direction and sign correspond to those required for data normalization.

이 모드에서, L_R 신호는 부호 비트(와 그 결과 여전히 시프트 방향을 표시함)가 되고, SFTNUM<4:0>는 시프트 값 LS 5 비트들로, 시프트 규모를 나타낸다. 양수(L_R=0, 오른쪽 시프트) 값들의 경우, 시프터는 타겟 누산기를 0부터 최대 16까지 SFTNUM<4:0>에 의해 정의된 비트들의 수만큼 오른쪽으로 시프트할 것이다. 음수((L_R=1, 왼쪽 시프트) 값들의 경우, 시프터는 타겟 누산기를 0부터 최대 16까지 SFTNUM<4:0>에 의해 정의된 비트들의 수의 2의 보수만큼 왼쪽으로 시프트할 것이다. 다시 도 9의 표를 이용하여 제어 신호들에 따른 시프트 범위뿐만 아니라 방향 및 규모 제어를 설명한다.In this mode, the L_R signal becomes a sign bit (and consequently still indicating the shift direction), and SFTNUM <4: 0> is the shift value LS 5 bits, representing the shift magnitude. For positive (L_R = 0, right shift) values, the shifter will shift the target accumulator to the right by the number of bits defined by SFTNUM <4: 0> from 0 to a maximum of 16. For negative ((L_R = 1, left shift) values, the shifter will shift the target accumulator left by two's complement of the number of bits defined by SFTNUM <4: 0> from 0 to a maximum of 16. The table of 9 describes the direction and scale control as well as the shift range according to the control signals.

이 모드에서, DSP 멀티비트 시프트 명령어에는 2 계층이 있는데, 하나는 일정한 시프트 값(SFTACK)을 갖고, 나머지는 다른 하나는 가변적인 시프트 값(SFTAC)을 갖는다. 어셈블러는 16보다 크거나 -16보다 작은 시프트 값을 갖는 SFTACK 명령어를 이용하려는 시도를 방해할 것이다. SFTAC 명령어의 경우, 최대 시프트 규모는 16보다 크거나 -16보다 작은 시프트 값으로 하드웨어에 의해 제한된다. 유효 범위 바깥의 시프트 값을 갖는 SFTAC 명령어를 실행하려는 시도는 산술 에러 트랩(math error trap)을 발생하게 할 것이다. 이것이 발생하면, 시도된 시프트 결과는 타겟 누산기에 기록되지 않을 것이다. (예를 들면, 명령어 문자 필드를 수동으로 조작함에 의해) 유효 범위 바깥의 시프트 값을 갖는 SFTACK 명령어를 실행하려는 시도는 방해되지 않을 것이다. 명령어는 실행될 것이지만, 올바른 결과를 만들어낼 수 없다. 타겟 DSP 누산기가 40비트 와이드(wide)이기 때문에 SFTAC 범위는 제한되므로, 16비트보다 큰 시프트 값은 의미있는 결과를 만들어낼 수 있다(즉, 그것은 제로 또는 모든 1의 보수가 아닐 수도 있으며, MCU 멀티비트 시프트 ops를 위한 경우일 수 있음). 즉, MCU 가변 멀티비트 시프트 명령어들은 어떤 시프트 규모를 받아들일 수 있고, 또한 여전히 올바른 결과를 얻을 수 있지만, 이것이 시프터(160) 범위의 확장 없이 SFTAC 명령어를 갖는 경우가 아니다.In this mode, the DSP multi-bit shift instruction has two layers, one with a constant shift value (SFTACK) and the other with a variable shift value (SFTAC). The assembler will prevent attempts to use SFTACK instructions with shift values greater than 16 or less than -16. For SFTAC instructions, the maximum shift magnitude is limited by hardware with shift values greater than 16 or less than -16. Attempts to execute an SFTAC instruction with a shift value outside the valid range will result in a math error trap. If this occurs, the attempted shift result will not be written to the target accumulator. Attempts to execute SFTACK instructions with shift values outside the valid range (eg, by manually manipulating the instruction character field) will not be disturbed. The command will be executed, but it will not produce correct results. Since the SFTAC range is limited because the target DSP accumulator is 40 bits wide, shift values larger than 16 bits can produce meaningful results (i.e. it may not be zero or all 1's complement, MCU multi May be the case for bit shift ops). That is, MCU variable multi-bit shift instructions can accept any shift scale and still get the correct result, but this is not the case with SFTAC instructions without extending the shifter 160 range.

또한, 부호가 제거되도록 누산기에서 사인드 값을 시프트하는 것은 가능하다. 그 후 배럴 시프트 결과가 정상적으로 포화 로직을 통해 통과되면, 그것은 새로운 부호에 기초하여 잘못된 포화 결과를 만들어낼 것이다. 하지만, 원래의 데이터 값의 부호에 기초하여 대변동의 오버플로우가 인정되고, 포화가 정확하게 적용되도록, 시프터(160)는 비트 39를 넘어 왼쪽으로 시프팅된 비트를 검사하는 로직을 포함한다. 예를 들면,It is also possible to shift the signed value in the accumulator so that the sign is removed. If the barrel shift result then passes through the saturation logic normally, it will produce a false saturation result based on the new sign. However, the shifter 160 includes logic to check the shifted bit to the left beyond bit 39 so that a catastrophic overflow is accepted based on the sign of the original data value and saturation is applied correctly. For example,

; Q31 포화가 인에이블되었다고 가정한다.; Assume that Q31 saturation is enabled.

; 그리고 AccA=Ox0078AA0000; And AccA = Ox0078AA0000

SFTAC A, #9SFTAC A, # 9

; Ox007FFFFFPF->AccA, SA=1; Ox007FFFFFPF-> AccA, SA = 1

; AccAii9=OxP554000000 그러나 비트 39 오버플로우가 검출되면, SA는 세팅되고, 최대 포지티브(원래 AccA[39]=0) Q31 값으로 포화됨.; AccAii9 = OxP554000000 However, if a bit 39 overflow is detected, the SA is set and saturated to the maximum positive (original AccA [39] = 0) Q31 value.

멀티비트 MCU 시프트 명령어들 ASRK, LSRK 및 SLK는 명령어 문자 필드로부터 언사인드 4비트 시프트 값을 제공한다. 명령어들은 SFTNUM<5:0>이 되도록 이 값을 5비트로 제로 확장한다. 명령어 시프트 방향은 L-R를 결정한다. 멀티비트 MCU 시프트 명령어들 ASRW, LSRW 및 SLW는 W 레지스터 Wb의 LS 4비트들로부터 언사인드 4비트 시프트값을 추출한다. 명령어들은 FTNUM<5:0>이 되도록 이 값을 5비트로 제로 확장한다. 명령어 시프트 방향은 L-R를 결정한다. 시프터 모드 신호 BIDIR는 이들 명령어들을 위해 클리어된다. Wb의 남아있는 MS 12비트들의 일부가 세팅되면(시프트 값이 15보다 큼을 나타냄), 그 후 시프트 결과는 강제적으로 제로 또는 모두 1의 보수가 될 것이다(원래 오퍼랜드의 MS 비트가 세팅된 경우의 ASRW를 위함). Multibit MCU Shift Instructions ASRK, LSRK and SLK provide an unsigned 4-bit shift value from the instruction character field. The instructions zero extend this value into 5 bits so that SFTNUM <5: 0>. The instruction shift direction determines L-R. Multi-Bit MCU Shift Instructions ASRW, LSRW and SLW extract the unsigned 4-bit shift value from the LS 4 bits of the W register Wb. The instructions zero extend this value into 5 bits so that FTNUM <5: 0>. The instruction shift direction determines L-R. The shifter mode signal BIDIR is cleared for these instructions. If some of the remaining MS 12 bits of Wb are set (which indicates a shift value greater than 15), then the shift result will be forced to zero or all 1's complement (the ASRW when the MS bit of the original operand is set). For).

멀티비트 DSP 시프트 명령어 SFTACK는 명령어 문자 필드로부터 2의 보수 사인드 6비트 시프트 값을 제공한다. 이 값의 MS 비트는 L_R에 할당되며, 남아있는 5비트들은 SFTNUM<4:0>에 할당된다. 이 시프터 모드 신호, BIDIR는 이 명령어를 위해 세팅된다. 유효 범위 바깥의 시프트 값들은, 그들이 어셈블러에 의해 검출되지 않기 때문에 하드웨어에서 방해받지 않는다. The multibit DSP shift instruction SFTACK provides a two's complement signed 6 bit shift value from the instruction character field. The MS bits of this value are assigned to L_R, and the remaining 5 bits are assigned to SFTNUM <4: 0>. This shifter mode signal, BIDIR, is set for this command. Shift values outside the effective range are not disturbed in hardware because they are not detected by the assembler.

멀티비트 DSP 시프트 명령어 SFTAC는 W 레지스터 Wn의 LS 6비트들로부터 사인드 6비트 시프트 값을 추출한다. 이 값의 MS 비트는 L_R에 할당되며, 남아있는 5비트들은 SFTNUM<4:0>으로 할당된다. 이 시프터 모드 신호, BIDIR는 이 명령어를 위해 세팅된다. 배럴 시프터의 최대 범위를 넘어 시프팅되는 시도를 방지하기 위하여, 시프트 값이 유효하다는 것을 확인하기 위해 Wn이 검사된다. 시프트 값이 16보다 크거나 -16보다 작으면, 산술 에러 트랩은 발생될 것이고, 시프트 결과는 타겟 누산기에 기록되지 않을 것이다. The multi-bit DSP shift instruction SFTAC extracts a signed 6-bit shift value from the LS 6 bits of the W register Wn. The MS bits of this value are assigned to L_R, and the remaining 5 bits are assigned to SFTNUM <4: 0>. This shifter mode signal, BIDIR, is set for this command. To prevent attempts to shift beyond the maximum range of the barrel shifter, Wn is checked to confirm that the shift value is valid. If the shift value is greater than 16 or less than -16, an arithmetic error trap will be generated and the shift result will not be written to the target accumulator.

배럴 시프터(160)는 누산기들의 폭을 수용하기 위하여 40비트 와이드(wide)이다. 배럴 시프터(160)는 DSP 와 MCU 시프트 연산 모두를 위해 이용된다. 결과 데이터는 DSP 시프트 연산들을 위하여 BSout40으로부터, 그리고 MCU 시프트 연산들을 위하여 BSout16으로부터 얻어진다. The barrel shifter 160 is 40 bits wide to accommodate the width of the accumulators. The barrel shifter 160 is used for both DSP and MCU shift operations. The resulting data is obtained from BSout40 for DSP shift operations and from BSout16 for MCU shift operations.

데이터 Is는 데이터 통로를 구성하는 일련의 멀티플렉서들을 통해, MCU 시프트들이 달성될 수 있도록, 배럴 시프트로 그리고 배럴 시프트로부터 라운팅된다. 이 데이터 선택을 달성하는 멀티플렉서들은 M1 내지 M4로 칭해진다. 도 10은 각 DSP 엔진 명령어를 위하여, 멀티플렉서들이 어떻게 구성되는지를 보여주는 표이다. 도 11은 디코딩 효율의 가능성을 입증하기 위하여, 명령어들이 공통 제어 블록들로 그룹핑되는 대체 가능한 매핑을 도시한다. 가능하다면, '상관없음'('don't care') 상태들은 디코딩 요건을 압축하기 위하여 사용되어 왔다. 리던던트 신호들(redundant signals)은 ( )로 도시되어 있다.Data Is is rounded to and from the barrel shift so that MCU shifts can be achieved through a series of multiplexers that make up the data path. Multiplexers that achieve this data selection are referred to as M1 through M4. 10 is a table showing how the multiplexers are configured for each DSP engine instruction. 11 shows an alternative mapping in which instructions are grouped into common control blocks to demonstrate the possibility of decoding efficiency. If possible, 'don't care' states have been used to compress the decoding requirements. Redundant signals are shown by ().

배럴 시프트로의 데이터 입력은 멀티플렉서들 M1, M2 & M3(결합되어 도 1에 먹스 N6으로 도시됨)에 의해 제어되며, 다음의 소스들의 하나일 수 있다:Data input to the barrel shift is controlled by multiplexers M1, M2 & M3 (combined as mux N6 in FIG. 1) and can be one of the following sources:

1. 누산기1. Accumulator

2, 곱셈기 부호 확장 유닛의 출력2, the output of the multiplier sign expansion unit

3. 제로3. Zero

X-버스로부터의 데이터는 오른쪽 시프트들을 위하여 비트 위치들 16 내지 31 사이에서 그리고 왼쪽 시프트들을 위하여 비트 위치들 0 내지 15 사이에서, 배럴 시프트로 제공된다. 그러므로, 최대 범위(16비트 왼쪽 내지 15비트 오른쪽) 연산은 데이터 메모리 또는 레지스터들 상의 모든 멀티비트 계산 또는 논리 시프트 연산들을 이용할 수 있다. Data from the X-bus is provided in a barrel shift, between bit positions 16 through 31 for right shifts and between bit positions 0 through 15 for left shifts. Therefore, a maximum range (16 bit left to 15 bit right) operation can use all multibit calculations or logical shift operations on data memory or registers.

데이터 data 누산기Accumulator 및 가산기/감산기 And adder / subtractor

데이터 누산기 블록(100)은 곱셈기 결과를 위해 자동적인 결과(제로 또는 부호) 확장 로직을 갖는 40비트 가산기/감산기(145)를 포함한다. 40비트 가산기/감산기(145)는 그것의 전-누산 소스와 후-누산 목적지로서 2개의 누산기들(110, 115)(A, B) 중 어느 하나를 선택할 수 있다. 도 1에 도시된 바와 같이, ADDAC 및 LAC 명령어들을 위하여, 축적되거나 로딩될 데이터는 옵션으로 누산에 앞서 배럴 시프트(160)를 통해 스케일될 수 있다. Data accumulator block 100 includes a 40-bit adder / subtractor 145 with automatic result (zero or sign) expansion logic for multiplier results. The 40-bit adder / subtractor 145 may select either of the two accumulators 110, 115 (A, B) as its pre-accumulation source and post-accumulation destination. As shown in FIG. 1, for ADDAC and LAC instructions, data to be accumulated or loaded may optionally be scaled through barrel shift 160 prior to accumulation.

도 1에 도시된 데이터 가산기 블록(100)은 사실상 개념적이고, 오직 그것을 행사하는 명령어들에 의해 요구된 데이터 흐름을 이해하는 도움으로서의 사용을 위해 의도된다. 실제 구현의 블록도는 상당히 다를 수 있다. The data adder block 100 shown in FIG. 1 is in fact conceptual and intended only for use as an aid in understanding the data flow required by the instructions exerting it. The block diagram of the actual implementation can vary considerably.

DSP 명령어들의 각각을 위해, 매핑된 수개의 데이터 통로 선택과 기능 제어 신호들이 도 12에 도시되어 있다. 대체가능한 매핑은 도 13에 도시되어 있으며, 여기서 명령어들은 디코딩 효율의 가능성을 입증하기 위하여 공통 제어 블록들로 가져간다. 가능하다면, '상관없음'('don't care') 상태는 디코딩 요건을 압축하기 위하여 사용되어 왔다(무효 제어 신호는 이 활동을 위하여 무시됨) . 리던던트 신호들은 ( )로 도시되어 있다.For each of the DSP instructions, several mapped data path selection and function control signals are shown in FIG. 12. An alternative mapping is shown in FIG. 13, where instructions are taken to common control blocks to demonstrate the possibility of decoding efficiency. If possible, the 'don't care' state has been used to compress the decoding requirements (invalid control signals are ignored for this activity). Redundant signals are shown by ().

결과 확장 블록Result expansion block

DSP 엔진이 사인드 모드(US=1)로 동작하는 경우, 결과 확장 블록(165)은 제공된 32비트 수를 40비트로 부호 확장한다. DSP 엔진이 언사인드 모드(US=0)로 동작하는 경우, 결과 확장 블록(165)은 제공된 32비트 수를 40비트로 제로 확장한다. 일실시예에 따르면, 또한 배럴 시프터(160)는 결과 제로 확장을 강제할 수 있을 필요가 있을 것이다. 하지만, 이것은 다양한 방법으로 달성될 수 있다. When the DSP engine is operating in signed mode (US = 1), the result extension block 165 sign extends the provided 32 bit number to 40 bits. When the DSP engine operates in unsigned mode (US = 0), the result extension block 165 zero-extends the provided 32-bit number to 40 bits. According to one embodiment, the barrel shifter 160 will also need to be able to force the resulting zero expansion. However, this can be accomplished in a variety of ways.

제로 백-필(zero back-fill)Zero back-fill

시스템 설명을 단순화하기 위해, 제로 백-필 블록(170)은 항상 16개의 적어도 중요한 제로들을 X-버스로부터 워드 독출상으로 연관시킨다. 또한, 배럴 시프터(160)는 동일한 능력을 갖도록 제시된다. 이것은 다시 설계 구현되고, 다양한 방법으로 달성될 수 있다.To simplify the system description, zero back-fill block 170 always associates 16 at least significant zeros from the X-bus onto the word read. In addition, the barrel shifter 160 is presented to have the same capability. This is again designed and implemented and can be accomplished in a variety of ways.

가산기/감산기, Adder / subtractor, 오버플로우Overflow & 포화 & Saturated

가산기/감산기(145)는 한 입력으로 선택적인 제로 입력을 갖고, 다른 입력으로 진짜 또는 보수 데이터를 갖는 40비트 가산기이다. 또한 가산기(145)는 높거나 낮을 수 있는 신호의 캐리어를 수신할 수 있고, 그것은 래칭되어 상태 레지스터 제어 블록으로 라운팅된 2개의 오버플로우 상태 비트들을 발생해야 한다. . Adder / subtracter 145 is a 40-bit adder that has an optional zero input with one input and real or complement data with another input. The adder 145 can also receive a carrier of the signal, which can be high or low, which must latch and generate two overflow status bits that are rounded to the status register control block. .

비트 39로의 오버플로우는 누산기의 부호를 멸실하는 대변동의 오버플로우로서 사용될 수 있다. 비트 31 내지 비트 39로의 오버플로우는 복구 오버플로우로서 사용될 수 있다. 이 비트는 이들 비트들이 모두 동일하지 않을 때에는 언제든지 세팅된다. 그것은 누산기(110, 115)에 기록되는 데이터 값이 더 이상 1.31 분수 값으로 표시되어 질 수 없음을 나타낸다.The overflow to bit 39 can be used as an overflow of the cataclysm that loses the sign of the accumulator. Overflow to bits 31 to 39 can be used as a recovery overflow. This bit is set whenever any of these bits are not the same. It indicates that data values recorded in accumulators 110 and 115 can no longer be represented as 1.31 fractional values.

선택되면, 가산기(145)는 누산기 데이터 포화를 제어하는 추가적인 포화 블록을 구비한다. 언제 포화되었는지, 그리고 어떤 값으로 포화되었는지를 결정하기 위하여, 가산기는 가산기 결과, 상술한 오버플로우 상태 비트들, 및 SATNB 및 ACCSAT 모드 제어 비트들을 이용한다. 가산기/감산기(145) 및 포화 블록들은 이후로 DSP AU(계산 유닛)로서 언급된다. 일실시예에 따르면, 언사인드 모드(US=1)에서 DSP 엔진이 동작하는 경우, 비록 OA, OB, SA 및 SB 상태 비트들(및 연관된 포화 활동)이 어떤 의미를 포함하지 않을지라도, 그들은 금지되지 않으며, 인에이블되면, DSP 엔진(100)의 사인드/언사인드 동작 모드에 관계없이, 동일한 규칙에 근거하여 포화가 발생할 것이다.When selected, adder 145 has additional saturation blocks that control accumulator data saturation. To determine when and when to saturate, the adder uses the adder result, the overflow status bits described above, and the SATNB and ACCSAT mode control bits. The adder / subtractor 145 and the saturation blocks are hereinafter referred to as the DSP AU (calculation unit). According to one embodiment, when the DSP engine is operating in unsigned mode (US = 1), even if the OA, OB, SA and SB status bits (and associated saturation activity) do not contain any meaning, they are prohibited If enabled, saturation will occur based on the same rules, regardless of the signed / unsigned mode of operation of the DSP engine 100.

6개의 상태 레지스터 비트들은 포화와 오버플로우를 지지하도록 추가될 수 있다. 그들은 : Six status register bits can be added to support saturation and overflow. they :

1. OA: 가드 비트들로의 AccA 분수 오버플로우(더 이상 1.31 분수 값으로 나타낼 수 없음)OA: AccA fractional overflow into guard bits (can no longer be represented as a 1.31 fractional value)

OB: 가드 비트들로의 AccB 분수 오버플로우(더 이상 1.31 분수 값으로 나타낼 수 없음) OB: AccB fractional overflow into guard bits (can no longer be represented as a 1.31 fractional value)

OA 및 OB는 CORCON 모드 동작 레지스터(180) 내의 R/W 비트들일 수 있다.OA and OB may be R / W bits in CORCON mode operation register 180.

2. SA: a) 정상 포화 인에이블 : AccA가 가드 비트들로 오버플로우되면, SA 세팅됨. AccA는 1.31 값으로 포화될 것이다.2. SA: a) Normal Saturation Enable: If AccA overflows with guard bits, SA is set. AccA will saturate to a value of 1.31.

b) 슈퍼 포화 인에이블 : AccA가 부호(AccA<39>)로 오버플로우되면, SA 세팅됨. AccA는 9.31 값으로 포화될 것이다. b) Super Saturation Enable: SA is set if AccA overflows with sign (AccA <39>). AccA will be saturated to a value of 9.31.

c) 포화 디스에이블 : AccA가 부호(AccA<39>)로 오버플로우되면, SA 세팅됨. AccA는 동작의 (오버플로우된) 결과를 담을 것이다. COVTE가 세팅되면, 산술 에러 트랩은 발생할 것이다. 그 후 트랩 핸들러는 대변동의 오버플로우를 취급하는 적당한 조치를 취할 수 있다. c) Saturation disable: SA is set when AccA overflows with sign (AccA <39>). AccA will contain the (overflowed) result of the operation. If COVTE is set, an arithmetic error trap will be generated. The trap handler can then take appropriate measures to deal with the overflow of the cataclysm.

SA는 또한, CORCON 레지스터(180) 내의 R/W 비트들일 수 있다.The SA may also be R / W bits in the CORCON register 180.

3. SB: a) 정상 포화 인에이블 : AccB가 가드 비트들로 오버플로우되면, SB 세팅됨. AccB는 1.31 값으로 포화될 것이다.3. SB: a) Normal Saturation Enable: If AccB overflows with guard bits, SB is set. AccB will saturate to a value of 1.31.

b) 슈퍼 포화 인에이블 : AccB가 부호(AccB<39>)로 오버플로우되면, SB 세팅됨. AccB는 9.31 값으로 포화될 것이다. b) Super Saturation Enable: SB is set if AccB overflows with sign (AccB <39>). AccB will be saturated to a value of 9.31.

c) 포화 디스에이블 : AccB가 부호(AccB<39>)로 오버플로우되면, SB 세팅됨. AccB는 동작의 (오버플로우된) 결과를 담을 것이다. COVTE가 세팅되면, 산술 에러 트랩은 발생할 것이다. 그 후 트랩 핸들러는 대변동의 오버플로우를 취급하는 적당한 조치를 취할 수 있다. c) Saturation disable: SB is set if AccB overflows with sign (AccB <39>). AccB will contain the (overflowed) result of the operation. If COVTE is set, an arithmetic error trap will be generated. The trap handler can then take appropriate measures to deal with the overflow of the cataclysm.

SB는 또한, CORCON 레지스터(180) 내의 R/W 비트들일 수 있다.SB may also be R / W bits in CORCON register 180.

4. OAB: OA 및 OB의 논리적인 OR4. OAB: logical OR of OA and OB

5. SAB: SA 및 SB의 논리적인 OR 5. SAB: logical OR of SA and SB

정상 포화 모드(1.31)에서 동작하는 경우, 비트 31은 1.31 분수의 부호 비트이다. 누산기에 남아있는 비트들은 실제 기능이 아니며, 항상 비트 31로부터 부호 확장될 것이다. OA/OB는 결코 세팅되지 않을 것이다. 슈퍼 포화(또는 포화 없음)에서 동작하는 경우, 비트 31은 모두 40비트 사인드 분수 값 중에서 정수 부분을 함께 제공하는 가드 비트들 중 하나가 된다. 그러므로, 가드 비트들은 비트 31 내지 비트 38을 잡아먹고, 비트 39는 부호 비트로 새롭게 지명된다. 분수 오버플로우는 모든 가드 비트들 및 부호 비트(즉, 비트 31 내지 비트 39)가 동일하지 않으면 언제든지 검출된다. When operating in the normal saturation mode (1.31), bit 31 is the sign bit of the 1.31 fraction. The bits remaining in the accumulator are not a real function and will always be sign extended from bit 31. OA / OB will never be set. When operating at super saturation (or no saturation), bit 31 becomes one of the guard bits that together provide an integer portion of the 40-bit signed fraction value. Therefore, the guard bits eat bits 31 to 38, and bit 39 is newly designated as the sign bit. Fractional overflow is detected at any time unless all guard bits and sign bits (ie, bits 31 to 39) are the same.

OA 및 OB 비트는 DSP AU를 통해 데이터 패스를 할 때마다 수정될 수 있다. 세팅되면, OA 및 OB 비트는 가장 최근 연산이 누산기 가드 비트들로 오버플로우됨을 나타낸다. 또한, 세팅되면, OA 및 OB 비트들은 선택적으로 산술 에러 트랩을 발생할 수 있고, INTCON1 레지스터의 해당 오버플로우 트랩 플래그 인에이블 비트(OVATE, OVBTE)가 세팅된다. 이 때문에, 사용자는 예를 들면, 시스템 이득을 정정하도록, 즉시 조치를 취할 수 있다. The OA and OB bits can be modified each time data passes through the DSP AU. If set, the OA and OB bits indicate that the most recent operation overflows with the accumulator guard bits. Also, if set, the OA and OB bits can optionally generate an arithmetic error trap, and the corresponding overflow trap flag enable bits (OVATE, OVBTE) in the INTCON1 register are set. Because of this, the user can take immediate action, for example, to correct the system gain.

OA/OB는 DSP AU 출력에서의 데이터 값에 기초하여 업데이팅된다(즉, 만약에 있다면, 나중 포화(post saturation)). 그 결과, Q31 포화 모드가 선택되고, 비록 가산기가 오버플로우가 발생했음(비트 31 내지 39가 항상 동일함)을 나타낼지라도, OA/OB는 결코 세팅되지 않을 것이다.OA / OB is updated based on the data value at the DSP AU output (ie, if present, post saturation). As a result, the Q31 saturation mode is selected, and even though the adder indicates that an overflow has occurred (bits 31 to 39 are always the same), the OA / OB will never be set.

OA/OB는 다음의 DSP AU 연산을 업데이팅할 뿐이다. 모든 DSP 명령어들은 DSP AU 통해 데이터를 통과시키지만(그리고 OA/OB를 업데이팅할 것임), 누산기 SFR들로의 기록은 DSP 엔진을 통해 통과되지 않을 것이므로, OA/OB는 업데이팅되지 않을 것이다.OA / OB only updates the following DSP AU operations. All DSP instructions pass data through the DSP AU (and will update the OA / OB), but since the writes to the accumulator SFRs will not be passed through the DSP engine, the OA / OB will not be updated.

SA 및 SB 비트들은 DSP AU를 통해 데이터가 통과될 때마다 세팅될 수 있지만, 사용자 또는 CLRAC 명령어에 의해서만 클리어된다(예를 들면, 그들은 사실상 '스티키'('sticky')임). 세팅되면, SA 및 SB 비트들은 누산기가 최대 범위(32비트 포화를 위한 비트 31, 40비트 포화를 위한 비트 39)를 오버플로잉함과, (포화가 인에이블된다면) 포화될 것임을 나타낸다. 포화가 인에이블되지 않으면, SA 및 SB는 비트 39로 오버플로우를 이행하지 않고, 따라서 대변동 오버플로우가 발생함을 나타낸다. INTCON1 레지스터의 COVTE 비트가 세팅되고 포화가 디스에이블인 경우, SA 및 SB 비트들은 산술 에러 트랩을 발생할 것이다. The SA and SB bits can be set each time data is passed through the DSP AU, but only cleared by the user or CLRAC instruction (eg they are actually 'sticky'). If set, the SA and SB bits indicate that the accumulator will overflow the maximum range (bit 31 for 32 bit saturation, bit 39 for 40 bit saturation) and will be saturated (if saturation is enabled). If saturation is not enabled, SA and SB do not transition to bit 39, indicating that catastrophic overflow occurs. If the COVTE bit in the INTCON1 register is set and saturation is disabled, the SA and SB bits will generate an arithmetic error trap.

SA 및 SB 상태 비트는 '스티키'이다. 일단 세팅되면, 어떤 차후 누산기 근거 연산들(any subsequent accumulator based operations)로부터의 결과들에도 불구하고, SA 및 SB 상태 비트는 포화 로직에 의해 클리어되지 않을 수 있다(사용자 코드, 예를 들면 CLRAC에 의해서만 클리어됨). 하지만, 누산기 콘텐츠 그 자체는 '스티키'가 아니다. 이것은 모든 차후 동작들이 새로운 결과들, 즉 누산기가 이미 포화되었는지 아닌지를 계속적으로 축적함을 의미한다. 이로 인해, 결과적으로 계속된 포화가 초래되거나(예를 들면, 누산기는 최대 양수 값으로 포화되며, 새로운 누산은 이 값에 가산을 시도함), 누산기 콘텐츠들이 변경된다(예를 들면, 누산기는 최대 양수 값으로 포화되며, 새로운 누산은 누산기 콘텐츠를 이 값만큼 감소시키는 이 값으로부터 감산을 시도함). SA and SB status bits are 'sticky'. Once set, despite the results from any subsequent accumulator based operations, the SA and SB status bits may not be cleared by the saturation logic (only by user code, eg CLRAC). Cleared). However, the accumulator content itself is not 'sticky'. This means that all subsequent operations continue to accumulate new results, ie whether the accumulator is already saturated. This results in continued saturation (e.g., the accumulator saturates to a maximum positive value, and a new accumulator attempts to add to this value), or the accumulator contents are changed (e.g., the accumulator is at maximum). Saturating to a positive value, a new accumulator attempts to subtract from this value which reduces the accumulator content by this value).

OA 및 OB 비트들이 '스티키'가 아니면, 각 누산기에 근거한 연산의 평가에 기초한다. If the OA and OB bits are not 'sticky', it is based on the evaluation of the operation based on each accumulator.

오버플로우 및 포화 상태 비트들은 OA 및 OB의 논리 OR로서 (비트 OAB의) 상태 레지스터와 SA 및 SB의 논리 OR로서 (비트 SAB의) 상태 레지스터에서 선택적으로 보일 수 있다. 이 때문에, 프로그래머는 어느 하나의 누산기가 오버플로우되었는지를 판단하기 위해 상태 레지스터의 한 비트를 검사할 수 있고, 또는 어느 하나의 누산기가 포화되었는지를 판단하기 위하여 한 비트를 검사할 수 있다.Overflow and saturation status bits may optionally appear in the status register (of bit OAB) as the logical OR of OA and OB and in the status register (of bit SAB) as the logical OR of SA and SB. Because of this, the programmer can check one bit in the status register to determine which accumulator has overflowed, or can check a bit to determine which accumulator is saturated.

SAB 및 OAB는 래칭되거나 '스티키'되지 않는다. (OAB를 위해) OA 또는 OB, (SAB를 위해) SA 또는 SB가 세팅되면 언제든지 그들은 하나로서 독출할 것이다. 연관된 비트들 둘 다 소중하면 언제든지, 그들은 제로로서 독출될 것이다. 하지만, 비록 SAB가 '스티키'가 아니고 OAB는 독출용 비트일지라도, SA 및 SB의 '스티키' 속성은 SAB로 하여금 '스티키'를 나타내게 한다. SAB and OAB are not latched or 'sticky'. Whenever OA or OB (for OAB) or SA or SB (for SAB) is set, they will read as one. Whenever both associated bits are valuable, they will be read as zero. However, although the SAB is not 'sticky' and the OAB is a read bit, the 'sticky' attribute of SA and SB causes the SAB to indicate 'sticky'.

SAB는 SA 및 SB 모두 동시적으로 클리어하도록 신호를 제공하는 수단으로서 기록될 수 있다. 이 클리어 동작은 래치를 클리어하지 않지만, 교대로 다음 독출 동안에 SAB가 클리어로서 독출되게 한다. 장치는 3개의 포화와 오버플로우 모드들을 지지한다.The SAB can be written as a means of providing a signal to clear both SA and SB simultaneously. This clear operation does not clear the latch but alternately causes the SAB to be read as clear during the next read. The device supports three saturation and overflow modes.

1. 비트 39 오버플로우 및 포화 : 가산기로부터 비트 39 오버플로우 상태 비트, 그리고 가산 후에 비트 39 값을 이용하여, 9.31 결과의 올바른 부호가 결정될 수 있다. 그 후 포화 로직은 최대 양수 9.31(Ox7FFFFFFFFF) 또는 최소 음수 9.31 값(Ox8000000000)을 타겟 누산기로 로딩한다. SA 또는 SB 비트는 세팅되고, 사용자에 의해 클리어될 때까지 세팅을 유지한다. 이것은 '슈퍼 포화'로 언급되며, 잘못된 데이터 또는 기대하지 아니한 알고리즘 문제들(예를 들면, 이득 계산들)에 대하여 보호를 제공한다.1. Bit 39 Overflow and Saturation: Using the bit 39 overflow status bit from the adder, and the bit 39 value after addition, the correct sign of the 9.31 result can be determined. The saturation logic then loads the maximum positive 9.31 (Ox7FFFFFFFFF) or the minimum negative 9.31 value (Ox8000000000) into the target accumulator. The SA or SB bit is set and holds the setting until cleared by the user. This is referred to as 'super saturation' and provides protection against false data or unexpected algorithmic problems (eg gain calculations).

2. 비트 31 오버플로우 및 포화 : 가산기로부터 비트 31 내지 39 오버플로우 상태 비트, 그리고 가산 후에 비트 39 값을 이용하여, 7.31 결과의 올바른 부호가 결정될 수 있다. 그 후 포화 로직은 최대 양수 1.31(Ox007FFFFFFF) 또는 최소 음수 1.31 값(OxFF80000000)을 타겟 누산기로 로딩한다. SA 또는 SB 비트는 세팅되고, 사용자에 의해 클리어될 때까지 세팅을 유지한다. 이 포화 모드가 효력이 있는 경우, 가드 비트들은 사용되지 않는다(그래서 OA, OB 또는 OAB 비트들은 결코 세팅되지 않음).2. Bit 31 overflow and saturation: Using bits 31 to 39 overflow status bits from the adder, and the bit 39 value after addition, the correct sign of the 7.31 result can be determined. The saturation logic then loads the maximum positive 1.31 (Ox007FFFFFFF) or the minimum negative 1.31 value (OxFF80000000) into the target accumulator. The SA or SB bit is set and holds the setting until cleared by the user. If this saturation mode is in effect, the guard bits are not used (so the OA, OB or OAB bits are never set).

3. 비트 39 대변동 오버플로우 : 가산기로부터 비트 39 오버플로우 상태 비트는, 사용자에 의해 클리어될 때까지 세팅을 유지하는 SA 또는 SB 비트를 세팅하기 위해 이용된다. 어떤 포화 연산도 수행되지 않으며, 누산기가 (그것의 부호를 제거하는) 오버플로잉하게 한다. INTCON1 레지스터내의 COVTE 비트가 세팅되면, 대변동 오버플로우는 트랩 예외(trap execption)를 초기화할 수 있다.3. Bit 39 Large Fluctuation Overflow: The Bit 39 overflow status bit from the adder is used to set the SA or SB bit to hold the setting until cleared by the user. No saturation operation is performed, causing the accumulator to overflow (removing its sign). If the COVTE bit in the INTCON1 register is set, a catastrophic overflow can initiate a trap execption.

AccA에 대하여, 모든 가산기/감산기 모드들을 위한 포화 및 오버플로우 연산이 도 14에 요약되어 있다(동일 로직이 AccB에 적용됨). 또한, 몇 개의 예들이 도 15에 도시되어 있다. 일실시예에 따르면, 포화 연산은 OprB - OprA이다. OV39를 위한 불린 방정식은 다음과 같다.For AccA, saturation and overflow operations for all adder / subtractor modes are summarized in FIG. 14 (same logic is applied to AccB). Also, some examples are shown in FIG. 15. According to one embodiment, the saturation operation is OprB-OprA. The Boolean equation for OV39 is

OV 39(가산 연산들을 위해) = (OprA<39>&&OprB<39>&&AccA<39>)？((OprA<39>&&OprB<39>&&AccA<39>) OV 39 (for addition operations) = (OprA <39> && OprB <39> && AccA <39>)? ((OprA <39> && OprB <39> && AccA <39>)

OV 39(감산 연산들을 위해) = (OprB<39>&&OprA<39>&&Result<39>)？((OprA<39>&&OprB<39>&&Result<39>) OV 39 (for subtraction operations) = (OprB <39> && OprA <39> && Result <39>)？ ((OprA <39> && OprB <39> && Result <39>)

누산기Accumulator '기록-백'('Write-Back')( 'Write-Back' ( AWBAWB ))

명령어의 MAC 계층의 일부는 (예외들은 MPY, MPYN, ED, EDAC, SQR 및 SQWC) 선택적으로 누산기의 라운딩된 버전(명령어에 의해 타겟팅되지 않음)을 데이터 공간 메모리로 기록할 수 있다. 기록은 X-버스를 가로질러 결합된 X 및 Y 어드레스 공간으로 수행된다. 제한된 명령어 디코딩 공간은 어드레싱 모드 옵션들을 제한하고, 데이터가 항상 라운딩되고 스케일링되지 않도록 덧셈 부호를 강제한다. 하지만, 이러한 특징이 특히 FFT 및 LMS 알고리즘에서 유리하다는 것이 발견되었다.Part of the MAC layer of the instruction (exceptions are MPY, MPYN, ED, EDAC, SQR and SQWC) may optionally write a rounded version of the accumulator (not targeted by the instruction) to the data space memory. Writing is performed with the X and Y address spaces combined across the X-bus. Limited instruction decoding space limits the addressing mode options and forces an addition sign so that data is not always rounded and scaled. However, it has been found that this feature is particularly advantageous in FFT and LMS algorithms.

다음 어드레싱 모드들은 지지된다. The following addressing modes are supported.

1. W13, 레지스터 직접: 비-타겟 누산기의 기초 콘텐츠는 1.15 분수로서 W13에 기록된다.1. W13, Register Direct: The base content of a non-target accumulator is written to W13 as a 1.15 fraction.

2. [W13++], 나중 증가를 갖는 레지스터 간접: 비-타겟 누산기의 라운딩된 콘텐츠는 1.15 분수에 의해 포인팅된 어드레스로 기록된다. 그 후 W13는 (워드 기록을 위해) 2씩 증가된다.2. [W13 ++], register indirection with later increment: The rounded content of the non-target accumulator is written to the address pointed to by the 1.15 fraction. W13 is then increased by two (for word writing).

일실시예에 따르면, AWB 연산은 소스 누산기의 콘텐츠를 변경하지 않으며, AWB 연산은 OA/OB 또는 SA/SB를 업데이팅하지 않는다(심지어 결과 데이터가 오버플로우 또는 포화된 경우라도)According to one embodiment, the AWB operation does not change the contents of the source accumulator, and the AWB operation does not update the OA / OB or SA / SB (even if the resulting data overflows or saturates).

라운드 로직Round logic

라운드 로직은 누산기 기록(저장) 동안에 종래의 (바이어스된) 또는 수렴의(convergent) (언바이어스된) 라운드 기능을 수행하는 결합 블록이다. 라운드 모드는 CORCON 레지스터(180)의 RND 비트의 상태에 의해 결정된다. 도 16에 도시된 바와 같이, 라운드 로직은 데이터 공간 기록 포화 로직으로 통과되는 16비트 1.15 데이터 값을 발생한다. 라운딩이 명령어에 의해 표시되지 않는다면, 불완전한 1.15 데이터 값이 저장된다.Round logic is a combining block that performs conventional (biased) or convergent (unbiased) round functions during accumulator write (store). The round mode is determined by the state of the RND bit in the CORCON register 180. As shown in Figure 16, the round logic generates a 16-bit 1.15 data value that is passed to the data space write saturation logic. If rounding is not indicated by the instruction, an incomplete 1.15 data value is stored.

일실시예에 따르면, 라운딩 기능은 오직 16비트 가산기를 필요로 한다. MCU ALU는 ED/EDAC 외에 모든 명령어들을 이용할 수 있고, 따라서 영역을 절약하는 라운딩 가산을 수행할 수 있다. 이것은 밀접한 DSP 엔진 등에 의존할 수 있다.According to one embodiment, the rounding function only requires a 16-bit adder. The MCU ALU can use all instructions besides ED / EDAC, thus performing rounding additions that save area. This may depend on a close DSP engine or the like.

도 18에 2개의 라운딩 모드들이 도시되어 있다. 종래의 라운딩은 누산기의 비트 15를 취하고, 그것을 제로 확장하며, 그것에 가드 또는 오버플로우 비트들(비트들 16 내지 31)을 배제한 MS 워드에 더한다. 가산기의 LS 워드가 Ox8000과 OxFFFF 사이이면, MS 워드는 증가한다. 가산기의 LS 워드가 Ox8000과 Ox7FFF 사이이면, MS 워드는 변경되지 않고 유지된다. 이 알고리즘의 결과는 랜덤 라운딩 연산들의 연속에 걸쳐, 그 값은 약간 양수로 바이어스될 것이다.Two rounding modes are shown in FIG. Conventional rounding takes bit 15 of the accumulator, zero extends it, and adds it to the MS word excluding the guard or overflow bits (bits 16 to 31). If the LS word of the adder is between Ox8000 and OxFFFF, the MS word is incremented. If the LS word of the adder is between Ox8000 and Ox7FFF, the MS word remains unchanged. The result of this algorithm will be that the value will be biased slightly positive over a series of random rounding operations.

수렴의 (또는 언바이어스된) 라운딩은 LS 워드가 Ox8000와 동일한 경우를 제외하고는 종래의 라운딩와 같은 동일한 방식으로 동작한다. 동일한 경우이면, MS 워드의 LS 비트(가산기의 비트 16)가 검사된다. LS 비트가 1이면, MS 워드는 증가되고, LS 비트가 0이면, MS 워드는 변경되지 않는다. 사실상 비트 16이 실질적으로 랜덤하다고 가정하면, 그 후 이 방식은 축적할 수 있는 어떤 라운딩 바이어스를 제거할 것이다. Convergent (or unbiased) rounding operates in the same manner as conventional rounding, except that the LS word is equal to Ox8000. If so, the LS bit (bit 16 of the adder) of the MS word is checked. If the LS bit is 1, the MS word is incremented. If the LS bit is 0, the MS word is not changed. In fact, assuming bit 16 is substantially random, then this approach will eliminate any rounding bias that may accumulate.

SAC 및 SACR 명령어는 (데이터 포화에 종속된) X-버스를 통해 데이터 메모리에 타겟 누산기의 콘텐츠의 트런케이티드(SAC) 또는 라운디드(SACR) 버전을 저장한다.The SAC and SACR instructions store a truncated (SAC) or rounded (SACR) version of the contents of the target accumulator in data memory via an X-bus (subject to data saturation).

일실시예에 따르면, 명령어들의 MAC 계층 및 누산기 기록 동작은 X-버스를 통해 결합된 MCU(X 및 Y) 데이터 공간을 어드레싱하는 동일한 방식으로 기능할 것이다(즉, X 및 Y 데이터 공간들은 오직 사이클 Q1, Q2의 데이터 독출 부분 동안 분리됨). 명령어의 이 계층을 위하여, 데이터는 항상 라운딩에 종속된다(RND 비트에 의해 결정된 모드).According to one embodiment, the MAC layer and accumulator write operation of the instructions will function in the same way of addressing the combined MCU (X and Y) data spaces over the X-bus (ie, the X and Y data spaces only cycle). Separated during the data read portion of Q1, Q2). For this layer of instruction, data is always subject to rounding (mode determined by the RND bit).

데이터 공간 기록 포화Data Space Record Saturation

DSP AU 포화에 더하여, 소스 누산기의 콘텐츠의 영향이 없이도, 데이터 공간으로의 기록들은 역시 포화될 것이다. 데이터 공간 기록 포화 로직 블록은 소스 누산기로부터 라운드 가산기로 16비트 1.15 분수값을 라운딩한다. 소스 누산기의 남아있는 MS 비트들은 누산기의 오버플로우 상태를 발생하는데 이용된다. 도 16 및 17에 도시된 바와 같이, 이들은 결합되고, 데이터 공간 메모리로 기록하기 위한 출력으로 적당한 1.15 분수 값을 선택하는데 이용된다. In addition to DSP AU saturation, records into the data space will also be saturated, without affecting the content of the source accumulator. The data space write saturation logic block rounds the 16-bit 1.15 fractional value from the source accumulator to the round adder. The remaining MS bits of the source accumulator are used to generate an overflow condition of the accumulator. As shown in Figures 16 and 17, they are combined and used to select the appropriate 1.15 fractional value as the output for writing to the data space memory.

일실시예에 따르면, 오버플로우 로직은 DSP AU 오버플로우 로직으로부터 독립적이다. 그 결과, 데이터가 누산기에 어떻게 위치되는지와 관계없이(즉, DSP AU 또는 SFR 기록을 통해), 소스 누산기의 콘텐츠는 항상 정확하게 포화될 것이다.According to one embodiment, the overflow logic is independent of the DSP AU overflow logic. As a result, regardless of how the data is located in the accumulator (ie, via the DSP AU or SFR record), the content of the source accumulator will always be saturated exactly.

CORCON 레지스터(180)의 SATDW 비트는 세팅되고(디폴트 상태), (라운딩 또는 트런케이션 후의) 데이터는 오버플로우를 위해 검사되고, 그에 따라 조정된다. Ox007FFF 보다 큰 입력 데이터를 위해, 메모리에 기록된 데이터는 최대 양수 1.15 값, Ox007FFF로 강요받는다. OxFF8000보다 작은 입력 데이터를 위해, 메모리에 기록된 데이터는 최소 음수 1.15 값, OxFF8000를 강요받는다. 소스의 MS 비트(비트 39)는 검사될 오퍼랜드의 부호를 결정하기 위하여 이용된다.The SATDW bit in the CORCON register 180 is set (default), and the data (after rounding or truncation) is checked for overflow and adjusted accordingly. For input data larger than Ox007FFF, the data written to memory is forced to a maximum positive 1.15 value, Ox007FFF. For input data smaller than OxFF8000, the data written to memory is forced to a minimum negative 1.15 value, OxFF8000. The MS bit (bit 39) of the source is used to determine the sign of the operand to be checked.

비트 15로부터 라운드 가산기의 비트 16으로의 올림인 경우, 라운딩 오버플로우(OV)가 정의될 수 있다.In the case of rounding from bit 15 to bit 16 of the round adder, a rounding overflow (OV) can be defined.

여기에 이용된 라운딩 방식이 의미상 단일 방향(즉, 1 또는 0을 가산함)이기 때문에, 허용될 수 있는 유일한 오버플로우는 Ox7FFF로부터 Ox8000로이다. 그 결과, OxFFFF의 라운드-업으로부터 검출될 수 있는 오버플로우는 방지해야 한다. 도 16에 도시된 제안 구현은 어떤 음수(프리-라운드) 값이, 만일 원래의 가드 비트들(ACC<39>)이 %1111 1111 1과 동등하지 아니한 경우 이외에는 Ox8000에 데이터 공간 포화를 초래하지 않을 것이다. Since the rounding scheme used here is semantically unidirectional (ie, adding 1 or 0), the only overflow that can be tolerated is Ox7FFF to Ox8000. As a result, overflow that can be detected from the round-up of OxFFFF should be prevented. The proposed implementation shown in FIG. 16 does not cause any negative (pre-round) value to cause data space saturation in the Ox8000 except if the original guard bits (ACC <39>) are not equal to% 1111 1111 1. will be.

CORCON 레지스터(180)의 SATDW 비트가 세팅되지 않으면, 입력 데이터는 항상 모든 조건하에서, 변경 없이 통과된다. DSP 엔진으로부터 데이터 공간으로의 모든 데이터 기록은 선택적으로 포화될 수 있다.If the SATDW bit in the CORCON register 180 is not set, then the input data is always passed unchanged under all conditions. All data records from the DSP engine into the data space can be selectively saturated.

전력 관리(Power Conservation) Power Conservation

상술한 바와 같이, DSP 엔진(100)은 오직 누산기 레지스터들만이 클록킹되는 근본적으로 하나의 큰 비동기 블록의 로직일 수 있다. 그 결과, 잠재적으로 필요한 전력보다 더 많은 전력이 소비될 필요가 없다면, 결국 데이터에 대하여 공급되어 끝을 맺을 수 있는 다수의 경로가 있다. 이 포인트를 강조하기 위해 블록도 상에 몇 개의 제안된 데이터 경로 블록을 나타내지만, 이들 요소의 마지막 위치는 마지막 설계를 위해 선택된 구조적인 구현에 의존할 것이다.As discussed above, DSP engine 100 may be the logic of one large asynchronous block that is essentially one in which only accumulator registers are clocked. As a result, if there is no need to consume more power than is potentially needed, there are a number of paths that can eventually be supplied and terminated for data . Although several suggested data path blocks are shown on the block diagram to highlight this point, the last position of these elements will depend on the structural implementation chosen for the final design.

DSP 엔진 DSP engine 모드mode 선택 Selection

DSP 엔진은 CPU 코어 구성 레지스터 CORCON(180) 또는 인터럽트 구성 레지스터 INTCON1를 통해 선택 가능한 다양한 모드를 갖는다. The DSP engine has various modes selectable via the CPU core configuration register CORCON 180 or the interrupt configuration register INTCON1.

동작 모드들 :Modes of operation:

1. 분수 또는 정수1. a fraction or integer

2. 사인드 또는 언사인드2. Signed or Unsigned

3. 종래 또는 수렴 라운딩3. Conventional or Converging Rounding

4. AccA를 위한 자동 포화 온/오프4. Auto Saturation On / Off for AccA

5. AccB를 위한 자동 포화 온/오프 5. Auto Saturation On / Off for AccB

6. 데이터 메모리로의 기록을 위한 자동 포화 온/오프6. Auto Saturation On / Off for Writing to Data Memory

7. 누산기 포화 모드 선택7. Accumulator Saturation Mode Selection

8. AccA의 오버플로우 온/오프 상의 트랩8.Trap on overflow on / off of AccA

9. AccB의 오버플로우 온/오프 상의 트랩9. Trap on overflow on / off of AccB

10. AccA 및/또는 AccB의 대변동 오버플로우 온/오프 상의 트랩10. Trap on catastrophic overflow on / off of AccA and / or AccB

DSP 명령어DSP instruction

명령어에는 3개의 넓은 계층들이 있다.There are three broad layers of instructions.

1. 스케일링을 갖으나 (내재하는) 오퍼랜드 없음1. Has scaling but no (inherent) operand

2. 스케일링을 갖는 단일 오퍼랜드2. Single operand with scaling

3. 스케일링이 없는 이중 오퍼랜드3. Dual operands without scaling

DSP 수행을 증가시키는 명령어에 더하여, 다음의 하드웨어 특징들이 포함될 수 있다.In addition to instructions that increase DSP performance, the following hardware features may be included.

a. 'REPEAT n' 명령어는 다음 명령어를 명령어 레지스터로 고정시키고, 그 후 그것을 'n'번 실행한다.a. The 'REPEAT n' instruction holds the next instruction in the instruction register and then executes it 'n' times.

b. 네스팅 지지를 위한 가시적인 제어 레지스터들을 갖는 네스티드(nested) 'DO' 루프 하드웨어 프로그램 루프 제어 b. Nested 'DO' loop hardware program loop control with visible control registers for nesting support

c. LS 또는 MS 비트로부터 시작하여 제1 비트 세팅 또는 클리어링을 결정하는 "Find First" 명령어들 c. "Find First" instructions that determine the first bit setting or clearing starting from the LS or MS bit

d. 어떤 워킹 (어드레스) 레지스터들과 연관된 모듈러(modulo) 어드레싱 모드d. Modulo addressing mode associated with some working (address) registers

e. X 데이터 공간 기록만을 위한 비트 리버스(reverse) 어드레싱 모드e. Bit reverse addressing mode for X data space writes only

"Find First" 명령어들 "Find First" commands

'Find First' 명령어들에는 3개의 변형이 있다.There are three variants of the 'Find First' commands.

1. FF1L : Find the First occurrence of 1 starting from Left. 이 명령어는 RTOS 직무 관리 및 다른 비트 폴링 응용들에 유용할 수 있다. 1.FF1L: Find the First occurrence of 1 starting from Left. This command may be useful for RTOS task management and other bit polling applications.

2. FF1R : Find the First occurrence of 1 starting from Right. 이 명령어는 RTOS 직무 관리 및 다른 비트 폴링 응용들에 유용할 수 있다. 2.FF1R: Find the First occurrence of 1 starting from Right. This command may be useful for RTOS task management and other bit polling applications.

3. FBCL : Find the first occurrence of Complement of the MS-bit (sign) starting from Left. 이 명령어는 데이터 정규화에 유용하다.FBCL: Find the first occurrence of Complement of the MS-bit (sign) starting from Left. This command is useful for data normalization.

모든 명령어는 유사한 방식으로 동작한다. FFIR를 위한 예가 도 9에 도시되어 있다.All commands work in a similar way. An example for FFIR is shown in FIG. 9.

의사 명령어들Pseudo-instructions

마이크로컨트롤러 구조의 일실시예에 따르면, AccA 및 AccB를 포함하는 모든 레지스터들은, 레지스터 파일 어드레스 공간으로 매핑될 수 있다. 이것은 직관적으로 명백하지 않을 수 있는 이중 오퍼랜드 DSP 명령어들에 대하여 유연성의 정도를 상승시킨다. 예를 들면, MAC 연산은 다음 MAC 연산을 위한 오퍼랜드로서 현재(이전 MAC) 누산기 콘텐츠를 프리페치할 수 있다.According to one embodiment of the microcontroller architecture, all the registers, including AccA and AccB, can be mapped into the register file address space. This increases the degree of flexibility for dual operand DSP instructions that may not be intuitively apparent. For example, a MAC operation may prefetch current (previous MAC) accumulator content as an operand for the next MAC operation.

일실시예에 따르면, 워드 또는 홀수 바이트 독출들 동안에, 8비트 ACCAH 및 ACCBH 레지스터들(AccA<39:32> & AccB<39:32>)은 자동적으로 16비트들로 부호 확장된다(MS 바이트 독출은 부호 확장 바이트를 돌려줄 것임).
According to one embodiment, during word or odd byte reads, the 8 bit ACCAH and ACCBH registers (AccA <39:32>& AccB <39:32>) are automatically sign extended to 16 bits (MS byte reads). Will return the sign extension byte).

Claims

At least one multiplier unit that can be controlled to operate in a signed, unsigned, or mixed sign mode;
A multiplier unit mode decoder coupled to the multiplier unit and receiving position information of a first operand and position information of a second operand;
The multiplier unit mode decoder controls the multiplier unit to operate in a signed mode, an unsigned mode, or a combined sign / unsigned mode when in the mixed code mode according to the position information. Processor.

The method of claim 1,
And wherein said multiplier unit comprises an n-bit multiplier controllable to perform signed, unsigned, or mixed sign multiplication on two input operands.

The method of claim 1,
The multiplier unit,
A multiplier data preprocessor coupled to the multiplier unit for independently sign expanding or zero expanding two input operands;
And a signed multiplier.

The method of claim 3,
And said signed multiplier is an n + 1 bit multiplier.

The method of claim 1,
And a control register for selecting the signed mode, the unsigned mode, and the mixed sign mode for performing automatic selection of signed, unsigned, or combined signed / unsigned multiplication. Processor.

The method of claim 1,
Wherein the position information includes information as to whether a register in the plurality of working registers is an odd register or an even register.

The method of claim 1,
The first operand and the second operand are supplied by the data memory,
And the position information includes information on whether the address of the memory is an odd address or an even address.

The method of claim 2,
The first operand is selected from the first set of two consecutive registers,
And the second operand is selected from a second set of two consecutive registers.

The method of claim 1,
And a barrel shifter having a size to accommodate at least the magnitude of the result produced by the multiplier unit.

10. The method of claim 9,
Further comprising at least one accumulator and an adder coupled to the barrel shifter,
The multiplier unit, the accumulator and the barrel shifter are part of a digital signal processing (DSP) engine.

The method of claim 10,
A result expansion unit coupled to the multiplier unit and the barrel shifter;
And zero-backfill coupled to the result expansion unit.

The method of claim 10,
And round logic coupled to the accumulator.

The method of claim 10,
The DSP engine is a 16-bit DSP engine with a plurality of 16-bit registers,
And the barrel shifter and the accumulator each comprise 40 bits.

The method of claim 10,
Further comprising a microcontroller unit,
At least the multiplier unit is shared by the microcontroller unit and the DSP engine to execute arithmetic microcontroller instructions.

The method of claim 3,
In signed mode, the multiplier data preprocessor sign-extends all inputs,
In unsigned mode, the multiplier data preprocessor extends all inputs to zero,
In mixed sign mode, the multiplier mode decoder instructs the multiplier data preprocessor to sign extend the input if the source is an odd register number or an odd memory address and zero extend the input if the source is an even register number or an even memory address. Processor characterized in that.

In the multiplication method in the processor,
Providing a first n bit operand from a first position to a multiplier unit that can be controlled to operate in signed, unsigned, or combined signed / unsigned mode;
Providing a second operand from a second position to the multiplier unit;
Decode the position relative to the first operand and the second operand and control the multiplier unit to operate in a mixed mode where signed, unsigned, or combined signed / unsigned multiplication is performed according to the positions. And performing a multiplication in the processor.

The method of claim 16,
The first operand and the second operand are stored in registers,
Wherein the position comprises whether the register in the plurality of working registers is an odd register or an even register.

The method of claim 16,
The first operand and the second operand are provided by a data memory,
And the position comprises whether the address of the memory is an odd address or an even address.

The method of claim 17,
The first operand is selected from the first set of two consecutive registers,
And wherein the second operand is selected from a second set of two consecutive registers.

The method of claim 16,
And a control register determines whether the multiplier unit will operate in the signed, unsigned, or mixed mode.

The method of claim 20,
In signed mode, the first operand and the second operand are signed extended,
In the unsigned mode, the first operand and the second operand are expanded to zero,
In mixed mode, the first operand and the second operand are signed extended if the operand is supplied by an odd register number or odd memory address, and zero expanded when the operand is supplied by an even register number or even memory address. A method of performing multiplication in a processor.

A method for performing 2n bit multiplication using 4 n bit data words,
Storing a first operand for 2n bit multiplication in two consecutive registers or a first set of two consecutive memory locations;
Storing a second operand for 2n bit multiplication in a second set of two consecutive registers or two consecutive memory addresses;
Performing a first multiplication by a controllable multiplier unit using the first set of first registers or memory addresses and the second set of first registers or memory addresses, and shifting an associated first result;
Performing a second multiplication by a controllable multiplier unit using the first set of registers or memory addresses and the second set of registers or memory addresses to produce an associated second result;
Performing a third multiplication by a controllable multiplier unit using the first set of second registers or memory addresses and the first set of second registers or memory addresses to produce an associated third result;
And adding the first result, the second result and the third result to produce a final result, and storing the final result in registers or memory.
For each multiplication, the multiplier unit is automatically controlled according to the position of the register or memory address to operate in a signed, unsigned, or combined sign / unsigned mode. To perform 2n bit multiplication.

The method of claim 22,
And wherein the position comprises whether the register in the plurality of working registers is an odd register or an even register.

The method of claim 22,
Wherein the position comprises whether the address of the memory is an odd address or an even address.

The method of claim 22,
And wherein the control register determines which of the signed, unsigned, and mixed sign modes the multiplier unit will operate in. 4n bit data words.

The method of claim 25,
In signed mode, all inputs to the multiplier unit are signed extended,
In mixed sign mode, the input to the multiplier unit is signed extended if its input is provided by an odd register number or odd memory address, and zero expanded if its input is provided by an even register number or even memory address. A 2n bit multiplication method using 4 n bit data words, characterized by the above-mentioned.

The method of claim 22,
The second result and the third result are shifted,
Performing a fourth multiplication by the controllable multiplier unit using the second set or memory address of the first set and the second register or memory address of the second set to produce an associated fourth result. More,
And the fourth result is added to the first result, the second result, and the third result to generate the final result.

The method of claim 27,
And wherein the control register determines which of the signed, unsigned, and mixed sign modes the multiplier unit is to operate in. 4 n bit data words.

The method of claim 28,
The multiplier unit comprises a signed multiplier,
In signed mode, all inputs to the multiplier unit are signed extended,
In the unsigned mode, all inputs to the multiplier unit are zero extended,
In mixed sign mode, the input to the multiplier unit is signed extended if its input is provided by an odd register number or odd memory address, and zero expanded if its input is provided by an even register number or even memory address. A 2n bit multiplication method using 4 n bit data words, characterized by the above-mentioned.