WO1996029646A1 - Processeur - Google Patents
Processeur Download PDFInfo
- Publication number
- WO1996029646A1 WO1996029646A1 PCT/JP1996/000673 JP9600673W WO9629646A1 WO 1996029646 A1 WO1996029646 A1 WO 1996029646A1 JP 9600673 W JP9600673 W JP 9600673W WO 9629646 A1 WO9629646 A1 WO 9629646A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- instruction
- field
- instruction code
- information
- units
- Prior art date
Links
- 238000012545 processing Methods 0.000 claims abstract description 45
- 230000006870 function Effects 0.000 claims description 45
- 238000004458 analytical method Methods 0.000 claims description 41
- 238000000034 method Methods 0.000 claims description 16
- 238000012546 transfer Methods 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000010009 beating Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 43
- 230000006837 decompression Effects 0.000 description 26
- 230000009977 dual effect Effects 0.000 description 20
- 230000006835 compression Effects 0.000 description 16
- 238000007906 compression Methods 0.000 description 16
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 description 13
- 238000011161 development Methods 0.000 description 10
- 230000007704 transition Effects 0.000 description 9
- 230000008901 benefit Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 238000007562 laser obscuration time method Methods 0.000 description 5
- 230000007423 decrease Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000000717 retained effect Effects 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 230000015556 catabolic process Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 102100029091 Exportin-2 Human genes 0.000 description 1
- 101710147878 Exportin-2 Proteins 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000011010 flushing procedure Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000002250 progressing effect Effects 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3889—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
- G06F9/3891—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute organised in groups of units sharing resources, e.g. clusters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30036—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/30105—Register structure
- G06F9/30112—Register structure comprising data of variable length
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30145—Instruction analysis, e.g. decoding, instruction word fields
- G06F9/3016—Decoding the operand specifier, e.g. specifier format
- G06F9/30167—Decoding the operand specifier, e.g. specifier format of immediate specifier, e.g. constants
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/3017—Runtime instruction translation, e.g. macros
- G06F9/30178—Runtime instruction translation, e.g. macros of compressed or encrypted instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30181—Instruction operation extension or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30181—Instruction operation extension or modification
- G06F9/30189—Instruction operation extension or modification according to execution mode, e.g. mode flag
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/32—Address formation of the next instruction, e.g. by incrementing the instruction counter
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3816—Instruction alignment, e.g. cache line crossing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3818—Decoding for concurrent execution
- G06F9/3822—Parallel decoding, e.g. parallel decode units
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3853—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution of compound instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3887—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
Definitions
- the present invention relates to a processor suitable for multimedia processing such as digital moving images and three-dimensional graphics, and more particularly to a processor that realizes processing with high parallelism with a small code size.
- multimedia support has been progressing, especially in non-sound computers and workstations.
- the functions required for multimedia support are mainly motion-plane compression / expansion, audio compression / expansion, 3D graphics, and various recognition processes.
- DSP Digital al. Signal Processor
- OMOPS Organic Multimedia Processing
- video decompression requires about 2G OPS
- video compression requires about 50G0PS.
- Techniques for this include increasing the operating frequency and parallelizing the arithmetic processing.
- methods for utilizing multiple arithmetic units include superscalar and VLIW (Very Long Instruction Word).
- the former is a technique mainly used by general-purpose processors, in which the processor performs scheduling for simultaneously executing multiple operations.
- this method has the advantage of ensuring object compatibility with existing single-processing processors, its hardware is extremely complicated because the scheduling is performed dynamically by the processor. .
- VLIW has a problem in securing compatibility with existing processors, but does not require an instruction decoding circuit and has the advantage of simplifying its hardware.
- the instruction format is composed of fields that directly control the arithmetic unit, making the control by hardware extremely simple.
- a processor having such an instruction format there is Japanese Patent Application Laid-Open No. 63-978333 "Operation circuit control method".
- an operation field indicating that the arithmetic microphone instruction is an arithmetic instruction and a plurality of control bits for controlling the arithmetic circuit are provided, and each of the plurality of control bits is used to directly control each part of the arithmetic circuit. Control is being performed.
- VLIW can realize parallel processing with relatively simple hardware.
- An object of the present invention is to provide a processor having an architecture that can reduce the code size while improving the parallelism of processing for improving performance in order to solve the above-mentioned problems.
- Another object of the present invention is to provide a processor capable of executing many operations with a small number of instruction codes.
- Another object of the present invention is to provide a VLIW-type processor which is premised on static scheduling.
- Another object of the present invention is to provide a VLIW-type processor capable of coping with various applications and increasing the operation rate of each arithmetic unit.
- Another object of the present invention is to provide a processor suitable for multimedia processing that is effective for reducing the instruction code amount of a parallel processor that repeatedly executes the same type of operation such as multimedia processing.
- Honkiaki is a super scalar type mold that is effective in reducing code size. To provide a mouth sensor.
- Another object of the present invention is to provide a processor architecture capable of suppressing the number of development steps while increasing the degree of parallelism of processing. Disclosure of the invention
- the present invention is directed to multimedia processing, in which a plurality of arithmetic units of the same type are often simultaneously executed, and a plurality of arithmetic units can be controlled by one instruction.
- Such mode information is provided in the instruction format.
- a VLIW processor that has multiple arithmetic units and performs multiple operations with one instruction
- a VLIW processor that configures one instruction with multiple fields that control each arithmetic unit requires multiple arithmetic units in one field. Mode information to enable control is provided.
- an instruction decompression circuit for generating a plurality of fields from one field in one instruction is provided, and a plurality of arithmetic units having the same function are arranged to configure the plurality of arithmetic units.
- mode information for simultaneously controlling multiple arithmetic units is provided in one instruction.
- an instruction decompression circuit that generates a plurality of instructions from one instruction is provided, and a plurality of arithmetic units having the same function are arranged so that the generated plurality of instructions can be executed simultaneously.
- designation information for designating arithmetic units to be executed simultaneously is provided so that only the required number of arithmetic units can be used. Provided the required number of instruction fields, and the super scalar processor provided with the instruction decompression circuit to generate instructions.
- the present invention provides an arithmetic unit for simultaneously executing a plurality of operations of the same type, and an operand mainly supplied to the arithmetic unit.
- a plurality of operation units are provided, each of which consists of an integer operation unit that reads data from a memory and a register file that stores operands used by the above two types of operation units.
- the present invention provides a memory for storing an instruction code, an instruction code holding means for holding a plurality of instruction codes read from the memory, and a plurality of instruction codes held in the instruction code holding means.
- a processor having a plurality of operation units operable in parallel according to a command, the instruction code stored in the memory containing designation information for instructing execution of the operation in the plurality of operation units.
- Analyzing means for analyzing the specification information to determine a plurality of operation units specified by the instruction code and inputting the instruction code to the plurality of operation units specified;
- a plurality of computations in the plurality of computation units can be controlled by a computer.
- the present invention provides a memory for storing instruction codes, an instruction code holding means for holding a plurality of instruction codes read from the memory, and a plurality of instruction codes held in the instruction code holding means. Therefore, a processor having a plurality of operation units operable in parallel, and having instruction information stored in the memory having designation information instructing execution of the operation in the plurality of operation units. Analyzing means for analyzing the specification information to determine a plurality of operation units specified by the instruction code and inputting the instruction code to the plurality of operation units specified; and providing the plurality of operation units A processor which is configured to execute an operation corresponding to a plurality of instructions with a single instruction code in the plurality of operation units.
- the present invention provides a memory for storing instruction codes, an instruction code holding means for holding a plurality of instruction codes read from the memory, and a plurality of instruction codes held in the instruction code holding means. Therefore, operation can be performed in parallel
- a processor having a plurality of operation units, in which an execution mode is specified as specification information in addition to an operation code indicating an operation type and an operand in an instruction code stored in the memory.
- An analysis means which has a field, analyzes the field, and inputs at least an operation code and an operand to a plurality of operation units by analyzing the field and executing the execution mode is provided.
- a processor characterized in that it is configured to be executable in the plurality of operation units.
- the present invention provides a memory for storing an instruction code, an instruction code holding means for holding a plurality of instruction codes read from the memory, and a parallel processing in accordance with the plurality of instruction codes held in the instruction code holding means.
- a processor having a plurality of operation units capable of performing an operation at the same time.
- the instruction code stored in the memory includes an operation mode indicating an operation type, an execution mode as designated information in addition to an operand. It has a field to be specified and an operation unit specification field to specify the operation unit, and analyzes the field to execute the above execution mode. And an analysis means for inputting the data into the calculation unit specified in the calculation unit specification field, and A processor, characterized in that the executable to form the same type of operation.
- each of the arithmetic units has a unique register file.
- each operation unit has a unique register file, and an operand field specifies a register in a register file unique to each operation unit, so that operation data is stored in each of the operation units. It is characterized by being different in the operation unit. Further, the present invention is characterized in that, in the above processor, each of the arithmetic units has a common register file.
- the present invention also provides the above processor, wherein each of the operation units has a common register file, has an operand field for designating a register number from the register file, and designates an operation to be designated as a value of the operand field By adding a unique offset value to the unit, each operation unit can use a different register, and can be operated with different operation data.
- the present invention is also a processor comprising: a memory for storing an instruction code; an instruction code holding unit for holding an instruction code read from the memory; and a plurality of operation units.
- the code consists of a plurality of fields corresponding to the number of operation units described above.
- control information indicating that a plurality of operation units are controlled and each field are included. It has field information for designating a corresponding operation unit.
- the field information and the control information are analyzed to specify an operation unit controlled by the field, and the field is specified for the specified operation unit.
- An analysis means for inputting a command, and one field in the above-mentioned instruction code controls a plurality of operation units so that the number of fields is smaller than the above-mentioned operation number.
- a processor wherein a plurality calculating a short instruction code configured in de number is executable configured.
- the present invention is also a processor comprising: a memory for storing an instruction code; an instruction code holding means for holding the instruction code read from the memory; and a plurality of operation units.
- Each field consists of a plurality of fields corresponding to the number of operation units.
- Control information indicating that any one field in the instruction code controls a plurality of operation units and the instruction code described above.
- Header information indicating the number of fields present in the memory is stored in the memory, the header information and the control information are analyzed, and an operation unit controlled by the field is specified.
- An analysis means for inputting the above fields is provided, and one field in the instruction code controls a plurality of operation units, and the short instruction code composed of a small number of fields using the header information is used.
- a processor configured to execute a plurality of operations.
- the present invention provides a memory for storing an instruction code, an instruction code holding means for holding an instruction code read from the memory, and information controlled by the information stored in the instruction code holding means.
- a processor comprising at least one operation unit and a plurality of operation units each including a register file for storing operand information of the operation unit, wherein the instruction code is a plurality of operation units corresponding to the number of operation units.
- a single instruction code can be used to operate multiple operation units, and at least one operation unit having the same function is provided in all the operation units.
- the present invention provides a memory for storing an instruction code, an instruction code holding means for holding the instruction code read from the memory, and a memory controlled by information stored in the instruction code holding means.
- a processor having a single arithmetic unit and a plurality of arithmetic units formed of a register file storing operand information of the arithmetic unit, wherein the instruction code is a plurality of fields corresponding to the number of arithmetic units.
- At least one computing unit having the same function is provided in all the operation units, and a bit width that cannot be specified by the register in the register file is provided in each operation unit.
- a special register for holding a wide data type is provided, and a data type with a bit width that can be specified by the register in the register file and the special register A processor, characterized in that it has configured to be capable of processing both paid data types.
- the present invention provides a memory for storing an instruction code having designation information for instructing execution of a plurality of operation units, and analyzing the designation information contained in the instruction code stored in the memory to determine the instruction code. Analyzing means for determining a plurality of operation units to be specified; instruction code holding means for storing instruction codes for specifying the plurality of operation units determined by the analysis means; and instruction code holding means.
- a processor comprising: a plurality of operation units each executing an operation in parallel according to a stored instruction code.
- the present invention also provides a memory for storing an instruction code having designation information for instructing execution of a plurality of operation units, and a simple analysis of the designation information contained in the instruction code stored in the memory.
- Analysis means for determining a plurality of operation units specified by the single instruction code so as to execute an operation corresponding to a plurality of instructions with one instruction code; and a plurality of operation units determined by the analysis means.
- Instruction code holding means for holding a single instruction code to be designated, and a plurality of operation units each executing an operation in parallel according to the single instruction code held in the instruction code holding means.
- a processor characterized by comprising:
- the present invention is characterized in that in the above processor, each of the plurality of operation units is configured to execute a different type of operation. Further, the present invention provides an operation code indicating an operation type, a memory storing an instruction code having a field for specifying an execution mode as specification information in addition to an operand, and an instruction code read from the memory.
- Analysis means for analyzing a field and inputting at least an operation command and an operand to a plurality of operation units in which the execution mode is valid, and a plurality of operation instructions input by the analysis means; Instruction code holding means for holding at least an operation code and an operand of instructions for which the execution mode to the unit is enabled; and at least the operation code held by the instruction code holding means. And a plurality of operation units for executing the same type of operation in parallel according to the operands. .
- an instruction code having an operation code indicating an operation type, an operand, and a field defining an execution mode as specification information and an operation unit designating field designating an operation unit is stored.
- the memory and the field read from the memory are analyzed, and at least the instruction and the operand for which the execution mode is valid are transferred to the operation unit specified by the operation unit specification field.
- Analysis means to be input and an instruction code holding at least an operation code and an operand of at least one instruction in which the execution mode to the operation unit specified by the operation unit specification field input by the analysis means is enabled.
- the instruction code according to the instruction code holding means and at least the operation code and the operand held by the instruction code holding means.
- a processor to Toku ⁇ further comprising a plurality of arithmetic Yuni' Bok which has been the number same type of operation and executable configured in parallel.
- each of the arithmetic units has a unique register file.
- each operation unit has a unique register file, and the operand field specifies a register in the register file unique to each operation unit, so that the operation data is stored in each of the operation units. It is characterized by being different in the operation unit. Further, the present invention is characterized in that, in the processor, each of the arithmetic units has a common register file.
- the present invention also provides the above processor, wherein each of the arithmetic units has a common register file, has an operand field for designating a register number from the register file, and By adding a unique offset value to the specified operation unit, each operation unit can use a different register and operate with different operation data.
- the feature is that it is configured to
- the present invention is composed of a plurality of fields corresponding to the number of operation units, and control information indicating that a plurality of operation units are controlled in each one of the fields corresponds to an arbitrary one of the fields.
- a memory for storing an instruction code having field information for designating an operation unit to be executed; an operation unit controlled by the field by analyzing the field information of the instruction code read from the memory and the control information; Analysis means for inputting the field to the specified operation unit; instruction code holding means for holding the field by the analysis means; and field held by the instruction code holding means.
- a plurality of operation units for executing parallel operations according to the above.
- One of the fields in the instruction code corresponds to the plurality of operation units.
- a processor characterized in that a plurality of operations can be executed by a short instruction code configured to control the unit and to have a smaller number of fields than the number of operations described above.
- the present invention also includes an instruction code including control information indicating that any one of the fields controls a plurality of operation units, the instruction code including a plurality of fields corresponding to the number of operation units.
- a memory for storing header information indicating the number of fields present in the field, and analyzing the header information and the control information read from the memory to identify an operation unit controlled by the field; Analysis means for inputting the above field to the calculated operation unit, instruction code holding means for holding the field input by the analysis means, and parallel operation according to the field held in the instruction code holding means.
- a plurality of operation units to be executed, and one field in the instruction code controls the plurality of operation units and
- a processor characterized by being capable of executing a plurality of operations with a short instruction code composed of a small number of fields using header information. You.
- the present invention provides the processor, wherein the analysis means includes instruction extension means for reading out the compressed instruction code from the memory and converting the instruction code into a directly executable extension instruction code.
- the parsing means comprises: an extended instruction code comprising a plurality of fields which can read and directly execute at least one field in one instruction code compressed from the memory. It is characterized in that it has an instruction extension means for converting to an instruction.
- the analyzing means may include an instruction buffer for latching the instruction code compressed from the memory, and header information indicating the number of fields existing in the instruction code. And a field controller that analyzes the fields, and sorts and includes the presence / absence of each field based on the field selection signal analyzed from the field controller and the signal indicating the presence / absence of the field. And a selected selector.
- the analyzing means may analyze an execution mode (S mode) of each field of the instruction code and SIMD to select a copy source field of each field. And a selector for copying the copy source field selected and determined by the SIMD controller and inputting it to each operation unit.
- S mode execution mode
- the present invention includes a memory configured to store instruction codes each including a plurality of fields corresponding to the number of operation units and configured to be able to operate a plurality of operation units with one of the fields, and read out from the memory.
- An instruction code holding means for holding the instruction code, an arithmetic unit having at least one identical function controlled by the information held in the instruction code holding means, and operation information of the arithmetic unit;
- the operation unit consisting of the register file to be stored is duplicated.
- the processor is characterized in that a plurality of arithmetic units are configured to execute the same arithmetic operation.
- the present invention includes a memory for storing an instruction code composed of a plurality of fields corresponding to the number of operation units, and an instruction code holding means for holding the instruction code read from the memory,
- An arithmetic unit having at least one identical function controlled by the information held in the instruction code holding means, a register file storing operand information of the arithmetic unit, and a register in the register file cannot be specified.
- a plurality of operation units composed of special registers for holding data types with a wide bit width are provided. The data types of bit widths that can be specified in the registers in the register file in these operation units are provided.
- a processor characterized in that both types of arithmetic processing of the data type stored in the special register are enabled.
- the present invention provides a memory for storing instruction codes and data, an instruction code holding means for holding a plurality of instruction codes read from the memory, and a plurality of instructions held in the instruction code holding means.
- a processor having a plurality of operation units operable in parallel according to a code, wherein the operation unit is constituted by a plurality of operation units and a plurality of access port register files, and each of the operation units corresponds to a corresponding access unit.
- a processor characterized in that the contents of the register file can be read out from a port and can be operated, and the plurality of units have the same function.
- the present invention provides a memory for storing instruction codes and data, instruction code holding means for holding a plurality of instruction codes read from the memory, and a plurality of instruction codes held in the instruction code holding means.
- a processor having a plurality of operation units operable in parallel according to the following formula, wherein the operation unit comprises a plurality of operation units and a plurality of access point register files
- a processor wherein each of the arithmetic units can read the contents of the register file from a corresponding access port to perform an arithmetic operation, and the plurality of arithmetic units have a subset of the same function. It is.
- the present invention is characterized in that in the processor, at least one arithmetic unit in an arithmetic unit can execute a data transfer instruction for performing data transfer between the memory and the register file.
- one instruction is composed of eight fields.
- the instruction decompression circuit When one field has operation information, operand information, and the above-mentioned mode information, and the mode information specifies a simultaneous operation mode for controlling a plurality of operation units, the remaining These seven fields do not exist in memory when the instruction is read. Therefore, the instruction decompression circuit generates the remaining seven fields by copying the operation information and the operand information specified in the one field.
- one instruction corresponding to eight fields is generated with a code size of one field, and since each arithmetic unit has the same function, a plurality of arithmetic instructions can be executed in parallel without any problem. Size can be reduced to 1/8.
- the mode information is set to the specified information of the arithmetic unit, only the field corresponding to the set information is generated. You.
- one instruction has operation information, operand information, and the above-mentioned mode information, and the mode information is a simultaneous operation.
- the instruction decompression circuit When the mode is specified, the instruction decompression circuit generates three instructions by generating the operation information and the operand information specified in the instruction.
- the code size for one instruction is 4 instructions. Instructions can be executed, and the code size can be reduced to 1Z4.
- the operation unit designation information is set in the mode information, only instructions corresponding to the setting information are newly generated, so providing 2 bits of setting information will reduce the number of simultaneous operations to 2 to 4. Can be controlled by range.
- the code size can be reduced while the degree of parallelism of the simultaneous operation processing is improved.
- the same type of operation has the property of being repeatedly executed multiple times, so that performance can be surely improved by increasing the degree of parallelism of processing.
- the data processed by the integer arithmetic unit in the next cycle can be loaded simultaneously with the processing of the multimedia arithmetic unit. Since the loaded data is stored in the register file in the arithmetic unit, it can be used as an operand for processing by the multimedia arithmetic unit.
- FIG. 1 is a block diagram of a processor showing a first embodiment of the present invention.
- FIG. 2 is a diagram showing an instruction format of the processor.
- FIG. 3 is a diagram showing an example of storing a program in the instruction memory 1.
- Fig. 4 shows the format of the header.
- FIG. 5 is a diagram showing a specific example of the instruction code.
- FIG. 6 is a diagram showing a specific example of the instruction code.
- FIG. 7 is a detailed block diagram of the instruction decompression circuit 2.
- FIG. 8 is a detailed block diagram of the field controller 42.
- FIG. 9 is a diagram showing an example of storing a program.
- FIG. 10 is a diagram showing an operation opening of the field controller 42.
- FIG. Fig. 11 is a detailed block diagram of the header analyzer 60.
- FIG. 12 is a detailed block diagram of the partial light controller 62.
- FIG. 13 is a detailed block diagram of the address controller 61.
- FIG. 14 is a detailed block diagram of the select signal generator 63.
- FIG. 15 is a detailed block diagram of the offset generation circuit 120.
- FIG. 16 is a detailed block diagram of the SIMD controller 46.
- FIG. 17 is a block diagram of a processor showing a second embodiment of the present invention.
- FIG. 18 is a view showing an instruction format according to the second embodiment.
- FIG. 19 is a detailed block diagram of the instruction expansion circuit 200.
- FIG. 20 is a detailed block diagram of the field controller 201.
- FIG. 21 is a detailed block diagram of the synchronizer 210.
- FIG. 22 is a detailed block diagram of the select signal generator 211.
- FIG. 23 is a block diagram of a processor showing a third embodiment of the present invention.
- FIG. 24 shows instructions for the third embodiment. It is a figure showing a format.
- FIG. 25 is a detailed block diagram of the instruction expansion circuit 24 1.
- FIG. 26 is a diagram showing a truth table for realizing the function of the register adjuster 250.
- FIG. 27 is a block diagram of a socket processor showing a fourth embodiment of the present invention.
- FIG. 28 is a diagram showing an instruction format showing the fourth embodiment.
- FIG. 29 is a detailed block diagram of the instruction decompression circuit 260.
- FIG. 30 is a block diagram showing an embodiment of the IFG calculator according to the present invention. BEST MODE FOR CARRYING OUT THE INVENTION
- FIG. 1 is a block diagram of a VLIW processor to which the present invention is applied.
- 1 is an instruction memory for compressing and storing a processor instruction code
- 2 is a main block of the present invention, a code capable of actually executing a compressed instruction code read from an instruction memory 1.
- Instruction expansion circuit that expands the code 3 is the address bus of instruction memory 1
- 4 is the data bus of instruction memory 1
- 5 to 12 are the fields where the instruction expansion circuit 1 outputs the expansion code.
- 14 to 21 are instruction registers for holding decompression codes transferred via the field buses 5 to 12, respectively.22 to 25 have the same configuration, and are stored in the instruction registers 14 to 21 respectively.
- An operation unit that performs various operations in accordance with the retained decompression code, 26 is in units of 8 bits or 16 bits, and performs complex operations such as multimedia operations that perform multiple operations and multiplication.
- IFG (Integer Floating Graphics) computing unit, 27 is an INT (Integer) computing unit that executes simple operations such as data transfer instructions and logical operations that execute data transfer between data memory 30 and the registry file
- And 28 hold the value to be operated on and the value of the operation result, and consist of 32 64 bit registers, and a register having 4 read ports and 3 write ports.
- An evening file, 29 is a selection circuit that allows the operation results of the operation units 22 to 25 to be transferred to other operation units, and 30 is a register file in the operation units 22 to 25 This is a data memory that can transfer data to and from the data memory.
- this VLIW processor is integrated into one LSI. Also, descriptions of LSI pins such as cache memory for temporarily storing instruction codes and the like, instruction codes and the like are read from outside the processor, and operation results are output to the outside are omitted.
- the feature of the present invention is that the arithmetic unit 23 is composed of an IFG arithmetic unit 26, an INT arithmetic unit 27, and a register file 28, and a plurality of the same arithmetic units 23 to 25 are arranged in parallel. It is a configuration.
- the instruction decompression circuit 2 reads out the compressed instruction code from the instruction memory 1 based on the address information given via the address bus 3, and the operation units 22 to 25 can be directly executed. Convert to decompression instruction code.
- the decompression instruction code corresponding to one instruction is composed of eight fields, and each field is transferred to the corresponding instruction register 14 to 21. There are two types of fields, IFG fields and INT fields. The I FG field is transferred to instruction registers 14, 16, 18, 20, and the INT field is transferred to instruction registers 15, 17, 19, 21.
- the IFG field stored in the instruction register 14 controls the operation of the IFG operator 26 in the operation unit 22.
- the INT field stored in the instruction register 15 controls the operation relating to the INT operator 27 in the operation unit 22.
- instruction registers 16 and 17 are arithmetic unit 23
- instruction registers 18 and 19 are arithmetic unit 24
- instruction registers 20 and 21 are IFG arithmetic units in arithmetic unit 25 and INT performance Control the calculator.
- Data to be operated by the IFG operator 26 and the IINT operator 27 are read from the register file 28.
- the operation result is output to the selection circuit 29, and can be written to the register file 28 in any operation unit.
- the code size can be reduced while improving the parallelism of the simultaneous operation processing as described above.
- FIG. 2 shows the formats of the IFG field and the INT field.
- the operation code block of bit 0 to bit 7 (hereinafter referred to as “opcode”) indicates the type of operation, and up to 2 ⁇ 6 types can be specified.
- NOP no operation
- the immediate block of bit 8 (hereinafter referred to as “immediate”) indicates the meaning of the contents of the source 1 block of bits 22 to 26 (hereinafter referred to as “source 1”).
- source 1 When it is 1, the immediate value is shown, and when it is 0, the register number is shown.
- the register number indicates 1- ⁇ of the 32 registers in the register file in the operation unit.
- the source 0 block of bits 17 to 21 (hereinafter, referred to as “source 0”) is one of the 32 registers in the register file in the operation unit. Show.
- the bit 27 S mode block (hereinafter referred to as "S mode") (one bit field) specifies the simultaneous operation mode which is the point of the present invention.
- This bit (“S mode”) power 0 indicates normal mode, and 1 indicates SIMD (Single Instruction Multiple Data stream) mode (simultaneous operation mode).
- the SIMD destination block from bit 9 to bit 11 is In this mode, it is written as "Death Bank", and in SIMD mode, it is written as "SI MD".
- SIMD mode power, '1'
- the operation result of each operation unit can be controlled by one IFG field so that multiple operation units can be controlled. makes it possible to write only to the registers in the register file in the operation unit. Therefore, in the SIMD mode, the register to which the operation result is written is specified by specifying 32 types of register numbers in "Destination”.
- S IMD which shares the field with the bank of the death bank, is used to specify other operation units to be operated simultaneously.
- SIMD is composed of 3 bits and indicates whether or not each of the other three operation units performs the same operation. 1 indicates execution of the same instruction, and 0 indicates no execution, that is, no operation (hereinafter abbreviated as NOP).
- NOP no operation
- the correspondence between the three bits and each arithmetic unit is shown in the IFG field. Depends on the instruction register in which the field is held. In other words, when held in the instruction register 14 corresponding to the IFG field bank 0, the three bits of "SIMD" correspond to bank 1, link 2, and link 3. Therefore, if "SI MD” is 1 1 0 (binary), the same instruction is set in instruction registers 14 and 15, 16 and 17 and 18 and 19.
- the three bits of "SIMD” correspond to bank 0, bank 2, and bank 3. If the IFG field is held in the instruction register 18 corresponding to bank 2, the three bits of "SIMD” correspond to nok, nok0, nok1, and nok3. If the IFG field is held in the instruction register 20 corresponding to bank 3, the three bits of "SIMD” correspond to "0", "0", "1", and "2".
- Branch test is for controlling the branching of a program.
- Each arithmetic unit 22 to 25 has six 1-bit branch bank registers for conditional branching. These indicate that branching is performed when 1 is set, and that branching is not performed when 0 is set.
- branch test is 0 0 (binary)
- no branch occurs.
- branch is 0 0 (binary)
- no branch occurs.
- branch occurs according to the contents of the branch bank register to be executed.
- FIG. 1 is shows a program storage example of the installation traction memory 1 I have.
- eight instructions are stored at addresses 0 to 95, and the header is information indicating the presence or absence of a field for each instruction.
- the header is provided in a ratio of one for every four instructions.
- Finale 0, 1, 2, 3, 4, 5, 6, and 7 correspond to instruction registers 14, 15, 15, 16, 17, 18, 18, 19, 20, and 21, respectively. are doing.
- No field means a field omitted by using NOP or SIMD mode.
- the fields of the NOP are not stored, the first instruction is finolades 0, 1, 4, 6, and 7, the second instruction is fields 0, 1, and 2, and the third instruction is In fields 0, 1, 2, 4, 6, and 7, the fourth instruction is in fields 4 and 7, the fifth instruction is in fields 0 and 1, the sixth instruction is in fields 2, 3, and 7 The first instruction is in field 6, and the eighth instruction is in field 4.
- FIG. 4 shows the format of the header shown in FIG.
- One header is composed of 32 bits of the same size as one field, and indicates field presence / absence information (4 x 8-32) for four consecutive instructions.
- FIG. 5 shows the header 0 shown in FIG. 3 and the corresponding first to fourth decompressed instruction formats. Assume that these four instructions all specify the normal mode. The instruction format after decompression is generated by the instruction decompression circuit 2 from the presence / absence information of the field in the header, and the N 0 P field, which is omitted in the instruction memory 1, is used. Fields are created and the fields are sorted.
- FIG. 6 shows the header 1 shown in FIG. 3 and the corresponding instruction format after the fifth to eighth decompression. It is assumed that "S mode" and "SIMD" of these instructions have the values described in the figure, respectively.
- the SIMD mode is specified in field 4 and “SIMD” is 0 1 (0 is N ⁇ P, 1 is the same instruction execution). The contents are copied to fields 6 and 7.
- FIG. 7 is a block diagram showing details of the instruction decompression circuit 2.
- the same reference numerals are given to circuit blocks and the same signal lines having the same functions as those in FIG. 40 is an instruction buffer for latching the compressed instruction code (32 bits) from the data bus 4, and 41 (41a to 41h) is 4 bytes (1 field or 1 field).
- Eight field signal lines that indicate the presence or absence of a field 49 is a decompressed field bus after field rearrangement, 46 is a SIMD controller that controls the field copy operation of each instruction in SIMD mode, and 47 is a copy A SIMD selection signal line that controls the selection of the field, and 48 is a dual selector that selects one of the two fields (the IFG field and the INT field).
- the field controller 42 cuts out and analyzes the header information with reference to the information held in the instruction buffer 40. Based on the analysis result, address information of the instruction to be fetched to the instruction buffer 40 is written to the address bus 3, and information enabling fetching to the instruction buffer 40 in 4-byte units is write enabled.
- the information for selecting the field 0 of the instruction from the compressed field bus 41 is output to the field selection signal line 44 to the bus 43.
- the selection information of fields 1 to 7 is output to the corresponding selector 4 ⁇ b coupler 45h. Also, information indicating the presence or absence of each field is output to the field signal line 67.
- the instruction buffer 40 Since the instruction buffer 40 is 32 bytes, it cannot hold the longest instruction consisting of a header and eight fields at a time. Therefore, in this case, two fetches are required.
- the field controller 42 outputs the information indicating the second foot cycle to the foot signal line 13 again. This signal corresponds to the instruction shown in Figure 1. Sent to registers 14-21. Since only the information of field 7 is output in the re-fetch cycle, only the instruction register 21 corresponding to field 7 latches (updates) the field data in this cycle.
- the SIMD controller 46 analyzes the “S mode” and “SIMD” of the fields 0, 2, 4, and 6 in the decompressed field bus 49 to determine the copy source field of each field. Then, information for selecting the copy source field is output to the dual selector 48a.
- the SIMD controller 46 In the normal mode, the SIMD controller 46 outputs the selection information of the copy source fields 0 and 1 to the dual selector 48a. According to this selection information, the dual selector 48 a selects two out of the eight fields in the decompressed field bus 49, and outputs field 0 to field bus 5 and field 1 to field bus 6.
- each dual selector 4 8 b to 8 Select two from the fields, field 2 to field bus 7, field 3 to field bus 8, field 4 to field bus 9, field 5 to field bus 10.
- the instruction buffer 40 holds the header 0, the final instruction 1 of the second instruction, and the like. From this state, the first instruction format shown in Fig. 5 is generated according to the selectors 45a to h corresponding to each field, ', and the analysis information of header 0 in the field controller 42. And outputs it to the extension field bus 49.
- the field controller 42 fetches the next instruction in a total of 24 bytes of the header 0 and 5 fields corresponding to the first instruction in the instruction buffer 40.
- the information is output to the light-intensible bus 43.
- the information output to the write enable bus 43 is composed of 8 bits in order to control the writing at each 4-byte boundary.
- the address information of the address bus 3 indicates the head of the next instruction (field 0 of the second instruction in FIG. 3), and the data bus 4 has 32 bytes of continuous data from the address.
- the data (addresses 24 to 55) is read.
- 24 bytes of data (addresses 32 to 55) are updated according to the information of the write enable bus 43. Therefore, the instruction buffer 40 holds 32 bytes of information from addresses 32 to ⁇ 5 and addresses 24 to 31.
- the selectors 45a to h corresponding to each field generate the second instruction format shown in FIG. 5 according to the analysis information of the header 0 in the field controller 42, and Output to bus 49.
- the field controller 42 transfers information for flushing a total of 12 bytes for three fields corresponding to the second instruction to the instruction buffer 40 to the write enable bus 43. I do.
- the address information of the address bus 3 indicates the start of the next instruction (field 0 of the third instruction in FIG. 3), and the data bus 4 has 32 bytes continuous from that address. in c the state of the data (3 6 address to 6 address 7) is read, the next cycle, according to the information of the write enable bus 4 3, 1 2 by Bok of data (5 6 address to 6 address 7 ) Is updated. Therefore, the instruction buffer 40 holds 32 bytes of information at addresses 64 to 67 and addresses 36 to 63. As described above, the final controller 42 controls the instruction buffer 40 via the address bus 3 and the write enable bus 43 so that the instruction buffer 40 is always filled with data. The details of the field controller 42 will be described later.
- the "S mode" of each field on the decompressed field bus 49 generated as described above is checked by the SIMD controller 46.
- the SIMD controller 46 identifies the fields to be copied and the fields to be copied, as shown in FIG.
- the field selection information of the copy source is sent to the dual selector 48 a via the SIMD selection signal line 47.
- the field selection information selects its own field, fields 0 and 1.
- the field selection information for fields 2 and 3, fields 4 and 5, and fields 6 and 7 are similarly sent to the respective dual selectors 48b-d.
- the field buses 5 and 6 are output from the dual selector 48 a, and the field 0 of the final red bus 5 is selected from the fields 0, 2, 4, and 6 of the extended final bus 49, while Field 1 of field bus 6 is selected from fields 1, 3, 5, and 7 of extended field bus 49. Similarly, a field bus? Fields 2 to 7 on ⁇ 1 2 are generated. The details of the 310 controller 46 will be described later.
- FIG. 7 the same reference numerals are given to circuit blocks and the same signal lines having the same functions as those in FIG.
- 60 indicates a header analyzer for analyzing header information
- 67 indicates an 8-bit field signal line indicating a field configuration in the header
- 68 indicates an instruction length (0 to 33) at the time of compression.
- a 6-bit instruction length signal line 61 is an address controller for generating address information to be given to the address bus 3
- 64 is an instruction address bus for transferring instruction address information being executed
- 65 is an address bus.
- a 2-bit header bus indicating one of the four instructions included in the header 66 is a header 0 signal line that is asserted when the header is 0, and 62 is a write enable bus 4
- Reference numeral 3 denotes a partial light controller for generating flip information to be supplied to the field selector 63
- reference numeral 63 denotes a select signal generator for generating field selection information to be supplied to the field selection signal line 44 and the like. The operation will be described below with a specific example.
- FIG. 9 shows an instruction sequence stored in the instruction memory 1. Here, seven instructions are stored at addresses 0 to 127.
- FIG. 10 is a time chart showing the operation when these instructions are sequentially executed.
- the figure shows the operation for 9 cycles from T O to D 8.
- Each instruction is basically executed in a four-stage pipeline. As for the breakdown of the four stages, IF is the instruction fetch stage, EXP is the instruction decompression stage, EXE is the operation execution stage, and WB is the operation result writing stage. In the figure, the time chart from instruction 1 to instruction 6 is shown.
- Instruction 1 starts from the initial state, and each signal line in the T0 cycle shows the initial value. Since the instruction address bus 64 is 0, addresses 0 to 31 are read from the instruction memory 1. In addition, the write enable bus 43 is 1 1 1 1 1 1 1 1 1 (binary), and the instruction buffer 40 has addresses 0 to 31 at the transition from T0 to T1 cycle. Is latched.
- the header analyzer 60 sends the information latched to the instruction buffer 40 input from the compression field bus 41 in accordance with the information input from the instruction address bus 64 and the header address bus 65. From the report, specify the header information. In other words, since the instruction address bus 64 is 0, it is known that the headers of instructions 1 to 4 exist at addresses 0 to 3 and the information of these 4 bytes is latched. Further, since the header address bus 65 is 0, it can be understood that the header information corresponding to the instruction 1 is an 8-bit address 0.
- Header Analyzer 60 states that Instruction 1 consists of five fields, the fields of which are Fields 0, 1, 4, 6, and 7. I understand. Therefore, the header information 1 1 0 0 1 0 1 1 (binary number) of the instruction 1 is output to the field signal line 67 directly. In addition, the instruction length of the instruction 1 including the header is 24 bytes, and information of 0 1 1 0 0 0 (binary number) indicating that the instruction length is 24 on the instruction length signal line 68 is output. Is done. Since the instruction length does not exceed 32, the refetch signal line 13 is not asserted. The address controller 61 outputs a value obtained by adding 32 to the value of the instruction address bus 64 to the address bus 3 except for the cycle in the initial state. Therefore, in one cycle, 32 is output to the address bus 3.
- the partial write controller 62 transfers the position information of the field of the instruction 1 stored in the instruction buffer 40 to the write enable bus 43 from the information of the instruction length signal line 68 and the instruction address bus 64. Output. This location information is managed in units of 4 bytes and consists of 8 bits. In the T2 cycle, since the instruction address bus 64 is 0 and the instruction length signal line 68 is 24, it is understood that the first 24 bytes of the instruction buffer 40 correspond to the instruction 1. 1 1 1 1 1 1 1 0 0 (binary number) is output to the write enable bus 43.
- the select signal generator 63 sends the field information from the instruction buffer 64, the header 0 signal line 66, and the field signal line 67 from the integration buffer 40. Generate and output the information to be selected.
- the selection information of field 0 is output to the field selection signal line 44.
- the selection information of fields 1 to 7 is output to the corresponding field selection signal line.
- the selection information is composed of three bits indicating from which position on the 4-byte boundary the instruction buffer 40 should read. Since the instruction address bus 64 in T1 cycle is 0 and the header 0 signal line 66 is asserted, the field of instruction 1 is held after the second 32 bit boundary of the instruction buffer 40 You can see that it is. Further, which field is present can be known from the information on the field signal lines 67. Therefore, it is known that the field 32 is held at the second 32 bit boundary, and the selection information indicating the field 0 is 1, which is output to the field selection signal line 44. Similarly, the selection information for field 1 is 2, field 4 is 3, field 6 is 4, and field 7 is 5.
- the address bus 3 force, '32 2, the light enable bus 4 3 force 1 1 1 1 1 1 1 1 1 0 0 (binary number)
- addresses 32 to 55 are newly latched, and the information of addresses 24 to 31 already held is held as it is.
- the instruction length signal line 68 is 24, the information on the instruction address bus 64 is added to 24 by adding 0 to 24, and the information on the header address bus 65 is added by 1 Becomes 1
- the header information corresponding to the instruction 2 must be 8 bits at the address 1 of the information latched in the T1 cycle. I understand. From this information, the header analyzer 60 indicates that instruction 2 is composed of three fields, which are fields 0, 1, and 2. Therefore, the header information 1 1 1 0 0 0 0 0 (binary number) of the instruction 2 is output to the field signal line 67 as it is. Further, the instruction 2 has an instruction length of 12 bytes, and information of 0111010 (binary number) indicating that the instruction length is 24 is output to the instruction length signal line 68. Since the instruction length does not exceed 32, refetch signal line 1 3 Is not asserted.
- the address controller 61 outputs a value obtained by adding 32 to the value of the instruction address bus 64 to the address bus 3 except for the cycle in the initial state. Therefore, in the T2 cycle, 56 is output to the address bus 3.
- the partial write controller 62 Since the instruction address bus 64 and the instruction length signal line 68 in the T2 cycle are 24 and 24, respectively, the partial write controller 62 has the seventh four bytes of the instruction buffer 40. From the boundary, it can be seen that 3 bytes correspond to instruction 2, and 10000 001 (binary number) is output to the light-inable bus 43.
- the finolade of instruction 2 is an instruction buffer. It can be seen that it is held after the seventh 32-bit boundary of 40. Further, which field is present can be known from information on the field signal line 67. Therefore, it is found that field 0 is held at the seventh 32 bit boundary, and the selection information indicating field 0 is 6, which is output to the field selection signal line 44. Similarly, selection information indicating field 1 is 7 and field 2 is 0.
- the address bus 3 power 56 and the write enable bus 43 are 100 0 0 0 0 1 1 1 (binary number). Addresses 56 to 67 are newly latched, and the information already held at addresses 36 to 55 is retained as it is. At the same time, since the instruction length signal line 68 is 12, the information on the instruction address bus 64 is added to 24 by adding 12 to 36, and the information on the header address bus 65 is 1. Adds up to 2.
- the T2 cycle described above is called the IF stage.
- instruction 3 is held in the instruction buffer 40.
- ⁇ In 3 cycles, the 3 ⁇ ⁇ stage of instruction 3 is executed.
- the header information corresponding to the instruction 3 is ⁇ 8-bit address 2 of the information latched in one cycle. It turns out that it is. From this information, the header analyzer 60 indicates that instruction 3 is composed of eight fields, the breakdown of which is fields 0, 1, 2, 3, 4, 5, 6, 7. Therefore, the header information 1 1 1 1 1 1 1 1 (binary) of the instruction 3 is output to the field signal line 67 directly. Further, the instruction 3 has an instruction length of 32 bytes, and information of 100 000 (binary number) indicating that the instruction length is 32 is output to the instruction length signal line 68. Since the instruction length does not exceed 32, the refetch signal line 13 is not asserted.
- the address controller 61 outputs the value obtained by adding 32 to the value of the instruction address bus 64 to the address bus 3 except for the cycle in the initial state. Therefore, in the third cycle, 68 is output to the address bus 3. Since the partial address controller 62 has 36 instruction address lines 64 in three cycles and the instruction length signal line 68 has 32, the second four bytes of the instruction buffer 40 can be used. From the data boundary, it is found that 32 bytes correspond to instruction 3, and 1 1 1 1 1 1 1 1 1 (binary number) is output to write enable bus 43.
- the field of instruction 3 is It can be seen that it is held after the second 32 bit boundary of the instruction buffer 40. Further, which field is present can be known from information on the field signal line 67.
- the selection information for field 1 is 2, field 2 is 3, field 3 is 4, fieldo 4 is 5, field 5 is 6, fieldo 6 is 7, field 7 is 0.
- the write enable bus 43 is 1 1 1 1 1 1 1 1 1 1 1 (binary), it goes to the integration buffer 40. Are newly latched from addresses 68 to 99.
- the length of the instruction length signal line 68 is 32, the information of the instruction address bus 64 is added to 36 by adding 32 to 68, and the information of the header address bus 65 is added to 1 by 1. It becomes 3.
- the T3 cycle described above corresponds to the IF stage, and in the T4 cycle, the instruction buffer 40 holds the instruction 4.
- the EXP stage of instruction 4 is executed.
- the header information corresponding to the instruction 4 is 8 bits at the address 3 of the information latched in the T1 cycle. I understand. From this information, the header analyzer 60 knows that instruction 4 consists of one field, which is field 0. Therefore, the header information 1 0 0 0 0 0 0 0 (binary number) of the instruction 4 is output to the field signal line 67 directly. Further, the instruction 4 has an instruction length of 4 bytes, and the instruction length signal line 68 outputs 0 001 0 0 (binary) information indicating that it is 4. Since the instruction length does not exceed 32, the re-fetch signal line 13 is not asserted.
- the address controller 61 uses the instruction Outputs the value obtained by adding 32 to the value of the response bus 64 to the address bus 3. Therefore, in the T4 cycle, 100 is output to the address bus 3. Since the partial address controller 62 has the instruction address path 64 in the T4 cycle, and the instruction length signal line 68 is 4, the second 4 bits of the instruction buffer 40 are provided. From the bit boundary, it is found that 4 bytes correspond to instruction 4, and 01000.000 (binary number) is output to write enable bus 43.
- the select signal generator 63 Since the select signal generator 63 has the instruction address bus 64 in the T4 cycle and the header 0 signal line 66 is not asserted, the field of the instruction 4 is stored in the instruction buffer 40. It can be seen that it is held after the second 32 bit boundary. Further, which field is present can be known from the information on the field signal line 67. Therefore, it is found that field 0 is held at the second 32 bit boundary, and the selection information indicating field 0 is 1 and is output to the field selection signal line 44.
- the header analyzer 60 knows that the headers of the instructions 5 to 8 are located at addresses 72 to 75, and the instruction buffer 60 Latch the 4-byte information held in 40. Further, since the header address bus 65 is 0, it is understood that the header information corresponding to the instruction 5 is an 8-bit address 72. From this information, the header analyzer 60 knows that instruction 5 is composed of eight fields. Therefore, the header information 1 1 1 1 1 1 1 1 (binary) of the instruction 5 is output to the field signal line 67 directly. Further, the instruction length of instruction 5 including the header is 36 bytes, and the information of 1001 (binary number) indicating that the instruction length is 36 on the instruction length signal line 68 is provided. Is output. Since the instruction length exceeds 32, the refetch signal line 13 is asserted at the transition from T5 to T6.
- the address controller 61 outputs the value obtained by adding 32 to the value of the instruction address bus 64 to the address bus 3 except for the cycle in the initial state. Therefore, in the fifth cycle, 104 is output to the address bus 3.
- the field of instruction 5 is It can be seen that it is held after the fourth 32-bit boundary of the instruction buffer 40. Further, which field is present can be known from information on the field signal line 67.
- the selection information for field 1 is 4, the selection information for field 2 is 5, the selection information for field 3 is 6, the selection information for field 4 is 7, and the selection information for field 5 is The selection information indicating 0 and field 6 is 1. Since the instruction length exceeds 32, it is known that the information in the field 7 is not held in the instruction buffer 40, so the selection information is output in the next cycle (T6).
- address bus 3 is 104 and write enable bus 43 is 1 1 1 1 1 1 1 1 1 1 (binary). Is newly latched from address 104 to address 135.
- the length of the instruction length signal line 68 is 36
- the information of the instruction address bus 64 is obtained by adding 36 to 72 and adding it to 108, and the information of the header address bus 65 is 0. Is added to 1 to give 1.
- the re-fetch signal line 13 is asserted, indicating that the stage is EXP2.
- the header analyzer 60, the address controller 61, the partial light controller 62, and the select signal generator 63 operate specially. .
- the address controller 61 outputs to the address bus 3 136 which is obtained by adding 28 to 108 of the instruction address bus 64.
- the header address remains 1 without adding 1.
- the header analyzer 60 outputs 0 to the instruction length signal line 68. Therefore, the information of the instruction address bus 64 remains at 108 at the transition from T6 to T7. Because the instruction address bus 64 is 108 in the partial write controller 62 and the field 7 of the instruction 5 is located at the address 104 in the instruction memory 1, the instruction buffer 64 is instructed. It is found that it is on the third 4-byte boundary of 40, and outputs 0 0 1 0 0 0 0 0 (binary number) to the write enable bus 43.
- the select signal generator 63 is the instruction address bus 64, which is' 108, and the field 7 of the instruction 5 is located at address 104 of the instruction memory 1. From this, it is found that it is on the third 4-byte boundary of the instruction buffer 40, and 3 is output to the field selection signal line corresponding to the FINOLED 7.
- the address bus 3 is 1 3 6 and the write enable bus 4 3 is 0 0 1 0 0 0 0 0 (binary number).
- addresses 1336 to 1339 are newly latched, and the information of addresses 108 to 135 already held is retained as it is.
- the T6 cycle described above is the IF stage.
- the instruction buffer 40 holds the instruction 6.
- the EXP stage of instruction 6 is executed.
- header analyzer 60 since the header analyzer 60 has a header address bus 65 of 1, the header information corresponding to the instruction 6 has the header information corresponding to the instruction 6 at the second byte boundary of the information latched in the T5 cycle. It turns out there is. From this information, header analyzer 60 knows that instruction 6 consists of three fields, the fields of which are fields 1, 2, and 3. Therefore, the field The header information 0 1 1 1 0 0 0 0 (binary) of the instruction 6 is output to the signal line 67 as it is. Further, the instruction 6 has an instruction length of 12 bytes, and information of 0 0 1 1 0 (binary number) indicating that the instruction length is 12 is output to the instruction length signal line 68. Since the instruction length does not exceed 32, the refetch signal line 13 is not asserted.
- the address controller 61 outputs a value obtained by adding 32 to the value of the instruction address bus 64 to the address bus 3 except for the cycle in the initial state. Therefore, in the T7 cycle, 140 is output to the address bus 3.
- the partial address controller 62 Since the partial address controller 62 has 108 for the instruction address bus 64 and 12 for the instruction length signal line 68 in the T7 cycle, the fourth part of the instruction From the 4-byte boundary, it is found that 1 2 bits correspond to instruction 6, and 0 0 0 1 1 1 1 0 (binary number) is output to write enable bus 43.
- the select signal generator 63 is the instruction address bus in the T7 cycle, and the header 0 signal line 66 is not asserted. Is held after the fourth 32 bit boundary of the instruction buffer 40. Further, which field is present can be known from the information on the field signal lines 67. Therefore, it is known that field 1 is held at the fourth 32 bit boundary, the selection information indicating field 1 is 3, and the field selection signal line 4 4 is output. Similarly, the selection information for field 2 is 4, and the selection information for field 3 is 5.
- the address bus 3 and the write enable bus 4 3 are 0 0 0 1 1 1 1 0 0 (binary), so the instruction buffer 4 0 Is latched newly from address 140 to address 151, and the information from address 120 to address 13 It is kept as it is.
- the instruction length signal line 68 is 12
- the information of the instruction address bus 64 is obtained by adding 1 to 108 and adding 12 to 120, and the information of the header address bus 65 is 1 and Is added to 1 to give 2. Instruction 6 and subsequent steps are executed repeatedly in the same sequence. Of each instruction
- the expanded instruction field is latched in each of the instruction registers 14 to 21. Therefore, in the EXE stage, the operation units 22 to 25 execute operations under the control of the instruction registers 14 to 21. The execution result is written to the register specified in the instruction field in the WB stage, and the instruction ends.
- the above is the description of the operation of the field controller 42.
- the following are the header analyzer 60, the address controller 61, and the components. The detailed configurations of the single light controller 62 and the select signal generator 63 will be described.
- FIG. 11 is a block diagram showing details of the header analyzer 60.
- 80 is a selector for selecting one out of eight 4-byte data
- 81 to 83 are latch circuits for each one-byte data
- 84 is a selector for selecting one out of four 1-byte data.
- Selector 85 is a 1-bit 9-input adder
- 86 is a 4-bit decoder
- 87 is an AND circuit 87
- 88 is a 1-bit latch circuit
- 89 is a 1-bit latch
- the inverting circuit 90 is an AND circuit 90.
- the selector 80 specifies the header position from the three bits IA4 to IA2 of the 32-bit instruction address bus 64 (IA31 to IAO), and selects the four bytes (four instructions). Output header information for When the header 0 signal line 6 6 is asserted, the first instruction header of the header information goes to the selector 84, the second instruction header goes to the latch circuit 81, and to the third instruction. The header is output to the latch circuit 82, and the fourth instruction header is output to the latch circuit 83.
- the selector 84 selects the first header information if 0, the header information held in the latch circuit 81 if 1, and the latch circuit 8 2 if 2 according to the information on the header address bus 65. If it is 3, the header information held in the latch circuit 83 is selected and output to the field signal line 67.
- the adder 85 From the information on the field signal lines 67, the adder 85 generates instruction length information.
- the breakdown of the 9 inputs is 8 bits of the field signal line 67 and 1 bit of the header 0 signal line 66.
- the instruction length can be determined by counting the number of fields on the field signal lines 67. Further, in the cycle in which the header 0 signal line 66 is asserted, since the instruction includes header information, 1 is added to the instruction length generated from the number of fields. Therefore, the addition result of the adder 85 is in the range of 0 to 9, and the instruction length is output as 4-bit information.
- the actual instruction length is the value obtained by multiplying the above addition result by 4, and is 0 to 36 bytes long.
- the decoder 86 is a circuit for detecting an instruction having a length of 36 bytes, and asserts an output when the input information is 9.
- the latch circuit 88 When the output of the decoder 86 is asserted, the latch circuit 88 asserts the refetch signal line 13 at the timing when the cycle changes. When the refetch signal line 13 is asserted, the output of the inverting circuit 89 is negated, and the AND circuit 87 masks the output of the decoder 86.
- the latch circuit 88 negates the refetch signal line 13 at the timing when the cycle changes. That is, the refetch signal line 13 is always negated in the next cycle after the assertion.
- the output of the inverting circuit 89 is added.
- the output of the unit 85 is masked, and the AND circuit 90 outputs information of the instruction length 0 to the instruction length signal line 68. If the assertion is not made, the instruction length information output from the adder 85 is output to the instruction length signal line 68.
- the header analyzer 60 can output necessary information to each of the instruction length signal line 68, the field signal line 67, and the refetch signal line 13.
- FIG. 12 is a block diagram showing a detailed configuration of the partial light controller 62.
- the same reference numerals are given to circuit blocks and signal lines having the same functions as those in FIG.
- 100 is a 4-bit input 8-bit output decoder
- 101 is an 8-bit barrel shifter
- 102 is a 3-bit adder
- 103 is an address 3
- a bit input 8-bit output decoder 104 is a selector for selecting the output of the barrel shifter 101 and the decoder 103 and outputting the output to the write enable bus 43.
- the decoder 100 generates 8-bit information according to the conversion table shown in FIG.
- This information is written in the rel shifter 101 in accordance with the information of the three bits (IA4 to IA2) of the instruction address bus 64.
- the information output from the barrel shifter 101 indicates where the instruction indicated by the instruction address bus 64 is held in the instruction buffer 40 in units of 4 bytes. That is, this information indicates the location of the instruction buffer 40 to be updated at the time of transition to the next cycle.
- the selector 104 selects this information and outputs it to the write enable bus 43.
- the instruction address bus 64 indicates the start address of the next instruction, and the value obtained by subtracting 4 from that address is the value of the instruction file. This is the storage address for field 7.
- the adder 102 To determine the storage location of this field 7 in the instruction buffer 40, the adder 102 adds the information of the three bits (IA4 to IA2) of the instruction address box 64 to ⁇ . (1 1 0, binary). The addition result generates information to be output to the write enable bus 43 according to the conversion table shown in the diagram of the decoder 103.
- the selector 104 selects the output of the decoder 103 when the refetch signal line 13 is asserted.
- the 0 , 0— char light controller 62 can output necessary information to the light enable bus 43.
- FIG. 13 is a block diagram showing a detailed configuration of the address controller 61. As shown in FIG. 8, the same reference numerals are given to circuit blocks and the same signal lines having the same functions as those in FIG.
- 110 is a program counter for holding 32-bit address instruction address information
- 111 is a 32-bit adder
- 112 is a 2-bit header count register
- 1 13 is a 2-bit adder
- 1 14 is a 2-input NOR circuit that outputs the NOT information of OR
- 1 15 is a selector that selects either 3 2 or 28
- 1 1 6 is a selector Selector Selects the output of 115 or 0,
- 117 is a 32-bit adder.
- the program counter 110 updates the instruction address information every time a cycle changes.
- the update information is generated by the adder 111 adding the information of the instruction length signal line 68 and the instruction address information of the program counter 110.
- the generated information is output to the instruction address bus 64.
- the header count register 112 holds header address information (0 to 2) and, like the program counter 110, updates the header address information every cycle transition. Update information is generated by the adder 1 1 1 adding 1 to the header address information.
- the generated information is output to the header address bus 65.
- the N ⁇ R circuit 114 detects that the header address information is “0” and asserts the header 0 signal line 66.
- the selector 115 selects and outputs "32" when the re-fetch signal line 13 is not asserted, and selects "28" when the re-fetch signal line 13 is asserted. .
- the selector 1 16 selects 0 only in the initial state, and otherwise selects the output of the selector 115.
- the adder 1 17 outputs the result obtained by adding the information of the instruction address bus 64 and the output information of the selector 116 to the address bus 3.
- the address controller 61 can output necessary information to the instruction address bus 64, the header address bus 65, the header 0 signal line 66, and the address bus 3.
- FIG. 14 is a block diagram showing a detailed configuration of the select signal generator 63. As shown in FIG. In the figure, the same reference numerals are given to circuit blocks and signal lines having the same functions as those in FIG.
- 120 is an offset generation circuit
- 122 is a 3-bit adder
- 122-: 128 is a 3-bit adder
- 1229 is a 3-bit adder
- 130 is a selector.
- the offset generation circuit 120 generates 3-bit information indicating the relative positions of the fields 1 to 7 when the position of the first field is set to 0. I do.
- the relative position is a relative position in a compressed state stored in the instruction memory 1. Therefore, the offset generation circuit 120 outputs valid information only for the relative position of the existing field. Details of the offset generation circuit 120 will be described later.
- the adder 122 outputs a result obtained by adding one bit information of the header 0 signal line 66 to the information of three bits (IA 4 to IA 2) of the instruction address bus 64.
- the addition by the adder 122 in order to know the position of the first field of the instruction is performed. Will be needed.
- the addition result indicates the position of the field 0, and is output to the field selection signal line 44.
- the adder 122 generates the absolute position information of the field 1 by adding the relative position information of the field 1 output from the offset generation circuit 120 and the first field position information. Similarly, the adders 123 to 128 add the relative position information of each of the fields 2 to 7 output by the offset generation circuit 120 to the first field position information, and Generate absolute position information for fields 2 to 7.
- the selector 13 0 selects the output of the register file 28 when the refetch signal line 13 is not asserted, and the adder 1 2 when the refetch signal line 13 is asserted. Select the output of 9 and output it as the selection information of field 7.
- the instruction address bus 64 indicates the next instruction address information, so that the address information in the field 7 is 4 based on the above address information.
- the result is the result of the subtraction. Therefore, the adder 1 229 adds 1 1 (111, binary number) to the information of the three bits (IA 4 to IA 2) of the instruction address bus 64, similarly to the adder 1 221. This Generates the selection information for field 7.
- the select signal generator 63 can output necessary selection information to the field selection signal line 44 or the like.
- FIG. 15 is a detailed configuration diagram of the offset generation circuit 120 described above.
- the same circuit blocks and the same signal lines as in FIG. 14 are denoted by the same reference numerals.
- 13 1 is a 1-bit 2-input adder
- 13 2 is a 1-bit 3-input adder
- 13 3 is a 1-bit 4-input adder
- 13 4 is a 1-bit 5
- the input adder, 135 is a 1-bit, 6-input adder, and 135 is a 1-bit, 7-input adder.
- the relative position information of the field 1 is 1 when the field 0 exists and 0 when the field 0 does not exist, so that it is the information of the field 0 of the field signal line 67 itself.
- the relative position information of finoredo 2 depends on the presence or absence of field 0 and field 1, and it is 0 if neither exists, 1 if either exists, and 2 if both exist . Accordingly, in the adder 131, the relative position information is generated by adding the 1-bit information of each of the field 0 and the field 1.
- the relative position information of field 3 depends on the presence or absence of fields 0 to 2, and is generated by adding 1-bit information of fields 0 to 2 in adder 132.
- the relative position information of field 4 depends on the presence or absence of fields 0 to 3, and is generated by adding 1-bit information of each of fields 0 to 3 in adder 133. Is done.
- the relative position information of field 5 depends on the presence or absence of fields 0 to 4, and is generated by adding 1-bit information of each of fields 0 to 4 in adder 134. Is done.
- the relative position of field 6 The information depends on the presence or absence of fields 0 to 5, and is generated by adding 1-bit information of each of fields 0 to 5 in adder 13 ⁇ .
- the relative position information of field 7 depends on the presence or absence of fields 0 to 6, and is generated by adding each bit information of fields 0 to 6 in adder 1336. .
- FIG. 16 is a block diagram showing details of the SIMD controller 46. 7, the same reference numerals are given to circuit blocks and the same signal lines having the same functions as those in FIG.
- 140 is an enable analyzer
- 140, 141 to 144 are enable analyzers with the same function as 140
- 144 is a signal generator
- 144 to 147 is 1
- 14 8 to 15 0 is a 2-input AND circuit
- 15 1 to 15 4 and 15 6 are 0 when the left input bit is 0
- 1 is an AND circuit that outputs 2-bit information on the right side
- 155 is a NOR circuit 155 that outputs the negation information of 4-bit 1-bit OR.
- the enable analyzer 140 detects whether or not the SIMD mode is specified in the field 0, and outputs bits 9 to 11 (“SIMD”) of the field 0 from the extension field bus 49. And 4 bits, bit 27 ("S mode") (see Figure 2).
- SIMD bits 9 to 11
- S mode bits 9 to 11
- S mode bits 9 to 11
- S mode bits 9 to 11
- the AND circuits 148, 149 and 150 output 0.
- the AND circuit 148 stores the bit 9 information
- the AND circuit 149 stores the bit 10 information
- the AND circuit 150 stores the bit 9 information. 1
- the information of 1 is output.
- SI MD shown in FIG. 2
- the information of the AND circuit 148 when the information of the AND circuit 148 is 1, it indicates that the contents of the fields 0 and 1 are copied to the fields 1 and 2.
- the information of the AND circuit 1 4 9 is 1 If the contents of fields 0 and 1 are copied to 4 and 5, and if the information of AND circuit 150 is 1, the contents of fields 0 and 1 are copied to 6 and
- enable analyzer 144 is in field 2
- enable analyzer 144 is in field 4
- enable analyzer 144 is in field 6, and detects whether SIMD mode is specified. , Specify the copy destination.
- the copy instruction from each enable analyzer 140 to 144 is sent to each signal generator. Specifically, the signal generator 144 decides from which field the fields 0 and 1 are copied, and the copy instruction from the fields 2 and 3 is sent to the AND circuit 15 2
- the copy instruction from fields 4 and 5 is sent to AND circuit 153, and the copy instruction from fields 6 and 7 is sent to AND circuit 154.
- These copy instructions are not sent from multiple enable analyzers at the same time. This is basically guaranteed by the compiler. For the same reason, no copy instruction is sent from the enable analyzer 140 in the signal generator 144, so the input of the AND circuit 151 is fixed at 0.
- the logical product circuit 15 2 When receiving the copy instruction, the logical product circuit 15 2 outputs 1 which is information for selecting fields 2 and 3 as copy sources. Similarly, the AND circuit 15 3 outputs 2 and the AND circuit 15 4 outputs 3. If there is no copy instruction (not in SIMD mode), NOR circuit 155 detects this, and AND circuit 156 outputs selection information 0 so that fields 0 and 1 are selected. Eventually, the OR circuit 157 uses the logical sum of the 3-bit information output from the AND circuit 15 1 to 15 4 and 15 6 as SIMD selection information for fields 0 and 1. Output to select signal line 47.
- the signal generator 144 operates basically in the same manner as the signal generator 144. However, output the selection information of fields 2 and 3 Since the copy instruction is not sent from the enable analyzer 14 1, the corresponding input is fixed at 0, and when it is detected that there is no copy instruction (not in SIMD mode), fields 2 and 3 are set. Output selection information 1 to be selected.
- the signal generator 144 operates basically in the same manner as the signal generator 144. However, since the selection information of fields 4 and 5 is output, the copy instruction is not sent from Enable Analyzer 142, so the corresponding input is fixed to 0 and there is no copy instruction (SIMD If not, it outputs selection information 2 so that fields 4 and 5 are selected.
- the signal generator 144 operates basically in the same manner as the signal generator 144. However, since the selection information of fields 6 and 7 is output, no copy instruction is sent from Enable Analyzer 144. Therefore, the corresponding input is fixed to 0 and there is no copy instruction (SIMD (Not mode), output selection information 3 so that fields 6 and 7 are selected strongly.
- SIMD Not mode
- the header is also used in the SIMD mode.
- NOP compression is an indispensable technology in consideration of memory usage efficiency.
- the feature of the present embodiment is that overhead can be reduced by utilizing the header used in this technology also in the SIMD mode.
- the SIMD mode is realized by adding 4 bits to each field.
- the header is not assumed, it is necessary to add 7 bits to each field.
- the 4 bits used in this embodiment In addition, two bits of field address and one bit of synchronization control are required.
- FIG. 10 is an overall block diagram of the VLIW processor.
- circuit blocks and the same signal lines having the same functions as those in FIG. 1 are denoted by the same reference numerals.
- 2 0 is an instruction decompression circuit different from FIG.
- the refetch signal line 13 in FIG. 1 is not required. That is, the EXP 2 stage required by instruction 5 shown in FIG. 10 does not exist.
- This point is one of the features of the present embodiment, and is the same as FIG. 1 except for this point and the internal operation of the instruction expansion circuit 200.
- FIG. 18 shows the instruction format of the present embodiment. In the figure,
- Bits 0 to 27 of the 1NT field and the IFG field are the same as in the first embodiment. Bits 28 and 29 of the IFG field indicate the address of the field. IFG field is one of fields 0, 2, 4, and 6, and the bit allocation is as shown in the figure. Bit 30 (sync) of the IFG field is a synchronization signal. By inverting the sync bit for each instruction, it is possible to recognize a break between instructions. In the figure, the even-numbered instruction has a specification of sync bitca ⁇ 0, and the odd-numbered instruction has a specification of sync bit 1. Based on such an instruction format, the detailed operation of the instruction expansion circuit 200 which is the point of the present embodiment will be described below.
- FIG. 19 is a block diagram showing details of the instruction decompression circuit 200.
- circuit blocks and the same signal lines having the same functions as those in FIGS. 17 and 7 are denoted by the same reference numerals.
- reference numeral 201 denotes a field controller for generating information for selecting each field for generating one instruction from a compressed field bus 41, and 206 to
- Reference numeral 209 denotes a selection information line for transmitting the above selection information
- reference numeral 202 denotes a dual selector for generating fields 0 and 1
- reference numeral 203 denotes a dual selector for generating fields 2 and 3
- reference numeral 204 denotes fields 4 and 5.
- the dual selectors 203 to 205 have the same circuit configuration as the dual selector 202.
- 32 bytes simultaneously read from the instruction buffer 40 are composed of eight signal lines (41a to 41h) in units of 4 bits corresponding to one field. 41a is the address (32XN), 41b is the address (32XN + 4),
- 41h corresponds to the data read from the address (32 ⁇ N + 28).
- the dual selector 202 selects the field 0 data selected from the compressed field buses 4 la, 4 1 c, 4 1 e, and 4 1 g into the field bus 5 according to the selection information on the selection information line 206.
- the data of the field 1 selected from the compressed field buses 41b, 41d, 41f, and 41h is output to the fieldbus 6.
- the selection information of the selection information line 206 is composed of 4 bits. Basically, one bit of the information is asserted, and if all are not asserted, N 0 P It is regarded as a field and outputs 0 corresponding to the N0P code. This allows N ⁇ P compression in units of two fields.
- the dual selectors 203 to 205 also generate and output data of the fields 2 to 7.
- FIG. 20 is a block diagram showing details of the field controller 201.
- circuit blocks and the same signal lines having the same functions as those in FIGS. 8 and 19 are denoted by the same reference numerals.
- reference numeral 210 denotes a sink mouth generator for generating information to be output to the instruction length signal line 68 and the write enable bus 43 from information on the compression field bus 41 and the instruction address bus 64, and 211 a compression unit.
- This is a select signal generator that generates selection information to the selection information lines 206 to 209 from the information of the field bus 41 and the light enable bus 43.
- Address controller 61 has basically the same function as the address controller 61 shown in FIG.
- the synchronizer 210 receives the sync bits of 41a, c, e and g from the compressed fino redo bus 41. By inputting an instruction address from the instruction address path 64, it is possible to specify which of the above-mentioned sync bits the instruction being processed is. By examining the sync bit change point, the instruction length can be determined. Further, it specifies the data in the compressed field bus 41 where the instruction is present, and then outputs information indicating the position to be written to the instruction buffer 40 to the instruction length signal line 68.
- the select signal generator 211 transmits information of the write enable bus 43 and “SIMD”, “S mode” and address information of the compressed field buses 41 to 41 a, c, e and g. input.
- four bits of the position information of field 0 (information indicating any one of the four 41a, c, e, or) are output to the selection information line 206. If field 0 is NOP compressed, all four bits will be zero. This is, at the same time, the selection information for field 1 (information indicating any one of the 4 1b, d, f or g). Similarly, the four bits of the location information of field 2 (information indicating any of the four 41 a, c, e, or h) are transferred to the selection information line 207, and the four bits of the location information of field 4 (the four bits).
- FIG. 21 is a block diagram showing details of the synchronizer 210.
- circuit blocks having the same functions and the same signal lines in FIG. Have the same reference numerals.
- 220 is a write enable generation circuit that generates information that enables writing to instruction a 41 b 41 a and b, and 221-2 to 23 are write enable generation circuits. It has the same function as the circuit 220, and generates information that enables writing to the instruction cuffers 40, 41 c and d, 41 e and f, and 41 g and h, respectively.
- the write enable generation circuit 224 is a decoder for decoding two bits IA 4 and IA 3 of the information on the instruction address bus 64.
- the four signal lines output by the decoders 2 2 and 4 are used by the select signal generator 63 to output signals indicating 4 la and 4 lb, signals indicating 41 c and 41 d, and 41 e and 41 f, respectively. This signal is a signal indicating 41 g and a signal indicating 41 h.
- the write enable generation circuit 220 generates 41a and b write valid information. Input the sync information of the compression field buses 41 to 41a and 41g, the decode signals of the decoders 222 to 41a and b, and the write enable information from the write enable generation circuit 222. When the decode signal from the decoder 222 is asserted, the write enable generation circuit 220 asserts the write valid information. When the decode signal is not asserted, the write enable information is negated unless the output of the write enable generation circuit 223 is asserted. If the decoded signal is not asserted and the write enable generation circuit 23 is asserted, the sync information of 41a and the sync bit of 41g are compared and the same. If so, it is determined that the instruction is not a break, and the write valid information is asserted. Conversely, if the comparison results are not the same, it is determined that the instruction is a break, and the write valid information is negated.
- the write enable generation circuit 220 generates 4 la and b write valid information. As a result, during transition to the next cycle, Controls whether to write to the structure buffer 40. In the same way, the write enable generation circuit 2 2 1 outputs 4 1 c and d write valid information, and the write enable generation circuit 2 2 2 outputs 4 1 e and f write valid information. 23 generates 41 g and h write valid information and outputs it to the write enable bus 43.
- the adder 225 receives the four signals of the write enable generation circuit 220-223 and outputs a 1-bit 4-input addition result to the instruction length signal line 68. Since one bit is 8 bytes (for 2 fields), the addition result is a maximum of 32 bytes. The result of the addition is output to the instruction length signal line 68 as instruction length information.
- FIG. 22 is a block diagram showing details of the select signal generator 211.
- circuit blocks and the same signal lines having the same functions as those in FIG. 20 are denoted by the same reference numerals.
- 23 0 to 23 3 are destination signal generation circuits having the same function.
- reference numeral 234 denotes a combinational circuit for realizing the truth table shown in the figure
- reference numeral 235 denotes a 2-bit decoder
- reference numerals 236 to 239 have the same function. Is a logic circuit.
- the destination signal generation circuit 230 determines which field corresponds to 41a and c.
- Logic circuit 236 corresponds to fields 0 and 1
- logic circuit 237 corresponds to fields 2 and 3
- logic circuit 238 corresponds to fields 4 and 5.
- Circuit 239 shows that it corresponds to fields 6 and 7. Therefore, if 4 la and b correspond to fields 0 and 1, the output of logic circuit 236 corresponds to fields 2 and 3, and the output of logic circuit 237 corresponds to fields 4 and 5, and the logic circuit output corresponds to fields 4 and 5. If the output of 238 corresponds to fields 6 and 7, the output of logic circuit 239 is asserted.
- the instruction of 41a specifies SIMD mode. In this case, the destination signal generation circuit 230 asserts a plurality of signals at the same time. For example, when copying to all three other fields, all the output signals of the logic circuits 236 to 239 are asserted.
- the destination signal generation circuit 230 inputs “SIMD”, “S mode” and an address of 41 a from the compressed field bus 41. From these pieces of input information, the combinational circuit 234 generates output information of sf 0 to sf 3 according to the truth table shown in FIG. sf 0 to 3 are signals that specify the copy destination field of 41a when the 4 la field specifies the S mode. In the figure, a, b, and c indicate "SIMD" 3-bit logical values. Therefore, 3, b, and c of 51 : 0 to 3 indicate that the logical value is output as it is. sf0 is asserted when the 41a field specifies the SIMD mode and field 0 is specified as the copy destination. Similarly, sf 1-3 are asserted when fields 2, 4, and 6 are specified as copy destinations.
- the decoder 235 decodes the address information of the field and indicates which field is 41 a. For field 0, output to logic circuit 236, for field 2, output to logic circuit 237, for fielded 4, output to logic circuit 238, for field 6, logic circuit 2 3 Assert the output to 9.
- the destination signal generation circuit 230 inputs the write enable information of the write enable bus 43 from the write enable bus 43.
- the fact that this information is asserted indicates that 43a is a field of an execution instruction. Therefore, when this signal is not asserted, the outputs of the logic circuits 236 to 239 are not all asserted. Conversely, when asserted, the logic circuit to which one asserted signal line among the output lines of the decoder 235 asserts the output signal. Address is 0 0 If the address is 0 (binary), then the logic circuit 237, if the address is 10 (binary), the logic circuit 238, the address is If 1 1 (binary number), the logic circuit 2 3 9 outputs the output signal.
- the destination signal generation circuit 230 will go to the logic circuits 237 and 238 Since this signal is asserted, the output signals of the logic circuits 236 and 237 are also asserted.
- the destination signal generation circuit 230 determines which instruction field 41 a corresponds to, and further analyzes the destination field in SIMD mode, and then determines the destination of 41 a.
- the field is designated to the dual selector 202 via the selection information line 206.
- the destination signal generation circuit 2 3 1 corresponds to 4 1 c
- the destination signal generation circuit 2 3 2 corresponds to 4 1 e
- the destination signal generation circuit 2 3 3 corresponds to 4 1 g.
- the destination field numbers of the field information of 41 c, 41 e, and 41 g are assigned to each dual selector. specify.
- the output signal of the destination signal generation circuits 230 to 233 whose destination is field 0 is output to the selection information line 206.
- the destination to be output by the output signal of these destination signal generation circuits 230 to 233 is field 2
- the destination is the selection information line 207
- that of field 4 is the selection information line 207.
- the field 6 is output to the selection information line 209.
- the above is the second embodiment.
- the feature of this embodiment is that the SIMD mode is realized without using header information as in the first embodiment. It is.
- the advantage of this is that since the maximum instruction length is 32 bytes, it is guaranteed that the next instruction to be executed is always stored in the instruction buffer 40 (32 bytes). In other words, there is no need for an extra cycle, and pipeline control becomes easier. In order to eliminate the extra cycle in the first embodiment, it is necessary to prepare a 64 byte buffer. However, this embodiment has a feature that this is not required and the amount of hardware is small.
- circuit blocks and signal lines having the same functions as those in FIG. 17 are denoted by the same reference numerals.
- reference numeral 240 denotes a register file common to each operation unit
- reference numeral 241 denotes an instruction expansion circuit peculiar to this execution instruction.
- the present embodiment is characterized in that the register file 240 has a common configuration for each operation unit.
- the register format is different in the instruction format.
- FIG. 24 shows the specific instruction format of this embodiment. This figure is basically the same as FIG. 18, but differs in the following points. Bits 9 to 11 of the IFG field are valid only in the SIMD mode and have no meaning in the normal mode because the bank designation is not required as a destination. Similarly, bits 11 to 13 of the INT field are also invalid. In this case, a problem occurs in the SIMD mode.
- FIG. 25 is a detailed block diagram of the instruction expansion circuit 24 1.
- circuit blocks and the same signal lines having the same functions as those in FIGS. 23 and 19 are denoted by the same reference numerals.
- 250 to 25 3 is a register age.
- Register adjuster 250 checks bit 27 of the input field. If in normal mode, the register number in that field is not changed. Conversely, if in SIMD mode, check the addresses of bits 28 and 29. As a result, if the address is field 0, it can be determined that it is the copy source field, and the register number is not changed. Conversely, if it is any other field, it is determined to be the copy destination field, and the register number (bits 12 to 16: 17 to 21; 22 to 26) is updated. Specifically, offset value 1 is added to each register number for field 3, offset value 2 for field 2, and offset value 3 for field 1.
- register adjust 25 1 also operates.
- Register adjuster 2 ⁇ 1 checks bit 27 of the input field. If in normal mode, do not change the register number in that field. Conversely, if in SIMD mode, check the addresses of bits 28 and 29. As a result, if the address is field 1, it can be determined that the field is a copy source field, and the register number is not changed. Conversely, if it is any other field, it is determined that the field is the copy destination field, and the register numbers (bits 12 to 16: 17 to 21 and 22 to 26) are updated. Specifically, if field 0, offset value 1 is added, if field 3 is offset value 2, if field 2 is offset value 3, offset value 3 is added to each register number.
- the register adjusters 25 2 and 25 3 operate similarly.
- Such a resister adjuster 2 ⁇ 0 ⁇ 2553 is a simple combination of the above operations. It can be realized by a matching circuit.
- Fig. 26 shows a truth table for implementing the combinational circuit.
- the S mode of the input field is bit 27, the address is bits 28 and 29, and the register # is the input value of bits 12 to 16, 17 to 21 and 22 to 26. (0 to 31).
- the register # in the output field indicates the output values of bits 12 to: 16, 17 to 21, and 22 to 26. In particular, when the output register # exceeds 31, the value is subtracted by 32 (in this way, the present invention is applied to a processor having a configuration in which the register file is commonly used by a plurality of arithmetic units. It is possible.
- the first, second and third embodiments are all applied to a VLIW processor on the premise of static scheduling, but the present invention is not limited to this architecture.
- the present invention is also applicable to a superscalar processor that dynamically performs scheduling.
- One instruction of the superscalar processor basically has a fixed length of one field as described in the above embodiment.
- Such a processor incorporates a plurality of operation units and an instruction queue, checks the dependencies of the instructions present in the queue, and as a result, executes a plurality of instructions having no dependencies and being executable.
- the SIMD mode of a superscalar processor can be easily realized by transferring to multiple operation units.
- FIG. 1 A specific overall block diagram is shown in FIG.
- the 23rd Circuit blocks and the same signal lines having the same functions as those in the figure are denoted by the same reference numerals.
- 260 is an instruction extension circuit corresponding to the superscalar architecture.
- each of the IFG and INT fields is one instruction. Therefore, it is necessary to know which format the instruction is in. Therefore, bit 31 indicates that the instruction is in the IFG format, and 0 indicates that the instruction is in the INT format. Other than this bit, it is the same as FIG. 24. However, the bit positions of the "Destination”, “Source ⁇ ", “Source 1", and spare block of the INT instruction format are different. Therefore, the instruction decompression circuit 260 extracts a plurality of instructions that can be executed at the same time and inputs them to each operation unit.
- FIG. 1 The detailed configuration of the instruction decompression circuit 260 is shown in FIG. In the figure, circuit blocks and the same signal lines having the same functions as in FIGS. 27 and 25 are denoted by the same reference numerals.
- reference numeral 270 denotes a dispatcher that schedules instructions and controls the input of instructions to each arithmetic unit
- 271 denotes an instruction queue.
- an instruction queue 271 can store up to eight instructions. These instructions can all be known through the compression field bus 41 a to 41 h.
- the dispatcher 270 analyzes the contents of each instruction field to determine the dependence of the internal resources of the processor. As a result, a plurality of instructions independent of each other are determined, and information for inputting those instructions to an appropriate operation unit is transferred to the dual selectors 202 to 205.
- the write enable bus 43 notifies the instruction queue 271, which instruction in the instruction queue 271, has been executed, and also transmits the next instruction via the address bus 3 to the next. Outputs address information to the address bus 3 where the instruction code held by the instruction queue 27 1 exists. For example, when three instructions of 41a to 41c are executed, the instruction queue 271, 41c! At the same time, the 5 instructions from 4h to 41h are transferred to the positions of 41a to 41e, and at the same time, the 3 instructions that are transferred according to the information of the address path 3 to 41f to 41h are latched.
- the dispatcher 270 analyzes the "SIMD" and inputs one instruction to a plurality of operation units, as shown in FIG. It can be easily realized by referring to the configuration of 01.
- each operation unit 22 to 25 only the instruction set of the operation that is effective for the multimedia supports the SIMD mode, and if it does not support the branch instruction, etc., each operation unit Units 22 to 25 need not necessarily be exactly the same.
- the advantage in this case is that the operation unit can support a large number of operations.
- Opera-code and 1-bit "S-mode” enable 256 instructions It is compatible and can specify SIMD mode for all instructions.
- the arithmetic units 22 to 25 can support a maximum of 384 types of instructions.
- half of the 128 types of instructions specified by the 8-bit "op code” do not specify the SIMD mode, so one bit of the "S mode” is used as part of the "op code”. This makes it possible to support 2 ⁇ 6 types of instructions that do not support SIMD mode.
- the operation units 22 to 25 in FIG. 1 have the same functions. This configuration has the following effects in addition to the use in the SI ⁇ D mode.
- the first is the effect of reducing the development man-hours for processors.
- the circuits of the operation units 23 to 25 can be developed by copying the circuit of the operation unit 22.
- the development man-hour for the circuit scale is only 25% of the usual.
- each computing unit has a configuration that can support multiple uses such as a numerical operation instruction and a multimedia processing instruction.
- a numerical operation instruction For an application requiring a numerical operation, each of the four IFG operation units of the four operation units 22 to 25 executes a numerical operation instruction, and an application requiring a multimedia processing.
- the IFG operators of the four arithmetic units execute multimedia processing instructions, so that the capabilities of the IFG operators can be fully utilized for various applications.
- a 32 ⁇ 32-bit multiplication instruction used in numerical operations and a division multiplication instruction that executes 16 16-bit simultaneous 8 ⁇ 8-bit operations on 128-bit data used for multimedia On how to configure an IFG operator that can execute State.
- the result of a 32 x 32 bit multiplication is to divide the data into four 8 bits and divide it into 16 8 x 8 bit multiplications, and then sum the results of each multiplication. Is obtained. Therefore, 16 8-bit multipliers are required. Focusing on this point, 16-bit multiplication, which is often used in multimedia processing, can be performed simultaneously. Therefore, it is possible to realize an IFG arithmetic unit that can support various applications while sharing most of the circuit parts.
- the detailed configuration of the IFG arithmetic unit will be described below with reference to FIG.
- circuit blocks having the same functions as in FIG. 1 and the same signal names are denoted by the same reference numerals.
- reference numerals 300 and 301 denote a 128-bit register that holds the operand of a division multiplication instruction used for multimedia, and 302 denotes data for a 32-bit multiplication instruction.
- Operand filters distributed in 8-bit units 303 is a 256-bit 2-input selector, 304 is an 8-bit multiplier, and 305 is an adder for adding the multiplication results .
- the division multiplication instruction consists of 16 pieces of 8-bit data (a0 to al5) stored in register 300 and 16 pieces of 8-bit data (b0 to bl) stored in register 301. 5) for C a O xb O + alxbl-a 2 xb 2 + a 3 xb 3 + a 4 xb 4 + a 5 xb 5 + a 6 xb 6 +----+ al 5 xbl 5) Perform calculations. Therefore, first, before executing this multiplication instruction, the data is set in the registers 300 and 301.
- the selector 303 selects the data of the registers 300 and 301 and outputs the data to each multiplier.
- the multiplier 304 calculates the term a0Xb0.
- the remaining 15 multipliers similarly calculate a 1 xb 1, a 2 xb 2, a 3 xb 3,..., A 15 xb 15.
- Each multiplication The result is sent to adder 305.
- the adder 305 outputs the result of calculating the sum of 16 multiplication results.
- a 32-bit X32 2-bit multiplication instruction is divided into four 8-bit data (30 to 33 and 1) 0 to 53) and the next 16 8-bit data Divide into multiplications.
- Operandor 302 outputs operand data to be supplied to each multiplier so that 16 such 8-bit multiplications can be performed.
- the selector 303 selects the output of the operand router 302 and outputs it to each 8-bit multiplier. Further, the result of the multiplication is sent to the adder 305.
- the adder 305 calculates the above 16 multiplication results as follows.
- the result of this calculation is output as the result of multiplication of 32 bits x 32 bits.
- the 16 8-bit multipliers which occupy the majority of the circuit, can be shared with ordinary multiplication instructions and division multiplication instructions.
- the present invention is effective in reducing the instruction code amount of a parallel processor that repeatedly executes the same type of operation such as multimedia processing. Further, since the present invention has a configuration in which a plurality of arithmetic units having the same function are arranged, it is possible to reduce the number of design units of the arithmetic unit and increase the number of arithmetic units for improving the degree of parallelism. This has the effect that hardware can be easily realized simply by doing.
- present invention is applicable to processors of various architectures such as VLIW and superscalar.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Advance Control (AREA)
- Executing Machine-Instructions (AREA)
Description
Claims
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP52828196A JP3547139B2 (ja) | 1995-03-17 | 1996-03-15 | プロセッサ |
KR1019970706441A KR100325658B1 (ko) | 1995-03-17 | 1996-03-15 | 프로세서 |
US08/913,840 US6401190B1 (en) | 1995-03-17 | 1996-03-15 | Parallel computing units having special registers storing large bit widths |
US10/053,683 US6965981B2 (en) | 1995-03-17 | 2002-01-24 | Processor including a plurality of computing devices |
US11/216,024 US20060053271A1 (en) | 1995-03-17 | 2005-09-01 | Processor |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP5879095 | 1995-03-17 | ||
JP7/58790 | 1995-03-17 |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US08/681,180 Continuation-In-Part US5870618A (en) | 1995-03-17 | 1996-07-22 | Processor and data processor |
Related Child Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US08913840 A-371-Of-International | 1996-03-15 | ||
US10/053,683 Continuation US6965981B2 (en) | 1995-03-17 | 2002-01-24 | Processor including a plurality of computing devices |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1996029646A1 true WO1996029646A1 (fr) | 1996-09-26 |
Family
ID=13094375
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP1996/000673 WO1996029646A1 (fr) | 1995-03-17 | 1996-03-15 | Processeur |
Country Status (4)
Country | Link |
---|---|
US (3) | US6401190B1 (ja) |
JP (1) | JP3547139B2 (ja) |
KR (1) | KR100325658B1 (ja) |
WO (1) | WO1996029646A1 (ja) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009230375A (ja) * | 2008-03-21 | 2009-10-08 | Fujitsu Ltd | 演算装置および演算方法 |
JP2010073197A (ja) * | 2008-09-19 | 2010-04-02 | Internatl Business Mach Corp <Ibm> | 多重プロセッサ・コア・ベクトル・モーフ結合機構 |
WO2010044242A1 (ja) * | 2008-10-14 | 2010-04-22 | 国立大学法人奈良先端科学技術大学院大学 | データ処理装置 |
JP2011086158A (ja) * | 2009-10-16 | 2011-04-28 | Mitsubishi Electric Corp | 並列信号処理装置 |
JP2011145759A (ja) * | 2010-01-12 | 2011-07-28 | Mitsubishi Electric Corp | 並列信号処理プロセッサ |
JP2013545211A (ja) * | 2010-11-01 | 2013-12-19 | クアルコム,インコーポレイテッド | 複数メモリアクセスを伴うdsp/プロセッサ中の記憶バッファをなくすためのアーキテクチャおよび方法 |
Families Citing this family (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3547139B2 (ja) * | 1995-03-17 | 2004-07-28 | 株式会社 日立製作所 | プロセッサ |
JP3790607B2 (ja) * | 1997-06-16 | 2006-06-28 | 松下電器産業株式会社 | Vliwプロセッサ |
US6859870B1 (en) * | 2000-03-07 | 2005-02-22 | University Of Washington | Method and apparatus for compressing VLIW instruction and sharing subinstructions |
AU2001245520A1 (en) * | 2000-03-08 | 2001-09-17 | Sun Microsystems, Inc. | Vliw computer processing architecture having a scalable number of register files |
US6895494B1 (en) * | 2000-06-26 | 2005-05-17 | Texas Instruments Incorporated | Sub-pipelined and pipelined execution in a VLIW |
US7233998B2 (en) * | 2001-03-22 | 2007-06-19 | Sony Computer Entertainment Inc. | Computer architecture and software cells for broadband networks |
US6848074B2 (en) * | 2001-06-21 | 2005-01-25 | Arc International | Method and apparatus for implementing a single cycle operation in a data processing system |
ITMI20022003A1 (it) * | 2002-09-20 | 2004-03-21 | Atmel Corp | Apparecchio e metodo per la decompressione dinamica di programmi. |
US20040128482A1 (en) * | 2002-12-26 | 2004-07-01 | Sheaffer Gad S. | Eliminating register reads and writes in a scheduled instruction cache |
JP2005056311A (ja) * | 2003-08-07 | 2005-03-03 | Matsushita Electric Ind Co Ltd | 情報処理装置及びそれを用いた電子機器 |
JP4283131B2 (ja) * | 2004-02-12 | 2009-06-24 | パナソニック株式会社 | プロセッサ及びコンパイル方法 |
JP4300151B2 (ja) * | 2004-04-19 | 2009-07-22 | Okiセミコンダクタ株式会社 | 演算処理装置 |
WO2006049331A1 (ja) * | 2004-11-05 | 2006-05-11 | Nec Corporation | Simd型並列演算装置、プロセッシング・エレメント、simd型並列演算装置の制御方式 |
AT501213B1 (de) * | 2004-12-03 | 2006-10-15 | On Demand Microelectronics Gmb | Verfahren zum steuern der zyklischen zuführung von instruktionswörtern zu rechenelementen und datenverarbeitungseinrichtung mit einer solchen steuerung |
US7954062B2 (en) * | 2005-01-03 | 2011-05-31 | International Business Machines Corporation | Application status board mitigation system and method |
US7523295B2 (en) * | 2005-03-21 | 2009-04-21 | Qualcomm Incorporated | Processor and method of grouping and executing dependent instructions in a packet |
US20110152392A1 (en) * | 2009-12-17 | 2011-06-23 | Honeywell International Inc. | Catalysts For Polyurethane Foam Polyol Premixes Containing Halogenated Olefin Blowing Agents |
US9405564B2 (en) * | 2006-05-10 | 2016-08-02 | The Mathworks, Inc. | System and method for targeting commands to concurrent computing units executing a concurrent computing process |
US7631168B1 (en) * | 2006-05-10 | 2009-12-08 | The Math Works, Inc. | Graphical interface for grouping concurrent computing units executing a concurrent computing process |
JP4934356B2 (ja) * | 2006-06-20 | 2012-05-16 | 株式会社日立製作所 | 映像処理エンジンおよびそれを含む映像処理システム |
EP1873627B1 (en) * | 2006-06-28 | 2009-05-27 | STMicroelectronics S.r.l. | A clustered SIMD processor architecture |
JP5206240B2 (ja) * | 2008-08-29 | 2013-06-12 | 日本電気株式会社 | 情報処理装置および情報処理方法 |
EP2335149A1 (en) | 2008-09-08 | 2011-06-22 | Bridgeco, Inc. | Very long instruction word processor with multiple data queues |
KR101520624B1 (ko) * | 2008-12-31 | 2015-05-15 | 삼성전자주식회사 | 비트 맵 방식의 영상 인코딩/디코딩 방법 및 장치 |
KR102056730B1 (ko) * | 2013-04-22 | 2019-12-17 | 삼성전자주식회사 | Vliw 프로세서를 위한 명령어 압축 장치 및 방법과, 명령어 인출 장치 및 방법 |
KR102149509B1 (ko) * | 2014-03-27 | 2020-08-28 | 삼성전자주식회사 | 구성 데이터를 압축 및 복원하는 방법 |
US10514928B2 (en) * | 2014-04-17 | 2019-12-24 | Arm Limited | Preventing duplicate execution by sharing a result between different processing lanes assigned micro-operations that generate the same result |
US10346170B2 (en) * | 2015-05-05 | 2019-07-09 | Intel Corporation | Performing partial register write operations in a processor |
JP6694284B2 (ja) * | 2016-01-29 | 2020-05-13 | シナプティクス・ジャパン合同会社 | 画像データ伝送システム、送信回路及び受信回路 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH01133138A (ja) * | 1987-11-19 | 1989-05-25 | Hitachi Ltd | 並列計算機及びその制御方法 |
JPH04299436A (ja) * | 1990-10-05 | 1992-10-22 | Philips Gloeilampenfab:Nv | メモリ回路および機能ユニットのグループを備えた処理装置 |
JPH04308930A (ja) * | 1991-04-05 | 1992-10-30 | Toshiba Corp | 電子計算機 |
JPH0553805A (ja) * | 1991-08-29 | 1993-03-05 | Toshiba Corp | 電子計算機 |
JPH05143333A (ja) * | 1991-11-18 | 1993-06-11 | Toshiba Corp | 並列演算処理装置 |
JPH05233281A (ja) * | 1992-02-21 | 1993-09-10 | Toshiba Corp | 電子計算機 |
Family Cites Families (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6397833A (ja) | 1986-10-13 | 1988-04-28 | Mitsui Eng & Shipbuild Co Ltd | 固体燃料のガス化生成ガスを燃料とするガスタ−ビンプラント |
JPS6398733A (ja) | 1986-10-16 | 1988-04-30 | Fujitsu Ltd | 演算回路制御方式 |
US5057837A (en) * | 1987-04-20 | 1991-10-15 | Digital Equipment Corporation | Instruction storage method with a compressed format using a mask word |
JP2617974B2 (ja) * | 1988-03-08 | 1997-06-11 | 富士通株式会社 | データ処理装置 |
US5313551A (en) * | 1988-12-28 | 1994-05-17 | North American Philips Corporation | Multiport memory bypass under software control |
US5293500A (en) * | 1989-02-10 | 1994-03-08 | Mitsubishi Denki K.K. | Parallel processing method and apparatus |
US5226166A (en) * | 1989-02-10 | 1993-07-06 | Mitsubishi Denki K.K. | Parallel operation processor with second command unit |
US5471593A (en) * | 1989-12-11 | 1995-11-28 | Branigin; Michael H. | Computer processor with an efficient means of executing many instructions simultaneously |
EP0474297B1 (en) * | 1990-09-05 | 1998-06-10 | Koninklijke Philips Electronics N.V. | Very long instruction word machine for efficient execution of programs with conditional branches |
US5301340A (en) * | 1990-10-31 | 1994-04-05 | International Business Machines Corporation | IC chips including ALUs and identical register files whereby a number of ALUs directly and concurrently write results to every register file per cycle |
US5437043A (en) * | 1991-11-20 | 1995-07-25 | Hitachi, Ltd. | Information processing apparatus having a register file used interchangeably both as scalar registers of register windows and as vector registers |
JP2761688B2 (ja) * | 1992-02-07 | 1998-06-04 | 三菱電機株式会社 | データ処理装置 |
US5367650A (en) * | 1992-07-31 | 1994-11-22 | Intel Corporation | Method and apparauts for parallel exchange operation in a pipelined processor |
WO1994027216A1 (en) * | 1993-05-14 | 1994-11-24 | Massachusetts Institute Of Technology | Multiprocessor coupling system with integrated compile and run time scheduling for parallelism |
US5513363A (en) * | 1994-08-22 | 1996-04-30 | Hewlett-Packard Company | Scalable register file organization for a computer architecture having multiple functional units or a large register file |
US5600810A (en) * | 1994-12-09 | 1997-02-04 | Mitsubishi Electric Information Technology Center America, Inc. | Scaleable very long instruction word processor with parallelism matching |
JP3547139B2 (ja) * | 1995-03-17 | 2004-07-28 | 株式会社 日立製作所 | プロセッサ |
JP3526976B2 (ja) * | 1995-08-03 | 2004-05-17 | 株式会社日立製作所 | プロセッサおよびデータ処理装置 |
-
1996
- 1996-03-15 JP JP52828196A patent/JP3547139B2/ja not_active Expired - Fee Related
- 1996-03-15 WO PCT/JP1996/000673 patent/WO1996029646A1/ja not_active Application Discontinuation
- 1996-03-15 US US08/913,840 patent/US6401190B1/en not_active Expired - Lifetime
- 1996-03-15 KR KR1019970706441A patent/KR100325658B1/ko not_active IP Right Cessation
-
2002
- 2002-01-24 US US10/053,683 patent/US6965981B2/en not_active Expired - Fee Related
-
2005
- 2005-09-01 US US11/216,024 patent/US20060053271A1/en not_active Abandoned
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH01133138A (ja) * | 1987-11-19 | 1989-05-25 | Hitachi Ltd | 並列計算機及びその制御方法 |
JPH04299436A (ja) * | 1990-10-05 | 1992-10-22 | Philips Gloeilampenfab:Nv | メモリ回路および機能ユニットのグループを備えた処理装置 |
JPH04308930A (ja) * | 1991-04-05 | 1992-10-30 | Toshiba Corp | 電子計算機 |
JPH0553805A (ja) * | 1991-08-29 | 1993-03-05 | Toshiba Corp | 電子計算機 |
JPH05143333A (ja) * | 1991-11-18 | 1993-06-11 | Toshiba Corp | 並列演算処理装置 |
JPH05233281A (ja) * | 1992-02-21 | 1993-09-10 | Toshiba Corp | 電子計算機 |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009230375A (ja) * | 2008-03-21 | 2009-10-08 | Fujitsu Ltd | 演算装置および演算方法 |
US9513914B2 (en) | 2008-03-21 | 2016-12-06 | Fujitsu Limited | Apparatus and method for processing an instruction that selects between single and multiple data stream operations with register specifier field control |
JP2010073197A (ja) * | 2008-09-19 | 2010-04-02 | Internatl Business Mach Corp <Ibm> | 多重プロセッサ・コア・ベクトル・モーフ結合機構 |
WO2010044242A1 (ja) * | 2008-10-14 | 2010-04-22 | 国立大学法人奈良先端科学技術大学院大学 | データ処理装置 |
JP5279046B2 (ja) * | 2008-10-14 | 2013-09-04 | 国立大学法人 奈良先端科学技術大学院大学 | データ処理装置 |
JP2011086158A (ja) * | 2009-10-16 | 2011-04-28 | Mitsubishi Electric Corp | 並列信号処理装置 |
JP2011145759A (ja) * | 2010-01-12 | 2011-07-28 | Mitsubishi Electric Corp | 並列信号処理プロセッサ |
JP2013545211A (ja) * | 2010-11-01 | 2013-12-19 | クアルコム,インコーポレイテッド | 複数メモリアクセスを伴うdsp/プロセッサ中の記憶バッファをなくすためのアーキテクチャおよび方法 |
Also Published As
Publication number | Publication date |
---|---|
KR19980703033A (ko) | 1998-09-05 |
US6401190B1 (en) | 2002-06-04 |
US20020099924A1 (en) | 2002-07-25 |
US20060053271A1 (en) | 2006-03-09 |
KR100325658B1 (ko) | 2002-08-08 |
JP3547139B2 (ja) | 2004-07-28 |
US6965981B2 (en) | 2005-11-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO1996029646A1 (fr) | Processeur | |
Cronquist et al. | Specifying and compiling applications for RaPiD | |
US10817291B2 (en) | Apparatuses, methods, and systems for swizzle operations in a configurable spatial accelerator | |
US6334176B1 (en) | Method and apparatus for generating an alignment control vector | |
US5996057A (en) | Data processing system and method of permutation with replication within a vector register file | |
US9329866B2 (en) | Methods and apparatus for adapting pipeline stage latency based on instruction type | |
KR100190738B1 (ko) | 데이타 처리 시스템 및 방법 | |
US6061780A (en) | Execution unit chaining for single cycle extract instruction having one serial shift left and one serial shift right execution units | |
EP0427245B1 (en) | Data processor capable of simultaneously executing two instructions | |
US20050216706A1 (en) | Executing partial-width packed data instructions | |
JP2002333978A (ja) | Vliw型プロセッサ | |
JP2010532063A (ja) | 条件命令を無条件命令および選択命令へと拡張する方法およびシステム | |
US7574583B2 (en) | Processing apparatus including dedicated issue slot for loading immediate value, and processing method therefor | |
US20020108026A1 (en) | Data processing apparatus with register file bypass | |
US9021236B2 (en) | Methods and apparatus for storing expanded width instructions in a VLIW memory for deferred execution | |
JP3670668B2 (ja) | データ処理装置 | |
JPH0728761A (ja) | 非対称ベクトルマルチプロセッサ | |
US20030005261A1 (en) | Method and apparatus for attaching accelerator hardware containing internal state to a processing core | |
JP2001515628A (ja) | 極長命令語(vliw)プロセッサ | |
JPH1165844A (ja) | パイプラインバイパス機能を有するデータ処理装置 | |
JPH10105402A (ja) | パイプライン方式のプロセッサ | |
JP2001027945A (ja) | Simd演算を実行するために標準macユニットを利用する浮動小数点ユニット | |
US5815420A (en) | Microprocessor arithmetic logic unit using multiple number representations | |
JP3781519B2 (ja) | プロセッサの命令制御機構 | |
JPH05150979A (ja) | 即値オペランド拡張方式 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): CN JP KR LK SG US VN |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 08913840 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1019970706441 Country of ref document: KR |
|
122 | Ep: pct application non-entry in european phase | ||
WWP | Wipo information: published in national office |
Ref document number: 1019970706441 Country of ref document: KR |
|
WWR | Wipo information: refused in national office |
Ref document number: 1019970706441 Country of ref document: KR |