EP1504342A2

EP1504342A2 - Method and arrangement for power efficient control of processors

Info

Publication number: EP1504342A2
Application number: EP03729889A
Authority: EP
Inventors: Wolfram Drescher; Uwe Porst
Original assignee: Philips Semiconductors Dresden AG
Current assignee: NXP BV
Priority date: 2002-05-14
Filing date: 2003-05-13
Publication date: 2005-02-09
Also published as: AU2003240421A1; DE10221530A1; JP2005525637A; AU2003240421A8; US20080215851A1; WO2003096184A2; US20070150701A1; WO2003096184A3; JP4208149B2

Abstract

The invention relates to a method for the functional control of program and/or data flows in digital signal processors and processors which have respective closed and separated modules for program and data flow control, working in parallel with computers. The aim of the invention is to carry out a power-efficient adaptation of the signal process with the applied SIMD command-type in the individual paths and minimize the emergence of the appearance of NOP-commands with which the VLIW-architecture of the processor must be supplied. This is achieved by individually controlling the parallel signal processing of the processor in the data paths (DP) which respectively belong to the first and second slice. This is carried out by causing a single slice halt outputted from an SSM register bank to switch the register clockline according to state-dependent signal processing.

Description

Method and arrangement for the efficient control of processors

The invention relates to a method for the functional control of the program and / or data flow in digital signal processors and processors, each with separate and separate modules for program and data flow control, which operate in parallel arithmetic units.

With digital signal processors (DSP), processors are becoming increasingly important in which their architecture has a slice structure. In this case, data paths are combined into slices, signal processing being processed in a first slice independently of the signal processing running in parallel in a second slice.

If the SIMD instruction type is used in the parallel arithmetic units of these digital signal processors, the problem with the prior art arises that the algorithms used are often not suitable for parallel signal processing in all slices.

For example, in signal processing in the individual slices, due to the different algorithms used in each case, the results obtained can usually only be provided at different times or after a different number of processor cycles in the respective slice. The regime of command processing compliant with the other SIMD slices can either not be implemented at all or only with great effort.

On the one hand, this high effort is required in software as additional programs to be processed, which organize the different waiting times for the slices in order to realize the parallel provision of the results.

On the other hand, this high expenditure in hardware occurs as a strong processor and memory utilization, which reduces the processor performance. This reduction can e.g. be averted by a memory expansion, which means an increase in hardware expenditure.

In the prior art, it proves to be a disadvantage that, in order to adapt the algorithms to the SIMD instruction type during signal processing, primarily in the slices with their associated data paths, these slices and other associated VLIW architecture of the processor to a considerable extent with no operation Commands (NOP) must be supplied.

In this way, the performance-increasing effects of using the SIMD instruction type are not only rendered ineffective, but additional hardware and software expenditure is also required for adapting the algorithms.

The task according to the invention thus consists in realizing a power-efficient individual adaptation of the signal processing for the SIMD instruction type used in the individual data paths and in particular in minimizing the occurrence of NOP instructions with which the VLIW architecture of the processor are supplied.

The task according to the invention is achieved in that the SIMD implemented by the PCU Commands parallel signal processing of the processor in a respective data path (DP) of a first and second slice is individually controlled by a "single-slice half" state output by an SSM register bank for each slice.

The controlling effect of the output "single slice half" state is achieved in that the bits of the SSM register bank assigned for the first and second slice switch the register clock supply via the respectively associated first and second gated clock cell ,

As a result, the associated input register and / or accumulator and / or pipeline control register is temporarily stopped, depending on the state of the signal processing occurring in the slice of the data path.

This function is only released when another SIMD command is implemented when the "single slice stop" status is no longer output.

Regardless of the "single slice half" state, the register file unit (RFU) and the memory access register of the processor remain functional. The SSM register bank of the PCU can be written to by the PCU at any time.

The aim of this solution is to start the calculations in parallel in the slices of the data paths of the processor in accordance with the SIMD command type.

However, due to the different calculation processes, the intermediate and / or final results are provided in the slices at different times in the pipeline control registers, accumulators or result registers of the associated data paths.

Thus, after the provision of the intermediate and / or final further signal processing in the data paths associated with the individual slices, which is no longer relevant, is prevented.

Signal processing is continued in parallel in all data paths of the slices when another SIMD command is started to be processed.

A supplementary embodiment of the solution of the task according to the invention is that the clock supply for the VLIW unit is controlled by a software-related status output from the program flow of the processor in such a way that partial instruction words that are currently available in the VLIW unit are subsequently used in the latter Multiple use can be provided on the functional units.

This solution according to the invention is advantageously effective if a necessary adaptation of the algorithm to the SIMD command type during signal processing requires that the data paths or the associated VLIW architecture of the processor with no-operation commands (NOP) or the like Instructions with a high repetition rate must be supplied. The avoided generation of the same VLIW reduces the storage space consumption and keeps the processor's computing load low, so that the computing power is efficiently available for the important calculations.

An advantageous variant of the additional embodiment of the solution according to the invention is that the generation of further VILWs in the VLIW unit is interrupted by the PCU being announced a VLIW-WAIT command via a distant signal line and this command to the PCU in the next cycle is applied, the PCU subsequently switching the clock supply for the VLIW unit by means of a “VLIW-WAIT” signal line and a third gated clock cell. b

The aim of this solution is that debug routines can be implemented in software tests by setting and starting software break points in the program code.

The invention will be explained in more detail below on the basis of an exemplary embodiment for the output of a single slice stop state. In the drawing figure there is a block diagram of the processor in which the parts with the associated functional units are listed which relate to the solution according to the invention.

In the event that the output of the "single slice stop" state is effective, it is a prerequisite that a SIMD command is output by the VLIW unit 2 via the SIMD control bus 12. This single SIMD instruction triggers multiple data processing in the respective data path 14 of the first and second slice 18; 19 out.

The results are provided in the associated accumulator 8 at different times. In this case, one for the first and second slice 18; 19 assigned bit of the SSM register bank 13 set.

The signal assignment of this bit is at the first and second slice 18; 19 respectively associated data path 14 via the first and / or second gated clock cell 3; 4 fed and controls the signal processing in the first and second slice 18; 19 individually, in that if there is a result in this slice, the clock supply at the associated input register and thus also the signal processing is prevented.

When another SIMD command is issued on the SIMD control bus 12, for example after the last result generated in one of the slices has been provided, the respective bit of the SSM register bank 13 is reset and all data paths begin Next signal processing by reading the data provided by the RFU 11 at their input registers.

The signal processing in the individual slices of the data paths 14 is thus advantageously adapted to the requirements of parallel processing of the SIMD commands.

Method and arrangement for the efficient control of processors

Processor VLIW unit (very long instruction word) first gated clock cell second gated clock cell AGU (address generating unit) PCU (process controlling unit) clock supply line accumulator further processing unit (with gated Clock cell) Register of the further processing unit RFU (register file unit) SIMD control bus SSM register bank (single slice mode) Data path SIMD data path control line Distortion signal line VLIW-WAIT signal line first slice second slice third gated Clock-cell

Claims

Method and arrangement for the efficient control of processors

1. Method for the functional control of the program and / or data flow in digital signal processors and processors, each with separate and separate modules for program and data flow control, which work in parallel arithmetic units, characterized in that as a result of the PCU (6) implemented SIMD commands the parallel signal processing of the processor (1) in a to the first and second slice (18); (19) each associated data path DP (14) is controlled individually by a "single-slice-stop" state output by an SSM register bank (13), the controlling effect of the output "single-slice-stop" state thereby is achieved by the bits of the SSM register bank (13) assigned for each slice via the respective first and second gated clock cells (3); (4) switch the register clock supply and, depending on the state of the signal processing involved, in the DP (14) belonging to the respective slice the assigned input register and / or accumulator and / or pipeline

Control register in its function is stopped in the meantime and this function is only released again after the output of the "single slice stop" state as a result of the implementation of a further SIMD command, that the register file unit (RFU) (11) and the memory access register of the processor (1) remain functional, and the SSM register bank (13) of the PCU (6), regardless of the output "single slice half" state can be written to at any time by the PCU.

2. Method for the functional control of the program and / or data flow in digital signal processors and processors, each with separate and separate modules for program and data flow control, which work in parallel arithmetic units, characterized in that the clock supply for the VLIW unit ( 2) is controlled by a software-related status output from the program flow of the processor (1) in such a way that partial instruction words that are currently available in the VLIW unit (2) are then made available in this for multiple use on the functional units.

3. The method according to claim 2, characterized in that the generation of further VILW in the VLIW unit (2) is interrupted by the PCU (6) announcing a VLIW-WAIT command via a distant signal line (16) and in the next cycle this command is applied to the PCU (6), the PCU (6) subsequently using a "VLIW-WAIT" signal line (17) and a third gated clock cell (20) to supply the clock for the VLIW unit (2) switches.