WO2006016451A1

WO2006016451A1 - Arithmetic pipeline, processing method by arithmetic pipeline, semiconductor device, and computer program

Info

Publication number: WO2006016451A1
Application number: PCT/JP2005/011482
Authority: WO
Inventors: Junichi Naoi
Original assignee: Sony Computer Entertainment Inc.
Priority date: 2004-08-10
Filing date: 2005-06-16
Publication date: 2006-02-16
Also published as: JP2006053652A

Abstract

An arithmetic pipeline capable of suppressing the power consumption while minimizing the deterioration of performance of the conventional serial arithmetic pipe line. The arithmetic pipeline (1) comprises a SALP (2), in which SALCs (3) capable of varying the performances of components dynamically and re-timing circuits (5) for re-timing or passing through the processed results by the SALCs (3) are alternately cascaded, and a pipeline controller (4) for generating control data to be re-timed or passed through by the re-timing circuits (5), for every re-timing circuits (5). The pipeline controller (4) feeds control data so that a predetermined number of re-timing means may re-time the processed results.

Description

Arithmetic pipeline, processing method by arithmetic pipeline, semiconductor device, computer program

Technical field

The present invention relates to a semiconductor device, and more particularly to a technique for suppressing power consumption of an arithmetic pipeline by the semiconductor device.

Light

Fine 1

Background of the Invention

book

In semiconductor devices, elements such as transistors are becoming finer and higher in density due to advances in microfabrication technology. For this reason, semiconductor devices with large-scale and highly functional electronic circuits are being manufactured while reducing the chip area. Due to the larger scale of semiconductor devices, it is necessary to operate a larger number of elements than before. In semiconductor devices, power consumption increases due to the operation of a large number of elements, and as a result, heat is generated that makes it difficult to use in a consumer.

Such heat generation due to increased power consumption of semiconductor devices may be dealt with by reducing the operating voltage. When reducing the operating voltage, it is necessary to lower the threshold voltage to guarantee the operation of the transistor itself. However, lowering the threshold voltage has another problem of increasing leakage current. In addition, it is predicted that further miniaturization will progress in the future, and it is predicted that the leakage current will be larger than the operating current in the countermeasures that reduce the operating voltage.

Recently, due to advances in substrate bias technology, in addition to lowering the operating voltage, power consumption is reduced by changing the substrate bias. By changing the substrate bias, the threshold voltage can be controlled, and the leakage current of the transistor that is stopped by the substrate bias switch can be suppressed. In the method based on voltage control such as reducing the operating voltage, various problems have become apparent as described above, making it difficult to solve all the problems. In addition, even if board bias technology is used, the situation that is not an actual operation such as sleep mode is mainly improved in terms of architecture.

The applicant of the present application has applied for an invention relating to a serial arithmetic pipeline, as disclosed in Japanese Patent Application Laid-Open No. 2000-296-96. Such serial operation pipelines are suitable for applications that require discrete operations that suddenly and explode, such as computer graphics that represent frequently moving objects, such as instruction fetch (fe tch), It is intended to speed up processing by sequentially performing separate tasks such as instruction decoding and execution, and cascades arithmetic units that handle a small number of instructions. Composed. Various operations such as addition / subtraction, floating-point operation, comparison, Boolean algebra, selection (IF statement), etc. can be realized by appropriately changing the combination of multiple arithmetic units connected in cascade.

Such serial computation pipelines tend to consume more power because they use a large number of computing units.

An object of the present invention is to provide a technique for suppressing the power consumption of such a conventional arithmetic pipeline while minimizing the deterioration of the performance. Disclosure of the invention

An arithmetic pipeline according to the present invention that solves the above-described problems includes a processing unit capable of dynamically changing processing capacity and a retiming unit for retiming or squeezing input data alternately. And a pipeline controller that supplies control data for retiming or passing through the processing results of the processing means connected to the previous stage to each retiming means. The pipeline controller supplies control data to each retiming means so that a predetermined number of retiming means perform retiming of processing results and the remaining retiming means pass through the processing results. The latency of the deductive mechanism is determined according to the number of retiming means that perform retiming. It is configured to Mel so.

Electrical circuits usually consume less power as processing power decreases. above In the present invention, the number of retiming means for performing retiming is determined by control data supplied from the pipeline controller to the arithmetic mechanism. Retiming increases latency. By controlling the number of retiming means that perform retiming, the latency of the calculation mechanism can be controlled, so that power consumption can be reduced while minimizing performance changes.

For example, the pipeline controller is configured to generate a plurality of the control data according to the processing capability of each processing means. In this case, the latency of the calculation mechanism changes according to the change in the processing capability of the processing means. For example, when the processing means is configured to change the processing capability in accordance with changes in the supplied operating voltage value and Z or substrate bias value, the pipeline controller The plurality of control data is generated according to Z or the substrate bias value.

In such a configuration, the pipeline controller dynamically changes the latency and threshold voltage according to the operating voltage value and Z or substrate bias value actually supplied to the processing means, thereby changing the performance of the processing device. Power consumption can be minimized while suppressing

For example, when each processing means is configured by an element capable of changing performance, and the processing capacity is dynamically changed by the change in the performance of the element, the pipeline controller For example, the control data is configured to be generated according to the performance of the element.

For example, in the case where the element constituting the processing unit is configured to change its processing capability in accordance with a change in supplied operating voltage value and Z or substrate bias value, the pipeline controller The plurality of control data is generated according to the operating voltage value and Z or the substrate bias value. Even in such a configuration, the pipeline controller dynamically changes the latency and the threshold voltage in accordance with the operating voltage value and / or the substrate bias value that is actually supplied to the elements of the processing means. Various costs Electric power can be minimized while suppressing changes in performance. For example, the pipeline controller is configured to generate the plurality of control data as alternating data or data having a constant value. At this time, the retiming means is configured to retime the processing result when the control data is alternating, and to slew the processing result when the control data takes a constant value. The

For example, the retiming means retimes the processing result according to the change of “1” and “0” of the control data.

A processing method using an arithmetic pipeline according to the present invention includes an element that can change performance, and a processing capacity that dynamically changes in accordance with the change of the element, and a processing result by the processing apparatus. Retiming or slewing retiming circuits are connected to arithmetic mechanisms that are cascaded alternately, and each retiming circuit is to retime or slew processing results from the processor connected in the previous stage. This method is executed by an apparatus that supplies the control data. The control for each retiming means is performed so that a predetermined number of retiming circuits perform retiming of processing results and the remaining retiming circuits slew processing results according to the performance of the element. Generate data.

The semiconductor device of the present invention is configured by an element capable of changing performance, and a processing unit whose processing capability dynamically changes according to the change of the element, and a processing result by the processing unit is retimed. In addition, each retiming circuit is connected to an arithmetic mechanism that is cascade-connected with each other, and each retiming circuit re-times or slews the processing result of the processor connected in the previous stage. In accordance with the performance of the element, a predetermined number of retiming circuits perform retiming of processing results, and the remaining retiming circuits pass through the processing results. And means for generating the control data for each retiming means.

The computer program of the present invention is composed of an element capable of changing performance, and a processor whose processing capability dynamically changes in accordance with the change of the element, and a processing result by the processor is retimed or And a retiming circuit that passes through is connected to an arithmetic mechanism that is cascade-connected alternately, Depending on the performance of the element, a predetermined number of retiming circuits reprocess the processing results to a computer that supplies control data for retiming or slewing the processing results of the processing unit connected to the previous stage to the processing circuit This is a computer program for executing the processing for generating the control data for each retiming means so that the timing is executed and the remaining retiming circuit passes through the processing result. Brief Description of Drawings

FIG. 1 is a diagram showing a configuration example of an arithmetic pipeline to which the present invention is applied.

Fig. 2 is a diagram showing SALP when SALC is connected in 8-stage cascade. FIG. 3 is a diagram showing S A L P when S A L C is connected in a four-stage cascade connection. Fig. 4 is an example of SAL C.

FIG. 5 is an illustration of a pipeline controller.

FIG. 6 is a diagram for explaining processing using the arithmetic pipeline.

FIG. 7 is a diagram for explaining processing by SALC.

FIG. 8 is a diagram for explaining processing by SALC.

FIG. 9 is a diagram for explaining processing by SALC.

FIG. 10 is a diagram for explaining processing by SALC.

FIG. 11 is a diagram for explaining processing using the arithmetic pipeline.

FIG. 12 is a diagram for explaining processing using the arithmetic pipeline.

FIG. 13 is a diagram for explaining processing using the arithmetic pipeline.

FIG. 14 is a diagram for explaining processing using the arithmetic pipeline.

FIG. 15 is a diagram for explaining processing using the arithmetic pipeline.

FIG. 16 is a diagram for explaining processing using the arithmetic pipeline.

FIG. 17 is a diagram for explaining processing using the arithmetic pipeline.

FIG. 18 is a diagram for explaining processing using the arithmetic pipeline. BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, embodiments of the present invention will be described in detail. In this embodiment, a serial operation pipeline, which is an example of an operation mechanism, is configured with an architecture that is resistant to latency fluctuations, and the latency is dynamically controlled. In addition, the threshold voltage is dynamically controlled by substrate bias technology. By dynamically changing the latency and threshold voltage without changing the operating frequency, power consumption can be controlled while minimizing performance degradation.

The latency of the serial operation pipeline is determined by the number of times retiming is performed on the output of the cascade connected arithmetic units in the serial operation pipeline. For example, if retiming is performed once with cascaded computing units, the latency is 2. The latency is 3 if you retime twice. In the present invention, the latency is varied by dynamically changing the number of times retiming is performed in the serial operation pipeline.

FIG. 1 is a diagram showing a configuration example of an arithmetic pipeline 1 to which the present invention is applied. This arithmetic pipeline 1 includes a serial arithmetic pipeline (Serial-ALPipeline, hereinafter referred to as “SALP”) in which a plurality of serial arithmetic logic operation circuits (Serial ALCelK, hereinafter referred to as “SALC”) 3 are cascade-connected. 2 is installed, and a pipeline controller 4 that supplies control data for adjusting the number of stages of SAL C 3 connected in cascade is connected to SALP2. The SALP 2 is supplied with the operating voltage of each element in the SAL C 3 from the operating voltage supply device 6 and is supplied with the substrate bias of each element in the SAL C 3 from the substrate bias supply device 7.

In FIG. 1, a single SAL P 2 is provided for ease of explanation, but a plurality of SALP 2 may be provided. When the arithmetic pipeline 1 includes a plurality of SA LPs 2, control data is supplied from the pipeline controller 4 to each SAL P 2, and operating voltage is supplied from the operating voltage supply device 6 to each SAL P 2. The substrate bias is supplied to the SALP 2 from the substrate bias supply device 7.

SALP 2 implements various multi-bit arithmetic instructions by combining simple serial instructions. SALP2 has multiple SAL Cs connected in cascade 3 and a retiming circuit 5 provided after the SALC 3 for latching and retiming output data from the SALC 3. The SA LC 3 and the retiming circuit 5 are cascade-connected alternately.

SALP 2 shown in FIG. 1 has nine SALCs 3 that can be connected to a nine-stage cascade, and a retiming circuit 5 is provided after each S ALC 3.

The retiming circuit 5 is configured to determine whether the output data from the SALC 3 is latched and retimed or through according to the control data supplied from the pipeline controller 4. When retiming the output data of SALC 3, the retiming circuit 5 latches the output data of SALC 3 and outputs the latched output data at the next operation timing.

When SALC 3 output data is passed through, SALC 3 output data is output as it is without being latched. To the same operation timing. Such a retiming circuit 5 can realize the same function as the latch circuit, for example, can be configured using FF (Flip-flop). It is configured by using the same number of FFs as the number of SALC3 output terminals.

When all the retiming circuits 5 are in the through state, the SALP 2 is configured by connecting the 9 stages of SALC 3 in cascade. In this case, when data is input to SALC 3 at the 1st stage, the data is output from SALC 3 at the 9th stage, so the latency is “1”. When the number of retiming circuits 5 that perform retiming in this way is “0”, the latency is “1”. When only the retiming circuit 5 provided between the 4th stage and the 5th stage SALC3 from the data input side is in the retiming state, this SAL P 2 is connected to the SALC with 4 stage power scale connection. It consists of SALC 3 connected in cascade with 3 and 5 stages. When data is input to the first-stage SALC 3, the data is output from the fourth-stage SALC 3, and the output data is latched by the retiming circuit 5. When data is input from the retiming circuit 5 to the fifth stage SALC 3 at the next operation timing, the data is output from the ninth stage SAL C 3. In this way, from the input of data to the first stage SALC 3, the ninth stage S The latency is “2” because two operation timings are required until data is output from ALC3. When the number of retiming circuits 5 that perform retiming is “1”, the latency is “2”.

In this way, by providing the retiming circuit 5 between each SALC 3, and determining which retiming circuit 5 is to retime the output data, the number of cascaded SAL C 3 stages By adjusting, the latency can be controlled freely.

If the interval of the retiming circuit 5 that performs retiming by increasing the number of stages of SAL C 3 is increased (the latency is shortened), the critical path will be extended, and the elements used for SALC 3 will need high performance. . Conversely, if the interval of the retiming circuit 5 that performs retiming by reducing the number of stages of SALC3 is shortened (the latency is increased), the elements used for SAL C 3 need only have low performance. Power consumption can be reduced by using low-performance elements.

In this embodiment, the performance of S A L C 3 is varied by varying the performance of the device by varying the operating voltage and / or substrate bias.

The operating voltage supply device 6 is a device that supplies an operating voltage to the elements constituting the S AL C 3. The supplied operating voltage value is variable, and can be operated from outside the operating voltage supply device 6. The operating voltage value may be manipulated artificially, but for example, it may be manipulated by a control device not shown in the figure so that appropriate element performance can be obtained according to the content processed by SAL P 2. Also good.

The substrate bias supply device 7 is a device that supplies substrate bias to the elements constituting the S A L C 3. The substrate bias value to be supplied is variable and can be operated from the outside of the substrate bias supply device 6. The operation of the substrate bias value may be performed artificially as with the operating voltage supply device 6. For example, control outside the figure is performed so that appropriate element performance can be obtained according to the content processed by SALP 2. You may make it operate with an apparatus.

The performance of the element used for the SALC 3 can be changed by changing the operating voltage value with the operating voltage supply device 6 or changing the substrate bias value with the substrate bias supply device 7. Board via in forward or reverse direction By changing the bias voltage or changing the substrate bias value, the threshold voltage of the element used for SALC3, the amount of leakage current, etc. are changed, thereby changing the device performance.

Applying a substrate bias in the reverse direction can increase the threshold voltage and reduce the leakage current. However, the gate switching performance is degraded. Therefore, reduce the number of cascaded SAL C 3 stages so that they can be operated at the same clock frequency.

Applying a substrate bias in the forward direction lowers the threshold voltage and improves performance. For this reason, the number of cascaded SAL C 3 stages can be increased. However, when the substrate bias is applied in the forward direction, if the substrate bias value is increased too much, latch-up occurs. Therefore, it is necessary to keep the substrate bias value within a range where latch-up does not occur.

FIGS. 2 and 3 are diagrams respectively showing the case where S ALC 3 is connected in 8-stage cascade and the case where 4-stage cascade is used. In Fig. 2, SALC3 is in retiming circuit 5 for every 8 stages. In Fig. 3, SALC3 is in the retiming state of retiming circuit 5 every four stages. The other retiming circuit 5 is in the through state. In Fig. 2, the number of stages where SAL C 3 is cascaded is larger than in Fig. 3, and the number of retiming times is reduced, resulting in shorter latency. 2 has a longer retiming interval than FIG. 3, so the element used in SAL C 3 in FIG. 2 has higher performance than the element used in SAL C 3 in FIG.

Figure 4 shows an example of SAL C 3 configuration. As can be seen from Fig. 4, SAL C 3 according to this embodiment has three data input terminals Dli, D2i, D3i and three data output terminals Dlo, D2o, D3o. A forward line is formed to output three lines of data from to the next stage (right side of the figure). Since all lines are connected in the forward direction, there is no need to retiming between all SAL P 3, and as many SAL P 3 as can be processed in one cycle can be inserted. In the following explanation, the data on the line output from the data output terminal Dlo is referred to as “output data”, and the data on the line output from the data output terminal D2o The evening is referred to as “reference data”, and the data on the line output from the data output terminal D3o is referred to as “enable evening”.

The SALC 3 also decodes the contents of the instruction input from the instruction input terminal CON, executes a process according to the decoded result, and selects a decoder 31 for selecting a line for outputting the execution result. Have. Examples of processing include control processing such as path control, latch control, and conditional instruction in addition to arithmetic processing such as four arithmetic operations and logical operations. By cooperating with other SAL C 3 decoders, macro instructions can be executed. The command is input from a command input device not shown. This command input device inputs a predetermined command to each S A L C 3 according to the processing executed in S A L P 2.

The decoder 31 is connected to various latch circuits for facilitating the above operations, that is, a shift latch circuit 33, an enable latch circuit 34, and a carrier latch circuit 32. Yes. The shift latch circuit 33 latches the reference data so that the reference data line is delayed by a predetermined time from the output line, and outputs this in the next digit, for example, in the operation. To work. The carry latch circuit 32 latches the carry of the calculation result until the next digit is calculated. The enable latch circuit 34 latches the enable data input from the previous stage S A L C 3. Enable Day is a data for instructing Enable / Disable for the processing executed by the decoder 31.

The pipeline controller 4 in FIG. 1 receives latency data and a clock for determining the retiming interval of S A L P 2. Latency overnight depends on the contents of the processing by SALP 2 and the performance of the elements used in SALC 3.For example, it depends on the substrate bias value supplied from the substrate bias supply device 7 and the operation voltage supply device 6, the operation voltage value, etc. Determined. The pipeline controller 4 generates the same number of control data as the retiming circuit 5 based on these data. Each control data is supplied to the corresponding retiming circuit 5.

FIG. 5 is an exemplary diagram of such a pipeline controller 4. The pipeline controller 4 includes a latency register 41 and the same number of control data generators 42 as the retiming circuit 5. In this embodiment, the pipeline Although an example in which the in-controller 4 is configured by hardware such as a semiconductor device is shown, it may be configured in software by causing a predetermined CPU to execute the computer program of the present invention.

For example, the latency register 41 derives a stage value representing the number of stages of S A L C 3 connected in cascade from the latency data. The latency register 4 1 sends the derived stage value to the control data generation unit 4 2. Note that the latency data may be used as a step value as it is.

The control data generation unit 4 2 includes a subtractor 4 3, a selector 4 4, a discriminator 4 5, and an OR circuit 4 6, and generates control data from the stage values sent from the clock and latency register 4 1. To do.

The subtractor 4 3 is configured to subtract 1 from the input value. The step value is input from the latency register 4 1 to the subtracter 4 3 of the control data generation unit 42 in the first stage. The output of the selector 4 4 of the control data generation unit 42 in the previous stage is input to the subtracter 4 3 of the control data generation unit 42 in the second and subsequent stages.

The discriminator 44 is configured to discriminate whether or not the subtraction result by the subtractor 43 is “0” and input the discrimination result to the selector 45 and the OR circuit 46.

The selector 45 is configured to output either the subtraction result from the subtractor 43 or the stage value sent from the latency register 41 according to the determination result from the discriminator 44. The selector 4 5 outputs the stage value sent from the latency register 4 1 when the discrimination result from the discriminator 4 4 indicates that the subtraction result by the subtractor 4 3 is “0”. When it indicates that it is not “0”, the result of subtraction by the subtractor 4 3 is output.

The 0R circuit 46 is configured to perform 0R operation on the clock and the discrimination result from the discriminator 44 and to input the output to the retiming circuit 5 as control data. The OR circuit 4 6 outputs control data as alternating data when the discrimination result from the discriminator 4 4 indicates that the subtraction result by the subtracter 4 3 is “0”. When it indicates that it is not, a constant value of control data is output.

For example, if the discriminator 4 4 is “0” as the result of subtraction by the subtractor 4 3, If “1” is output and the determination result “0” is output unless it is “0”, the selector 45 and the OR circuit 46 operate as follows.

The selector 45 outputs the stage value output from the latency register 41 when the determination result from the determiner 44 is “1”, and outputs the subtraction result from the subtractor 43 when it is “0”.

The OR circuit 46 inverts and receives the discrimination result from the discriminator 44. In other words, “1” is input to the OR circuit 46 when the determination result is “0”, and “0” is input when the determination result is “1”. When the discrimination result from the discriminator 44 is “1”, “0” is inverted and “0” and “1” of the clock are output as control data as they are. When the discrimination result from the discriminator 44 is “0”, it is inverted and “1” is inputted, and a constant value “1” is outputted as control data.

By such a control data generation unit 42, control data such that SAL C 3 having the number of stages corresponding to the stage value is cascade-connected is supplied from the pipeline controller 4 to the SALP 2. For example, when the step value is “4”, the control data output from the control data generator 42 up to the third level is a constant value, and is output from the control data generator 42 of the fourth level. The control day will be a police box evening.

The SALP2 retiming circuit 5 is in the through state when the control data is a constant value (for example, “1”), and is in the retiming state when the control data is alternating data. When the stage value is “4”, the control data input to every four retiming circuits 5 is alternating data, and the others are constant control data.

Next, processing using such an arithmetic pipeline 1 will be described. Figure 6 shows SALP 2 for the compute pipeline 1 used. In SALP 2, S ALC 3 has 6 stages, the retiming circuit 5 between the 4th stage and the 5th stage SALC 3 is in the retiming state, and the other retiming circuit 5 is in the through state. The Since the retiming circuit 5 in the through state is equivalent to a simple wiring, the illustration is omitted. To distinguish each S ALC 3, the first SALC is S ALC3A, and the second and subsequent stages are S ALC 3 B, SALC 3C, SAL C3D, SALC 3 E, and SALC 3F.

SALC 3 uses the command input from the command input terminal CON as follows. Do it.

i) The decoder 31 adds the data A inputted to the data input terminal Dl i and the reference data B inputted to the data input terminal D2i, and sends the result to the data output terminal Dlo. The enable signal C input to the data input terminal D3i is sent to the data output terminal D3o and is latched by the enable latch circuit 34. The decoder 31 performs the above addition process according to the enable data C sent to the data output terminal D3o. The reference data B 1 latched in the shift latch circuit 3 3 is sent to the data output terminal D2o, and the new reference data B input to the data input terminal D2i is transferred to the shift latch circuit 3 3 via the decoder 31. Sent. The shift latch circuit 33 latches the new reference data B that has been sent (FIG. 7).

i i) The decoder 31 adds the data A input to the data input terminal Dli and the reference data B input to the data input terminal D2i, and sends the result to the data output terminal Dlo. The enable signal C input to the data input terminal D3i is sent to the data output terminal D3o. The decoder 31 performs the above addition process according to the enable data C 1 latched in the enable latch circuit 34. The reference data B 1 latched in the shift latch circuit 33 is sent to the data output terminal D2o, and the new reference data B input to the data input terminal D2i is transferred to the shift latch circuit 33 via the coder 31. Sent. The shift latch circuit 33 latches the new reference data B that has been sent (FIG. 8).

iii) The decoder 31 sends the data A input to the data input terminal Dl i as it is to the data output terminal Dlo. Enable device C input to data input terminal D3i is sent to data output terminal D3o. In accordance with the enable data C 1 latched in the enable latch circuit 34, the decoder 31 sends the data A input to the data input terminal Dli as described above to the data output terminal Dlo. The reference data B 1 latched in the shift latch circuit 3 3 is sent to the data output terminal D2o, and the new reference data B input to the data input terminal D2 i is transferred to the shift latch circuit 3 3 via the decoder 31. Sent to. The shift latch circuit 33 latches the new reference data B that has been sent (FIG. 9). iv) The decoder 31 sends the data A input to the data input terminal Dli as it is to the output terminal Dlo. The enable data C input to the data input terminal D3i is sent to the data output terminal D3o and is latched by the enable latch circuit 34. The decoder 31 sends the data A input to the data input terminal Dli to the data output terminal Dlo as described above according to the enable signal C sent to the data output terminal D3o. The reference data B 1 latched in the shift latch circuit 33 is sent to the data output terminal D2o, and the new reference data B input to the data input terminal D2i is sent to the shift latch circuit 33 via the decoder 31. It is done. The shift latch circuit 33 latches the new reference data B that has been sent (FIG. 10).

In the processes i) to iv) described above, the decoder 31 adds the data input to the data output terminal Dli or the data input terminal Dlo according to the data output terminal D3o or the enable data of the enable latch circuit 34. Will be sent to. In the following explanation, when the enable data is “1”, the decoder 31 performs an addition process, and when it is “0”, the decoder sends the data input to the data input terminal Dli as it is to the data output terminal Dlo. To do.

Figure 11 shows the data input to the data input terminals Dli, D2i, and D3i. Data “100000” is input to the data input terminal Dli, reference data “010000” is input to the data input terminal D2i, and enable data “110000” is input to the data input terminal D3i in this order. As a result, the operation (2X 3 + 1) can be executed. Each S ALC 3 A to F executes any of the above processes i) to iv) according to the instruction input from the instruction input terminal C0N, and the solution of the operation is the data output terminal of SAL C 3 at the final stage. Output from Dlo.

Figure 12 shows the state in which the first data (1,0,1) is input to the first SAL C 3 A in the first cycle (0 cycle). When data (1,0,1) is input, SALC3A executes the process i) according to the instruction input from the instruction input terminal C0N. (1,0,1) is output to the data output terminals Dlo, D2o, and D3o of S ALC 3 A. When data (1,0,1) is input from SALC3A, SALC3B executes the process of iii) according to the instruction input from instruction input terminal C0N. SALC3B Day (1,0,1) is output to the evening output terminals Dlo, D2o, and D3o. The third and fourth stage SA LC3 C and D perform the same processing as SALC 3 B. In this cycle, no data is sent after the retiming circuit 5, and the initial state remains unchanged.

Figure 13 shows the state in which the second delay (0,1,1) is input to the first stage SALC 3 A in the first cycle. When data (0, 1, 1) is input, SALC 3 A executes the process ii) according to the instruction input from the instruction input terminal CON. (1,0,1) is output to the data output terminals Dlo, D2o, and D3o of SALC 3 A. When the data (1,0,1) is input from SALC 3 A, SALC 3 B executes the process i) according to the instruction input from the instruction input terminal CON. (1,0,1) is output to the data output terminals Dlo, D2o, and D3o of SALC 3 B. SALC 3 C executes the process of iii) according to the command input from the command input terminal CON when the data (1, 0, 1) is input from SALC3B. (1, 0, 1) is output to the data output terminals Dlo, D2o, and D3o of SALC 3 C. SALC 3D performs the same processing as SALC 3 C.

The retiming circuit 5 is supplied with data (1,0,1) from SALC 3D. The retiming circuit 5 sends the supplied data to the SALC 3 E at the next stage according to the control data supplied from the pipeline controller 4. When data (1,0, 1) is input from the retiming circuit 5, the SALC 3 E executes the process of iii) according to the instruction input from the instruction input terminal C0N. (1,0,1) is output to the output terminals Dlo, D2o, and D3o of SALC 3 E. SALC 3 F performs the same processing as SALC 3 E. “1” is output from the data output terminal Dlo of S ALC 3 E.

Figure 14 shows the state in which the third data (0, 0, 0) is input to the first stage SALC 3 A in the second cycle. When data (0,0,0) is input, SALC 3 A executes the process ii) according to the instruction input from the instruction input terminal C0N. (0,1,0) is output to the data output terminals Dlo, D2o, and D3o of SALC 3 A. When data (0,1,0) is input from SALC 3 A, SALC 3 B executes the process ii) according to the instruction input from the instruction input terminal C0N. (1,0,0) is output to the data output terminals Dlo, D2o and D3o of SALC 3 B. When SALC 3 C data (1,0,0) is input from SALC 3 B force, it is determined by the command input from command input terminal C0N. Perform step iv). (1, 0, 0) is output to the data output terminals Dlo, D2o, and D3o of S ALC 3 C. SALC3D executes the processing of iii) according to the command input from the command input terminal CON when the data (1,0,0) is input from SALC 3C. (1,0,0) is output to the data output terminals Dlo, D2o, and D3o of SA LC 3D.

In the retiming circuit 5, the output of the SALC3D of the previous cycle is latched as it is. In this case, (1,0,1) is latched. (1,0,1) latched in the retiming circuit 5 is input to the data input terminals Dli, D2i, and D3i of the SALC 3 E subsequent to the retiming circuit 5. SALC3E executes the processing of iii) according to the command input from the command input terminal CON when the data (1,0,1) is input from the retiming circuit 5. (1, 0, 1) is output to the output terminals Dlo, D2o, and D3o of S ALC 3 E. When data (1, 0, 1) is input from SAL C 3 E, SALC3F executes the process iii) according to the instruction input from the instruction input terminal CON. “1” is output from the data output terminal Dlo of SALC 3 E.

Figure 15 shows the state in which the fourth data (0, 0, 0) is input to the first stage SALC3A in the third cycle. When data (0,0,0) is input, SALC 3 A executes the process ii) according to the instruction input from the instruction input terminal C0N. (0,0,0) is output to the data output terminals Dlo, D2o, and D3o of SALC 3 A. SALC 3 B executes the processing of ii) according to the instruction input from the instruction input terminal C0N when the data (0,0,0) is input from SALC 3 A. (0,1,0) is output to the data output terminals Dio, D2o, D3o of SALC 3 B. SALC3C executes the process of iii) according to the command input from the command input terminal C0N when the data (0, 1, 0) is input from SALC3B. (0,0,0) is output to the data output terminals Dlo, D2o, and D3o of SALC 3 C. When data (0,0,0) is input from SALC 3C, SALC 3D executes the process of iv) according to the instruction input from instruction input terminal C0N. (0,0,0) is output to the data output terminals Dlo, D2o, D3o of SALC 3D. In the retiming circuit 5, the SALC 3D output of the previous cycle is latched as it is. In this case, (1,0,0) is latched. (1, 0, 0) latched by the retiming circuit 5 is input to the data input terminals Dli, D2i, and D3i of the SALC 3 E at the subsequent stage of the retiming circuit 5. SALC3E is a retiming circuit When data (1,0,0) is input from 5, the process of iii) is executed by the instruction input from the instruction input terminal CON. (1,0,0) is output to the output terminals Dlo, D2o, and D3o of S ALC 3 E. When data (1,0,0) is input from SAL C 3 E, SALC3F executes the process of iii) according to the instruction input from the instruction input terminal CON. 3 Eight. “1” is output from the data output terminal 010 of 3 £.

Figure 16 shows the state in which the fifth data (0, 0, 0) is input to the first stage SALC3A in the fourth cycle. When the data (0,0,0) is input, SAL C 3 A executes the process ii) according to the instruction input from the instruction input terminal CON. (0,0,0) is output to the data output terminals Dlo, D2o, and D3o of SALC 3 A. When the data (0,0,0) is input from SALC3A, SALC 3 B executes the process ii) according to the instruction input from the instruction input terminal CON. (0, 0, 0) is output to the output terminals Dlo, D2o, D3o of SALC3B. When data (0, 0, 0) is input from SALC3B, SALC3C executes the process of iii) according to the instruction input from instruction input terminal C0N. (0,1,0) is output to the data output terminals Dlo, D2o, and D3o of S AL C 3 C. When data (0,1,0) is input from SALC3C, SALC 3D executes the process iii) according to the command input from command input terminal C0N. (0, 0, 0) is output to the output terminals Dlo, D2o, and D3o of S ALC 3D.

In the retiming circuit 5, the output of the SAL C 3D of the previous cycle is latched as it is. In this case, (0,0,0) is latched. (0,0,0) latched by the retiming circuit 5 is input to the data input terminals Dli, D2i, and D3i of the SALC3E at the subsequent stage of the retiming circuit 5. When data (0,0,0) is input from the retiming circuit 5, the SALC3E executes the process iv) according to the instruction input from the instruction input terminal C0N. (0,0,0) is output to the data output terminals Dlo, D2o, and D3o of S ALC 3 E. When data (0,0,0) is input from SALC3E, SALC3F executes the process iii) according to the instruction input from the instruction input terminal C0N. “0” is output from the data output terminal Dlo of SAL C 3 E.

Figure 17 shows the state in which the 6th data (0,0,0) is input to the first stage SALC3A in the 5th cycle. SAL C 3 A receives data (0, 0, 0) Then, process ii) is executed by the command input from command input terminal CON. (0, 0, 0) is output to the data output terminals Dlo, D2o, and D3o of SALC 3 A. When the data (0,0,0) is input from SALC3A, SALC3 B executes the process ii) according to the instruction input from the instruction input terminal CON. (0,0,0) is output to the data output terminals Dlo, D2o, D3o of SALC 3B. When data (0,0,0) is input from SALC3B, SALC3C executes the process iii) according to the instruction input from the instruction input terminal CON. S ALC 3 C data output terminal Dlo, D2o, D3o

(0,0,0) is output. When the data (0,0,0) is input from SALC3C, S ALC 3D executes the process of iii) according to the instruction input from the instruction input terminal CON. (0, 1, 0) is output to the data output terminals Dlo, D2o, and D3o of SALC 3D.

In the retiming circuit 5, the output of the SAL C 3D of the previous cycle is latched as it is. In this case, (0,0,0) is latched. (0,0,0) latched in the retiming circuit 5 is input to the data input terminals Dli, D2i, and D3i of the SALC3E at the subsequent stage of the retiming circuit 5. When the data (0,0,0) is input from the retiming circuit 5, the SALC3E executes the process iii) according to the instruction input from the instruction input terminal C0N. (0,0,0) is output to the output terminals Dlo, D2o, and D3o of S ALC 3 E. When data (0,0,0) is input from SALC3E, SALC3F executes the process iv) according to the instruction input from the instruction input terminal C0N. “0” is output from the data output terminal Dlo of SALC3E.

Figure 18 shows the state after all data has been input in the sixth cycle. S ALC 3 A to S ALC 3 D have been processed because there is no input.

In the retiming circuit 5, the output of the SAL C 3D of the previous cycle is latched as it is. In this case, (0,1,0) is latched. (0,1,0) latched by the retiming circuit 5 is input to the data input terminals Dli, D2i, and D3i of the SALC 3 E at the subsequent stage of the retiming circuit 5. When the data (0,1,0) is input from the retiming circuit 5, the SALC 3 E executes the process iii) according to the instruction input from the instruction input terminal C0N. S ALC 3 E output terminal Dlo, D2o, D3o Outputs (0, 0, 0). When data (0,0,0) is input from SAL C 3 E, SALC3F executes the process of iii) according to the instruction input from the instruction input terminal CON. “0” is output from the data output terminal 010 of 3803 £.

As described above, the arithmetic pipeline 1 of this embodiment can adjust the number of stages of S A LC 3 cascade-connected in the SAL P 2 by the control data from the pipeline controller 4. Therefore, the latency of SALP 2 can be changed dynamically. In addition, each element constituting the SAL C 3 can dynamically change the threshold voltage by the operation voltage supplied from the operation voltage supply device 6 and the substrate bias supplied from the substrate bias supply device 7. The performance of each element is variable.

Appropriate operating voltage, substrate bias, and number of SAL C 3 stages allow power consumption to be controlled while minimizing performance degradation without changing the operating frequency of arithmetic pipeline 1. Become.

If the number of cascaded SAL C 3 stages is large, depending on the data dependency between instructions, an interlock will occur and the operation will stop, causing performance degradation. For example, if the subtraction process is performed using the conversion result after the addition process, the addition result cannot be used as it is in the subtraction instruction if there is a latency in the addition process, and the subtraction is waited for the result of the addition process. Processing will be executed. However, in the case of a program with little dependency between processes, even if the latency is long, the effective performance is not affected. Alternatively, there are technologies that reduce the impact of latency on performance, such as hyper-threading technology and command-line leave technology. Based on the above, the latest process technology can be applied without degrading performance by optimally setting the threshold voltage and pipeline latency (the number of connected stages of SAL C 3) according to the program conditions. Is possible.

Claims

The scope of the claims

1. an arithmetic mechanism in which processing means capable of dynamically changing processing capacity and retiming means for retiming or slewing input data are cascade-connected alternately;

Each retiming means includes a pipeline controller for supplying control data for retiming or through the processing result of the processing means connected to the previous stage, and

The pipeline controller supplies control data to each retiming unit so that a predetermined number of retiming units perform retiming of processing results and the remaining retiming units pass through the processing results. Configured.

According to the number of retiming means for performing retiming, the latency of the arithmetic mechanism is determined.

Arithmetic pipeline.

2. The pipeline controller is configured to generate a plurality of the control data according to the processing capability of each processing means,

The arithmetic pipeline according to claim 1.

3. The processing means is configured such that the processing capability changes according to changes in the supplied operating voltage value and Z or substrate bias value,

The arithmetic pipeline according to claim 2, wherein the pipeline controller is configured to generate the plurality of control data in accordance with the operating voltage value and / or the substrate bias value.

4. Each processing means is configured by an element capable of changing the performance, and the processing capacity is configured to dynamically change according to the change in the performance of the element. The control data is configured to be generated according to the performance of the element.

The arithmetic pipeline according to claim 2.

5. The element constituting the processing means is supplied with an operating voltage value and Z or The processing capability is configured to change according to a change in the substrate bias value, and the pipeline controller generates the plurality of control data according to the operating voltage value and Z or the substrate bias value. The arithmetic pipeline according to claim 4, which is configured as follows.

6. The pipeline controller is configured to generate the plurality of control data as alternating data or data having a constant value,

The retiming means is configured to retime the processing result when the control data is alternating data, and to smooth the processing result when the control data takes a constant value. ,

The arithmetic pipeline according to any one of claims 1 to 5.

7. A processor configured with elements capable of changing the performance, the processing capacity dynamically changing according to the change of the elements, and a retiming circuit for retiming or through the processing result by the processor Are connected to arithmetic mechanisms that are cascade-connected alternately, and each retiming circuit is supplied with control data for retiming or slewing the processing results from the processor connected in the previous stage. A method performed by an apparatus, comprising:

The device is

In accordance with the performance of the element, the control data for each retiming means is generated so that a predetermined number of retiming circuits perform retiming of processing results and the remaining retiming circuits pass through the processing results. A processing method using an arithmetic pipeline that includes stages.

8. A processor that is composed of elements whose performance can be changed, and whose processing capability dynamically changes according to the change of the element, and a retiming circuit that retimes or slews the processing result of the processor Are connected to arithmetic mechanisms that are cascade-connected alternately, and each retiming circuit is supplied with control data for retiming or slewing the processing results from the processor connected in the previous stage. A semiconductor device for

Depending on the performance of the element, a predetermined number of retiming circuits may re-process the processing results and the remaining retiming circuits may pass the processing results. And a means for generating the control data for each retiming means.

9. A processor composed of elements whose performance can be changed, and a processor whose processing capacity dynamically changes in accordance with the change of the element, and a retiming for re-migrating or passing through a processing result by the processor Is connected to an arithmetic mechanism that is cascade-connected to the circuit, and supplies control data for retiming or through the processing result of the processor connected in the previous stage to the retiming circuit. On the computer

In accordance with the performance of the element, the control data for each retiming means is generated so that a predetermined number of retiming circuits perform retiming of processing results and the remaining retiming circuits pass through the processing results. To make the process run,

Computer program.