CA1270954A

CA1270954A - Apparatus for arithmetic processing

Info

Publication number: CA1270954A
Application number: CA000535863A
Authority: CA
Inventors: Atsushi Hasebe
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1986-04-30
Filing date: 1987-04-29
Publication date: 1990-06-26

Abstract

PATENT

ABSTRACT OF THE DISCLOSURE

In a high-speed arithmetic processor, a first number that has an absolute value that can exceed one is multiplied by a second number that has an absolute value not exceeding one. If the first number exceeds one it is divided into an integer and a part having a value less than one. The second number is accumulated as an addend a number of times equal to the integer to produce a sum. The second number and the part of the first number having a value less than one are supplied to a multiplier to produce a partial product. An adder adds the partial product to the sum, thereby obtaining a final product of the firs.
and second numbers. The multiplication is thereby performed in a number of steps which is minimized and never varies, regardless of whether the absolute value of the first number is, for example, less than one, at least one but less than two, or at least two but less than three. This speeds up the arithmetic processing and simplifies the programming therefor.

Description

O~ L PA rFNT

BACKGR~UND OF THE INVENTION

Field of the Invention ~ his invention relates to arithmetic processors and, in particular, to a novel and highly effective arithmetic processor adapted ~or use in video image processors and in other high~speed data processorC and able to process data more rapidly than arithmetic processors hereto~ore conventional in such apparatus.

Description of the Prior Art Video image processing apparatus must proce~s data at high speed. In commercial television, for example, 25 or 3~ frames (depending on the system) are displaved per second, each frame including hundreds of lines and each line including hundreds of pixels (picture el`ements). In advanced image processing apparatus, signals produced by a television camera are typically converted to digital form, stored in an input image memory, processed ln a position stationary processor, stored in an output image memory, converted back to analog form, and then recorded by a VTR and/or displayed on a television monitor. Apparatus such as a position variant processor, a contro] processor and a host computer are provided for controlling dat~a flows, controlling the execution and stopping of processes, and contrclling the entire video image processing apparatus.

- 2 -~ 709~4 PATE~T
S03~6 In such apparatus, the position stationary processor includes a number of arithmetic units that process signals consisting of data signals and a coefficient. Each data signal is multiplied by its coef~icient to pr~duce an output. Depending on the magnitude of the signals, the multiplication requires in conventional practice a different number of steps, for example three to five. ~t is difficult to write a pro~ram that infallibly takes the diflerence in the number of processins steps (and hence in processing time) into account. Typicall~, therefore, the progrGm all~s for the maximum number of steps tha4 may be required, for example five steps. This means that time is wasted in any case where only three or four steps are required for the multiplication. While the time wasted is short in any given instance, the wasted time is accumulated over and over and is quite sigr.ificant in the aggregate.

OBJECTS A~ SUM~RY OF THE INVENTTON
_ An object o the invention is to remed~ the problems of the prior art outlined above.
Another object of the invention is to provide a ~igh-speed arithmetic processor that car.
multiply two numbers in the same Iminimum) number ol steps regardless, within limits, of the magnitu~c Or the numbers.
More particularly, an object of the inventior. is to provide an arithmetic processor for multiplying a first number such as a coefficient that has an absolute value that can exceed one by a second number such as a data number that has an absolute .

- 3 -~7~
PP.TENT

value not exceeding one, the multiplication requiring a number of steps that is minimized and always th~
same, regardless of whether the absolute value of the first number is, for example, less than one, at least one but less than two, or at least two but less than three.
~ he foregoing and other objects are attained in accordance with a first aspect of the invention by p~oviding ~n arithmetic processor for multiplying a first number that has an absolute value that can exceed one by a second number that has an absolute value not exceeding one; the processor comprising: a multiplier; ccntrol means respons ve to an absolute value o the first number exceeding one~
for dividing the ~irst number into an inteaer and a part having a value lesc than one; accumulatinq means for accumulating the second number as an addend a number of times equal to the integer to produce a sum;
storage means for supplyin~ the second number and the part to the multiplier to produce a partial product thereof; and adder means for adding the partial product to the sum, thereby obtaining a final product of the first and second numbers.
Tn accordance with a sPcond aspect o' the invention, an arithmetic processor comprises a multiplier capable of multiplying two numeric values each having an absolute value not exceeding one; means operative in response to one of the two numeric values exceeding one for dividing the one:value into an integer and a part having a value less than one; means for supplying the other of the two numeric values and the part to the multiplier to form a partial product;

~L~7~.~3~a~
PAT~

and means for taking the other numeric value ac ar.
addend a number of times equal to the in'eger to f~rm a sum and adding the sum to the partial product, there~y obtaining a final product of the t~to numeric values.
In accordance wilh another aspect of the invention, an arithmetic processor comprises an input register; an arithmetic section; a work memory having a write input; and a selector connected between the inpu~ register and the write input for selectivel~
suppl~-ing data from the input re~ister to the write input.
In accordance with another aspec. of the invention, an ari~hmetic processor is provided comprising an input register; an arithmetic section; a work memory ha~Ting a write input and an output; and a selector connected between the input register and the write input and between the output and the write input; data from the input register and the outpu~
being selectively supplied to the write input via the selectox.
In accordance with another aspect of the invention, an arithmetic processor is provided comprising a multiplier having two input terminals; a coefficient memory producing an output; and means connected to the multiplier and the coefficient memory for supplying the coefficient output to both of the input terminals for calculating the square thereof.
In accordance with another aspect of the invention, an arithmetic processor is provided comprising a multiplier having~two input terminals; an arithmetic logic unit producing a logic output; and ~70~
PATENT
5~3267 two selectors respectively c~nnected to the two input terminals and responsive to the logic output; whereby the logic output is supplied t~ both of the input terminals for calculating the square thereof.

BP~lEF` DESCRIPTIO~; OF THE DRAWINGS
A better understanding o, the objects, ~eatures and ad~a~ages of the invention may be gained from a consideration of the following detailed description of the preferred embodiments thereof, in conjunction with the appended drawings, throughout which a given reference character alwa~s indicates the same element or part, and wherein:
Fig. 1 is a conceptual drawing showing the whole of an image processing apparatus to which thc apparatus of the present invention is applicable;
Fig. is a block diagram showing an example of a main portion of the image processing apparatus of Fig. 1;
Fig. 3 is a block diagram of an earlier but not publicl~ disclosed arithmetic unit for use in the apparatus of Fig. 2;
Figs. 4-9 are block diagrams of respective pre erred embodiments of arithmetic units in accordance with the invention that can be substituted for the arithmetic unit of Fig. 3;
Fig. ~0 is a block diagram showing the incorporation of the structures of Figs. 4-9 to form a pair of arithmetic unit~ for use in the apparatus of Fig. 2; and ~7~
PklE~'T
S~3267 Fig. 11 is a flowchart illustrating the operation of a preferred embodiment of an arithmetic unit in accor~ance with the invention.

DESCRIPTION OF THE PREFERRED EMBODIMFNTS
Typical Appara$us Em~loying Arithmetic Processor Figs. 1 a~d 2 show image processing apparatus of a type disclosed in a copending applica~ion of Hasebe et al. serial No. 06/932,277, filed ~ovember 19, 1986, and assigned to the assignee of the present application. Arithmetic processors according to the present invention are especially adapted for use in apparatus as shown in Figs. 1 and 2.
Fig. 1 shows an example of video image processing apparatu~ for achieving high-spee~ data processing. The apparatus comprises an input/output portion 1 (hereinafter called an IOC), a memory portion 2 (hereinafter called a VIM) consistins of an input image memory 2A (hereinafter called a VIMTN) and an output image memory ~B (hereinafter called a VIMOUT~, a data processing portion 3 consisting of a position stationary processor 3A (hereinafter called a PIP) mainly ~or calculating picture element values and a position variant processor system 3B /hereinafter called a PVP) for controlling data flows as by controlling addresses and for adjust1ng processes~to coincide in timing, and a processor 4 (hereinafter~
called a TC~ as a total controller for controlling execution and stopping of processes and 3~

PATE~T

exchange of programs. The TC 4 is provided with a host computer 5 (hereinaf~er called an HC) for controlling the entire video image processing apparatus.
The IOC 1 makes A/D (analog-to-digital) conversion o ~ideo signals coming from a video camera or VT~ 6, for example, to provide digital image data, writes t~e digital image data in ~he VIMIN 2A, reads out processed image data from the VIMOUT 2B, and makes D/A (digital-to-analog~ conversion of the processed image data to restore analog video signals, so that they may, for example, be recorded in a VTR 7 or supplied to a monitor receiver 8 to enable monitoring of the video image.
Tn the present case, the signals supplied as input and output are video signals o the NTSC
system or the R-G-B system, and either of thece systems is specified by the TC 4. A picture elemen~
is provided, for example, by 8-bit data.
The writing and reading of image data in~o and out of the VIM 2 is performed in large blocks of image data, for example in blocks of a field or a ~rame. Therefore, each of the VIMIN 2A and the VIMOUT
2B is made up of a plurality sheets of memories, each having enough capacity for the imaqe data of a field or a frame. For example, 12 sheets of 768 x 512 bytes may be employed as frame memories. In the present example, the use of these 12 sheets of frame memories i5 not fixed but can be flexibly alIocated to either the VIMIN 2A or the VIMO~T 2B according to the purpose of the processing or the picture image as the object of the processing. Two sheets are used as one set, so P~ l'El~'~

that when one sheet is written, th~ other can be re2c, whereb; processing from outside the VIM 2 b~- .he IOC 1 and processing within the VIM 2 by the PIP 3A an~ the P~P 3B are per~ormed in parallel.
A c~ntrol mode signal determining whether the pl~rality ~f shee~s o~ frame memories of VIM 2 should come under the control of the IOC 1 or under the control of the PVP 3B is issued from the TOC 1 anc supplied to the ~
The data processing porti~n 3 co~rises G
processor, reads image data stored in the VI~ h accor~ing to its program, processes the data in various ~ays, and writes the processe~ data in the VIMO~T ~B.
~ he data processing portion 3 is made up o the separated systems pTp 3A and PVP 3B operating n parallel; by virtue o such separated arrangement, the procescing time consumed in the data processing portion is determined onl~ by whiche~-er is lonqer of the processing times taken by the two systems. In contrast, in the data processing portions of eariier image p~ocessing apparatus, the total processing time was determined ~y the sum of the processing times. ~n the present example, data processing is performed at such hish rates that video data can be processed on a real-time basis.
The processing portion 3 is ~ade u~ of one sheet or a plurality of sheets of processors, and the microprograms in their microprogram memories can be exc~anged when the scope of the processing is enlarged.

g~
PATE~T
` 5O3267 The pr~gram exchange ls carried out in th-s wa~: the microprograms are supplied fro~; the HC 5 to the TC 4 in ad~ance and st~red, for example, in a Rh~J
provided ~heIein. Thereafter, wnen, for examp'e, the user has made a request for exchanging some programs (b~ turning a s~itch on), the TC 4 suppiies the programs to each of the processors.
The PIP 3A and the PVP 3B are basically of the same architecture. Each comprises an independent processor ha~ing a control unit, arithmetic uni., memor~ unit, and input/output port. Each is arranced in a multiprocessor structure made up of a plurality of unit processors and is constructed so that high-speed processing is achieved chiefly by adoption of 2 parallel processing technique.
The PIP 3A comprises, for example, 60 sheets of PIP procPssors and several sheets of subprocessors and processes image data coming from the VIM ~ or generates image data within the PIP 3A
itself.
The P~'P 3B comprises, for example, 30 sheets of processors and controls flows of image data inward from the VIM 2 such as allocation of the picture e'ement data to the PIP 3A.
More particularly, the PVP 3B generates address data and c~ntrol signals for the VIM 2 and supplies them to the VIM 2. It also generates input/outp~t control signals and o~her control ~Lgnals for the PlP 3A an2 supplies them to the PIP 3A.
The image data processing is not always conducted in such a manner that the data from a single sheet of a frame of the VIMIM 2A are processed and the )9~
, ~ PA~E~T

processed data are written ln the VIMO~T 2B, but sometimes data coming from a plurality of she~ts of frame memories and extending over a plurality of sheets of ~rames are processed together.
The PIP 3A and PVP 3B employ 16-bit processing as a standard, and a speed s achievable that will enab~e the ari~hmetic processing of the image data of one frame within the time period of one frame, namely tha-t will enable real-time processing.
As a matter of course, there are also some processes that require longer processing time than ane ~rame.
In the present case, the image data processing by the PIP 3A and PVP 3B is performe~ in synchronism with the video frames. Therefore, a process start timing signal PS in synchronism with each frame is supplied from IOC 1 to the PVP 3P. The signal PS is ordinarily at a high level and it is brought to a low level at the processing start time.
On the other hand, a signal OK indicating that a process has been finished is supplied from the PVP 3 to the IOC 1. This signal OK is supplied by a processor at the core of the PVP 3B that pro~ides timing control. The process start timing signal P~ is generated in the IOC 1 based on a frame start signal indicating the first line. of each frame and the process end signal OK.
When the pro~essing is performed on a real time basis, since the signal OK is always obtained at the end of each frame, the signal PS becomes the same signal as the frame start signal.

On the other hand, when the processins time is longer than one frame, the signal PS does not 7~ ~3~

ATE~T

coincide with the frame period but is obtained at the star, of a frame after a signal 0~ has been supplied as an output.
When the processor at the core of the P~'P
3B ~etects that the process start timing signal PS
from the IOC 1 h2s been brought to the low level, this processor starts t~ ruD, and ~utputs, according to its controlling program, timing signals to other processors (including the pTp 3A), supplies addresses to the VIM 2, reads the image data from the ~ 2 and causes the same to be processed in the PIP 3A. ~her.
the processing has been finished, the same proce~sor generates the signal OK and stops, waiting for issuance o~ the next process start timing siar.al PS.
In this case, only the image signal portion, excluding the synchronizing sisnal and bur~t signal, is taken as the object of processing, and the data read out from the VIM ~ does not include the synchronizing signal and burst signal. Therefore, the IOC 1 is provided with a ROM generating the synchroni2ing signal, burst signal, and the vertical blanking signal, and in ~he case of the NTSC signal, the data from the VIMOU~ 2B (after being rearranged, if necessary) are transferred to the D/A converter of the IOC 1 together with the synchronizing sisnal, burst signal, and vertical blanking signal.
Also in the case of the three primary color signals, an outer synchronizing signal becomes necessary. This signal is generated also in the IOC l and supplied to the monitor and other apparatus.
In this parallel processing system by the use of multiprocessorsi the TC 4 effects synthetic - 12 - ~

~ ~7~)~5~

PAl`ENT

control according to the three modes men~ioned below.
Execution of processes, stopping, and program transfer ~exchange) are thus carried out consistently. Also, the transrer and execution are effectively conducted by using a slow clock and a fast cloc3~ at the times of the program tr~nsfer and the program execution, respectively.
Fig. 2 shows a concrete structure of the PIP 3A. Although the PIP 3A has, in reality, a large number 160 sets, fOT example) of processors arranse~
in parallel, only two sets o' them are shown in the drawing. In this drawing, digital data from, the VI~. 2 are supplied to input registers 31-1 to 31-n (hereinafter called the FRA) provided for each of the n processors 30-1 to 30-n, and these registers are controlled by the PVP 3B in accordance with the address read out of the VIM 2 an~ stored with a predetermined amount o' data necessar~- for each processor.
The data written in these re~isters 31-1 to 31-n are supplied to arithmetic units 32-1, 33-1 to 32-n, 33-n, respectively. Each of the arithmetic units is provided with an adder/subtractor, multiplier, coefficient memory, data memory, etc., and makes linear an~ nonlinear data conversion calculations accordinq t~ a control signal from the control units 34-1 to 34-n. Results of the calculations are obtained at the arithmetic units 33-1 to 33-n, an~ the arithmetic units 33-1 to 33-n are controlled by the PVP 3B according to write addresses of the VIM 2, whereby the results of the calculations are written in necessary portions in the VIM 2.

~7 - EN~
Su3~7 The control signals from the control units 34-l to 34-n are formed according to the microprogram written in the microprogram memories (~.P~) 35-l to 35-n. The microprogram is written from outside through program chan~e controls 36-l to 36-n~
If the microprogram is formed by the host computer lHC) 5 (Fig. 1), etc., the transfer rate from the HC S to each MP~; 35-l to 35-n is limited by the capacity of t~e line. Tt is possible to transfer the program only at the rate, for example, of 500 Kbytes~sec or so, and it takes a considerable amount of time for the rewriting in all of the ~.P~s 35-l to 35-n. Since procecsing in the PIP 3A, etc., is impossible during that time, substantial dra~bac~s are experienced. And, since the transfer cannot be performed until the processing in the pTp 3A, etc., has been finished, the HC has to wait until it is finished, and the efficiency of usage of the HC is considerably lowered.

Earlier Arithmetic Processor ..
In the apparatus des~cribed above, each arithmetic unit 32,33 of each processor sect.on 30 conctituting the PIP 3A is provided with a so-called multiplier.
FlG. 3 shows a primary portion of an earlier arithmetic unit known to the inventor but not publicly disclosed and no~ claimed~herein, in which data from the FRA 31 and data from a work memory 41 to be descri~ed later are supplied ~ia a selector 42 to an inpu~ of a multiplier 43, and data from a coefficient memory 44 and data from an arithmetic ~2~
P~ .NT
S0~267 logic unit (ALU~ 46 to be descr-bed later are supplied via a selector 45 to another input of the multiplier 43. Output data from the multiplier 43 is supplied to an input of the ALU 46, which deliver~ output data there rom to the work memory 41 and via a register 47 to another input of the ALU 46.
In a case where the work memory 41 is not provided, the selector 42 is unnecessary; and, in a case where the output from the ALU 46 is not supplied to the multiplier 43, the selector 45 is unnecessary.
_ In general, the multipliers used for this kind of digital operation require that the absolute values of each of two numbers to be multiplied be less than l. Of course, data to be supplied to the FkA 31 can be adjusted to have an absolute value less than ', for example, by setting the dynamic range to less than 1. However, the coefficient to be multiplied is required to be at leas~ 1 in some cases.
To cope with thi~ situation, in Fig. 3, a coefficient having a value of at least 1 is subdivided into a plurality of coefficients each having a value less than 1. Each coefficient is then multiplied by the input data, and the results are added to obtain the product as a total. For example, in FIG. 3, the data from the FRA 31 and the coefficient less than 1 from the coefficient ~emory 44 are supplied to the multiplier 43. T~e resultant product is supplied to an input of the ALU 46, which operates as an adder, and an output fxom the ALU 46 is delivered ~ia the register 47 to another input of the ALU 46.
In this circuit, assuming the input data and the coefficient to be x and a ( ~a~

~7~3~4 P~ ~NT

respectively, the arithmetic pr~cessing is executed in such a way that the input data x and the coef~icient a are supplied to the multiplier 43 in the first step, the product ax is obtained and is loaded in the output register of the multiplier 43 in the second step, and then the product is extracted via the ALU 46.
Consequently, when the absolute value of the coefficient is less th~n 1, the product can be obtained in three steps.
In contrast, when the absolute value of the coefficient is at least 1 and less than 2, the arithmetic operation is conducted with the coefficients (a + b : ¦a3,~ b~ 1). In this case, four steps are required to obtain the product: the input data x and the coefficient a are supplied to the multiplier 43 in the first step; the input data x and the coefficient b are supplied to the multiplier 43 immediately after the product ax is supplied to the output register of the multiplier 43 in the second step the product bx is supplied to the output register of the multiplier 43 immediately after the product ax is supplied from the output register via the AL~ 46 to the register 47 in the third step; and the A~U 46 adds the bx in the output register of the multiplier 43 to the ax in the register 47, thereby obtaining ~a ~ b)x, in the fourth step. Since four steps are required to obtain the product when the absolute value of the coefficient is at least 1 and less t~an 2, the Tequired period of time is greater by the time of one step than the period of time required when ~he absolute value of the coefficient is less than 1.

~7~
PA~E~T
S0~267 It can easily be seen tha~, if the absolut~
value of the coefficient is a~ least 2 and less than 3, five steps are required by the apparatus of Fig. 3 to obtain the product.
In a case where the processing time re~uired for a given kind of arithmetic operation varies depending on the numbers to be subjected to the operation, the processing program is designed to accommodate the operation~ that require the gr~atest period of time for their execut-on. This caUsec some o the time durin~ the execution of other opera~ ons to be wasted. In addition, it is not easy to desigr.
the processing program to take the variations of the processing time into account. The.intertal of time required to perform one step descri~ed above is quite short; however, such an operation is repeate~ a tremendous number of times in graphic processing and the like. In such a case, the short interval of time is accumulated over anc over, which results in a substantial delay.
In the technique described above, many arithmetic processing steps are required to perform a multiplication with a coefficient of which the absolute value is at least 1, thereby leading to the problem that a substantial delay is caused.
In the apparatus described above, the output result of the arithmetic operation of the AL~
46 is supplied also to the work memory 41, and thereafter arithmetic processing is executed in some cases by using the data written in the work memory 41 and the data latched in the register 47. The amount of data necessary for the processing varies depending ~7(~

P~TENT
~ 267 on the content of the processing, and ~he amount of data to be written in the FRA 31 is greatly changed especially when the apparatus is used as a general-purpose processing system. In ordinary processing, it is unnecessary to allocate the capacity of the write data of FRA 31 accor~iny to the maximum amount of the required data; however, the efficiency of the read,~write operations may deteriorate in some ca~es.
In a case where so-called s~.ading processin~ oi a solid spherical ima~e is performec by the apparatus describec above, an inner product is calculated from the uni; vector Or the light source and the normal ~ector at an~ gi~7en point on the surface of the ima~e to obtain the brightness at that point. In order to obtain the normal vector in this case, it is necessary to perform processing such as look-up table (LUT) processing and squaring of the data from the coefficient memor~ 44 and the FRA 31.
When a squaring of the coefficient is performed in the arithmetic sections 3 -1 to 32-n and 33-1 to 33-n of each processor section 30-1 to 30 n (Fig. 2) constituting the PIP 3A (Fig. 1) described above, the coefficient from the coefficient memory 44 (Fig. 3) is supplied to the work memory 41 throush the selector 45, the multiplier 43, and the ALU 46, an~ then the coefficient stored in the work memor~ 41 is supplied via the selector 4~ to an input of the multiplier 43.
~t the same time, the coefficient from the coefficient memory 44 is supplied ~ia the selector 45 to another input of the multiplier 43, and the obtained product - 18 ~

~7~ 35~

PA~EN~
S0326' (square of the coefficient) is supplied as an output b~ the AL~ 46.
When a squaring operation is to be performed on data, the data from the FRA 31 is supplied to the work memory 41 and t~e register 47 through the selector 42, the multiplier 43, and the ALU 46, and then the data stored in the work memory 41 is supplied via the selector 42 to an input of the multiplier 43. At the same time, the data from the register 47 is delivered to another input of the multiplier 43 through the ALU 46 and the selector 45, and then the resultant product (squared value) ~s sup~lied as an output by the ALU 46.
In the apparatus of Fig. 3, however, to use the work memory 41 for intermediate processing in the arithmetic operation complicates the address ~eneration, and the operating efficiency may deteriorate when, for example, a coefficient is squared in LUT processin~.

A `thmetic Processor Accordin to the Invention r 1 ~
In accordance with the present invention, the processing time required for the arithmetic operations described above is s-gniricantly reduced.
FIG. 4 shows one preferred embodiment of apparatus constructed in accordance with the invention. In the apparatus of Fig. 4, data x from the FRA 31 and a coefficient a from the coefficient memory 44 ~assumed for the moment to have a value less than l) are supplied to the multiplier 43, and the resultant product ax is delivered to a first input of a selector 48, which may be formed of tri-states.

-- lg --~7~

PATE~T

Data ~rom the FRA 31 is sent directly to a second input ~i the selector 48, and the da~a from t~e FRA 31 is further supplied via a delay register 49 to a t~ir~
input thereof. The data selecte~ by the selector 48 is supplied to an inp~ o~ the adder 46. The output of the adder 46 is supplied via the register 47 to another input of the adder 46.
As indicated above, it is assumed that the input data and the coefficient are x and a, respectivel~. The arithmetic processing is then perorme~ as follows: the input data x an~ the coe~ficient a are supplied to the multiplier 43 in the ~irst step; the (partial) product ax is store~ in the output register of the multiplier 43 in the second step; and the ~final) product ax is obtained through the selector 48 and the adder 4~ in the third step.
Consequentl~, when the absolute value of the coefficier.t is less than 1, the product is obtained in three steps, first as in the case of the apparatus of Fis. 3.
In contrast, when the absolute value of the coefficient is at least 1 and less than 2, the apparatus of Fig. 4 requires fewer steps than ~he apparatus of Fig. 3 to complete the calculation. In this case, the calculation is performed with the coefficient ~a + 1 :¦a¦<l). The input data x and the coefficient a are supplied to the multiplier 43, ahd, at the same time, the input data x is supplied t~ the delav register 49 in the first step; the partial product ax is supplied to the ~utput register of the multiplier 43, and, at the same time, the data x supplied to the delay register 49 is delivered to the 70~
PATEl~
S0326, register 47 via the selector 48 and the ad~r 46 in the second step; and then the adder 46 adds the partial product ax received from the output reg~ster of the mul iplier 43 to x received from the -~g c~er 47, thereby obtaining the final product (1 1 a) x in the third step. Consequently, when the absolute value o~ the coefficient is at least 1 and less than 2, the final product is obtained also in lust three steFs, in contrast to the four steps required b~- the appara'us o Fig. 3.
When the absolute value of the coefr~cient is at least 2 and less than 3, the arlthmetic operation is effectec with the coefficient (a ~ 2 :
la¦ ~ 1). In this case, the input data x and the coeflicien~ a are supplied to the multiplier 43, and, at the same time, the input data x is delivere~ to the delay register 49 and via the selector 48 and the adder 46 to the register 47 in the first step; the partial product ax is supplied to the output regicter of the multiplier 43, and, at the same time, the sum x + x = 2x produced b~- the adder 46 is delivered to the register 47 in the second step; and the partia product ax of the output reg ster of the multiplier 43 and the sum 2x stored in the register 47 are added in the adder 46, thereby obtaining the final pro~uct (2 a)x, in the third ste~. Consequently, when thc absolute value of the coefficient is at least 2 and less than 3, the final product is obtained also in just three steps, in contrast to the five steps required by the apparatus of Fig. 3.
Thus, unlike the apparatus of Fig. 3, the apparatus of Fig. 4 described above can obtain the PAl r~
o3267 inal product in just three steps ~one step for ~he input and two steps for the processing) whene~er the absolute value of ~he coefficient is at ~east 1 a~d less than 3. In many applications of the in~ention, this encompasses all of the cases of interest. A~ a consequence, no provision need be made for additional delay time when the absolute value o~ the coefficient is within that range, and the arith~etic processins time can be minimize~ and held constant. Tn a~.ition, the processing program can also be quite easil~-created.
If a detection circuit ~not shown) in the selector 48 is supplied with the output of the coefficient memory 44, including the integra' portion thereof, the selector 48 can selec~ the input data automatically.
Practically, however, the da'a contents of the coefficient memory 44 (including only the respective a portions of the coefficients) are stored simultaneously when the program for the processor is stored in the microprogram memories 35-1 to 35 n (Fig.
~. Since only the a part of each coefficient is stored, the selector is controlled by the program.
The same is true of the other selectors described below.
In the apparatus of Fig. 4 described above, for arithmetic processing with a coefficient of which the value is 1, the input data x from the FRA 31 can be directly obtained through the selector 48 and the adder 46 in the first step, and the processins time can be greatly reduced as compared with the conventional case where the input data x is obtained ~L27~5D~

PATE~T
S03'67 through the multiplier and the arithmetic operation is executed wi~h the coefficient (0.5 ~ 0.5).
FIG. 11 is a flowchart illustrating the operation of the embodiment of Fig. 4 where the coefficient is at least two (if the coefficient is less than 2, the flowchart can be shortened). The selector 48 is stepped to its left position and the coefficient from the memory 44 and data fro~ the FRA
31 are supplied to the multiplier 43. The multiplication output of the mult~plier 43 is su~p ~ed to the adder 46, and the output of the adder 46 is accumulated a first time in the register 47. The selector 48 is stepped to its center position, ar.d the adder 46 and register 49 receive data from the F~ 31.
The adde~ 46 receives the output of the register 47 and adds it to the data from FRA 31 to produce a sum that is stored in the register 47. The selector 48 is stepped to its right position, and the adder 46 receives the output of the registers 49 and 47 anc.
produces a sum that is accumulated a third time in the register 47. In the light of Fig. 11, those skilled in the art will be able to prepare a flowchart for the other embodiments of arithmetic processors according to the invention on the basis of the description belo~
of their structure and function.
If the absolute value of the coefficient can be restricted to have a value less than 2, the register 49 can be omitted, as in FIG. 5. In this case, when the program is designed to cause the output register of the multlplier 43 to be "transparent", the partial product derived by the multlplier 43 can be immediately supplied to the adder 46 in the second ' ' PA'rE~

step, so that the arithmetic operation can bc performed in just twg steps (one step Lor the input and one step for the processing).
According to the present in~ention, a bypass is established around the multiplier and hence multiplication with a numeric value of which the absolute value is at least 1 can be quite easily performed.
FIG. 6 shows another embodiment of apparatus constructed in accordance with the invention. In this figure, the data from the FRA 31 and the data from the work memory 41 'o be descr-be~
later undergo a selection in the selector 42 so as to be supplied to an input of the multiplier ~3. At the same time, the data from the coefficient memory 44 is delivered to another input of the amplifier 43.
Output data from the multiplier 43 is supplied to an input of the arithmetic logic unit (ALU) 46. The output of the AL~ 46 and the data from the FRA 31 are supplied to the selector 50, and the selected data i~
delivered to a write input of the work memory 41. The output of the ALU 46 is delivered via the register 4 to another input of the ALV 46.
The data supplied to the FRA 31 in this apparatus is supplied via the selector 4~ to the multiplier 43 and is then multiplied b~ a coefficient from the coefficient memory 44. The resultant data is supplied to the ALU 46. The data is further subjected to an operati~n such as additlon to the data from the register 47, and the resultant output of the operation is extracted. At the same time, the output i5 delivered to the work memory 41 (via the selector 50) - 2~ -~ 127~954 PATE~l and to the register 47, and thereafter the arithmetic processing is executed by use of the data written in the work memory 41 and the data latched in the register 47.
In this apparatus, the data supplied to the FRA 31 is supplied via the selector 50 to the work memory 41.
Consequently, in this apparatus, when the amount of the input data exceeds the capacit~ o~ the FRA 31, the excess data can be supplied ~ia the selector 50 to the work memorv 41 so as to be stored therein. Even when a great amoun~ of data is to be processed, the FRA 31 need ha~e only a small capacit~, since the e~cess data can be written in the work memory 41. A great amount of data can thus be handled without lowering the efficiency of the FRA 31, which facilitates efficient processing regardless of the amount of data.
The read/write opera~ions in the work memor~ 41 can be effected in concurrence with the arithmetic operation such as multiplication, and hence the efficiency of the processing does not deteriorate.
According to the pre~sent invention, when the input data exceeds the capacity of the input register, the excess data can be written in the work memory, and hence the data can be effecti~ely processed with a small input register regardless of the amount of data.
In this apparatus, the data supplied to the ~RA 31 is delivered via the selector 50 to the work memory 41 o as to be written therein. Moreover, the data read from the work memory 41 and the data from 5~
. PAlENT

~he FRA 31 are delivered via the selector 42 to the multiplier 43, which effects a multiplication with the coefficient from the coefficient memory 44, and the resultant ~ata is supplied to the A~U 46. The obtained data and the data from the register 47 are subjected to an operation such as addition to obtain the output of the arithmetic operation, and the output is supplied to the work memory 41 via the selector 50 and to the register 4,. Thereafter, the arithmetic operation is performed by using the data written n the work memory 41 and the data latched in the register 47.
In the apparatus described above, wher. an operation such as so-called filter processing or convolution processing is to be executed, a part 0c a series of data or a partial series of data is writter.
in the work memory 41, and this data and the coefficient from the coe.ficient memory 44 are subjected to multiplication and addition by use of the multiplier 43 and the ALU 46. In this case, howe~er, a predetermined period of time is necessary to write in the work memorv 41 the partial series of data required in the filter processing, and the arithmetic processing cannot be carried out at the same time. As a result, the processing efficiency deteriorates.
In so-called ~ilter processing, the partial series of data to be used in the arithmetic operation is sequentially processed in the ~verlapped state;
consequently, in many cases, an arhitrary portion of the series of data is repetltlvely used by shi_ting the sequence of the series of data when the processing is next executed.

~L~7~

PATEN~
503~6/
In the embodiment of Fig. 7, the data from the FRA 31 and the data from the wor}; memor) 41 to be described later are subject to a selection in the selector 42 and the selected data is supplie~ to an input of the multiplier 43. At the same time, the data from the coe~ficient memor~ 44 is deli~ered to another input of the multiplier 43. The multiplication output of the multiplier 43 is delivered to an input of the AL~ 46, and the output Or the ALU 46 is supplied via the register 4, to ano'her input of the AL~ 46. The data rrom the work me~ior~ 4 is supplied to the register 51. The data from the register 51, the data ~rom the FR~ 31, and the ou'put o~ the AT~l 46 are supplied to the selector 50. The selected data ~rom the selector 50 ic delivered to the write input of the work memory 41.
The data supplied to the FRA 31 in this apparatus is supplied via the selector 50 to the work memor~ 41 so as to be wr.tten therein. The data read from the work memory 41 and the da a from the FRA 31 are delivered via the selector 42 to the multipl~er 43, which effects a multiplication with the coefficient from the coefficient memory 44. The resultant data is delivered to the ALU' 46. The output of the multiplier 43 and the data from the register 47 are subjected to an operation such as addition, and the obtained output is supplied to the work memory 41 (via the selector ~0~ and to the register 47.
Thereafter, arithmetic processing is effected by using the data written in the work memory 41 and the data latched in the register 47.

~ ~7(~5~

PA'rEl~T
, S03 67 The data read fr~m the work mem~ry 41 is fed to the register 51, and the data from the res~ster 51 is rewritten in the work memory 41 via the selector 50.
Conseq~entl~, in this apparatus, the data written in the work memory 41 is read and is subjecte~
to an arithmetic operation. At the same time, the data can be rewritten in the work memory 4' via the selector 50. Thus, any data in the partial series of data to be used also in the next processing is latchec in a register and the latched data is rewritter. at an adcress that has undergone a necessary shift; that is, the amount of data to be written can be reduced anc hence the time required to write the data is minimized.
For example, in a case where a one-address shift for the next processing is to be executed, while the data is read and is subjected to an arithmetic processing, the data is la ched in the register 51;
and when the next data is read after the processing, the data of the register 51 is rewritten at the address from which ~he readout has been effected. As a result, the data is shifted and is rewritten. At the same time, the system constituted by the register 51 and the work memory 41 is separated from the arithmetic section; consequen~tly, the rewrite operation can be accomplished in concurrence with the arithmetic processing, which greatly increases the processing efficiency.
According to the present invention, since the data written in the work memory can be rewritten through the sequential shift operation, the necessary ~763~
PATEN~

portion of the partial series of data can be rewritten for storage and thus the amount of data written in the respective processings is reduced, thereby minimizing the write time and improviny the processin5 efficienc~
In the embodiment of Fig. 8, the data from the FRA 31 and the data from the work memory 41 are supplied to the selector 42, and the selected data is delivered to an input of the multiplier 43. The data from the coefficient memory 44 is supplièd to another input of the multiplier 43. The data ~rom the coefficient memory 44 is supplled also to a register The output of the multiplier 43 is fed to an input of the ALU 46, which deli~ers output data therefrom to the work memory 41 and ~.~ia the resister 47 to another input of the AL~ 46. The register 52 may alternatively be connected to the other input terminal of the multiplier 43.
In a case where the square o~ a coefficient is to be calculated by this apparatus, the coefficient from the coefficient memory 44 is supplied to the register 52, and the data from the register 52 is delivered via the selector 42 to an input of the multiplier 43. At the same time, the same coefficier,t from the coefficient memory 44 is fed to another input of the multiplier, and the obtained product (square of the coefficient3 is supplied as an output by the ALU
46.
Since the output of the coefficient memory 44 is supplied to both inputs of the multiplier 43, the square is ~uite simply calculated. With this provision, operations such as the square of a ~L2~

~A~ENT

coefficient and the multiplication of two coef~icientc can be simplv accomplished, for e~ample in LUT
processing, which considerabl~ increases the efficiency of the arithmetic operation.
Accordins to the present invention, the provision of a circuit for applying the output of the coe ficient memory 44 to both inputs of the multiplier 43 facilitates such operations as the squaring of a coefficient in LUT processing and the like.
In the embodiment of Fig. 9, the data fro~
the FRA 31 and the data from the work memorv 41 ar~
supplied to the selector 4', and the selected data from the selector 42 lS supplied to an input of the multiplier 43. The data from the coefficient memorv 44 and the output of the ALU 46 are supplied to the selector 45, and the selected data therefrom is supplied to another input of the multiplier 43. The output of the multiplier 43 is fed to an input Or the ALU 46, and the output from the AL~' 46 is supplied to the selector 45, the register 53, and the work memory 41 and is further supplied via the register 47 to another input of the ALV 46 at the same time. The register 53 may alternatively be connected to the other selector 45.
In a case where the square of data from the FRA 31 is to be calculated, the data from the FRA 31 is fed to the registers 47 and 53 via the selector 42, the multiplier 43, and the ALU 46. Next, the data from the register ~3 i5 delivered via the selector 42 to an input of the multiplier 43; at the same time, the data from the register 47 is fed to another input of the multiplier 43 via the ALU 46 and the selector ~ ~7~5~

PATE~T

45. The obtained product (square of the data) becomes the output o~ the ALU 46.
Since the outpu' r~m the A'~l 46 can be supplied to both inputs of the multiplier 43, the squaring operation can be quite easily performed. In addition, s~nce the output of the ALU 46 can be supplied to either input of the multiplier 43, the output rrom the AL~ 46 can be arbitrarily mu'tiplied by the coefficient from the memory 44, the output o the .~U 46, the data from the FRA 31, or the data ~rom the work memory 41, thereby considera~ly im~ro~ing the eff~ciency of the arithmetic operation.
Accordina to the present invention, there are provided respective routes for ~upplyins the output from the ALU to two inputs of the multiplier, which greatly facilitates arithmetic operations such as the squaring of numeric data.
F-g. 10 shows a preferred embodiment of apparatus according to the invention applied to the arithmetic sections 3~-1 to 32-n and 33-1 to 33-n (Fiy. ,) of the PIP 3A (Fig. 1) of the digital signal processing system.
In Fig. 10, the arithmetic section of the PIP comprises two systems including parts A (on the left side of the figure) and B (on the right side of the figure). Each part comprises a coeficient memory, a ~ork memory, a multiplier, an ALU and a register to perform the basic arithmetic operations necessary to efect the signal and graphics processing.
Each of the coefficient memories A CM and B

CM includes 1024 x 16 bits, and the memory contents ~:7(~
..E~T

can be exchanged ~hrough the program change control 36-1 to 36-n (Fig. 2) ~f the PIP. However, the ~ontents cannot be read from apparatus on the PIP.
The coefficient memory is disposed to store data such as coefficients necessary for the pr~cessing. For example, the coefficients of a digital filter, sine and cosine ~alues of FFT (fast Fourier transform~, and addresses of the A CM and B CM are commonly used.
However, no problem arises, because the content~ o~
the A C~ an~ B CM can be independently supplied b~ the TC 4. The output rom the A CM is supplied 2S an input to the Al MUX or Al REG, and the outpu~ ~rom the B CM is supplied as an input to the Bl MUX or Bl REG.
The contents of the Al REG and B1 REG are delivered to the respective outputs a~ 'he next clock pulse CLK.
Each of the multipliers A ~PY and B ~;P~ is a 16 bit x 16 bit parallel multiplier. Input x of the A MPY is supplied with the output value of A C~.
selected by the Al MUX or the output value of the A
ALU, whereas input y is supplied with one of the output values of the A1 REG, PL REG, A6 REG, B, REG, or FRA selected by the A2 M~X. The PL REG is a register circuit in which the PL value of the microprogram is stored. tRefer to a manual of Advanced Micro Device AM2910. The micro instructions are stored with condition or jump addresses and can also be the stored data itself.) The A6 REC- and B7 REG are register circ~its to store the outputs from the work memories A TM and B TM, respectively~ The FRA 31 comprises a group of shift regis~ers ha~ing a variable structure and being contr~lled by the processors (PVP 3B and TC 4) other than the PIP 3A and ~7a~
PATE~T

is used as an external input port of the PIP 3A. The structure can be changed according to the ~rocessing and can be shirted when necessar~ he output from the multiplier A MPY includes 32 bits. From the output, the 16-bit MSB and the 16-bit LSB can be respectively e~tracted in different cycles. The 16-bit LSP may be obtained from the ~ input. The Al RE~ is disposed to enable a squarins of the contents of the A CM and a multiplication ~f the difrerent contents. Part B is nearly the same as part A.
However, the output of ~he PL ~EG cannot be selected by the B~ M~X, which has only four inputs instead of the five of the A2 MUX. Since the FRA 31 has two ports, the same data can be read from parts A and ~ at the same time.
Each A AL~ and B AL~ is an arithmetic logic unit in which logical operations such as addition, subtraction, OR, and AND can be performed. The A AL~
is supplied with the output o~ the A MPY, the selection output of the A2 MUX, the output of the A2 REG, or the output of the A3 REG. The B ALU is supplied with the output of the B MPY, the selection output of the B2 MUX, the output of the B2 REG, or the output ~f the B3 REG. More strictly, the MUX
selection results in a selected output or no selection. The A2 REG and the B2 REG are employed because neither the A MPY nor the B MPY can perform a multiplication on an input having a value equal ~o or more than one. For example, in a case where a coefficient of 1.5 is multiplied by an input from the FRA 31, the multiplier multiplies the input by 0.5.
At the same time, the data is sent to the A? REG or 395~
PATENT

the B2 REG, thereby accomplishing a multiplicatio~
with a coefficient equal to or more than one. The A3 REG and the B3 REG link part A to part B. For example, these registers are used in a case where an operation to obtain a sum of products in a digital filter is performed in parts A and B and each output is used to obtain a final result. The output from the A AL~ is fed to the A4 MUX, the A1 MUX, and ~he B3 REG, whereas the output fro~. the B AL~I is delivered to the B4 MUX, the Bl M~lY., and the A3 REG. ~he A4 .~ is use~ to select one of the outputs from the A AL~', the IW REG, and the FRA 31.
The IN REG is an external input po-t. The output selected by the A4 MUX is supplied tG the A4 REG, the O~Tl REG, the OUT2 REG, and the B4 MUX. The A4 REG is used to store the input to the work memory A
TM. The OUTl REC- and the OUT2 REG are output ports of the PIP and are controlled so that data can be independentl~y sent thereto. The B4 M~'X is used to select one of the outputs of the B AL~, the A4 MUX, and the C ALV.
The outputs of the A4 REG and the A5 REG
undergo a selection b~J the A5 MUX, and the selected output is stored in the A TM, the A6 REG, and A7 RFG
and the A5 REG. The data can be naturall,~ stored in any one there~f. The A TM has a bidirectional input/output function. When an output is effected by the A TM, neither of the outputs from the A4 REG and the A5 ~EG i~ selected by the A5 MUX, and the output of t'he A TM is stored in the A5 REG, the A6 REG, and the A7 REG. The A5 REG serves to shift the address of the A TM. More concretely, the dela~ pr~cessing of _ 34 ~

~2'7~5~
~,, rE~
S~3267 the digital filter can be effectively performe~. T;~e A7 REG is a register to send data from part A to part B. The output of ~he A7 REG is delivered to the B~
MUX. This provision is effective for a shading operation in which data is squared in part A and the resultant data is multiplied by a value in part B.
Since this applies also to part B, the description thereof will be omitted.
The C ALU is located at an intermedia'e point between the arithmetic section and the con'ro' section. The data selected by the A3 MUX is supplied as an input to the C ATU and, after undergoins an arithmetic operation in the C AL~, is trar.smittec to the CM REG, the T~, REG, the VECT REG, and the B4 M~X.
The arithmetic function of the C A'~' is the same as that of the A ALU and the B ALU. The CM REG is a register circuit to store the addresses of the coefficient memories A CM and B C~., and the T.~. REG is a register circuit to store the addresses of the work memories A T~ and B TM. The VECT REG is a register circuit to store the iteration count of a program loop and the jump destination to be used in the program controller (PRGCNT) of the control section. Throuch the bus to the B4 MUX, the result of an arith~etic operation in the C ALU can be returned to the processing section. This enables use of the C AL~
also as an auxiliary apparatus for the A ALU and the B
~U .
With the provision of the CM REG and TM
REG, the data of the processing section can be used as addresses of the coef'icient memory and the work memory, and hence look-up table processing is PATE~'T
, S03~6' facilitated. In a case where FFT (fast Fourier transform) processing is to be effected, butterfly operation is achieved bv use of the A MP~, the A AT~, the B MP~, and the B ALU, and the addresses of the A
TM and the B TM storing data and the addresses ~f the A CM and the B CM containing coefficients ~sin, cos) are computed by use of the C ALLl. For butterfly operatior., the real part and the imaginary part of each complex number are processed si~ultaneously in parts A and P, respectively. Since the arithmetic operations of the real and imaginary par s can be accomplished at the same time, the load of the addressing operation for the data and coefficients can be reduced; consequently, the overall processing efficiency is improved and the processing speed is increased. This is an effect obtained by the provision of two systems including parts A and B. The TM REG and the C~ REG comprise four registers, and hence the same address need not be calculated in the C
AL~, which increases the efficiency thereof.
Although, in this example, because of physical restrictions such as the size of the circuit board, parts A and B are not symmetrical, the circuits may be made symmetrical.
Many modifications of the preferred embodiments of the invention disclosed above will readily occur to those skilled in the art upon consideration of this disclosure. All such modifications are intended to be included within the invention, and the invention is limited only by the appended claims.

Claims

PATENT

WHAT IS CLAIMED IS:

1. An arithmetic processor for multiplying a first number that has an absolute value that car.
exceed one by a second number that has an absolute value not exceeding one; said processor comprising:
a multiplier;
control means responsive to an absolute value of said first number exceeding one for dividing said first number into an integer and a part having a value less than one;
accumulating means for accumulating said second number as an addend a number of times equal to said integer to produce a sum;
storage means for supplying said second number and said part to said multiplier to produce a partial product; and adder means for adding said partial product to said sum, thereby obtaining a final product of said first and second numbers.

2. An arithmetic processor according to claim 1; wherein said second number represents data and said first number represents a coefficient of said second number.

3. An arithmetic processor according to claim 1; wherein said accumulating means comprises a selector connected to receive outputs from said multiplier and said storage means and selectively to pass said partial product or said second number to said adder means and a storage register responsive to the output of said adder means.

PATENT

4. An arithmetic processor according to claim 1; wherein said storage means comprises an input register and a work memory and further comprising a second selector connected to receive outputs from said input register and said adder means and selectively pass said second number or an output of said adder means to said work memory.

5. An arithmetic processor according to claim 4; further comprising a register to receive an output from said work memory and to supply an output to said second selector, thereby enabling calculation by said multiplier and adder means and recirculation of data through said work memory to proceed simultaneously.

6. An arithmetic processor according to claim 1; wherein said storage means comprises a coefficient memory and a register, said coefficient memory being connected to supply an output to a first input of said multiplier and to said register and said register being connected to supply an output to a second input of said multiplier, whereby the same coefficient can be supplied to two inputs of said multiplier for calculating the square thereof.

7. An arithmetic processor according to claim 1; wherein said storage means comprises an input register and said arithmetic processor further comprises an additional register connected to receive data from said input register; said storage register being connected to supply an output to a first input PATENT

of said multiplier and said additional register being connected to supply an output to a second input of said multiplier, whereby the same data from said input register is supplied to two inputs of said multiplier for calculating the square thereof.

8. An arithmetic processor according to claim 1; wherein said multiplier, control means, accumulating means, storage means and adder means form a first part of said arithmetic processor; further comprising:
a second multiplier; second control means, second accumulating means, second storage means and second adder means respectively corresponding in structure and function to said multiplier, control means, accumulating means, storage means and adder means and forming a second part of said arithmetic processor connected in parallel with said first part;
said first and second parts respectively and simultaneously operating on real and imaginary parts of complex numbers.

9. An arithmetic processor comprising:
a multiplier capable of multiplying two numeric values each having an absolute value not exceeding one;
means operative in response to one of said two numeric values exceeding one for dividing said one value into an integer and a part having a value less than one;

PATENT

means for supplying the other of said two numeric values and said part to said multiplier to form a partial product; and means for taking said other numeric value as an addend a number of times equal to said integer to form a sum and adding said sum to said partial product, thereby obtaining a final product of said two numeric values.

10. An arithmetic processor comprising:
an input resister;
an arithmetic section connected to said input register for performing arithmetic operations on data supplied to said input register;
a work memory having a write input and a selector connected between said input register and said write input for selectively supplying data from said input register to said write input, whereby data from said input register can be transferred to said work memory, thus reducing the required capacity of said input register.

11. An arithmetic processor comprising:
an input register;

an arithmetic section connected to said input register for performing arithmetic operations on data supplied to said input register;
a work memory having a write input and an output, said output also being connected to said arithmetic section; and PATENT

a selector connected between said input register and said write input and between said output and said write input;
data from said input register and said output being selectively supplied to said write input via said selector;
whereby data from a first address in said work memory can be supplied to said arithmetic section and simultaneously recirculated through said selector for storing in said work memory at a second address shifted with respect to said first address.

12. An arithmetic processor comprising:
a multiplier having two input terminals;
a coefficient memory producing an output; and means connected to said multiplier and said coefficient memory for supplying said coefficient memory output to both of said input terminals for calculating the square thereof.

13. An arithmetic processor comprising:
a multiplier having two input terminals;
an arithmetic logic unit producing a logic output; and two selectors respectively connected to said two input terminals and responsive to said logic output;

whereby said logic output is supplied to both of said input terminals for calculating the square thereof.

14. An arithmetic processor comprising:
a multiplier having two input terminals;
a coefficient memory producing an output' means connected to said multiplier and said coefficient memory for supplying said coefficient memory output to both of said input terminals of said multiplier so that said multiplier calculates the square thereof; and an arithmetic logic unit provided with the square from said multiplier and for producing a logic output.

15. An arithmetic processor comprising:
a multiplier having two input terminals;
an arithmetic logic unit producing a logic output; and two selectors respectively connected to said two input terminals and respective to said logic output;
whereby said logic output is supplied to both of said input terminals of said multiplier and for calculating the square thereof.