WO2004003785A1 - 離散コサイン変換(dct)を実行するために用いるdctプロセッサ - Google Patents
離散コサイン変換(dct)を実行するために用いるdctプロセッサ Download PDFInfo
- Publication number
- WO2004003785A1 WO2004003785A1 PCT/JP2003/008222 JP0308222W WO2004003785A1 WO 2004003785 A1 WO2004003785 A1 WO 2004003785A1 JP 0308222 W JP0308222 W JP 0308222W WO 2004003785 A1 WO2004003785 A1 WO 2004003785A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- storage
- output
- dct
- input
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/147—Discrete orthonormal transforms, e.g. discrete cosine transform, discrete sine transform, and variations therefrom, e.g. modified discrete cosine transform, integer transforms approximating the discrete cosine transform
Definitions
- DCT processor used to perform the discrete cosine transform (DCT)
- the present invention relates to a DCT processor used to perform a discrete cosine transform (hereinafter "DCT").
- DCT discrete cosine transform
- Discrete cosine transform is used to convert data expressed as values along the time axis into data divided into frequency components.
- the DCT algorithm that realizes the discrete cosine transform is used for wide-area applications such as image processing represented by MPEG and MP3 and sub-band filters for frequency.
- a DCT processor that executes the DCT algorithm typically requires a large number of add-subtracters, multipliers, and a large number of crossbar switches.
- the number of components required increases rapidly as the number of sampling points increases. The main reason is the complexity of palm tasting data.
- the increase in the number of components increases the function block area of the DCT processor, increases the power consumption of peripheral logic wiring, increases the maximum wiring length of peripheral logic, and further increases the processing power of the processor. This lowers the speed and increases the output latency.
- a typical DCT algorithm is a so-called LEE algorithm.
- Performing a discrete cosine transform at 32 sampling points using a DCT processor that implements this LEE algorithm requires at least 273 additions, a subtractor and 80 multipliers, and a much larger number. Number of Qu A loss bar switch is required. In fact, all these huge resources
- the present invention is intended to solve the above-described conventional problems, and is based on a predetermined DCT algorithm that has been conventionally developed, and is developed by focusing on the regularity of data flow in the DCT algorithm. Promotes resource assurance using the specially designed processing memory, thereby reducing the number of required components, reducing the function block area, reducing power consumption, and increasing the processing speed. Or to reduce the output latency.
- the present invention is directed to a DCT processor used for performing a discrete cosine transform, a storage and processing device for performing palm staking on data, and a computing device for performing calculations according to a predetermined DCT algorithm
- the data is looped a predetermined number of times between the storage / processing device and the computing device, and a result of the discrete cosine transform is obtained based on the data read from the computing device. It has a floor.
- the present invention also provides a DCT processor used to execute a discrete cosine transform, comprising: a plurality of input units and a plurality of output units; and a plurality of data input through the plurality of input units. Output from the plurality of output units.
- a plurality of input units and a plurality of output units connected to an output unit of the storage and processing device, wherein data input from the plurality of input units is specified.
- a computing device that calculates in accordance with the DCT algorithm and outputs the data from the plurality of output units; and a plurality of inputs connected to the output unit of the computing device and a plurality of outputs connected to the input unit of the storage processing unit.
- a storage device that stores a plurality of data output from an output unit of the computing device and outputs the data from the plurality of output units; provided between the storage device and the storage processing unit; A rearranging unit for rearranging the data from the storage device in a predetermined order; and a plurality of data output from an output unit of the storage device.
- Storage device and parallel And a loop for performing a predetermined number of loops between them in these order, and obtaining a result of the discrete cosine transform based on the data read from the output unit of the storage device.
- the present invention relates to a 32 point DCT processor for performing a discrete cosine transform on 32 data obtained by sampling at 32 sampling points;
- a storage and processing device capable of inputting and outputting a total of 32 data, eight times at a time, a total of four times, and a total of thirty-two data at a time through the eight input parts.
- the storage and processing device which outputs a total of 32 data sequentially written to a predetermined storage location in a predetermined order to the eight output units in a predetermined order; It has four input parts and four output parts respectively connected to four of the eight output parts of the storage and processing unit, and is inputted from the eight input parts.
- Data is calculated four at a time according to the CGA-DCT algorithm and the four outputs And two output units respectively; and eight input units connected to a total of eight output units of the two calculation units and a total of eight input units of the storage and processing unit, respectively. It has eight connected output units, and writes a total of 32 data, eight times at a time, a total of 32 times, in a first-in first-out manner.
- a storage device capable of reading the data wherein the storage device stores a total of 32 data in total of eight data output from each output unit of the computing device, a total of 32 data; And a rearrangement means provided between the storage device and the storage and processing device for rearranging data from the storage device in a predetermined order.
- the above-described DCT processor further includes an input unit for inputting data from outside to inside the DCT processor, immediately before the storage processing unit, or between the storage-processing unit and the computing unit. Alternatively, it may be provided between the computing device and the storage device.
- the storage processing unit does not change the association between the input data and the output data when writing the data to the predetermined storage location! / Operating in one of an operation mode and a second operation mode for changing the association between the input data and the output data, wherein the data from the input means is the first operation It may be processed in mode.
- a total of 32 pieces of data processed by the storage and processing device operating in the first operation mode are sequentially processed by the calculation device and the storage device, and thereafter, the storage device After processing a total of 32 pieces of data read from the output unit of the storage unit by the rearranging unit, the storage / processing unit operating in the second operation mode, the calculation device, the storage device, And the rearranging means loops between them in this order, and obtains a result when the data is subjected to discrete cosine transform based on the data read from the output unit of the storage device at the time of the fourth loop. You may make it so.
- the storage processor has a total of three two storage position location, the three two data, these 3 any two storage location They may be written and read one by one.
- a write line and a read line are provided at each of the 32 storage locations, and the write line and the read line are wired perpendicularly to each other, and are shared when writing and reading data. It is not necessary.
- the predetermined rearrangement may be performed by crossing a transmission line between the storage device and the storage / processing device.
- a rearrangement device may be used to output a total of 32 data to the eight output units in a predetermined order.
- the storage and processing device may be an 8 R / W memory circuit
- the computing device may be a DCT circuit
- the storage device may be a FIFO.
- the rearrangement means may be provided in a storage / processing device.
- the present invention is a storage and processing device used in a DCT processor used to execute a discrete cosine transform, wherein the data is looped a predetermined number of times with a calculation device that performs a calculation according to a predetermined DCT algorithm, It is characterized in that palmtation is performed on data so as to obtain a result of discrete cosine transform based on data read from a computing device.
- the present invention is a storage and processing device used in a DCT processor used to execute a discrete cosine transform, wherein the storage processing device has a plurality of input units and a plurality of output units, Outputting a plurality of data input through a plurality of input units from the plurality of output units after performing palm tasting; the DCT processor further comprising: a plurality of data units connected to an output unit of the storage-processing device. And a plurality of output units. The data input from the plurality of input units is converted into a predetermined DCT.
- a computing device that calculates according to an algorithm and outputs from the plurality of output units; a plurality of input units connected to the output unit of the computing device; and a plurality of output units connected to the input unit of the storage and processing device.
- a storage device for storing a plurality of data output from an output unit of the computing device and outputting the data from the plurality of output units; provided between the storage device and the storage-processing device; Reordering means for reordering data from the device in a predetermined order; and a plurality of data output from an output unit of the storage device, the storage and processing device, the computing device,
- the storage device and the rearranging unit loop a predetermined number of times between them in these order, and obtain a result of the discrete cosine transform based on the data read from the output unit of the storage device. It is characterized by There.
- the present invention relates to a storage and processing device used in a 32 point DCT processor which performs a discrete cosine transform on 32 data obtained by sampling at 32 sampling points.
- the storage and processing device has eight input units and eight output units, and can input and output 32 data in total 8 times at a time, a total of 32 times, A total of 32 data sequentially written to a predetermined storage location, eight times at a time through the eight input parts, a total of thirty-two data are output to the eight output parts in a predetermined order.
- the DCT processor further comprises: four inputs and four outputs respectively connected to four of the eight outputs of the storage and processing device; CGA—DCT algorithm for data input from the above eight input units four at a time Accordingly, two computing devices that calculate and output from the four output units, respectively; eight input units respectively connected to a total of eight output units of the two computing devices, and the storage unit
- the processor has eight output sections connected to a total of eight input sections, respectively, eight at a time for a total of four times, for a total of 32 data in a first-in first-out manner.
- a storage device that can be written and read, wherein a total of 8 data output from each output unit of the computing device is output 4 times in total, and a total of 32 data is output.
- the storage device which is provided between the storage device and the storage processing device, and rearranges the data from the storage device in a predetermined order; It is characterized by. BRIEF DESCRIPTION OF THE FIGURES
- FIG. 1 is a block diagram of a DCT processor according to the present invention.
- FIG. 2 is a diagram illustrating each function of a functional element of the DCT circuit.
- FIG. 3 is a diagram showing a block diagram of the 8 R / W memory circuit.
- FIG. 4 is a circuit diagram of the 8 RZW memory circuit.
- Figure 5 is a diagram that visually shows the effect obtained by performing palmation.
- FIG. 6 is a diagram showing a sequence flow of the discrete cosine conversion process.
- FIG. 7 is a data flow graph showing a data flow of data generated by the discrete cosine transform process.
- FIG. 1 shows a block diagram of the DCT processor 1 according to the first embodiment of the present invention.
- This DCT processor 1 is capable of performing a discrete cosine transform on 32 sampling data obtained by sampling at 32 sampling points.
- DCT processor The number of bits of sampling data is determined by the designer. 2003/008222 You can decide freely. For example, one word (16 bits) is used.
- the DCT processor 1 includes two circuits (hereinafter, referred to as “DCT circuits”) 3 and 3 ′ that can perform calculations required for the discrete cosine transform in accordance with a predetermined DCT algorithm.
- DCT circuits special-structured 8-read Z-write port SRAM memory circuit (hereinafter referred to as “8R / W memory circuit”) 5 developed by paying attention to the regularity of data flow in the algorithm used in the circuit.
- DCT circuit 3 and the DCT circuit 3 ′ can be considered to be exactly the same.
- the algorithm used in these DCT circuits 3, 3 is described here in particular in "Constant Geometry for DCT” published by Jakko Altola and David Akopian in 1999 and 2000. ⁇ Contemporary Geometry Algorithm for DCT (CGA-DCT) [4] ”(hereinafter referred to as“ CGA-DCT algorithm ”).
- the 8 RZW memory circuit 5, the DCT circuits 3, 3 ', and the FIFO are connected to each other in a loop in this order. More specifically, the eight output sections 53 of the 8 R / W memory circuit 5 are connected to the input sections 31 of a total of eight DCT circuits 3, 3, and a total of eight DCT circuits 3, 3, 3.
- the output section 33 of the FI FO 7 is connected to the eight input sections 71 of the FI FO 7, and the eight output sections 73 of the FI F 07 are connected to the eight input sections of the 8 R / W memory circuit 5, respectively. I have.
- the data can be looped between them in a desired number of times in this order. In other words, the same processing can be repeatedly performed on the data a desired number of times.
- the arrows in the figure indicate the direction in which data flows.
- the data is Cross section (corresponds to “sorting means” in the claims) By 4 and 4 ′, or more specifically, by crossing the transmission line provided between them at two places, sorting of some data is performed ( (Replacement). By this arrangement, palm tasting becomes possible, which will be described later.
- the DCT circuits 3 and 3 ' are each composed of six adder' subtracters 35a to f, two multipliers 36a and 36b, and two selectors (sel) 37a and 37b. , Two masks (msk) 38a, 38b, and one ROM39.
- Four input units 31 and four output units 33 are provided for each of the DCT circuits 3 and 3 ′, so that each of the DCT circuits 3 and 3 ′ inputs four data at a time and Can be output.
- the four data input from the input unit 31 of each of the DCT circuits 3, 3, are calculated according to the CGA-DCT algorithm, and then output as four data from the output unit 33. Become.
- DCT circuits 3, 3 The configuration of DCT circuits 3, 3 is described in a paper by Jarmo Astola, David Akopian et al. It is the same as that described in April, but its composition is outlined below, but for details, see the above paper.
- each of the DCT circuits 3, 3, and 3 shown in Fig. 1 realizes the functions of the functional elements of modes 0 to 2 shown in a) to c) of Fig. 2 with a single circuit. You can think of it as possible. Therefore, before explaining the DCT circuit of FIG. 1, first, each function of the functional element shown in FIG. 2 will be described.
- the functional element in mode 0 consists of four adder'subtractors 35a-e and two multipliers 36a, 36b.
- the functional element of mode 1 further includes one addition / subtraction unit 35 f in addition to these members, and the functional element of mode 2 further includes one addition / subtraction unit 35 f.
- the number of additions and subtractors and multipliers provided in the mode 2 functional element is the same as that of the DCT circuits 3 and 3 'in Fig. 1, but the mode 2 functional element has As shown in the DCT circuits 3 and 3 'in FIG. 1, the selectors 37a and 37b, the masks 38a and 38b, and the ROM 39 are not provided. This is because these members are mainly used only for selecting the mode.
- the AC obtained at the position A is multiplied by a coefficient d (n) corresponding to the value of n by the multiplier 36a, and then the output position is exchanged by the cross section 40d. Is multiplied by the coefficient d (n) by the multiplier 36b.
- a + C, B + D, (A—C) X d (n), and (B— D) X d (n) is obtained.
- the result of d (n) — (B + D) is A + C, B + D, (A-C) at each output A ',, ⁇ D' ,, X 2 d (n) — (A + C), (B—D) X 2 d (n)
- One (B + D) is obtained.
- the DCT circuits 3, 3 combine the functional elements of modes 1 to 3 described above with two selectors 37a, 37b, two masks 38a, 38b, and one ROM 39. By using this, it can be realized with one circuit.
- the selectors 37a and 37b are respectively for selecting one of the two adder-subtracters 35b and 35d connected to them.
- the mask 38a is used to send or not send a signal to the adder / subtractor 35e connected to it, and the mask 38b is used to send a signal to the subtractor 35f connected to it. belongs to.
- the ROM 39 stores information necessary for controlling the selectors 37a and 37b and the masks 38a and 38b.
- This information is of two tapes Le, i.e., are stored as coefficient table 4 1 a to the command table 4 1 b.
- the coefficient table 4 la stores the calculation formula of the above equation 1). After obtaining the value of n to be used, each multiplier calculates a coefficient using this coefficient table 4 la.
- the command table 41b stores a selector to be selected or a mask according to a mode to be selected.
- the DCT circuits 3, 3, select which of the selectors 37a, 37b should be selected according to each mode based on the information of the command table 41b, or select one of the masks 38a, 38b. Know if you should choose
- the command table 41b will be further described.
- the processing units (not shown) of the DCT circuits 3 and 3 ′ operate based on the information in the command table 41 b, and select the selector 37 a using the adder / subtractor 35. b, 35 d to select the adder / subtractor 35 b, while the selector 37 b selects the adder / subtractor 35 b, 35 d to select the adder / subtractor 35 d And issues a command to those selectors 37a and 37b.
- the part (not shown) is that the mask 38a is connected to the addition 'subtractor 35 e so as not to send the signal from the subtracter 35a, while the mask 38b is The adder / subtractor 35 f is instructed so as not to send the signal from the adder / subtractor 35 c to the adder / subtractor 35 f connected thereto.
- the DCT circuits 3 and 3 when functioning as a mode 2 functional element, the DCT circuits 3 and 3 'operate in the same manner as in the mode 1 for the selector 37a, the selector 37b, and the mask 38a, and perform the operation for the mask 38b. , And instructs them to send the signal from the adder / subtractor 35c to the adder / subtractor 35f connected to it.
- the DCT circuits 3 and 3 ′ are arranged such that the selector 37 a selects the addition / subtraction unit 35 d among the addition / subtraction units 35 b and 35 d.
- the selector 37b issues a command to the adder / subtractor 35b to select the adder / subtractor 35b among the adder / subtractor 35d.
- the DCT circuits 3 and 3 add the mask 38a to the adder / subtractor 35e connected to it.
- the mask 38b is used to send the signal of the subtractor 35a.
- the DCT circuits 3 and 3 ′ in FIG. 1 can realize all the functions of the functional elements in modes 1 to 3 in FIG.
- the FI FO 7 is similar to a generally used FI FO, and is a storage device that can write and read data in a first-in first-out manner.
- FI FO 7 here we use FI FO 7 with 8 x 4 debs.
- the number of data that can be written and read at one time by this FIFO 7 is 8, and it is necessary to write 32 such 8 data 4 times in a row.
- a total of 32 data can be read four times in a row.
- a total of 32 data can be stored in the FIFO 7 at one time.
- the purpose of providing the FIFO 7 is to temporarily store data, in other words, to delay the data to make the operation relatively slow, and to enable processing by the 8 R / W memory circuit 5. Therefore, if the operation of the 8R / W memory circuit 5 is sped up by the advance of technology, it is considered that the FIFO 7 is not always necessary.
- the 8 RZW memory circuit 5 is a memory specially developed for the DCT processor 1 of the present invention. However, the 8 RZW memory circuit 5 does not simply function as a storage device, but has a main purpose of performing processing necessary for executing the discrete cosine transform, that is, performing permutation.
- the structure of the R / W memory circuit is to reduce the number of components by sharing the various components (resources) conventionally used, in other words, to promote resource assurance. It can be said that.
- the use of the 8 RZW memory circuit greatly reduces the number of components required for the processor, for example, the number of adders, subtractors, multipliers, and crossbar switches.
- the data that can be input and output at one time by the 8 R / W memory circuit 5 is eight, similar to FIF07, and such eight data are consecutively output four times.
- a total of 32 can be written (input), and a total of 32 can be output four times in a row. Further, a total of 32 data can be stored in (the memory of) the 8 R / W memory circuit 5 at a time.
- FIG. 1 shows a block diagram of the 8 RZW memory circuit 5.
- the 8 RZW memory circuit 5 of the present invention writes 32 memory blocks 0 to 31 (corresponding to “storage, position” in the claims) and these memory blocks 0 to 31 8 write lines 52 ah, for each of these memory blocks 0 to 31 to read data from 8 read lines 53 ah, each memory block 0 to 31 Eight transmission lines 54a-h for transmitting data and four crossbar switches 58a-d (corresponding to the "reordering device” in the claims) are provided.
- the write lines 52a-h and the read lines 53a-h are connected to a write enable section and a read enable section, respectively.
- FIG. 4 also shows an actual circuit diagram of the 8 R / W memory circuit 5 for reference.
- the crossbar switches 58a-d are not shown in this figure.
- WL 0-7 are on write lines 52a-h
- RL 0-7 are on read lines 53a-h
- (0)-(7) are on transmission lines 54a-h, respectively. Corresponding.
- One data can be written to and read from each of the memory blocks 0 to 31.
- Each of the memory blocks 0 to 31 is distinguished by a numeral of 0 to 31. These numbers can be said to indicate storage locations where data is stored.
- block group 59a includes memory blocks 0, 16, 6, 22, 22, 8, 24, 14, 30.
- the transmission lines 54 a to h are allocated to each block group 59 a to d by eight, and one to each memory block 0 to 31 in a certain direction (vertical direction in the drawing). ) Provided. Data output from the FIFO 70 and the like is transmitted through these transmission lines 54a to 54h.
- the write lines 52 are provided so that two ff lines are assigned to each block group 59 and eight ff lines are assigned to each of the memory blocks 0 to 31. Each write line 52 intersects with four of the eight memory blocks in each block 59 in a direction orthogonal to the transmission lines 54a-h. These write lines 52 can be enabled two at a time by the operation of the write enable section 60. When a signal is present on the data transmission line 54 and the write line 52 is enabled, data is written to any of the memory blocks at the intersection of those lines. Each write line intersects four memory blocks, and since two write lines are enabled at a time, a total of eight data points can be written to any one memory block in one write. Will be written.
- Eight read lines 53 a to h are provided for each block group 59 a to d in the same direction as the data transmission lines 54 a to h.
- Each read line 53a-h intersects with eight memory blocks (two for one block group 59a-d), but does actually work during one read operation. Only one for each block gnore 59 a-d, and therefore only for a total of four memory blocks.
- These read lines 53 a to h are enabled two at a time by the function of the read enable section 61, similarly to the write lines 52. It is the four memory blocks that each read line 53 actually acts during a single read operation, and two read lines are enabled at a time. As a result, a total of eight data items are read from one of the memory blocks in one read.
- the line is not shared between writing and reading, and the writing line 52 and the reading line 53 are perpendicular to each other. Wired.
- the crossbar switches 58 a to 58 d are used for appropriately rearranging the data read from each of the memory blocks 0 to 31 before outputting the data from the 8 RZW memory circuit 5.
- eight data read by enabling the read line 53a and the read line 53e in other words, eight data read in the first phase [0], [15], [14], [1], [2], [13], [1 2, [3] are defined by the crossbar switches 58 a-d, [0], [1], [2], [3], Sorted in the order of [12], [13], [14], [15].
- [n] (n is an integer from 0 to 31) is stored in a memory location n such as a memory block 0 to 31. Indicates the content (value) of the data that is
- the first read of the owl, [0], [1], [2] ⁇ [3], [15], and [14] are enabled by enabling read line 53b and read line 53e, respectively.
- [13], and [12] are read out and these —The data were sorted by crossbar switch 58 in the order [0], [1], [2], [3], [1 2], [13], [14], [15], Output from 8RZW memory circuit 5.
- the second read by enabling read line 53a and read line 53f, [6], [7], [4], [5], [9], [8], [1 1], [10] are read out and these data are read by crossbar switch 58 [4], [5], [6], [7], [8], [9] , [10], and [11], and then output from the 8RZW memory circuit 5.
- the third read by enabling read line 53d and read line g, [16], [17], [18], [19], respectively, [31], [30], [29], [28] are read, and these data are cross-switched 58 to [16], [17], [18],
- the fourth read by enabling read line 53c and read line 53h, [22], [23],
- [20], [21] and [25], [24], [27], [26] are read, and these data are cross-switched 58 to [20], [21], [22], After being rearranged in the order of [23], [24], [25], [26], [27], the data is output from the 8 R / W memory circuit 5.
- the data can be output in a predetermined order. Furthermore, the order of the data output from the 8 R / W memory circuit 5 is determined based on the fact that the memory block 0 to 31 of the 8 RZW memory circuit 5 stores the data. You can control it.
- Writing data to the memory blocks 0 to 31 of the RZW memory circuit 5 is performed externally from the DCT processor, for example, from a CPU (not shown), or in a loop inside the DCT processor. There are two cases in which this is performed via FIFO 7 or the like.
- a means for inputting data from outside the DCT processor to the inside of the DCT processor for example, outside the DCT processor It is assumed that there is a signal line (not shown) for transmitting data from the RZW memory circuit 5 to the input unit 51.
- These signal lines need only be electrically connected to the input section 51 of the 8 R / W memory circuit 5 between the cross sections 4, 4 (and FIFO 5) and the 8 RZW memory circuit 5.
- a crossing point between the dashed-dotted line A and the signal line 24 may be indirectly connected to them via a selecting means such as a selector provided at the same.
- a signal line for external data and a loop-shaped signal line 24 are both connected to the selecting means, so that only one of the signal lines is selected.
- the operation of the RZW memory circuit differs between processing data from outside the DCT processor, that is, data from input means, and processing data from inside the DCT processor.
- first operation mode corresponding to the “first operation mode” in the claims
- second operation mode corresponds to the “second operation mode” of the above.
- the DCT processor After writing the data from the FI FO) to the 8 R / W memory circuit, the data is output.
- the selection means in the first operation mode, a signal line for external data is selected so that data from the CPU or the like is transmitted to the 8 R / W memory circuit.
- the selection means in the second operation mode, the selection means is appropriately switched so that the signal line 24 is selected and the data from the FIFO is transmitted to the 8R / W memory circuit.
- the data comes from the input means (not shown) and therefore does not go through the cross sections 4, 4 '(see Fig. 1) and also changes the data mapping Do not write to 8 RZW memory circuit.
- the data is from the FIFO 5, and after the data is rearranged in the cross sections 4 and 4 ', the correspondence of the data is changed so that 8 R / W Write to memory.
- palm output is performed on data output from the 8RZW memory circuit.
- the data write position follows the numbers 0 to 31 assigned to the memory blocks 0 to 31. That is, data [0] corresponds to memory block 0, [1] corresponds to memory block 1, [2] corresponds to memory block 2, and so on.
- the data read in the first phase is [0], [1], [2], [3], [12], [13] before writing. , [14], [15] correspond to data [0],
- [1], [2], [3], [12], [13], [14], [15], and the data read in the second phase is [4], [5] before writing.
- [6], [7], [8], [9], [10], [11] correspond to [4], [5], [6], [7], [8], [9], [10], [11], and the data read by the third fuse is [16], [ 17],
- the order of data to be read is simply controlled.
- the data is rearranged by the cross sections 4, 4 (see FIG. 1) provided between the FIFO and the 8RZW memory circuit, the data is It is written to the 8 RZW memory so that the association is changed.
- the data write position does not always follow the numbers 0 to 31 assigned to each memory block. Therefore, in the second operation mode, the order of the data to be read is controlled after the association of the data is changed, that is, after the data is subjected to palm tasting.
- Figure 5 provides a visual representation of the benefits of running a palm tent.
- the numbers not surrounded by [] are the memory block numbers 0 to 31.
- the original data [0] to [31] shown on the left side were converted to [0]
- the first phase data [0], [1], [2], [3], [1 2], [1 3], [1 4], [1 5] before the palmation is First, [0], [2], [1], [3], [1 2], [1 4], [1 3], [1 5] are sorted in the order of the cross section 4, 4 '( (See Figure 3 at 56.)
- the write lines 52a and 52d the memory blocks 0, 1, 16 and 17 and 6, 7, 22 and 23 are written, respectively.
- the data of the second phase [4], [5], [6], [7], [8], [9], [10], [11] before the palm tasting are First, [4], [6], [5], [7], [8], [10], [9], and [11] are rearranged by the cross sections 4, 4 '(Fig. 3 Then, by enabling the write lines 52e, 52h, the memory blocks 2, 3, 18, 19, and 4, 5, 20, 21 are written to these memories, respectively. The blocks will be read out as [2], [3], [18], [19], [4], [5], [20], [21].
- FIG. 6 shows a DCT processor 1 according to the present invention and its peripheral devices (not shown). ), And FIG. 7 shows a data flow graph representing a data flow of data generated by this processing.
- the numbers of the steps (ST) shown at the top of FIG. 7 correspond to those of FIG.
- the processing performed by the DCT processor 1 of the present invention is only Steps 3 to 10 among Steps 1 to 11 shown in FIG.
- the remaining steps 1, 2, and 11 are to be performed by peripheral devices such as a CPU.
- peripheral devices such as a CPU.
- a peripheral device that performs the processing of steps 1, 2, and 11 in addition to the processing by the DCT processor 1 of the present invention is required.
- these processes may be performed by the DCT processor 1 of the present invention by a design change.
- step 1 data to be subjected to discrete cosine transform is sampled at 32 sampling points to obtain 32 sampled data [0] to [31].
- step 2 these sampling data are rearranged in the order shown in FIG.
- This rearrangement is a process necessary for appropriately performing the subsequent processes.
- the data is rearranged in the order of [0], [1], [31], [30], and so on.
- step 3 these data are operated in the first operation mode.
- the writing here is performed by, for example, a CPU or the like, and is performed by FIF05 (see FIG. 1).
- each data is written to a memory block corresponding to each of the numbers [0], [1], [31], [30],. That is, [0] is the memory block
- step 4 these data are processed by two DCT circuits 3, 3 '(see FIG. 1) (ST4).
- the eight output sections of the 8RZW memory circuit 5 are connected to the four input sections of the two DCT circuits 3 and 3 ', respectively, so that 8 [0], [1], [2], and [3] of the data read in the first phase by the RZW memory circuit 5 are DCT circuits 3, while [1 2] and [1 3] , [14], [15] are processed by the DCT circuit 3,.
- [4], [5], [6], and [7] of the data read in the second phase are the DCT circuit 3, while [8], [9], [10], [11] is processed by the DCT circuit 3,.
- [16], [17], [18], [19] of the data read in the third phase are the DCT circuit 3, while [28],
- [29], [30], and [31] are processed by the DCT circuit 3 '.
- [20], [21], [22], and [23] of the data read in the fourth phase are the DCT circuit 3, while [24], [25], and [26] ] And [27] are processed by the DCT circuit 3 '.
- the data range processed by one DCT circuit 3, 3 is boxed.
- four data are processed in each square.
- ⁇ n> (n is an integer from 1 to 4) in each square means that those data are processed in phases 1 to 4, respectively.
- two squares with ⁇ 2> mean that they are processed in the second phase, and the data processed at that time is [4] to [7] or [8] to [1] 1], which means that they are processed by DCT circuit 3 and DCT circuit 3, respectively.
- the two squares with ⁇ 3> indicate that they are processed in the third phase, and the data processed at that time is [16] ⁇ ! : 1 9] or [27] to [31], respectively, 0 ⁇ 1: circuits 3 and 0. This means that the signal is processed by the circuit 3.
- the mode (mode) n (n is an integer from 0 to 2) in each square represents the mode used by each DCT circuit 3, 3, and d (n) (n is 1 to Integers up to 31) represent the coefficients of the multiplication used in the multipliers 36a and 36b (see FIG. 1) of each DCT circuit 3, 3 '.
- the DCT circuit 3 functions as a mode 0 functional element (see a in Fig. 2), and the multiplier 36a of the functional element in mode 0 uses a coefficient d (16).
- the multiplier 36b uses a coefficient of d (24).
- the DCT circuit 3 functions as a mode 0 functional element (see a in FIG. 2).
- step 5 the eight data processed by each of the DCT circuits 3, 3, and 3 are sequentially written to FIFO7 (see FIG. 1), and after all 32 data have been written, again Read sequentially.
- the data read at this time are [0], [1], [2], [3], [1 2], [1 3], [1 4], [15],
- the second phase [4], [5], [6], [7],
- step 6 the data is rearranged by the cross sections 4, 4, (see FIG. 1). Due to this rearrangement, the data from the FIFO becomes in the order shown in 56 in FIG. That is, the first phase data is [0], [2], [1], [3], [1 2], [14], [13], [15], and the second phase data is [4], [6], [5], [7], [8], [10], [9], [1 1], and the data of the third fuse are [16], [18], [ 1 7], [1 9], [28],
- Phase 4 data includes [20], [22],
- step 7 the data read from the FI F07 is returned to the RZW memory circuit 5 (FIG. 1) that operates in the second operation mode (the circuit operation is looped). ), And are written and read there.
- palm tasting is performed, and the first phase data [0], [1], [2], [3], [12], [13], [14], [15] [0], [16],
- step 8 the data subjected to palm tasting is processed again by the DCT circuits 3, 3,. This process is similar to that described in step 4.
- step 9 the data from the DCT circuits 3, 3, 3 is processed again by the FIFO 7. This process is similar to that described in step 5.
- step 10 the rearrangement is performed again by the cross sections 4, 4,... (See FIG. 1). This process is similar to that described in step 6.
- Step 7 to Step 10 is repeated three more times (in terms of circuit operation, "loop").
- the processing of steps 7 to 10 is repeated a total of four times ( ⁇ , as apparent from the following description and Fig. 6, the processing of step 10 in the fourth noraping is It is irrelevant and may be omitted).
- step 11 at the end of the fourth loop, the data obtained after the processing of step 9, that is, the data from FIF07 (see Fig. 1), [0], [4],
- the input means (not shown) for inputting data from the outside to the inside of the DCT processor is provided between the 8 R / W memory circuit 5 and the DCT circuits 3 and 3 '(one point, line B and At the intersection with the signal line 24).
- the second embodiment unlike the first embodiment, data from outside the DCT processor is directly applied to the DCT circuits 3 and 3 ′ without passing through the 8RZW memory circuit 5. Will be. However, in this case as well, the data provided to the DCT circuits 3, 3 is the same as the data passed through the 8 R / W memory circuit 5, ie, the 8 RZW memory of the first operation mode. It must be the same as the data processed by the circuit. Therefore, in the second embodiment, it is assumed that data is processed in advance by a CPU or the like.
- the second embodiment it is not necessary to process the 8 RZW memory circuit 5 in the first operation mode. In other words, in the second embodiment, it is only necessary to operate the 8R memory circuit 5 only in the above-described second operation mode.
- the 8 RZW memory 5 can be operated in a single operation mode ( Only the operation in the second operation mode) is required, so that the control and configuration of the 8 RZW memory circuit can be simplified.
- the control and configuration of the 8 R / W memory circuit 5 are slightly more complicated than in the second embodiment, the processing performed by the CPU or the like can be reduced. This is advantageous.
- the input means is provided between the DCT circuits 3, 3, and FI F07 (at the intersection of the dashed line C and the signal line 24).
- data from outside the DCT processor is: 8 This is directly applied to FIF # 7 without going through RZW memory circuit 5 or DCT circuits 3, 3 '.
- the data provided to the FIFO 5 is the same as the data via the 8 RZW memory circuit 5 and the DCT circuits 3 and 3 ′, in other words, the 8 RZW in the first operation mode. It must be the same as the data processed by the memory circuit and the DC ⁇ circuits 3, 3. This processing can be performed by a CPU or the like.
- the third embodiment it is not necessary to cause the 8RZW memory circuit 5 to process in the first operation mode, as in the second embodiment. Therefore, it has the same advantages and disadvantages as those described in the second embodiment.
- the number of arithmetic units required to calculate one sampling point in the table means the arithmetic units shown in FIG. 6, that is, the number of additions, subtracters and multipliers. .
- eight DCT circuits are provided in each of the DCT circuits 3 and 3 ', so that the total is sixteen.
- the “number of intermediate output values generated in each cycle” is the number of values that can be generated by one calculation by the DCT circuit. Since the calculation is divided into four points, the number of points is eight.
- the number of register files (data storage devices) means the number of devices for storing the calculation results, and in the present invention, they are FIF 07 and 8 R / W memory circuit 5. From two.
- the “estimated core area of DCT” is the area required for DCT processor 1.
- “Output latency” is the average time to obtain an output result.
- the DCT processor 1 of the present invention suffices for about half the area required for a processor based on the Lippen algorithm. Also, output latency
- the DCT processor of the present invention suffices about 1 Z4 that of Lippen's processor.
- the DCT processor of the present invention exhibits excellent performance.
- the cross portion is used to rearrange the data from the FIFO, but for example, a cross bar switch may be used.
- the cross section may be provided in the output section of the FIFO or the input section of the 8 RZW memory circuit (inside the 8 RZW memory circuit). All that is required is that the rearrangement must be performed before writing to the memory block of the 8 RZW memory circuit so that the palming by the 8 R / W memory circuit is performed.
- the resource sharing is promoted by the 8 RZW memory circuit, thereby reducing the complexity of signal partitioning (replacement), thereby reducing the required number of members and the function block.
- the area can be reduced, the power consumption can be reduced, the processing speed can be increased, or the output latency can be reduced.
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP03761831A EP1538532A1 (en) | 2002-06-28 | 2003-06-27 | Dct processor for executing discrete cosine transform (dct) |
AU2003244151A AU2003244151A1 (en) | 2002-06-28 | 2003-06-27 | Dct processor for executing discrete cosine transform (dct) |
US11/023,954 US20050240643A1 (en) | 2002-06-28 | 2004-12-28 | DCT processor used for implementing discrete cosine transform (DCT) |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2002189382A JP2005309474A (ja) | 2002-06-28 | 2002-06-28 | 離散コサイン変換(dct)を実行するために用いるdctプロセッサ |
JP2002-189382 | 2002-06-28 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/023,954 Continuation US20050240643A1 (en) | 2002-06-28 | 2004-12-28 | DCT processor used for implementing discrete cosine transform (DCT) |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2004003785A1 true WO2004003785A1 (ja) | 2004-01-08 |
Family
ID=29996843
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2003/008222 WO2004003785A1 (ja) | 2002-06-28 | 2003-06-27 | 離散コサイン変換(dct)を実行するために用いるdctプロセッサ |
Country Status (6)
Country | Link |
---|---|
US (1) | US20050240643A1 (ja) |
EP (1) | EP1538532A1 (ja) |
JP (1) | JP2005309474A (ja) |
CN (1) | CN1672148A (ja) |
AU (1) | AU2003244151A1 (ja) |
WO (1) | WO2004003785A1 (ja) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101599998B1 (ko) * | 2013-01-08 | 2016-03-04 | 주식회사 엘지화학 | 배터리 팩에 포함된 다수의 배터리 셀에 대한 전압 데이터 관리 장치 및 방법 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH06332932A (ja) * | 1993-05-19 | 1994-12-02 | Fujitsu Ltd | 高速フーリエ変換装置 |
JPH07239842A (ja) * | 1994-02-18 | 1995-09-12 | Hoabanteientsuu Guufuun Yuushienkonshii | 離散的コサイン変換及び逆変換のための集積回路プロセッサ |
JPH1049518A (ja) * | 1996-08-06 | 1998-02-20 | Sony Corp | 演算装置および方法 |
JP2002117015A (ja) * | 2000-10-06 | 2002-04-19 | Takuro Sato | 高速フーリエ変換回路 |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4996661A (en) * | 1988-10-05 | 1991-02-26 | United Technologies Corporation | Single chip complex floating point numeric processor |
US5408425A (en) * | 1993-05-25 | 1995-04-18 | The Aerospace Corporation | Split-radix discrete cosine transform |
US5831881A (en) * | 1994-12-02 | 1998-11-03 | Sican Gmbh | Method and circuit for forward/inverse discrete cosine transform (DCT/IDCT) |
US5671169A (en) * | 1995-06-23 | 1997-09-23 | United Microelectronics Corporation | Apparatus for two-dimensional inverse discrete cosine transform |
US6421695B1 (en) * | 1995-10-28 | 2002-07-16 | Lg Electronics Inc. | Apparatus for implementing inverse discrete cosine transform in digital image processing system |
US5909572A (en) * | 1996-12-02 | 1999-06-01 | Compaq Computer Corp. | System and method for conditionally moving an operand from a source register to a destination register |
US6272257B1 (en) * | 1997-04-30 | 2001-08-07 | Canon Kabushiki Kaisha | Decoder of variable length codes |
US6343304B1 (en) * | 1999-03-09 | 2002-01-29 | National Science Council | Apparatus with selective fixed-coefficient filter for performing recursive discrete cosine transforms |
US6996595B2 (en) * | 2001-05-16 | 2006-02-07 | Qualcomm Incorporated | Apparatus and method for consolidating output data from a plurality of processors |
-
2002
- 2002-06-28 JP JP2002189382A patent/JP2005309474A/ja active Pending
-
2003
- 2003-06-27 CN CNA038180944A patent/CN1672148A/zh active Pending
- 2003-06-27 EP EP03761831A patent/EP1538532A1/en not_active Withdrawn
- 2003-06-27 AU AU2003244151A patent/AU2003244151A1/en not_active Abandoned
- 2003-06-27 WO PCT/JP2003/008222 patent/WO2004003785A1/ja active Application Filing
-
2004
- 2004-12-28 US US11/023,954 patent/US20050240643A1/en not_active Abandoned
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH06332932A (ja) * | 1993-05-19 | 1994-12-02 | Fujitsu Ltd | 高速フーリエ変換装置 |
JPH07239842A (ja) * | 1994-02-18 | 1995-09-12 | Hoabanteientsuu Guufuun Yuushienkonshii | 離散的コサイン変換及び逆変換のための集積回路プロセッサ |
JPH1049518A (ja) * | 1996-08-06 | 1998-02-20 | Sony Corp | 演算装置および方法 |
JP2002117015A (ja) * | 2000-10-06 | 2002-04-19 | Takuro Sato | 高速フーリエ変換回路 |
Non-Patent Citations (1)
Title |
---|
ASTOLA J. ET AL.: "Architecture-oriented regular algorithms for discrete sine and cosine transforms", IEEE TRANSACTIONS ON SIGNAL PROCESSING, vol. 47, no. 4, 1999, pages 1109 - 1124, XP000893635 * |
Also Published As
Publication number | Publication date |
---|---|
CN1672148A (zh) | 2005-09-21 |
EP1538532A1 (en) | 2005-06-08 |
US20050240643A1 (en) | 2005-10-27 |
JP2005309474A (ja) | 2005-11-04 |
AU2003244151A1 (en) | 2004-01-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4104538B2 (ja) | リコンフィギュラブル回路、リコンフィギュラブル回路を備えた処理装置、リコンフィギュラブル回路における論理回路の機能決定方法、回路生成方法および回路 | |
JP3546437B2 (ja) | 適応形ビデオ信号演算処理装置 | |
US6023742A (en) | Reconfigurable computing architecture for providing pipelined data paths | |
US7120903B2 (en) | Data processing apparatus and method for generating the data of an object program for a parallel operation apparatus | |
CN1666187A (zh) | 可重配置的流型矢量处理器 | |
CN102652315B (zh) | 信息处理设备及其控制方法 | |
JP3938238B2 (ja) | 高速フーリエ変換処理装置 | |
AU2002330511B2 (en) | Semiconductor calculation device | |
JPH09294069A (ja) | プログラマブルlsiおよびその演算方法 | |
EP0497777A1 (en) | METHODS FOR PRODUCING DIGITAL SIGNAL PROCESSORS USING A PROGRAM COMPILER. | |
KR100474357B1 (ko) | 다단계 분할을 이용한 기억소자 할당방법 | |
Dann et al. | GraphScale: Scalable bandwidth-efficient graph processing on FPGAs | |
CN116167425B (zh) | 一种神经网络加速方法、装置、设备及介质 | |
KR20200013715A (ko) | 사전 프로그래밍 된 함수를 갖는 고속 컴퓨터 가속기 | |
WO2004003785A1 (ja) | 離散コサイン変換(dct)を実行するために用いるdctプロセッサ | |
JP4002151B2 (ja) | 情報処理装置 | |
JP2007183712A (ja) | データ駆動型情報処理装置 | |
US5931892A (en) | Enhanced adaptive filtering technique | |
JP7038608B2 (ja) | 半導体装置 | |
JP2000322235A (ja) | 情報処理装置 | |
Heron et al. | Development of a run-time reconfiguration system with low reconfiguration overhead | |
JP2009037590A (ja) | 信号処理 | |
JP2005530246A (ja) | 画像片とサーキュラーアドレッシング構成を使用する画像データ処理方法及び装置 | |
Liu et al. | Design of a high-throughput low-power IS95 Viterbi decoder | |
Dann et al. | GraphScale: Scalable Processing on FPGAs for HBM and Large Graphs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
REEP | Request for entry into the european phase |
Ref document number: 2003761831 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2003761831 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11023954 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 20038180944 Country of ref document: CN |
|
WWP | Wipo information: published in national office |
Ref document number: 2003761831 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: JP |