WO2013031083A1 - 対称型フィルタ演算装置及び対称型フィルタ演算方法 - Google Patents
対称型フィルタ演算装置及び対称型フィルタ演算方法 Download PDFInfo
- Publication number
- WO2013031083A1 WO2013031083A1 PCT/JP2012/004729 JP2012004729W WO2013031083A1 WO 2013031083 A1 WO2013031083 A1 WO 2013031083A1 JP 2012004729 W JP2012004729 W JP 2012004729W WO 2013031083 A1 WO2013031083 A1 WO 2013031083A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- data string
- string
- filter
- center
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 11
- 239000000284 extract Substances 0.000 claims abstract description 65
- 238000004364 calculation method Methods 0.000 claims description 59
- 238000000605 extraction Methods 0.000 claims description 37
- 238000010586 diagram Methods 0.000 description 43
- 239000000872 buffer Substances 0.000 description 25
- 230000015654 memory Effects 0.000 description 25
- 230000001343 mnemonic effect Effects 0.000 description 14
- 230000008707 rearrangement Effects 0.000 description 12
- 230000006870 function Effects 0.000 description 8
- 239000000470 constituent Substances 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 4
- 238000004590 computer program Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000013075 data extraction Methods 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/10—Image enhancement or restoration using non-spatial domain filtering
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03H—IMPEDANCE NETWORKS, e.g. RESONANT CIRCUITS; RESONATORS
- H03H17/00—Networks using digital techniques
- H03H17/02—Frequency selective networks
- H03H17/0202—Two or more dimensional filters; Filters for complex signals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/544—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
- G06F7/5443—Sum of products
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/76—Arrangements for rearranging, permuting or selecting data according to predetermined rules, independently of the content of the data
- G06F7/766—Generation of all possible permutations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/3001—Arithmetic instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30032—Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30036—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2207/00—Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F2207/38—Indexing scheme relating to groups G06F7/38 - G06F7/575
- G06F2207/3804—Details
- G06F2207/3808—Details concerning the type of numbers or the way they are handled
- G06F2207/3828—Multigauge devices, i.e. capable of handling packed numbers without unpacking them
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/28—Indexing scheme for image data processing or generation, in general involving image processing hardware
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20024—Filtering details
Definitions
- the present invention relates to a symmetric filter calculation device and a symmetric filter calculation method for performing filter calculation using symmetrical filter coefficients.
- the filter operation is an operation of multiplying the pixel value of the image data by a filter coefficient and accumulatively adding the results, and is one of operations used for various image processing.
- a symmetric filter operation device that performs a symmetric image filter operation (hereinafter referred to as a symmetric filter operation), which is a filter operation when the filter coefficients are symmetric, has been proposed (see, for example, Patent Document 1).
- the configuration of the above-described conventional symmetric filter arithmetic device it is a dedicated arithmetic unit that performs only symmetric filter arithmetic in a fixed manner, and there is no mention of the operation and configuration when it is performed by a processor. Further, the configuration is specialized for a symmetric filter operation with 6 taps, and there is no mention of a case in which a symmetric filter operation with a tap number other than 6 is performed.
- An object of the present invention is to solve the above-described conventional problems, and to provide a symmetric filter calculation device and a symmetric filter calculation method capable of performing symmetric filter calculation corresponding to various tap numbers by a processor. To do.
- a symmetric filter arithmetic device is a symmetric filter arithmetic device that performs a filter operation on a plurality of data stored in a storage unit using symmetric filter coefficients.
- a device that reads a first data string that is a plurality of continuous data from the storage unit and that is multiplied by a left filter coefficient that is a filter coefficient on the left side of the center from the first data string.
- a left data string extraction unit that extracts a left data string that is data of the second data string, a second data string that is a plurality of continuous data from the storage unit, and a filter coefficient on the right side of the center from the second data string
- a right data string extraction unit that extracts a right data string that is a plurality of continuous data multiplied by the right filter coefficient having the same value as the left filter coefficient.
- the symmetric filter arithmetic corresponding to various tap numbers can be performed by the processor.
- FIG. 1 is a diagram showing a configuration of a filter arithmetic apparatus according to Embodiment 1 of the present invention.
- FIG. 2 is a diagram showing the configuration of the data shuffler according to the first embodiment of the present invention.
- FIG. 3A is a diagram for explaining the operation of the data shuffler according to Embodiment 1 of the present invention.
- FIG. 3B is a diagram for explaining the operation of the data shuffler according to Embodiment 1 of the present invention.
- FIG. 3C is a diagram for explaining the operation of the data shuffler according to Embodiment 1 of the present invention.
- FIG. 4 is a diagram showing mnemonics and instruction codes for operating the data shuffler in the first embodiment of the present invention.
- FIG. 5 is a flowchart illustrating an example of an operation in which the filter arithmetic device according to Embodiment 1 of the present invention performs a symmetric filter arithmetic operation.
- FIG. 6 is a diagram illustrating an instruction for the filter operation device according to Embodiment 1 of the present invention to perform a symmetric filter operation.
- FIG. 7 is a diagram for explaining an operation in which the filter operation device according to Embodiment 1 of the present invention performs a symmetric filter operation.
- FIG. 8 is a diagram for explaining an operation in which the filter operation device according to Embodiment 1 of the present invention performs a symmetric filter operation.
- FIG. 6 is a diagram illustrating an instruction for the filter operation device according to Embodiment 1 of the present invention to perform a symmetric filter operation.
- FIG. 7 is a diagram for explaining an operation in which the filter operation device according to Embodiment 1 of the present invention performs a symmetric filter operation.
- FIG. 8 is a diagram for
- FIG. 9A is a diagram illustrating an operation in which the filter operation device according to Embodiment 1 of the present invention performs a symmetric filter operation.
- FIG. 9B is a diagram illustrating an operation in which the filter operation device according to Embodiment 1 of the present invention performs a symmetric filter operation.
- FIG. 10 is a diagram showing the configuration of the filter arithmetic apparatus according to Embodiment 2 of the present invention.
- FIG. 11 is a diagram showing a configuration of the data shuffler according to the second embodiment of the present invention.
- FIG. 12 is a diagram for explaining the operation of the data shuffler according to the second embodiment of the present invention.
- FIG. 13A is a diagram for explaining the operation of the data shuffler according to the second embodiment of the present invention.
- FIG. 13B is a diagram for explaining the operation of the data shuffler according to the second embodiment of the present invention.
- FIG. 13C is a diagram for explaining the operation of the data shuffler according to the second embodiment of the present invention.
- FIG. 14A is a diagram for explaining the operation of the data shuffler according to the second embodiment of the present invention.
- FIG. 14B is a diagram for explaining the operation of the data shuffler according to the second embodiment of the present invention.
- FIG. 15 is a diagram showing a mnemonic and an instruction code for operating the data shuffler according to the second embodiment of the present invention.
- FIG. 16 is a diagram illustrating an operation in which the filter operation device according to Embodiment 2 of the present invention performs a symmetric filter operation.
- FIG. 16 is a diagram illustrating an operation in which the filter operation device according to Embodiment 2 of the present invention performs a symmetric filter operation.
- FIG. 17 is a diagram illustrating an operation in which the filter operation device according to Embodiment 2 of the present invention performs a symmetric filter operation.
- FIG. 18 is a diagram illustrating an operation for performing a symmetric filter calculation when the number of taps is 48 in the filter calculation device according to the second embodiment of the present invention.
- FIG. 19 is a diagram illustrating an operation for performing a symmetric filter calculation when the number of taps is 49 in the filter calculation device according to the second embodiment of the present invention.
- FIG. 20 is a diagram illustrating the contents of a 6-tap one-dimensional image filter calculation.
- FIG. 21 is a diagram showing a configuration of a conventional symmetric filter arithmetic apparatus.
- FIG. 20 is a diagram showing the contents of a 6-tap one-dimensional image filter operation (filter operation).
- p0 to p8 are pixel values of nine consecutive pixels, and k0 to k5 are filter coefficients used for the filter calculation.
- the pixels p0 to p5 are multiplied by filter coefficients k0 to k5, respectively, and the results are cumulatively added to obtain a filter result q0.
- the same filter operation is performed on the pixels p1 to p6 shifted by one pixel to obtain the filter result q1, and when the filter operation is performed on the pixels p2 to p7, the filter result q2 is applied to the pixels p3 to p8.
- a filter result q3 is obtained. In this way, by performing the filter operation while shifting the pixels one pixel at a time, the filter operation can be performed on the entire image.
- the filter coefficient is said to be symmetric.
- the filter operation is symmetric image filter operation (symmetric filter operation). ).
- the number of multiplications can be reduced and the processing speed can be increased by pre-adding pixels to be multiplied by the same filter coefficient and then multiplying by the filter coefficient.
- Patent Document 1 discloses the technique.
- FIG. 21 is a diagram showing a configuration of a conventional symmetric filter arithmetic device disclosed in Patent Document 1.
- a pixel is read from the buffer 300, a pixel having a symmetric filter coefficient is selected by a selector 310, and four filter arithmetic units 321 to 324 (filter arithmetic units # 1 to ## are selected.
- the filter operation is performed in 4).
- the filter calculation units 321 to 324 all have the same configuration, and each of the filter calculation units 321 to 324 performs a 6-tap symmetric filter calculation on the pixel selected by the selector 310.
- each of the filter calculation units 321 to 324 as shown in (b) of FIG. 20, pixel values of pixels with symmetric filter coefficients are added in advance, multiplied, and cumulatively added. By performing this filter calculation simultaneously with the four filter calculation units 321 to 324, four filter calculation results can be obtained.
- the above-described conventional configuration is a dedicated arithmetic unit that performs only the symmetric filter operation in a fixed manner, and does not mention the operation and configuration when it is performed by a processor. Further, the configuration is specialized for a symmetric filter operation with 6 taps, and there is no mention of a case in which a symmetric filter operation with a tap number other than 6 is performed.
- a symmetric filter arithmetic device uses a symmetric filter coefficient to perform a filter operation of a plurality of data stored in a storage unit using symmetric filter coefficients.
- a device that reads a first data string that is a plurality of continuous data from the storage unit and that is multiplied by a left filter coefficient that is a filter coefficient on the left side of the center from the first data string.
- a left data string extraction unit that extracts a left data string that is data of the second data string, a second data string that is a plurality of continuous data from the storage unit, and a filter coefficient on the right side of the center from the second data string
- a right data string extraction unit that extracts a right data string that is a plurality of continuous data multiplied by the right filter coefficient having the same value as the left filter coefficient.
- the left data string is extracted from the first data string stored in the storage unit
- the right data string is extracted from the second data string stored in the storage unit. That is, a pair of data strings to be multiplied by the same filter coefficient can be extracted. For this reason, by extracting the pair of the said data sequence according to the number of taps, the symmetrical filter calculation corresponding to various tap numbers can be performed with a processor.
- an addition unit that calculates the addition data string by adding the extracted left data string and the right data string, and the calculated addition data string and the left filter coefficient or the right filter.
- a multiplication unit that multiplies the coefficient to calculate a multiplication data string may be provided.
- the left data string and the right data string are added and multiplied by the left filter coefficient or the right filter coefficient. That is, a pair of data strings to be multiplied by the same filter coefficient is added and multiplied by the filter coefficient. For this reason, by performing the addition and multiplication according to the number of taps, the processor can perform symmetric filter operations corresponding to various tap numbers.
- the left data string extraction unit reads the first data string composed of a continuous third data string and a fourth data string from the storage unit, extracts the left data string
- the right data string extraction unit is configured to: (a) the data arranged at the data center that is the center of the first data of the third data string and the final data of the fifth data string is a filter at the center of the symmetrical filter coefficient Reading out the second data string composed of the fifth data string and the sixth data string that are consecutive so as to be data multiplied by a coefficient from the storage unit, and extracting the right data string, or (B) The data arranged at the data center that is the center of the first data of the third data string and the first data of the sixth data string is multiplied by the filter coefficient at the center of the left-right symmetric filter coefficient. So that over data, the second data string consisting of the fifth data row continuous with said sixth data string read from said storage unit, may be to extract the right data string.
- the continuous third data is arranged so that the data arranged at the center of the first data of the third data string and the final data of the fifth data string becomes the data multiplied by the filter coefficient of the center.
- the left data column is extracted from the first data column composed of the data column and the fourth data column, and the right side is extracted from the second data column composed of the continuous fifth data column and sixth data column. Extract data columns. Or the left data column and the right data so that the data arranged at the center of the top data of the third data column and the top data of the sixth data column becomes the data multiplied by the filter coefficient of the center Extract columns.
- the first data string is stored in one buffer
- the second data string is stored in the other buffer
- the left data string is extracted from one buffer
- the right data string is extracted from the other buffer.
- the conventional configuration shown in FIG. 21 is configured to store all the pixel values of the pixels necessary for the filter operation in the buffer 300, and when the number of taps for the symmetric filter operation is large, the buffer 300 is used. Increases in size. For this reason, in the conventional configuration described above, there is a problem in that the circuit scale of the symmetric filter arithmetic device 20 is increased when trying to cope with symmetric filter arithmetic with a large number of taps.
- the symmetric filter arithmetic device According to the symmetric filter arithmetic device according to the present invention, even when the number of taps of the symmetric filter arithmetic is large, a part of the data is divided and stored in two buffers. For this reason, a symmetric filter operation with a large number of taps can be performed without storing all data used for the symmetric filter operation in the buffer, so that the processor can perform a symmetric filter operation corresponding to various tap numbers. .
- the left data string extraction unit extracts the left data string starting from the left data from the data center
- the right data string extraction unit is data right of the data center
- the right data string starting from data that is symmetrical with the data at the head of the left data string with respect to the data center may be extracted.
- the left data column and the right data column are extracted so that the top data of the left data column and the top data of the right data column are symmetrical with respect to the data center.
- a pair of data strings to be multiplied by the same filter coefficient can be extracted.
- the symmetrical filter calculation corresponding to various tap numbers can be performed with a processor.
- the third data string and the fourth data string, or the fifth data string and the sixth data string are stored in a continuous area of the storage unit, and the left data string extraction unit is The first data string is read from the storage unit and the left data string is extracted, and the right data string extraction unit reads the second data string from the storage unit and extracts the right data string. You may decide.
- the third data string and the fourth data string, or the fifth data string and the sixth data string are stored in consecutive number registers.
- the instruction code for performing the symmetric filter operation is simple, and the bit field of the register in the instruction code can be reduced.
- the left data string extraction unit reads the first data string composed of a continuous third data string and a fourth data string from the storage unit, extracts the left data string
- the right data string extraction unit reads the first data string as the second data string from the storage unit, and the data arranged at the data center that is the center of the third data string has a symmetric filter coefficient.
- the right data string may be extracted so that the data multiplied by the center filter coefficient is obtained.
- a left data column and a right data column are extracted from the data columns. That is, a pair of data strings to be multiplied by the same filter coefficient can be extracted.
- the first data string can be stored in one buffer, and the left data string and the right data string can be extracted from the one buffer. It is. For this reason, a pair of data strings corresponding to the number of taps can be easily extracted, and a symmetric filter operation corresponding to various tap numbers can be performed by the processor.
- the center of the top data of the third data string and the last data of the third data string is the data center, or the top data of the third data string and the fourth data string
- the center with the top data is the data center
- the left data string extraction unit extracts the left data string with the data on the left side of the data center as the head
- the right data string extraction unit is the data center. It is also possible to extract the right data string that is the data on the right side and that starts with data that is symmetric with the top data of the left data string with respect to the data center.
- the center of the first data column and the last data of the first data column is the data center, or the first data column and the second data column
- the left data string and the right data string are extracted so that the first data of the left data string and the first data of the right data string are symmetrical with respect to the data center.
- a pair of data strings to be multiplied by the same filter coefficient can be extracted.
- the symmetrical filter calculation corresponding to various tap numbers can be performed with a processor.
- the present invention can be realized not only as such a symmetric filter arithmetic device, but also as a symmetric filter arithmetic method including steps characteristic processing performed by a processing unit included in the symmetric filter arithmetic device. It can also be realized. It can also be realized as a program or an integrated circuit that causes a computer to execute characteristic processing included in the symmetric filter calculation method. Such a program can be distributed via a recording medium such as a CD-ROM and a transmission medium such as the Internet. Moreover, you may implement
- FIG. 1 is a diagram showing a configuration of a symmetric filter arithmetic device 10 (hereinafter referred to as a filter arithmetic device 10) according to Embodiment 1 of the present invention.
- the filter operation device 10 is a device that performs a filter operation on a plurality of data stored in a storage unit using left and right symmetrical filter coefficients, and includes a data shuffler 160.
- the storage unit is described as being a register file, but the storage unit is not limited to a register file.
- a data memory may be used as the storage unit.
- the filter operation device 10 includes an instruction memory 110, an instruction fetch unit 120, an instruction decoder 130, a register file 140, a memory access unit 150, a data shuffler 160, an adder 170, a multiplier 180, and data A memory 190 is provided.
- the instruction memory 110 is a memory for storing an instruction for instructing the operation of the filter arithmetic device 10.
- the instruction fetch unit 120 acquires the next instruction to be executed from the instruction memory 110, and outputs the acquired instruction to the instruction decoder 130.
- the instruction decoder 130 analyzes the instruction output from the instruction fetch unit 120, determines an arithmetic unit that executes the instruction, and selects any one of the memory access unit 150, the data shuffler 160, the adder 170, and the multiplier 180. One execution control signal is output.
- the memory access unit 150 acquires data from the data memory 190 and outputs it to the register file 140 according to the execution control signal from the instruction decoder 130, or acquires data from the register file 140 and outputs it to the data memory 190.
- the data shuffler 160 acquires data from the register file 140, rearranges the data, and outputs the result to the adder 170. Specifically, the data shuffler 160 multiplies the left data string multiplied by the left filter coefficient and the right filter coefficient having the same value as the left filter coefficient as a pair of data strings multiplied by the same filter coefficient. Extract the right data column.
- the data shuffler 160 performs data rearrangement in order to perform a symmetric filter operation, but the data shuffler 160 has a function of rearranging data for purposes other than the symmetric filter operation. You may have.
- the detailed configuration of the data shuffler 160 will be described later.
- the adder 170 acquires data from the data shuffler 160, performs an addition operation, and outputs the result to the register file 140. Specifically, the adder 170 adds the left data string and the right data string extracted by the data shuffler 160 to calculate an added data string.
- the adder 170 has a function of an “adder” described in the claims.
- the data shuffler 160 and the adder 170 perform data rearrangement and addition operations as one processing unit
- the data shuffler 160 outputs the rearranged data to the register file 140 and adds the data.
- the unit 170 may acquire the data from the register file 140 and perform an addition operation.
- the multiplier 180 acquires data from the register file 140, performs a multiplication operation, and outputs the result to the register file 140. Specifically, the multiplier 180 multiplies the addition data string calculated by the adder 170 and the left filter coefficient or the right filter coefficient to calculate a multiplication data string.
- the multiplier 180 has a function of a “multiplier” described in the claims.
- the register file 140 is a register that holds data output from each arithmetic unit in the arithmetic unit group, and is composed of 32 pieces of 64-bit registers R0 to R31.
- the data memory 190 is a memory for storing data necessary for calculation in the filter calculation device 10. Note that the instruction memory 110 and the data memory 190 may be mounted in separate memories, or may be mounted in the form of sharing one memory.
- FIG. 2 is a diagram illustrating a configuration of the data shuffler 160 according to the first embodiment of the present invention.
- the data shuffler 160 has two 64-bit input ports A and B and two 64-bit output ports Z1 and Z2, and includes a first data shuffle unit 161 and a second data And a shuffle portion 162. Note that output data is output from the output ports Z1 and Z2 of the data shuffler 160 to the adder 170, but the output data may be output to the register file 140.
- the first data shuffle unit 161 has two 64-bit input ports X1 and Y1, and one 64-bit output port Z1.
- the second data shuffle unit 162 has two 64-bit input ports X2 and Y2, and one 64-bit output port Z2.
- the data shuffler 160 has only two 64-bit input ports, the input port X1 of the first data shuffler 161 and the input port X2 of the second data shuffler 162 are connected to the data shuffler 160. Data is input from the input port A. Similarly, data from the input port B of the data shuffler 160 is input to the input port Y1 of the first data shuffle unit 161 and the input port Y2 of the second data shuffle unit 162.
- the first data shuffle unit 161 reads two 64-bit data through the input ports X1 and Y1 according to the execution control signal, and rearranges the data in byte units. Then, after the data is rearranged, the first data shuffle unit 161 outputs the 64-bit data as a result of the rearrangement through the output port Z1.
- the first data shuffle unit 161 reads a first data string that is a plurality of continuous data from the register file 140, and from the first data string, a left filter coefficient that is a filter coefficient on the left side from the center.
- the left data string which is a plurality of continuous data multiplied by is extracted.
- the first data shuffle unit 161 reads a first data string composed of a continuous third data string and a fourth data string from the register file 140, and extracts the left data string.
- the first data shuffle unit 161 reads the third data string through the input port X1, and reads the fourth data string through the input port Y1. Then, the first data shuffle unit 161 outputs the left data string to the adder 170 through the output port Z1. Details of the processing performed by the first data shuffle unit 161 will be described later.
- the first data shuffle unit 161 has the function of the “left data string extraction unit” described in the claims.
- the second data shuffle unit 162 reads two 64-bit data through the input ports X2 and Y2 according to the execution control signal, and rearranges the data in byte units. Then, after the data is rearranged, the second data shuffle unit 162 outputs the rearranged 64-bit data through the output port Z2.
- the second data shuffling unit 162 reads a second data string that is a plurality of continuous data from the register file 140, and the left side filter is a filter coefficient on the right side of the center from the second data string.
- a right data string which is a plurality of continuous data multiplied by the right filter coefficient having the same value as the coefficient is extracted.
- the second data shuffle unit 162 reads the first data string from the register file 140 as the second data string, and the data arranged at the data center that is the center of the third data string is symmetrical. The right data string is extracted so as to be data multiplied by the filter coefficient at the center of the filter coefficient.
- the second data shuffle unit 162 reads the third data string through the input port X2, and reads the fourth data string through the input port Y2. Then, the second data shuffle unit 162 outputs the right data string to the adder 170 through the output port Z2. Details of the processing performed by the second data shuffle unit 162 will be described later.
- the second data shuffle unit 162 has the function of the “right data string extraction unit” recited in the claims.
- 3A to 3C are diagrams for explaining the operation of the data shuffler 160 according to Embodiment 1 of the present invention. Specifically, these drawings are diagrams showing the operation of the data shuffler 160 when processing 64-bit data in which one element is composed of 8 bits.
- FIG. 3A shows data input to the data shuffler 160.
- 64-bit data [a0, a1, a2, a3, a4, a5, a6, a7] and [b0, b1, b2, b3, b4 composed of eight consecutive 8-bit elements.
- B5, b6, b7] are input to the input ports A and B of the data shuffler 160, respectively.
- [a0, a1, a2, a3, a4, a5, a6, a7] are the third data strings, and the port X1 and the second data shuffle unit 162 of the first data shuffle unit 161 are used.
- Port X2. [B0, b1, b2, b3, b4, b5, b6, b7] are input to the port Y1 of the first data shuffle unit 161 and the port Y2 of the second data shuffle unit 162 as a fourth data string.
- the first data shuffle unit 161 and the second data shuffle unit 162 rearrange the data according to the execution control signal, and output the rearranged data. This rearrangement of data is performed according to the table shown in FIG. 3C.
- the first data shuffle unit 161 rearranges data according to the execution control signal “0”, and outputs [a0, a1, a2, a3, a4, a5, a6, a7] as the left data string.
- the second data shuffle unit 162 rearranges data according to the execution control signal “0” and outputs [a7, b0, b1, b2, b3, b4, b5, b6] as the right data string.
- the data is rearranged according to the execution control signals “0” to “3” shown in FIG. 3C, and the number of taps of the symmetric filter calculation is an odd number.
- the data is rearranged according to the execution control signals “4” to “7”. Details will be described later.
- FIG. 4 is a diagram showing mnemonics and instruction codes for operating the data shuffler 160 and the adder 170 in the first embodiment of the present invention.
- ( ⁇ -1) and (a-2) in the same figure are mnemonics that take two input registers Ra and Rb as inputs, add the extracted left data string and right data string, and output the output register Rc.
- Show. (A-3) in the figure shows a mnemonic that takes two input registers Ra and Rb as inputs, adds the extracted left data string and right data string, and outputs two output registers Rc and Rc + 1. Yes.
- the operand of the instruction mnemonic is two input registers Ra and Rb, an output register Rc, and an immediate 3-bit I3 indicating a data rearrangement shuffle pattern.
- the output register Rc is a result obtained by extracting the left data string and the right data string with the input registers Ra and Rb as inputs, and adding the extracted left data string and right data string in units of 8 bits. is there.
- the addition result of the data element of the left data string and the data element of the right data string is expanded to 16 bits, and the result is output as two output registers Rc and Rc + 1.
- the addition result is a 128-bit data string of eight 16-bit data elements, and two output registers are required.
- two input registers Ra and Rb may be input, and the extracted left data string and right data string may be output as output registers Rc and Rc + 1.
- the operands of the instruction mnemonic are two input registers Ra and Rb, two output registers Rc and Rc + 1, and an immediate 3-bit I3 indicating a data rearrangement shuffle pattern.
- the bit width of elements constituting 64-bit data is expressed by an opcode, and the element width of this instruction is 8 bits. Note that the execution control signal output to the data shuffler 160 of the shuffle calculator is the shuffle pattern value itself.
- This instruction code is composed of 32 bits, and is composed of an opcode field indicating that the data shuffler 160 is operated, a shuffle pattern field, an element width field, and Ra, Rb, and Rc register number fields. .
- the bit width of each field is 12 bits for the opcode field, 3 bits for the shuffle pattern field, 2 bits for the element width field, and 5 bits for the register number fields Ra, Rb, and Rc.
- 0b00 is 8 bits
- 0b01 is 16 bits
- 0b10 is 32 bits.
- the restriction that the register numbers of the two output registers are serial numbers is provided.
- the restriction may be that the register numbers of the two input registers are serial numbers.
- a restriction that the register numbers of the two output registers are serial numbers by adding a register number field to the instruction code is not necessarily required. May be removed.
- valnadd. Shown in FIG. A process in which the filter operation apparatus 10 performs a symmetric filter operation using 8 instructions will be described with reference to FIGS.
- valn. 8 instructions can be used, and valnadd. 8s or valnadd. Although it may be necessary to use the 8l instruction, in the following, as an example, valnadd. 8 instructions are used.
- FIG. 5 is a flowchart showing an example of an operation in which the filter operation device 10 according to Embodiment 1 of the present invention performs a symmetric filter operation.
- FIG. 6 is a diagram illustrating an instruction for the filter operation device 10 according to the first embodiment of the present invention to perform a symmetric filter operation.
- 7 to 9B are diagrams for explaining an operation in which the filter operation device 10 according to the first exemplary embodiment of the present invention performs a symmetric filter operation.
- the filter operation device 10 performs a symmetric filter operation on 8 pixels of [p0, p1, p2, p3, p4, p5, p6, p7] shown in FIG.
- the column “number of filter taps” shown in FIG. 6 indicates the number of taps of the filter that performs the symmetric filter calculation.
- the column of “input data of R0 and R1” is “valnadd. Pixel data that needs to be input to the registers R0 and R1 before executing eight instructions.
- pixel data includes p-1, p-2, and p-3, where p-1 is a pixel adjacent to the left of p0, p-2 is a pixel adjacent to the left of p-1, p- 3 represents the left adjacent pixel of p-2.
- the column of “instruction” generates a pair of data strings having the same filter coefficient in the symmetric filter operation, and adds the pair of data strings. 8 instructions are shown.
- the filter calculation device 10 performs a symmetric 6-tap filter calculation corresponding to eight pixels p0 to p7 shown in FIG. 20 to obtain filter calculation results q0 to q7.
- a load instruction is issued, and [p-1, p0, p1, p2, p3, p4, p5, p6] are stored in the register R0, and [p7, p8, p9, p10, p11, p12, p13, p14] are stored.
- the first data shuffle unit 161 reads the first data string composed of the third data string and the fourth data string that are continuous from the register file 140 and extracts the left data string. (S102).
- “valnadd.8 R2, R0, R1, 1”, which is an instruction for the number of filter taps “6” shown in FIG. 6, is issued.
- the first data shuffle unit 161 [p-1, p0, p1, p2, p3, p4, p5, p6] of the register R0 as the third data string and [[ p7, p8, p9, p10, p11, p12, p13, p14] are read out.
- the first data shuffle unit 161 outputs [p0, p1, p2, p3, p4, p5, p6, p7], which is the data output in the case of the execution control signal “1” shown in FIG. Extract as a column.
- the second data shuffle unit 162 reads the first data string from the register file 140 as the second data string, and the data arranged at the data center that is the center of the third data string has a symmetric filter coefficient.
- the right data string is extracted so that the data multiplied by the center filter coefficient is obtained (S104).
- the second data shuffle unit 162 causes the [p ⁇ 1, p0, p1, p2, p3, p4, p5, p6] and the first data string composed of [p7, p8, p9, p10, p11, p12, p13, p14] of the register R1 as the fourth data string Read as two data strings. Then, the second data shuffle unit 162 outputs [p5, p6, p7, p8, p9, p10, p11, p12], which are data outputs in the case of the execution control signal “1” shown in FIG. Extract as a column.
- the second data shuffle unit 162 has a data center that is the center of the third data string [p-1, p0, p1, p2, p3, p4, p5, p6].
- the right data string is extracted so that the data “p2, p3” arranged in D becomes data multiplied by the filter coefficient at the center of the symmetrical filter coefficients.
- the first data shuffle unit 161 extracts a left data string [p0, p1, p2, p3, p4, p5, p6, p7] starting from the data on the left side of the data center D (p2, p3).
- the second data shuffle unit 162 is data on the right side of the data center D (p2, p3), and is data “p5” that is symmetrical with respect to the data “p0” at the head of the left data string with respect to the data center D.
- To the right data string [p5, p6, p7, p8, p9, p10, p11, p12].
- valnadd.8 R3, R0, R1, 2 which is the next instruction when the number of filter taps is “6” shown in FIG. 6, is issued.
- the first data shuffle unit 161 outputs the data in the case of the execution control signal “2” shown in FIG. 3C. [P1, p2, p3, p4, p5, p6, p7, p8] is extracted as the left data string. Further, the second data shuffle unit 162 extracts [p4, p5, p6, p7, p8, p9, p10, p11] as the right data string.
- the first data shuffle unit 161 has a left data string [p1, p2, p3, p4, p5 starting from the data on the left side of the data center D (p2, p3). , P6, p7, p8].
- the second data shuffle unit 162 is data on the right side of the data center D (p2, p3), and is data “p4” that is symmetrical with respect to the data “p1” at the head of the left data string with respect to the data center D.
- To the right data string [p4, p5, p6, p7, p8, p9, p10, p11].
- the first data shuffle unit 161 has a left data string [p2, p3, p4, p5, p6, p7, p8, starting from the left data of the data center D. p9] is extracted, and the second data shuffling unit 162 extracts the right data string [p3, p4, p4] starting from the data “p3” symmetrical to the data “p2” at the head of the left data string with respect to the data center D. p5, p6, p7, p8, p9, p10] are extracted.
- the multiplier 180 multiplies the addition data string calculated by the adder 170 and the left filter coefficient or the right filter coefficient to calculate a multiplication data string (S108). Since the left filter coefficient and the right filter coefficient have the same value, the multiplier 180 applies the same multiplication regardless of whether the added data string is multiplied by the left filter coefficient or the added data string is multiplied by the right filter coefficient. A data string can be calculated.
- the multiplier 180 multiplies the addition data string R2 by the filter coefficient k0, multiplies the addition data string R3 by the filter coefficient k1, and adds the filter coefficient k2 to the addition data string R4. Multiply.
- the filter operation device 10 outputs the result of the symmetric filter operation by cumulatively adding the multiplication data strings calculated by the multiplier 180 (S110). Specifically, as illustrated in FIG. 8, the filter operation device cumulatively adds the three multiplication results calculated by the multiplier 180, thereby obtaining the filter operation results [q0, q1, q2, q3, q4, q5, q6, q7] can be obtained.
- the filter coefficient symmetry when obtaining the filter operation result q0 is obtained. If loading is performed so that the pixels are stored so that the pixel at the center position of between and r04 is between r03 and r04, it is possible to support even-tap symmetric filters up to 8 taps.
- the pixel at the symmetric center position of the filter coefficient when obtaining the filter operation result q0 is r04 (data center D shown in FIG. 9B). It is sufficient to load so that the pixels are stored. That is, when the number of taps in the filter operation is an odd number, the center of the first data “r00” of the third data string and the first data “r08” of the fourth data string is set as the data center D. In the case of odd taps, valnadd. Can be handled with 8 instructions.
- the left data string is extracted from the first data string stored in the register file 140 and stored in the register file 140.
- the right data string is extracted from the second data string. That is, a pair of data strings to be multiplied by the same filter coefficient can be extracted. For this reason, by extracting the pair of the said data sequence according to the number of taps, the symmetrical filter calculation corresponding to various tap numbers can be performed with a processor.
- the processor can perform symmetric filter operations corresponding to various tap numbers.
- the left data column and the right data column are extracted from the inside. That is, a pair of data strings to be multiplied by the same filter coefficient can be extracted.
- the first data string can be stored in one buffer, and the left data string and the right data string can be extracted from the one buffer. It is. For this reason, a pair of data strings corresponding to the number of taps can be easily extracted, and a symmetric filter operation corresponding to various tap numbers can be performed by the processor.
- the center of the first data string and the last data of the first data string is the data center
- the number of taps for the filter operation is an odd number
- the first data in the left data column and the first data in the right data column are symmetrical with respect to the data center.
- the left data column and the right data column are extracted. Thereby, a pair of data strings to be multiplied by the same filter coefficient can be extracted. For this reason, by extracting the pair of the said data sequence according to the number of taps, the symmetrical filter calculation corresponding to various tap numbers can be performed with a processor.
- FIG. 10 is a diagram showing a configuration of a symmetric filter arithmetic device 11 (hereinafter referred to as filter arithmetic device 11) according to Embodiment 2 of the present invention.
- the present filter arithmetic device 11 is characterized by including a data shuffler 200 instead of the data shuffler 160 of the filter arithmetic device 10 of the first embodiment.
- the filter arithmetic unit 11 includes an instruction memory 110, an instruction fetch unit 120, an instruction decoder 130, a register file 140, a memory access unit 150, a data shuffler 200, an adder 170, a multiplier 180, and data A memory 190 is provided.
- the data shuffler 200 acquires data from the register file 140, rearranges the data, and outputs the result to the adder 170, like the data shuffler 160 of the filter arithmetic apparatus 10 of the first embodiment.
- the data shuffler 200 and the adder 170 perform data rearrangement and addition operations as one processing unit. However, after the data shuffler 200 is rearranged in the register file 140, The adder 170 may acquire the data from the register file 140 and perform the addition operation.
- the data shuffler 200 may have a function of rearranging data for purposes other than the filter operation. A specific difference between the data shuffler 200 and the data shuffler 160 will be described later.
- FIG. 11 is a diagram showing the configuration of the data shuffler 200. As shown in FIG. 11
- the data shuffler has four 64-bit input ports A, B, C, and D, and two 64-bit output ports Z1 and Z2, and a first data shuffle unit 201, And a second data shuffle unit 202.
- the data shuffler 200 may be configured such that the input ports A and B are combined into one 128-bit input port, and the input ports C and D are combined into one 128-bit input port.
- the first data shuffle unit 201 has four 64-bit input ports X10, Y10, X11, and Y11, and one 64-bit output port Z1.
- the second data shuffle unit 202 has four 64-bit input ports X20, Y20, X21, and Y21, and one 64-bit output port Z2.
- output data is output from the output ports Z1 and Z2 of the data shuffler 200 to the adder 170.
- the output data may be output to the register 140.
- Data is input from the input port A of the data shuffler 200 to the input port X10 of the first data shuffler 201, and data is input from the input port B of the data shuffler 200 to the input port Y10 of the first data shuffler 201. Is input.
- data is input from the input port C of the data shuffler 200 to the input port X11 of the first data shuffle unit 201, and data is input from the input port D to the input port Y11.
- the first data shuffle unit 201 may be configured such that the input ports X10 and Y10 are combined into one input port, and the input ports X11 and Y11 are combined into one input port.
- Data is input to the input port X20 of the second data shuffle unit 202 from the input port A of the data shuffler 200, and the input port B of the data shuffler 200 is input to the input port Y20 of the second data shuffler 202.
- the data is input from.
- data is input from the input port C of the data shuffler 200 to the input port X21 of the second data shuffle unit 202, and data is input from the input port D to the input port Y21.
- the second data shuffle unit 202 may be configured such that the input ports X20 and Y20 are combined into one input port, and the input ports X21 and Y21 are combined into one input port.
- the first data shuffle unit 201 reads four 64-bit data through the input ports X10, Y10, X11, and Y11 according to the execution control signal, and rearranges the data in units of bytes. Then, after the data is rearranged, the first data shuffle unit 201 outputs 64-bit data as a result of the rearrangement through the output port Z1.
- the first data shuffle unit 201 reads a first data string that is a plurality of continuous data from the register file 140, and from the first data string, a left filter coefficient that is a filter coefficient on the left side from the center.
- the left data string which is a plurality of continuous data multiplied by is extracted.
- the first data shuffle unit 201 reads a first data string composed of a continuous third data string and a fourth data string from the register file 140, and extracts the left data string.
- the first data shuffle unit 201 reads the third data string through the input port X10 and reads the fourth data string through the input port Y10. Then, the first data shuffle unit 201 outputs the left data string to the adder 170 through the output port Z1.
- the first data shuffle unit 201 extracts the left data string by determining the data center using the second data string read through the input ports X11 and Y11. Details of the processing performed by the first data shuffle unit 201 will be described later.
- the first data shuffle unit 201 has the function of the “left data string extraction unit” described in the claims.
- the second data shuffle unit 202 Similar to the first data shuffle unit 201, the second data shuffle unit 202 reads four 64-bit data through the input ports X20, Y20, X21, and Y21 according to the execution control signal, and rearranges the data in byte units. Do. Then, after the data is rearranged, the second data shuffle unit 202 outputs the rearranged 64-bit data through the output port Z2.
- the second data shuffling unit 202 reads a second data string that is a plurality of continuous data from the register file 140, and from the second data string, a filter coefficient on the right side of the center and a left filter A right data string which is a plurality of continuous data multiplied by the right filter coefficient having the same value as the coefficient is extracted. More specifically, the second data shuffle unit 202 is configured such that the data arranged at the data center, which is the center of the first data of the third data string and the first data of the sixth data string, has a symmetrical filter coefficient. A second data string composed of a continuous fifth data string and sixth data string is read from the register file 140 so as to be data multiplied by the central filter coefficient, and the right data string is extracted.
- the second data shuffle unit 202 reads the fifth data string through the input port X21 and reads the sixth data string through the input port Y21. Then, the second data shuffle unit 202 outputs the right data string to the adder 170 through the output port Z2.
- the second data shuffle unit 202 extracts the right data string by determining the data center using the first data string read through the input ports X20 and Y20. Details of the processing performed by the second data shuffle unit 202 will be described later.
- the second data shuffle unit 202 has the function of the “right data string extraction unit” recited in the claims.
- the data shuffler 200 of the present embodiment changes the number of registers read from the register file 140 according to the execution control signal.
- the data shuffler 200 reads two 64-bit registers when the execution control signal is 0 to 7 and four 64-bit registers when the execution control signal is 8 to 15 from the register file 140.
- FIG. 12 shows data input to the data shuffler 200 when the execution control signal is 0-7.
- [a0, a1, a2, a3, a4, a5, a6, a7] are the ports of the first data shuffler 201 as in the data shuffler 160 of the first embodiment.
- X10 and the port X20 of the second data shuffle unit 202 are input.
- [B0, b1, b2, b3, b4, b5, b6, b7] are input to the port Y10 of the first data shuffle unit 201 and the port Y20 of the second data shuffle unit 202.
- the first data shuffle unit 161 and the second data shuffle unit 162 rearrange the data according to the execution control signal. Data rearrangement is performed according to the table shown in FIG. 3C in the same manner as the data shuffler 160 of the first embodiment.
- FIG. 13A shows data input to the data shuffler 200 when the execution control signal is 8-15.
- [a0, a1, a2, a3, a4, a5, a6, a7] are input to the port X10 of the first data shuffle unit 201, and [b0, b1, b2, b3, b4, b5, b6, b7] are input to the port Y10 of the first data shuffle unit 201.
- [C0, c1, c2, c3, c4, c5, c6, c7] are the ports X11 of the first data shuffle unit 201, and [d0, d1, d2, d3, d4, d5, d6, d7] are the first.
- the data is input to the port Y11 of the data shuffle unit 201.
- [A0, a1, a2, a3, a4, a5, a6, a7] are input to the port X20 of the second data shuffle unit 202, and [b0, b1, b2, b3, b4, b5, b6, b7].
- [C0, c1, c2, c3, c4, c5, c6, c7] are the ports X21 of the second data shuffle unit 202, and [d0, d1, d2, d3, d4, d5, d6, d7] are the first.
- the data is input to the port Y 21 of the two data shuffle unit 202.
- the first data shuffle unit 201 and the second data shuffle unit 202 rearrange the data according to the execution control signal. Data rearrangement is performed according to the table shown in FIG. 13C.
- the first data shuffle unit 201 rearranges data according to the execution control signal “8”, for example, [A0, a1, a2, a3, a4, a5, a6, a7] are output as a data string.
- the second data shuffle unit 202 rearranges the data according to the execution control signal “8”, and outputs [c7, d0, d1, d2, d3, d4, d5, d6] as the right data string.
- the first data shuffle unit 201 outputs [a1, a2, a3, a4, a5, a6, a7, b0] as the left data string in accordance with the execution control signal “9”.
- the second data shuffle unit 202 outputs [c6, c7, d0, d1, d2, d3, d4, d5] as the right data string according to the execution control signal “8”.
- the first data shuffle unit 201 rearranges data according to the execution control signal “16”, for example, as a left data string. [A0, a1, a2, a3, a4, a5, a6, a7] are output.
- the second data shuffle unit 202 rearranges data according to the execution control signal “16” and outputs [d0, d1, d2, d3, d4, d5, d6, d7] as the right data string.
- the first data shuffle unit 201 rearranges data according to the execution control signal “17” and outputs [a1, a2, a3, a4, a5, a6, a7, b0] as the left data string. Further, the second data shuffle unit 202 outputs [c7, d0, d1, d2, d3, d4, d5, d6] as the right data string in accordance with the execution control signal “16”.
- the data is rearranged according to the execution control signals “8” to “15” shown in FIG. 13C, and the number of taps of the symmetric filter calculation is an odd number. In this case, the data is rearranged according to the execution control signals “16” to “23”.
- FIG. 15 is a diagram illustrating a mnemonic and an instruction code for operating the data shuffler 200 and the adder 170 according to the second embodiment of the present invention.
- the instruction mnemonic and instruction code shown in FIG. 4 the data shuffler 200 and adder 170 are operated in the same manner as the data shuffler 160 and adder 170 of the first embodiment. Since the operation is the same as that of the data shuffler 160 and the adder 170, the description is omitted.
- (A-1) and (a-2) in FIG. 15 take the four input registers Ra, Ra + 1, Rb, and Rb + 1 as inputs, add the extracted left data string and right data string, and set the output register Rc.
- the mnemonic to be output is shown.
- (A-3) in FIG. 6 receives four input registers Ra, Ra + 1, Rb, and Rb + 1, adds the extracted left data string and right data string, and outputs two output registers Rc and Rc + 1. Indicates a mnemonic.
- the operand of the instruction mnemonic is four input registers Ra, Ra + 1, Rb, Rb + 1, an output register Rc, and an immediate three-bit I3 indicating a data rearrangement shuffle pattern.
- the output register Rc receives the input registers Ra, Ra + 1, Rb, and Rb + 1 as inputs, extracts the left data string and the right data string, and adds the extracted left data string and right data string in units of 8 bits. This is the result obtained.
- the valnpadd Shown in (a-2) of FIG.
- saturation processing of the addition result exceeding 8 bits is performed, and the result is output as the output register Rc. More specifically, when the data element is 8-bit unsigned data, when the addition result is larger than 255, saturation processing is performed on 255.
- saturation processing is performed on ⁇ 128, and when the addition result is greater than 127, saturation processing is performed on 127.
- valnpadd Shown in (a-3) of FIG.
- the addition result of the data element of the left data string and the data element of the right data string is expanded to 16 bits, and the result is output as output registers Rc and Rc + 1.
- the addition result is a 128-bit data string of eight 16-bit data elements, and two output registers are required.
- valnp As shown in FIG. 8, four input registers Ra, Ra + 1, Rb, and Rb + 1 may be input, and the extracted left data string and right data string may be output as output registers Rc and Rc + 1.
- the operands of the instruction mnemonic are four input registers Ra, Ra + 1, Rb, Rb + 1, two output registers Rc, Rc + 1, and an immediate 3-bit I3 indicating a data rearrangement shuffle pattern.
- the register numbers of the four input registers are serial numbers by two. That is, the third data string and the fourth data string, or the fifth data string and the sixth data string are stored in consecutive numbered registers.
- bit width of the elements constituting the 64-bit data is represented by an opcode, and the element width of this instruction is 8 bits.
- the execution control signal output to the data shuffler 200 of the shuffle calculator is a value obtained by adding 8 to the value of the shuffle pattern.
- the instruction code shown in the figure is composed of 32 bits, and includes an opcode field indicating that the data shuffler 200 is operated, a shuffle pattern field, an element width field, and Ra, Rb, and Rc register number fields. It is configured.
- the bit width of each field is 11 bits for the opcode field, 4 bits for the shuffle pattern field, 2 bits for the element width field, and 5 bits for the register number fields Ra, Rb, and Rc.
- the correspondence with the element width is 8 bits for 0b00, 16 bits for 0b01, and 32 bits for 0b10.
- the restriction that the register numbers of the registers are serial numbers is not necessarily required, and therefore, the restriction that the register numbers of the registers are serial numbers may be removed.
- valnadd. Shown in FIG. 8 instructions and valnpadd.
- a process in which the filter operation device 11 performs a symmetric filter operation using 8 instructions will be described with reference to FIGS. 16 and 17.
- valnp. 8 instructions can be used, and valnpadd. 8s or valnpadd. Although it may be necessary to use the 8l instruction, in the following, as an example, valnpadd. 8 instructions are used.
- valnadd. When using 8 instructions, as shown in FIG. 6, the processing is the same as that of the filter arithmetic apparatus 10 of the first embodiment, and thus description thereof is omitted.
- FIGS. 16 to 19 are diagrams for explaining an operation in which the filter operation device 11 according to the second embodiment of the present invention performs a symmetric filter operation.
- FIGS. 16 and 17 are diagrams for explaining an operation of performing a symmetric filter operation when the number of taps is 16 in the filter operation device 11 according to the second embodiment of the present invention.
- a load instruction is issued, and pixel data [p0, p1, p2, p3, p4, p5, p6, p7] are stored in the register R0, and pixel data [p8, p9, p10, p11, p12 are stored in the register R1.
- P13, p14, p15] pixel data [p8, p9, p10, p11, p12, p13, p14, p15] are stored in the register R2, and pixel data [p16, p17, p18, p19] are stored in the register R3.
- P20, p21, p22, p23] are stored.
- a command “valnpadd.8 R4, R0, R1, R2, R3, 0” which is predetermined as a command for the number of filter taps “16” is issued.
- the first data shuffle unit 201 uses [p0, p1, p2, p3, p4, p5, p6, p7] of the register R0 as the third data string and [p8, p9, p10, p11, p12, p13, p14, p15] are read out.
- the first data shuffle unit 201 outputs [p0, p1, p2, p3, p4, p5, p6, p7], which are data outputs in the case of the execution control signal “8” shown in FIG. Extracted as a column (S102 in FIG. 5).
- the second data shuffle unit 202 [p8, p9, p10, p11, p12, p13, p14, p15] of the register R2 as the fifth data string and [p16, p17 of the register R3 as the sixth data string. , P18, p19, p20, p21, p22, p23] are read out. Then, the second data shuffle unit 202 converts the data output [p15, p16, p17, p18, p19, p20, p21, p22] in the case of the execution control signal “8” shown in FIG. Extracted as a column (S104 in FIG. 5).
- the second data shuffle unit 202 has the first data “p0” of the third data string and the last of the fifth data string.
- a second data string composed of the sixth data string is read from the register file 140, and the right data string is extracted.
- the first data shuffle unit 201 has a left data string [p0, p1, p2, p3, p4, p5, p6, with data “p0” on the left side of the data center D (p7, p8) as the head. p7] is extracted.
- the second data shuffle unit 202 is data on the right side of the data center D (p7, p8) and is data “p15” that is symmetric with respect to the top data “p0” of the left data string with respect to the data center D.
- To the right data string [p15, p16, p17, p18, p19, p20, p21, p22].
- a command “valnpadd.8 R5, R0, R1, R2, R3, 1” which is predetermined as the next command in the case of the number of filter taps “16” is issued.
- the first data shuffle unit 201 reads the first data string and outputs data for the execution control signal “9” illustrated in FIG. 13C [p1, p2, p3, p4, p5, p6, p7, p8] are extracted as the left data string.
- the second data shuffle unit 202 reads the second data string and outputs data for the execution control signal “9” shown in FIG. 13C [p14, p15, p16, p17, p18, p19, p20. , P21] is extracted as a right data string.
- the second data shuffle unit 202 has a left data string [p1, p2, p3, p.3] that starts with data “p1” on the left side of the data center D (p7, p8). p4, p5, p6, p7, p8] are extracted.
- the second data shuffle unit 202 is data on the right side of the data center D (p7, p8), and is data “p14” that is symmetric with respect to the data “p1” at the head of the left data string with respect to the data center D.
- To the right data string [p14, p15, p16, p17, p18, p19, p20, p21].
- the instruction “valnpadd.8 R6, R0, R1, R2, R3, 2” is issued, and the first data shuffle unit 201 receives [p2, p3, p4, p5, p6, p7, p8, p9] are extracted as the left data string.
- the second data shuffle unit 202 extracts [p13, p14, p15, p16, p17, p18, p19, p20] as the right data string. In this way, two pieces of pixel data to be multiplied by the filter coefficient k2 are extracted. Finally, the two extracted pixel data are added, and the addition result is stored in the register R6.
- an instruction “valnpadd.8 R7, R0, R1, R2, R3, 3” is issued, and the first data shuffle unit 201 receives [p3, p4, p5, p6, p7, p8, p9, p10]. Are extracted as the left data column.
- the second data shuffle unit 202 extracts [p12, p13, p14, p15, p16, p17, p18, p19] as the right data string. In this way, two pieces of pixel data to be multiplied by the filter coefficient k3 are extracted. Finally, the extracted two pieces of pixel data are added, and the addition result is stored in the register R7.
- an instruction “valnpadd.8 R8, R0, R1, R2, R3, 4” is issued, and the first data shuffle unit 201 receives [p4, p5, p6, p7, p8, p9, p10, p11]. Are extracted as the left data column.
- the second data shuffle unit 202 extracts [p11, p12, p13, p14, p15, p16, p17, p18] as the right data string. In this way, two pieces of pixel data to be multiplied by the filter coefficient k4 are extracted. Finally, the two extracted pixel data are added, and the addition result is stored in the register R8.
- an instruction “valnpadd.8 R9, R0, R1, R2, R3, 5” is issued, and the first data shuffling unit 201 [p5, p6, p7, p8, p9, p10, p11, p12].
- the second data shuffle unit 202 extracts [p10, p11, p12, p13, p14, p15, p16, p17] as the right data string.
- two pieces of pixel data to be multiplied by the filter coefficient k5 are extracted.
- the extracted two pieces of pixel data are added, and the addition result is stored in the register R9.
- the instruction “valnpadd.8 R10, R0, R1, R2, R3, 6” is issued, and the first data shuffle unit 201 receives [p6, p7, p8, p9, p10, p11, p12, p13]. Are extracted as the left data column. Further, the second data shuffle unit 202 extracts [p9, p10, p11, p12, p13, p14, p15, p16] as the right data string. In this way, two pieces of pixel data to be multiplied by the filter coefficient k6 are extracted. Finally, the extracted two pieces of pixel data are added, and the addition result is stored in the register R10.
- an instruction “valnpadd.8 R11, R0, R1, R2, R3, 7” is issued, and the first data shuffle unit 201 receives [p7, p8, p9, p10, p11, p12, p13, p14]. Are extracted as the left data column.
- the second data shuffle unit 202 extracts [p8, p9, p10, p11, p12, p13, p14, p15] as the right data string. In this way, two pieces of pixel data to be multiplied by the filter coefficient k7 are extracted. Finally, the extracted two pieces of pixel data are added, and the addition result is stored in the register R11.
- the multiplier 180 multiplies the addition data string calculated by the adder 170 by the left filter coefficient or the right filter coefficient to calculate a multiplication data string (S108 in FIG. 5). Since the left filter coefficient and the right filter coefficient have the same value, the multiplier 180 applies the same multiplication regardless of whether the added data string is multiplied by the left filter coefficient or the added data string is multiplied by the right filter coefficient. A data string can be calculated.
- the multiplier 180 multiplies the addition data string R4 by the filter coefficient k0, multiplies the addition data string R5 by the filter coefficient k1, and adds the filter coefficient k2 to the addition data string R6.
- the addition data string R11 is multiplied by the filter coefficient k7.
- the filter operation device 10 outputs the result of the symmetric filter operation by cumulatively adding the multiplication data strings calculated by the multiplier 180 (S110 in FIG. 5). Specifically, as illustrated in FIG. 17, the filter arithmetic device 10 cumulatively adds the eight multiplication results calculated by the multiplier 180, thereby obtaining the filter arithmetic results [q0, q1, q2, q3, q4, q5. , Q6, q7] can be obtained.
- the contents of the register R0 are represented as [r00, r01, r02, r03, r04, r05, r06, r07] and the contents of the register R2 are represented as [r20, r21, r22. , R23, r24, r25, r26, r27], for example, when the filter calculation result q0 is obtained, the pixels at the symmetric positions from the symmetric center position of the filter coefficient are loaded so as to be stored in R0 and R2. In this case, it is possible to cope with a symmetric filter having a large number of filter taps.
- R1 is loaded with the right continuous data of the pixel data stored in R0
- R3 is loaded with the right continuous data of the image data stored in R2. It is necessary to keep it. Even when the number of filter taps is so large that the number of pixels necessary for performing the filter operation does not fit into four of R0, R1, R2, and R3, the filter coefficients are symmetric from the center position of the filter coefficient in the same procedure. While sequentially loading the pixel at the position into the register, valnpadd. If 8 commands are issued, it is possible to cope.
- FIG. 18 is a diagram illustrating an operation for performing a symmetric filter calculation when the number of taps is 48 in the filter calculation device 11 according to the second embodiment of the present invention.
- the filter operation device 11 when the number of taps for the filter operation is an even number of 48, the filter operation device 11 is arranged at the center of the first data of the third data sequence and the final data of the fifth data sequence.
- the left data string and the right data string which are two data strings starting from left-right symmetric data, are extracted. Since the number of taps for the filter operation is an even number, data is extracted according to the execution control signals “8” to “15” shown in FIG. 13C.
- the filter calculation device 11 extracts the left data string and the right data string in three stages (a), (b), and (c) in FIG. 18 and performs a symmetric filter calculation.
- the first data shuffle unit 201 reads out the first data string composed of the continuous third data string RA and the fourth data string RB from the register file 140, and starts from the data center.
- the left data column starting from the left data is extracted.
- the first data shuffle unit 201 extracts the left data sequence [a0, a1, a2, a3, a4, a5, a6, a7] starting from the data “a0” of the third data sequence RA.
- the second data shuffle unit 202 multiplies the data arranged at the center (the final data of the data string RC and the first data of the data string RD) by the filter coefficient at the center of the symmetrical filter coefficient.
- the second data string composed of the continuous fifth data string RF and sixth data string RG is read from the register file 140 so that the right data string is extracted.
- the second data shuffle unit 202 extracts the right data string that is data on the right side of the data center and that starts with data that is symmetrical with the top data of the left data string with respect to the data center. .
- the second data shuffle unit 202 extracts the right data string [c7, d0, d1, d2, d3, d4, d5, d6] starting from the last data “c7” of the fifth data string RF.
- the first data shuffle unit 201 extracts the left data sequence [a1, a2, a3, a4, a5, a6, a7, b0], and the second data shuffle unit 202 outputs the right data sequence [ c6, c7, d0, d1, d2, d3, d4, d5] are extracted.
- data extraction is performed one after another, and finally, the first data shuffle unit 201 extracts the left data sequence [a7, b0, b1, b2, b3, b4, b5, b6]
- the second data shuffle unit 202 extracts the right data sequence [c0, c1, c2, c3, c4, c5, c6, c7].
- the first data shuffle unit 201 reads the first data string composed of the continuous third data string RB and the fourth data string RC from the register file 140, for example, The left data string [a0, a1, a2, a3, a4, a5, a6, a7] starting from the data “a0” of the third data string RB is extracted.
- the second data shuffle unit 202 reads the second data string composed of the continuous fifth data string RE and the sixth data string RF from the register file 140, for example, the last data string RE of the fifth data string RE.
- the right data string [c7, d0, d1, d2, d3, d4, d5, d6] starting from the data “c7” is extracted.
- the first data shuffle unit 201 extracts up to the left data sequence [a7, b0, b1, b2, b3, b4, b5, b6] and the second data shuffle unit 202 extracts the right data sequence [ c0, c1, c2, c3, c4, c5, c6, c7].
- the first data shuffle unit 201 reads the first data string composed of the continuous third data string RC and the fourth data string RD from the register file 140, for example, The left data string [a0, a1, a2, a3, a4, a5, a6, a7] starting from the data “a0” of the third data string RC is extracted.
- the second data shuffle unit 202 reads the second data string composed of the continuous fifth data string RD and the sixth data string RE from the register file 140, for example, the last data string RD of the fifth data string RD
- the right data string [c7, d0, d1, d2, d3, d4, d5, d6] starting from the data “c7” is extracted.
- the first data shuffle unit 201 extracts up to the left data sequence [a7, b0, b1, b2, b3, b4, b5, b6] and the second data shuffle unit 202 extracts the right data sequence [ c0, c1, c2, c3, c4, c5, c6, c7].
- the data strings for all pairs are extracted, the data strings of the extracted pairs are added, multiplied by the filter coefficient, and cumulatively added.
- FIG. 19 is a diagram illustrating an operation for performing a symmetric filter calculation when the number of taps is 49 in the filter calculation device 11 according to the second embodiment of the present invention.
- the filter operation device 11 when the number of taps in the filter operation is an odd number of 49, the filter operation device 11 is arranged at the center of the top data of the third data sequence and the top data of the sixth data sequence.
- the left data string and the right data string which are two data strings starting from the left-right symmetric data, are extracted. Since the number of taps in the filter operation is an odd number, data is extracted according to the execution control signals “16” to “23” shown in FIG. 13C.
- the filter calculation device 11 extracts the left data string and the right data string in three stages (a), (b), and (c) in FIG. 18 and performs a symmetric filter calculation.
- the first data shuffle unit 201 reads out the first data string composed of the continuous third data string RA and the fourth data string RB from the register file 140, and starts from the data center.
- the left data column starting from the left data is extracted.
- the first data shuffle unit 201 extracts the left data sequence [a0, a1, a2, a3, a4, a5, a6, a7] starting from the data “a0” of the third data sequence RA.
- the second data shuffle unit 202 continues so that the data arranged at the center (the first data of the data string RD) is multiplied by the filter coefficient at the center of the left and right symmetrical filter coefficients.
- the second data string composed of the fifth data string RF and the sixth data string RG is read from the register file 140 and the right data string is extracted.
- the second data shuffle unit 202 extracts the right data string that is data on the right side of the data center and that starts with data that is symmetrical with the top data of the left data string with respect to the data center. .
- the second data shuffle unit 202 extracts the right data string [d0, d1, d2, d3, d4, d5, d6, d7] starting from the top data “d0” of the fifth data string RG.
- the first data shuffle unit 201 extracts the left data sequence [a1, a2, a3, a4, a5, a6, a7, b0], and the second data shuffle unit 202 outputs the right data sequence [ c7, d0, d1, d2, d3, d4, d5, d6] are extracted.
- data extraction is performed one after another, and finally, the first data shuffle unit 201 extracts the left data sequence [a7, b0, b1, b2, b3, b4, b5, b6]
- the second data shuffle unit 202 extracts the right data string [c1, c2, c3, c4, c5, c6, c7, d0].
- the first data shuffle unit 201 reads the first data string composed of the continuous third data string RB and the fourth data string RC from the register file 140, for example, The left data string [a0, a1, a2, a3, a4, a5, a6, a7] starting from the data “a0” of the third data string RB is extracted.
- the second data shuffle unit 202 reads the second data string composed of the continuous fifth data string RE and sixth data string RF from the register file 140, for example, at the head of the sixth data string RF.
- the right data string [d0, d1, d2, d3, d4, d5, d6, d7] starting from the data “d0” is extracted.
- the first data shuffle unit 201 extracts up to the left data sequence [a7, b0, b1, b2, b3, b4, b5, b6], and the second data shuffle unit 202 extracts the right data sequence [ c1, c2, c3, c4, c5, c6, c7, d0].
- the first data shuffle unit 201 reads the first data string composed of the continuous third data string RC and the fourth data string RD from the register file 140, for example, The left data string [a0, a1, a2, a3, a4, a5, a6, a7] starting from the data “a0” of the third data string RC is extracted.
- the second data shuffle unit 202 reads the second data string composed of the continuous fifth data string RD and the sixth data string RE from the register file 140, for example, at the head of the sixth data string RE.
- the right data string [d0, d1, d2, d3, d4, d5, d6, d7] starting from the data “d0” is extracted.
- the first data shuffle unit 201 extracts up to the left data sequence [a7, b0, b1, b2, b3, b4, b5, b6], and the second data shuffle unit 202 extracts the right data sequence [ c1, c2, c3, c4, c5, c6, c7, d0]. Further, the first data shuffle unit 201 extracts a data string [b0, b1, b2, b3, b4, b5, b6, b7] multiplied by the center filter coefficient.
- the data strings for all pairs are extracted, the data strings of the extracted pairs are added, multiplied by the filter coefficient, and cumulatively added.
- the filter operation device 11 when the number of taps for the filter operation is an even number, the center of the first data of the third data string and the last data of the fifth data string
- the left data string is extracted from the first data string composed of the third data string and the fourth data string so that the data arranged in is the data multiplied by the center filter coefficient.
- the right data string is extracted from the second data string composed of the continuous fifth data string and sixth data string.
- the number of taps in the filter operation is an odd number
- the data arranged at the center of the top data of the third data string and the top data of the sixth data string becomes the data multiplied by the center filter coefficient.
- the left data column and the right data column are extracted.
- the first data string is stored in one buffer
- the second data string is stored in the other buffer
- the left data string is extracted from one buffer
- the right data string is extracted from the other buffer.
- the description has been made so far on the assumption that the buffer is composed of a plurality of registers, but the buffer is not limited to registers. For example, a partial area of the data memory may be used as a buffer.
- the left data column and the right data column are extracted so that the top data of the left data column and the top data of the right data column are symmetrical with respect to the data center.
- a pair of data strings to be multiplied by the same filter coefficient can be extracted.
- the symmetrical filter calculation corresponding to various tap numbers can be performed with a processor.
- the third data string and the fourth data string, or the fifth data string and the sixth data string are stored in consecutive number registers.
- each component may be configured by dedicated hardware, or may be realized by executing a software program suitable for each component.
- each component may be realized by a program execution unit such as a CPU or a processor reading and executing a software program recorded on a recording medium such as a hard disk or a semiconductor memory.
- each component of the symmetric filter arithmetic apparatus shown in FIG. 1 or 10 may be realized by software.
- achieves the symmetrical filter calculating apparatus of said each embodiment is a program which makes a computer perform the step contained in the following symmetrical filter calculating methods. That is, this symmetric filter calculation method is a symmetric filter calculation method for performing a filter calculation of a plurality of data stored in a storage unit using left and right symmetric filter coefficients, and a plurality of continuous filter calculations from the storage unit.
- the left data is read out from the first data string, and the left data string that is a plurality of continuous data multiplied by the left filter coefficient that is the filter coefficient on the left side from the center is extracted from the first data string
- a column extraction step and reading out a second data string that is a plurality of continuous data from the storage unit, and the right side of the second data string is a filter coefficient on the right side of the center and having the same value as the left side filter coefficient
- Such a program can be distributed via a recording medium such as a CD-ROM and a transmission medium such as the Internet.
- the present invention can also be realized as an integrated circuit (LSI) including a characteristic processing unit included in such a symmetric filter arithmetic device.
- LSI integrated circuit
- These may be individually made into one chip, or may be made into one chip so as to include a part or all of them.
- all of the functional blocks included in the symmetric filter arithmetic apparatus shown in FIG. 1 or 10 except for the memory may be integrated into one chip.
- LSI is used, but depending on the degree of integration, it may be called IC, system LSI, super LSI, or ultra LSI.
- the method of circuit integration is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor.
- An FPGA Field Programmable Gate Array
- a reconfigurable processor that can reconfigure the connection and setting of circuit cells inside the LSI may be used.
- the case where the pixel data is 8 bits has been described as an example, but the pixel data may be other than 8 bits.
- the number of pixels may be other than 8 pixels.
- the number of pixels stored in one register is other than eight. Can also be supported.
- the filter operation device performs the symmetric filter operation on the pixel data.
- the data on which the symmetric filter operation is performed is not limited to the pixel data. Data other than data may be used.
- the mnemonic of the instruction for performing the symmetric filter operation is used to set two consecutive number registers “Rc: Rc + 1”, “Ra, Ra + 1”, “Rb, Rb + 1”. ", But you may give it an alias. For example, if two consecutive numbered registers are aliased as one register X, 32 64-bit registers R0-R31 can be represented as 16 128-bit registers X0-X15. In this case, “Rc: Rc + 1” can be expressed as “Xc”, “Ra, Ra + 1” as “Xa”, and “Rb, Rb + 1” as “Xb”.
- valnpadd. 8 instructions are changed to valnadd. We decided to use it for symmetric filters larger than 9 taps that cannot be handled by 8 instructions. However, valnpadd. Since 8 instructions can be used even in the case of a symmetric filter of 9 taps or less, in the second embodiment, valnpadd. Eight instructions may be used.
- the symmetric filter arithmetic device is useful for performing symmetric filter arithmetic processing.
- the filter operation of image data is one of the basic operations of image processing, and the present invention can be used in various devices that perform image processing. For example, it can be used for information display devices and imaging devices such as televisions, digital video recorders, car navigation systems, mobile phones, digital cameras, and digital video cameras.
- Filter operation device (symmetric filter operation device) 110 instruction memory 120 instruction fetch unit 130 instruction decoder 140 register file 150 memory access unit 160 data shuffler 161 first data shuffle unit 162 second data shuffle unit 170 adder 180 multiplier 190 data memory 200 data shuffler 201 first data Shuffle unit 202 Second data shuffle unit 300 Buffer 310 Selector 321 to 324 Filter operation unit
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Complex Calculations (AREA)
- Image Processing (AREA)
Abstract
Description
本発明者は、「背景技術」の欄において記載した従来の対称型フィルタ演算装置に関し、以下の課題が生じることを見出した。
図1は、本発明の実施の形態1における対称型フィルタ演算装置10(以降、フィルタ演算装置10と呼ぶ)の構成を示す図である。
図10は、本発明の実施の形態2における対称型フィルタ演算装置11(以降、フィルタ演算装置11と呼ぶ)の構成を示す図である。本フィルタ演算装置11は、実施の形態1のフィルタ演算装置10のデータシャッフル器160の代わりに、データシャッフル器200を備えていることを特徴としている。
110 命令メモリ
120 命令フェッチ部
130 命令デコーダ
140 レジスタファイル
150 メモリアクセス部
160 データシャッフル器
161 第一データシャッフル部
162 第二データシャッフル部
170 加算器
180 乗算器
190 データメモリ
200 データシャッフル器
201 第一データシャッフル部
202 第二データシャッフル部
300 バッファ
310 セレクタ
321~324 フィルタ演算部
Claims (9)
- 左右対称のフィルタ係数を用いて、記憶部に格納されている複数のデータのフィルタ演算を行う対称型フィルタ演算装置であって、
前記記憶部から連続する複数のデータである第一データ列を読み出し、前記第一データ列の中から、中心より左側のフィルタ係数である左側フィルタ係数に乗算される連続する複数のデータである左側データ列を抽出する左側データ列抽出部と、
前記記憶部から連続する複数のデータである第二データ列を読み出し、前記第二データ列の中から、中心より右側のフィルタ係数であって前記左側フィルタ係数と同じ値の右側フィルタ係数に乗算される連続する複数のデータである右側データ列を抽出する右側データ列抽出部と
を備える対称型フィルタ演算装置。 - さらに、
抽出された前記左側データ列と前記右側データ列とを加算して、加算データ列を算出する加算部と、
算出された前記加算データ列と前記左側フィルタ係数または前記右側フィルタ係数とを乗算して、乗算データ列を算出する乗算部とを備える
請求項1に記載の対称型フィルタ演算装置。 - 前記左側データ列抽出部は、連続する第三データ列と第四データ列とで構成される前記第一データ列を前記記憶部から読み出して、前記左側データ列を抽出し、
前記右側データ列抽出部は、
(a)前記第三データ列の先頭のデータ及び第五データ列の最終のデータの中心であるデータ中心に配置されるデータが、左右対称のフィルタ係数の中心のフィルタ係数に乗算されるデータになるように、連続する前記第五データ列と第六データ列とで構成される前記第二データ列を前記記憶部から読み出して、前記右側データ列を抽出する、または、
(b)前記第三データ列の先頭のデータ及び第六データ列の先頭のデータの中心であるデータ中心に配置されるデータが、左右対称のフィルタ係数の中心のフィルタ係数に乗算されるデータになるように、連続する第五データ列と前記第六データ列とで構成される前記第二データ列を前記記憶部から読み出して、前記右側データ列を抽出する
請求項1または2に記載の対称型フィルタ演算装置。 - 前記左側データ列抽出部は、前記データ中心より左側のデータを先頭とする前記左側データ列を抽出し、
前記右側データ列抽出部は、前記データ中心より右側のデータであって、前記データ中心に対して前記左側データ列の先頭のデータと対称となるデータを先頭とする前記右側データ列を抽出する
請求項3に記載の対称型フィルタ演算装置。 - 前記第三データ列及び前記第四データ列、または前記第五データ列及び前記第六データ列は、前記記憶部の連続した領域に記憶されており、
前記左側データ列抽出部は、前記記憶部から前記第一データ列を読み出して、前記左側データ列を抽出し、
前記右側データ列抽出部は、前記記憶部から前記第二データ列を読み出して、前記右側データ列を抽出する
請求項3または4に記載の対称型フィルタ演算装置。 - 前記左側データ列抽出部は、連続する第三データ列と第四データ列とで構成される前記第一データ列を前記記憶部から読み出して、前記左側データ列を抽出し、
前記右側データ列抽出部は、前記第一データ列を前記第二データ列として前記記憶部から読み出して、前記第三データ列の中心であるデータ中心に配置されるデータが、左右対称のフィルタ係数の中心のフィルタ係数に乗算されるデータになるように、前記右側データ列を抽出する
請求項1または2に記載の対称型フィルタ演算装置。 - 前記第三データ列の先頭のデータと前記第三データ列の最終のデータとの中心を前記データ中心とし、または、前記第三データ列の先頭のデータと前記第四データ列の先頭のデータとの中心を前記データ中心とし、
前記左側データ列抽出部は、前記データ中心より左側のデータを先頭とする前記左側データ列を抽出し、
前記右側データ列抽出部は、前記データ中心より右側のデータであって、前記データ中心に対して前記左側データ列の先頭のデータと対称となるデータを先頭とする前記右側データ列を抽出する
請求項6に記載の対称型フィルタ演算装置。 - 左右対称のフィルタ係数を用いて、記憶部に格納されている複数のデータのフィルタ演算を行う対称型フィルタ演算方法であって、
前記記憶部から連続する複数のデータである第一データ列を読み出し、前記第一データ列の中から、中心より左側のフィルタ係数である左側フィルタ係数に乗算される連続する複数のデータである左側データ列を抽出する左側データ列抽出ステップと、
前記記憶部から連続する複数のデータである第二データ列を読み出し、前記第二データ列の中から、中心より右側のフィルタ係数であって前記左側フィルタ係数と同じ値の右側フィルタ係数に乗算される連続する複数のデータである右側データ列を抽出する右側データ列抽出ステップと
を含む対称型フィルタ演算方法。 - 左右対称のフィルタ係数を用いて、記憶部に格納されている複数のデータのフィルタ演算を行うためのプログラムであって、
前記記憶部から連続する複数のデータである第一データ列を読み出し、前記第一データ列の中から、中心より左側のフィルタ係数である左側フィルタ係数に乗算される連続する複数のデータである左側データ列を抽出する左側データ列抽出ステップと、
前記記憶部から連続する複数のデータである第二データ列を読み出し、前記第二データ列の中から、中心より右側のフィルタ係数であって前記左側フィルタ係数と同じ値の右側フィルタ係数に乗算される連続する複数のデータである右側データ列を抽出する右側データ列抽出ステップと
をコンピュータに実行させるプログラム。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201280002462.6A CN103098371B (zh) | 2011-09-02 | 2012-07-25 | 对称型滤波运算装置以及对称型滤波运算方法 |
US13/818,198 US8989512B2 (en) | 2011-09-02 | 2012-07-25 | Symmetric filter arithmetic apparatus and symmetric filter arithmetic method |
JP2012555249A JP5903598B2 (ja) | 2011-09-02 | 2012-07-25 | 対称型フィルタ演算装置及び対称型フィルタ演算方法 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2011192060 | 2011-09-02 | ||
JP2011-192060 | 2011-09-02 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013031083A1 true WO2013031083A1 (ja) | 2013-03-07 |
Family
ID=47755617
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2012/004729 WO2013031083A1 (ja) | 2011-09-02 | 2012-07-25 | 対称型フィルタ演算装置及び対称型フィルタ演算方法 |
Country Status (4)
Country | Link |
---|---|
US (1) | US8989512B2 (ja) |
JP (1) | JP5903598B2 (ja) |
CN (1) | CN103098371B (ja) |
WO (1) | WO2013031083A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019088072A1 (ja) * | 2017-11-01 | 2019-05-09 | 日本電気株式会社 | 情報処理装置、情報処理方法及びプログラム |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9898286B2 (en) * | 2015-05-05 | 2018-02-20 | Intel Corporation | Packed finite impulse response (FIR) filter processors, methods, systems, and instructions |
US9977601B2 (en) | 2016-03-17 | 2018-05-22 | Ceva D.S.P. Ltd. | Data load for symmetrical filters |
US11092106B2 (en) * | 2019-03-26 | 2021-08-17 | Ford Global Technologies, Llc | System and method for processing cylinder pressures |
EP3994796A1 (en) * | 2019-07-18 | 2022-05-11 | Huawei Technologies Co., Ltd. | Advanced finite impulse response system and method for real coefficients and complex data |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11266140A (ja) * | 1997-12-23 | 1999-09-28 | Koninkl Philips Electronics Nv | ディジタルフィルタを実現するプログラム可能な回路 |
JP2000124773A (ja) * | 1998-10-19 | 2000-04-28 | New Japan Radio Co Ltd | ディジタルフィルタ |
JP2006319941A (ja) * | 2005-04-15 | 2006-11-24 | Sanyo Electric Co Ltd | Firフィルタ演算器 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5034907A (en) * | 1989-09-12 | 1991-07-23 | North American Philips Corporation | Dynamically configurable signal processor and processor arrangement |
JPWO2006048958A1 (ja) * | 2004-11-05 | 2008-05-22 | 有限会社ニューロソリューション | デジタルフィルタおよびその設計方法、設計装置、デジタルフィルタ設計用プログラム |
US7492848B2 (en) * | 2005-04-13 | 2009-02-17 | Texas Instruments Incorporated | Method and apparatus for efficient multi-stage FIR filters |
JP4824703B2 (ja) | 2005-12-19 | 2011-11-30 | パナソニック株式会社 | 2次元フィルタ演算装置及び方法 |
CN101163240A (zh) * | 2006-10-13 | 2008-04-16 | 国际商业机器公司 | 一种滤波装置及其方法 |
-
2012
- 2012-07-25 WO PCT/JP2012/004729 patent/WO2013031083A1/ja active Application Filing
- 2012-07-25 CN CN201280002462.6A patent/CN103098371B/zh not_active Expired - Fee Related
- 2012-07-25 JP JP2012555249A patent/JP5903598B2/ja not_active Expired - Fee Related
- 2012-07-25 US US13/818,198 patent/US8989512B2/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11266140A (ja) * | 1997-12-23 | 1999-09-28 | Koninkl Philips Electronics Nv | ディジタルフィルタを実現するプログラム可能な回路 |
JP2000124773A (ja) * | 1998-10-19 | 2000-04-28 | New Japan Radio Co Ltd | ディジタルフィルタ |
JP2006319941A (ja) * | 2005-04-15 | 2006-11-24 | Sanyo Electric Co Ltd | Firフィルタ演算器 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019088072A1 (ja) * | 2017-11-01 | 2019-05-09 | 日本電気株式会社 | 情報処理装置、情報処理方法及びプログラム |
US11580194B2 (en) | 2017-11-01 | 2023-02-14 | Nec Corporation | Information processing apparatus, information processing method, and program |
Also Published As
Publication number | Publication date |
---|---|
CN103098371A (zh) | 2013-05-08 |
US20140219577A1 (en) | 2014-08-07 |
CN103098371B (zh) | 2016-04-13 |
US8989512B2 (en) | 2015-03-24 |
JPWO2013031083A1 (ja) | 2015-03-23 |
JP5903598B2 (ja) | 2016-04-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5903598B2 (ja) | 対称型フィルタ演算装置及び対称型フィルタ演算方法 | |
JP4623963B2 (ja) | コンテンツデータを効率的にフィルタリング及び畳み込む方法及び装置 | |
US7873812B1 (en) | Method and system for efficient matrix multiplication in a SIMD processor architecture | |
US9665540B2 (en) | Video decoder with a programmable inverse transform unit | |
EP3093757B1 (en) | Multi-dimensional sliding window operation for a vector processor | |
CN108073549B (zh) | 卷积运算装置及方法 | |
US20120278591A1 (en) | Crossbar switch module having data movement instruction processor module and methods for implementing the same | |
EP3217289A2 (en) | System and method for preventing cache contention | |
JP4698242B2 (ja) | 並列演算プロセッサ、並列演算プロセッサの動作を制御する制御プログラム及び制御方法、並びに並列演算プロセッサを搭載した画像処理装置 | |
WO1997022938A1 (en) | Manipulating video and audio signals using a processor which supports simd instructions | |
US8352528B2 (en) | Apparatus for efficient DCT calculations in a SIMD programmable processor | |
US7412587B2 (en) | Parallel operation processor utilizing SIMD data transfers | |
JP4020804B2 (ja) | データ処理装置 | |
JP2000322235A (ja) | 情報処理装置 | |
JP6687803B2 (ja) | 区分線形近似のためのシステムおよび方法 | |
JP2013239120A (ja) | 画像処理装置 | |
JP4014486B2 (ja) | 画像処理方法及び画像処理装置 | |
US9330438B1 (en) | High performance warp correction in two-dimensional images | |
JP5499203B2 (ja) | ブロックマッチング回路及びデータ更新方法 | |
JP7141401B2 (ja) | プロセッサおよび情報処理システム | |
JP4171319B2 (ja) | 画像音声処理装置 | |
JP3895031B2 (ja) | 行列ベクトル乗算器 | |
US8693796B2 (en) | Image processing apparatus and method for performing a discrete cosine transform | |
JP4451693B2 (ja) | データをフィルタリングする方法 | |
JP5473507B2 (ja) | 並列処理装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201280002462.6 Country of ref document: CN |
|
ENP | Entry into the national phase |
Ref document number: 2012555249 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13818198 Country of ref document: US |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12828858 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 12828858 Country of ref document: EP Kind code of ref document: A1 |