WO2006013839A1 - アレイ型演算装置 - Google Patents
アレイ型演算装置 Download PDFInfo
- Publication number
- WO2006013839A1 WO2006013839A1 PCT/JP2005/014077 JP2005014077W WO2006013839A1 WO 2006013839 A1 WO2006013839 A1 WO 2006013839A1 JP 2005014077 W JP2005014077 W JP 2005014077W WO 2006013839 A1 WO2006013839 A1 WO 2006013839A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- instruction
- control information
- array
- processor
- data
- Prior art date
Links
- 238000004364 calculation method Methods 0.000 claims description 53
- 208000033748 Device issues Diseases 0.000 abstract 1
- 238000000034 method Methods 0.000 description 55
- 230000015654 memory Effects 0.000 description 45
- 238000013500 data storage Methods 0.000 description 35
- 238000010586 diagram Methods 0.000 description 33
- 230000033001 locomotion Effects 0.000 description 18
- 101150005652 selO gene Proteins 0.000 description 11
- 230000001276 controlling effect Effects 0.000 description 10
- 230000000875 corresponding effect Effects 0.000 description 8
- 238000011156 evaluation Methods 0.000 description 4
- 230000007704 transition Effects 0.000 description 4
- 101100136062 Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv) PE10 gene Proteins 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 210000000744 eyelid Anatomy 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 101100136063 Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv) PE11 gene Proteins 0.000 description 1
- 101100062121 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) cyc-1 gene Proteins 0.000 description 1
- 101150032799 PE15 gene Proteins 0.000 description 1
- 235000008331 Pinus X rigitaeda Nutrition 0.000 description 1
- 235000011613 Pinus brutia Nutrition 0.000 description 1
- 241000018646 Pinus brutia Species 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002620 method output Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3842—Speculative instruction execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/43—Hardware specially adapted for motion estimation or compensation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
Definitions
- the present invention relates to a signal processing LSI for use in video and audio equipment using a digital signal such as a DVD recorder or a digital TV, and more particularly to an image signal processing LSI.
- MPEG Moving Picture Experts Group
- MPEG coding L SI has been installed in various devices due to recent various types of digital AV devices, for example, DVD recorders and monofilar devices that are stationary devices.
- digital AV devices for example, DVD recorders and monofilar devices that are stationary devices.
- camcorders an application called MPEG2 encoding runs.
- the DVD recorder which is a stationary device, is required to execute an algorithm that guarantees higher image quality than a camcorder, which is a mono device.
- the MPEG code is briefly described.
- the motion vector representing how much the subject of the current frame has moved in which direction in the past or future frame! Preserves the brightness and the amount of change in color difference.
- This process of obtaining a motion vector is called a motion vector search process (see Fig. 1), and this process has a very large processing amount compared to general image processing such as resizing filter and motion compensation. There are features.
- Patent Document 1 Japanese Patent Application Laid-Open No. 09-0222404
- the present invention provides a two-dimensional array-type operation that realizes efficient parallel processing by controlling the two-dimensional array-type arithmetic device more flexibly by software and does not increase the area of the hardware.
- the purpose is to provide a device.
- an array-type arithmetic unit of the present invention includes a processor array configured by a plurality of ordered processor elements, and an instruction acquisition unit that acquires one instruction per cycle. And operation control information for controlling the operation of the processor element of the first order is created for each cycle, and the first operation is based on the created operation control information and one instruction acquired by the instruction acquisition means.
- a means for generating instructions for the processor elements of the rank and operation control information for controlling the operation of the processor elements after the next rank are created for controlling the operation of the processor elements of the previous rank for each cycle. Created based on the operation control information, and based on the created operation control information and one instruction acquired by the instruction acquisition means, the processor elements after the next order are processed. And a means for generating an instruction to the client.
- the array type arithmetic unit according to the present invention has the above-described configuration, so that a plurality of processor elements (Processor elements: ⁇ "P_t ⁇ " and! /, One) are obtained by one instruction. Because different operations can be performed, flexible processing can be performed using multiple PEs.
- the processor array is composed of a plurality of processor elements connected by signal lines, and the operation result of each processor element is transmitted to the next-order processor element via the signal line every cycle. You can do it.
- the array-type arithmetic unit can transmit the PE operation result to the subsequent PE.
- the array type arithmetic unit further includes basic control information generating means for generating basic control information every cycle, and the operation control information for controlling the operation of the first rank processor element is The basic control information may be created based on the basic control information generated by the basic control information generating means.
- each of the processor elements includes data acquisition means for acquiring a plurality of types of data
- the operation control information is a specification that specifies a type of data used when each processor element executes an instruction.
- Each processor element including information may use data acquired in accordance with the designation information in execution. This makes it possible to change the data used when executing each PE, so that more flexible processing can be performed.
- the operation control information is information that specifies whether or not to execute one instruction acquired by the instruction acquisition means, and is an instruction that the operation control information is to be executed.
- the processor element may execute the instruction, and if the operation control information is an instruction not to execute, the power supply to the corresponding processor element may be suppressed.
- the power supply to the PE can be performed without executing the calculation, so that the power consumption can be reduced.
- the array type arithmetic unit of the present invention is a processor array having a two-dimensional array structure of M rows and XN columns, which has M rows of N processor elements connected by signal lines, The processor array connected so that the calculation result of the mouth sensor element can be transmitted to the processor element in the next row, basic control information generating means for generating basic control information for each cycle, and one for each cycle Based on the basic control information generated by the basic control information generating means, instruction acquisition means for acquiring instructions and operation control information for controlling the operation of the processor element in the first row for each cycle are created. Based on the generated operation control information and one instruction acquired by the instruction acquisition means, a means for generating an instruction for the first line, and a movement of processor elements of 2 to M lines per cycle.
- the operation control information for controlling the operation is created based on the operation control information created for controlling the operation of the processor element in the previous line, and the created operation control information and one instruction acquired by the instruction acquisition means
- N means for generating instructions for the processor elements of the 2 to M rows, and the N processor elements arranged in each row of the processor array execute instructions for the row. It is good also as a characteristic. [0017] This allows a plurality of PEs to perform different operations by issuing one instruction to a two-dimensional array type arithmetic unit, so that more flexible processing can be performed. It ’s a little tricky.
- FIG. 1 is a diagram showing a search method for motion vector search processing.
- FIG. 2 (a) shows the configuration of the reference image 100
- FIG. 2 (b) shows the configuration of the target image 200.
- FIG. 3 is a diagram showing an example of a conventional array processor.
- FIG. 4 is a diagram showing a configuration of a peripheral portion related to the array type arithmetic unit 1000.
- FIG. 5 is a diagram showing details of the configuration of the array type arithmetic unit 1000.
- FIG. 6 is a diagram showing a method of supplying a reference image 100 to a PE array 1100.
- FIG. 7 is a diagram showing a method for supplying control information (tokens) to an instruction generation unit (3100, etc.).
- FIG. 8 is a diagram showing a transition of contents stored in a correlation storage unit 2400 of each PE.
- FIG. 9 is a flowchart showing processing for obtaining a correlation between a target image 200 and a reference image 100 in the array type arithmetic unit 1000 according to the first embodiment.
- FIG. 10 is a flowchart showing processing of “exec_array” in the first exemplary embodiment.
- FIG. 11 is a flowchart showing processing of a PE according to the first embodiment.
- FIG. 12 (a) is a flowchart showing processing of a control information generation unit of Embodiment 1.
- FIG. 12B is a flowchart showing processing of the instruction generation unit of the first embodiment.
- FIG. 13 is a diagram showing the token and PE operations of Embodiment 1 on a time axis.
- FIG. 14 is a diagram showing an example of a program according to the first embodiment.
- FIG. 15 is a diagram showing the token and PE operations of Embodiment 2 on a time axis.
- FIG. 16 is a diagram showing an example of a program according to the second embodiment.
- FIG. 17 is a diagram showing details of the configuration of the array type arithmetic unit 1000 according to the third embodiment.
- FIG. 18 is a flowchart showing processing for obtaining a correlation between a target image 200 and a reference image 100 in the array type arithmetic apparatus 1000 according to the third embodiment.
- FIG. 19 is a flowchart showing processing of “exec_array” in the third embodiment.
- FIG. 20 is a flowchart showing processing of a PE according to the third embodiment.
- FIG. 21 (a) is a flowchart showing the processing of the control information generating unit 3000 of the third embodiment
- FIG. 21 (b) shows the processing of the instruction generating unit (3100, etc.) of the third embodiment. It is a flow chart.
- FIG. 22 is a diagram showing the token and PE operations of the third embodiment on the time axis.
- FIG. 23 is a diagram showing an example of a program according to the third embodiment.
- FIG. 24 is a diagram showing an example of a target image of Embodiment 4 and a reference image supplied to the PE array.
- ⁇ 25 A flowchart showing a process for obtaining the correlation between the target image 200 and the reference image 100 in the array type arithmetic unit 1000 according to the fourth embodiment.
- FIG. 26 is a flowchart showing processing of “exec_array” in the fourth embodiment.
- FIG. 27 is a flowchart showing processing of a PE according to the fourth embodiment.
- FIG. 28 (a) is a flowchart showing the processing of the control information generating unit 3000 of the fourth embodiment
- FIG. 28 (b) shows the processing of the instruction generating unit (3100, etc.) of the fourth embodiment. It is a flow chart.
- FIG. 29 is a diagram showing the token and PE operations of Embodiment 4 on the time axis.
- FIG. 30 is a diagram showing an example of a program according to the fourth embodiment.
- the array type arithmetic unit according to the present invention is a two-dimensional array type arithmetic unit, which reduces the number of instruction memories and instruction decoders, and controls software instructions for controlling arithmetic units arranged in an array type. By suppressing the length, the expansion of the scale of instruction memory and instruction decoder is to be suppressed.
- SIMD Single Instruction Multi Data
- !! methods are widely known. This is because software is issued by issuing a common instruction to the computing unit in the developed direction. This is to reduce the cost required for control. This method is particularly suitable for pixel processing where each PE performs the same operation.
- the number of instructions is significantly reduced compared to issuing independent operation instructions to each PE, but the arrays used for the motion vector search process and the image recognition process described above are It is a large-scale one, and even the instructions for several rows (or columns) have a large effect on the instruction memory size and instruction decoder. Basically, as many instruction memories and instruction decoders as there are rows are required.
- the instruction memory and the instruction decoder are each provided, and the instruction memory size is reduced by further reducing the instruction length, resulting in the LSI. Trying to suppress the increase in area.
- the two-dimensional array type arithmetic device of the present embodiment performs motion vector search processing in MPEG encoding processing.
- FIG. 1 is a diagram showing a search method for motion vector search processing.
- the target screen 20 is a current encoding target frame, and the target image 200 is a so-called macroblock.
- the reference screen 10 is a past or future frame for calculating a motion vector
- the reference image 100 is a range in which a similar portion of the target image 200 is searched.
- the portion having the highest similarity to the macroblock is searched by shifting one pixel at a time from the upper left to the lower right (reference image 100-1, reference image 100-2).
- FIG. 2 is a diagram illustrating the configuration of the reference image 100 and the target image 200.
- Fig. 2 (a) shows the reference image 100, where the upper left pixel is "R (0,0)" and the pixel position is expressed in the xy coordinate system.
- FIG. 2B shows the target image 200, and the upper left pixel is expressed as “T (0,0)” as in FIG. 2A.
- R (x, y)” or “Rxy” represents a pixel of the reference image 100
- T (x, y)” or “Txy” represents a pixel of the target image 200.
- the mouth represents the pixel of the reference image 100
- the circle represents the pixel of the target image 200.
- FIG. 1 the configuration of a two-dimensional array type arithmetic unit 1000 that is useful in the present invention will be described with reference to FIGS. 4 and 5.
- FIG. 1 the configuration of a two-dimensional array type arithmetic unit 1000 that is useful in the present invention will be described with reference to FIGS. 4 and 5.
- FIG. 1 the configuration of a two-dimensional array type arithmetic unit 1000 that is useful in the present invention will be described with reference to FIGS. 4 and 5.
- FIG. 4 is a diagram showing a configuration of a peripheral part related to the array type arithmetic unit 1000.
- FIG. 4 in addition to the array type arithmetic unit 1000, a program counter 1001, an instruction memory 1002, an instruction issuing unit 1003, a program storage unit 1004, and a memory cache 1005 are shown.
- Each of these functional units and each PE, etc. shall be connected by the number of data signal lines corresponding to the number of bits to be transmitted.
- the program storage unit 1004 stores a software program that also has instruction stream power indicating the operation of the arithmetic unit, and the program counter 1001 indicates an instruction to be executed next, stored in the program storage unit 1004.
- the instruction pointed to by the program counter 1001 is fetched into the instruction memory 1002, and the instruction issuing unit 1003 decodes the instruction in the instruction memory 1002, and issues a control signal that is a common instruction to the entire array type arithmetic unit 1000.
- the memory cache 1005 stores data used by the array type arithmetic device 1000.
- This array-type arithmetic unit 1000 includes a PE array 1100 composed of 30 PEs (PE00, etc.) arranged in a two-dimensional array of 5 rows x 6 columns, a control information generation unit 3000, and an instruction generation unit (3100-3500). Composed.
- Each PE and each generation unit are connected by a bus so that signals can be transmitted.
- Each generation unit shall be composed of logic circuits.
- the array type arithmetic unit 1000 has two features. The first is to control 30 PEs with one instruction (InstO) input from the external instruction issuing unit 1 003, and the control information generator 3000 described below and instructions to each line (Inst00 to Inst40) ) For generating an instruction generation unit (3100, etc.). The other is that each PE is connected in the column direction not only in the row direction, but is connected by the bus 1009 so that data can be transmitted and received.
- FIG. 5 is a diagram showing details of the configuration of the array type arithmetic unit 1000. For convenience of explanation, only some PEs are shown here.
- the array type arithmetic unit 1000 includes a plurality of PE 2000s, a control information generation unit 3000, a plurality of instruction generation units (3100, etc.), and an addition unit 1200.
- a cycle is a fixed clock cycle that is a reference for processing (the same applies hereinafter).
- the control information generation unit 3000 includes a counter storage unit 3010, which stores a counter. Control information is generated according to the value of this counter.
- the counter storage unit 3010 also stores the latest generated control information.
- control information generated here is the basis for controlling the operation of each PE.
- the instruction generation unit 3100 receives the instruction information issued from the instruction issue unit 1003 and the control information issued from the control information generation unit 3000, and the PE of the PE array 1100 for one row (PE 00 to PE05). Generates an instruction to control the arithmetic processing.
- control information generation unit 3000 When generating a command here, it creates control information for itself based on the control information received from the control information generation unit 3000, and creates a PE command based on the created control information. Or create it before sending out the control information. In other words, the received control information is different from the control information sent out.
- the received control information is the same as the control information to be sent out.
- the control information is used as it is, but in the fourth embodiment described later, the received control information is processed and sent to the next command generation unit.
- the command generation unit 3100 includes a control information storage unit 3110 and stores the control information received from the control information generation unit 3000. Before storing new control information, the control information stored here is transmitted to the instruction generation unit 3200, and the instruction generation unit 3200 generates an instruction based on the received control information. .
- the instruction generation unit (3200, 3300, 3400, 3500) has the same function as the instruction generation unit 3100, but the previous instruction generation unit is not the control information issued by the control information generation unit 3000. The difference is that the control information received from is used to generate instructions to control the arithmetic processing of the arithmetic units (PE10 to PE15, etc.) in each row.
- the instruction generation unit (3200 to 3500) stores control information in each control information storage unit (3210, 3310, etc.), and sends control information to be sequentially stored.
- PE00 (2000) includes a calculation unit 2100, a target data storage unit 2200, a reference data storage unit 2300, and a correlation storage unit 2400.
- the target data storage unit 2200 stores 1-pixel data of the target image 200. Specifically, each of 30 PEs (see Fig. 4) stores data of 30 pixels (see Fig. 2 (b)) of the target image 200.
- the target data storage unit 2200 of PE00 stores data of the pixel “T (0,0)”
- the target data storage unit 2200 of PE10 stores data of the pixel “ ⁇ (1,0)”.
- the reference data storage unit 2300 performs fe data for one pixel (see FIG. 2A) of the reference image 100. .
- the target data storage unit 2200 stores different pixel data for each PE, and the force reference data storage unit 2300 is different for each PE column. That is, the reference data storage unit 2300 of PEs in the same column stores the same data.
- the reference data storage unit 2300 reads and stores reference data stored in the memory cache 1005.
- the reference data storage unit 2300 of PEOO and PE10 stores the data of pixel “R00”.
- the reference data storage unit 2300 of PE01 and PE11 stores the data of pixel “R10”. I remember.
- the computing unit 2100 calculates the strength of the correlation between the target data stored in the target data storage unit 2200 and the reference data stored in the reference data storage unit 2300, and the correlation storage unit 2400 I remember this.
- the strength of the correlation is judged by using SAD (Sum of Absolute Difference).
- the correlation storage unit 2400 stores the received data passed from the PE of the previous row as much as the result data of the computation unit 2100 of the own PE. It also has the function of sending stored data to the PE in the next row (see bus 1009). Details of this will be described later with reference to FIG.
- Adder 1200 outputs the value obtained by adding the outputs of the PE power of the last row as well as the power of array type arithmetic unit 1000.
- This value is a correlation value between the target image 200 and the reference image 100. The correlation is the strongest, that is, the sum of the absolute differences is the smallest, and the positional force moving image vector of the reference image is obtained.
- FIG. 6 is a diagram showing a method of supplying the reference image 100 to the eyelid array 1100.
- the cocoon array 1100 is obtained by arranging the target image 200 of FIG. 2B on the PE array of the array type arithmetic unit 1000. Specifically, it is stored in the target data storage unit 2200 (see FIG. 5).
- the target image 200 is held on the PE array of the array type arithmetic unit 1000, and the reference image 100 is supplied for each line of 6 pixels across.
- the first line (R00 to R50) of the reference image 100 is supplied to the cycle “Cyc 0” 1 ⁇ 1
- the same reference data is supplied to the PE array 1100 in units of columns. Specifically, it is stored in the reference data storage unit 2300 (see FIGS. 5 and 7).
- FIG. 7 is a diagram showing a method for supplying control information to an instruction generation unit (3100, etc.).
- control information is supplied from the array-type arithmetic unit 1000 in a time series from “Cycle 0” to “Cycle 3”, and the state of the operation is described.
- control information generated by the control information generation unit 3000 is represented as “tokenO”, “tokenl”, etc.
- the PE and instruction generation unit (3100, etc.) represent each storage unit and its contents.
- a dotted line arrow indicates transmission of the contents of the storage unit.
- control information “tokenO” generated by the control information generation unit 3000 is stored in the control information storage unit 3110 of the instruction generation unit 3100.
- the control information is generated by the control information generation unit 3000 in the previous cycle, and the power stored in the counter storage unit 3010 is “tokenl”. Is described.
- PE00 and PE01 perform an operation, and store the result in the correlation storage unit 2400. .
- control information “tokenl” generated by the control information generation unit 3000 is stored in the control information storage unit 3110 of the instruction generation unit 3100, and the control information storage unit 3120 of the instruction generation unit 3200. Stores the control information “tokenO” stored in the control information storage unit 3110 of the instruction generation unit 3100.
- the instruction generating unit (3100, etc.) Based on the control information “tokenO” and the like and the instruction “InstO” issued by the instruction issuing unit 1003, the instruction generating unit (3100, etc.) generates an instruction to be sent to each row of the PE array.
- FIG. 8 is a diagram showing transition of contents stored in the correlation storage unit 2400 of each PE.
- the contents stored in the correlation storage unit 2400 of each PE are sent to the correlation storage unit 2400 of the PE in the next row in time series, and the state of the operation is described! /
- the contents of the correlation storage unit 2400 include two types of data. One is the computation result data 2410 of the computation unit 2100 of the own PE, and the other is the reception data 2420 sent from the PE in the previous row.
- the sum of the computation result data 2410 and the received data 2420 in which the PE of the last row, here PE20, and the force is also sent is a correlation value 2401 of one row of the target image and the reference image.
- the sum of the correlation values sent from the PE column in the last row is the correlation value 2402 of the reference image shifted by 1 pixel from the target image.
- the correlation values for one row of the reference image 100 and the target image 200 that are sequentially shifted by one pixel in the Y direction are output.
- the correlation value between the target screen and the reference screen can be obtained by summing up the output of each row.
- the functional unit that executes these processes is composed of a combinational sequential circuit and executes the following processes (the same applies to FIG. 18 and the like).
- FIG. 9 is a flowchart showing processing for obtaining the correlation between the target image 200 and the reference image 100 by the array type arithmetic unit 1000.
- the reference image 100 one macroblock, that is, a position having the highest correlation with the target image 200 is obtained by shifting the reference image 100 by one pixel.
- each target data is read from the target image 200 to the target data storage unit 2200 of each PE (step S100, see FIG. 6).
- a value is set in the counter storage unit 3010 of the control information generation unit 3000 (step S110).
- the value set here is the number of rows of the reference image 100. For example, “8” is set.
- the leading address for one row supplied to the array type arithmetic unit 1000 is loaded into the register 0 (step S120).
- the load destination is not limited to register 0, but depends on the system. For example, when supplying “R00”, “R10” to “R50” (see FIG. 6), the address of the pixel data “R00” stored in the memory cache 10 05 is loaded. Remember! If not, read the memory cache.
- step S130 a process for taking a correlation with the target image 200 is executed (step S130). This process is executed by the instruction issuing unit 1003 issuing “exec_array” as an instruction.
- the correlation between one row of the reference image 100 and all the rows of the target image 200 is obtained.
- the reference data supplied in the 0th cycle 101 of FIG. 6 and the target image 200 on the PE array 1100 are calculated.
- Step S 150 If the calculation is not performed up to the last row of the reference image 100 (Step S 150: NO), the address of the next row, for example, the address of the pixel data of “R01” is set in the register 0 and the process is repeated (Step S 120 forces are also step S 140).
- the calculation up to the last row of the reference image 100 means that the calculation of the target image T (x, 0) and the reference image R (x, 8) is completed.
- “Exec_array” is processed 13 times, the number of lines of the image and the number of lines of the target image.
- step S150 When the calculation is completed up to the last row of the reference image 100 (step S150: YES), the process moves to the next column and the calculation is performed (step S110 to step S150).
- Step 120 Load the address of the pixel data of "R10" of "R10” "R20” to “R60” for 6 pixels with the reference image shifted to the right by 1 pixel to register 0 (Step 120) .
- the process is terminated. This completes the calculation of the target image 200, which is one microblock, and the motion vector is also calculated for the place force with the strongest correlation output in the calculation result output (step S140).
- FIG. 10 is a flowchart showing the processing of “exec_array”.
- control information generation unit 3000 generates new control information (token), and generates an instruction generation unit. (3100, etc.) sends the stored control information to the next command generator (step S210).
- the instruction generator that received the control information receives the exec-array issued from the instruction issuer 1003.
- the command is generated from the command and the control information stored in the control information storage unit (3110, etc.), and transmitted to the PE of the corresponding row (step S220).
- Each PE that has received the generated instruction performs arithmetic processing (step S240).
- control information generation unit 3000 in step S210 and the instruction generation unit (such as 3100) in step 230 will be described later with reference to FIG.
- FIG. 11 is a flowchart showing the processing of the PE.
- step S300 execution
- the memory cache 1005 is referred to and the reference data storage of each PE is performed by referring to the destination indicated by register 0.
- the reference data corresponding to the part 2300 is read (step S305). Specifically, data is read from the memory corresponding to each column of the PE array in which the instruction issuing unit 1003 has written the corresponding data at the time of instruction decoding.
- the calculation unit 2100 obtains an absolute difference between the target data in the target data storage unit 2200 and the reference data in the reference data storage unit 2300 (step S310), and the calculation result is calculated in the correlation storage unit 2400.
- the result data 2410 is stored (see step S320, FIG. 8).
- the operation result data and the received data 2420 are added and sent to the PE of the next row, and the PE of the next row that has received the data stores it in the received data 2420 in its own PE.
- step S300 cancel
- FIG. 12 (a) is a flowchart showing the processing of the control information generation unit.
- an “Invalid” token is generated (Step S41O).
- step S410: ⁇ 0) If the value of the counter “Counter” is not “0” (step S410: ⁇ 0), a “Va lid” token is generated (step S411).
- the generated token is sent to the instruction generation unit 3100 and stored in the control information storage unit 3110.
- Each of the instruction generation units (such as 3100) performs the same processing as described below.
- FIG. 12 (b) is a flowchart showing the processing of the instruction generation unit.
- the token stored in control information storage unit 3110 is transmitted to the next command generation unit (step S450), and the token is received from the previous command generation unit or control information generation unit (step S460). ).
- Step S470 Valid
- Step S470: Invalid an instruction to execute the “exec_array” instruction is generated
- Step S472 If the token is “Valid” (Step S470: Valid), an instruction to execute the “exec_array” instruction is generated (Step S471). If the token is rinvalidj (Step S470: Invalid), the “exec_array” instruction is not executed. A cancel instruction is generated (step S472).
- the generated operation instruction is sent to each PE, and the token is stored in the control information storage unit 3110.
- FIG. 13 is a diagram showing the operations of the token and the PE on the time axis.
- the horizontal axis shows the time axis in cycle units, and shows the operation of the control information generator 3000 and the operations of the first to fifth rows of the PE array 1100! /.
- ⁇ is the token that is the basis of the instruction, and here represents the token stored in the counter storage unit 3010 or each control information storage unit (3110, etc.), “val” indicates Valid, “Iv Is It shall represent Invalid. That is, the PE in the “val” row performs the operation, and the PE in the “Iv” row does not perform the operation.
- the table below shows the calculation results (5200, 5210) and shows the pixels that correlate the target image with the reference image.
- the data “T 00” “ ⁇ 10” “ ⁇ 20” “ ⁇ 30” “ ⁇ 40” “ ⁇ 50” in the first row of the target image 200 arranged in the first row of the PE array 1100 is supplied to the PE array.
- the calculation of the absolute value of the difference between the data ⁇ R00 '', ⁇ R10 '', ⁇ R20 '', ⁇ R30 '', ⁇ R40 '' and ⁇ R50 '' in the first row of the reference image 100 is performed. Passed to the computation element in the second row (see Figure 6 and Figure 7).
- the data ROO to R50 of the first row of the reference image 100 are supplied to the second to fifth rows of the PE array, but no calculation is performed.
- ”“ R11 ”“ R21 ”“ R31 ”“ R41J “R51” are calculated to obtain the absolute difference, and the result is passed to the PE in the second row through the output bus from the calculation element.
- the data “ ⁇ 01”, “ ⁇ 11”, “ ⁇ 21”, “ ⁇ 31”, “ ⁇ 41”, “ ⁇ 51”, and the second row of the reference image are placed in the second row of the PE array.
- FIG. 14 is a diagram illustrating an example of a program.
- the program instruction list 5300 describes the operation 5302 for each program instruction 5301.
- an operation according to the value of the control information (token) is shown.
- ⁇ (1” is described as “ ⁇ & 1”
- “Valid” is described as “val” (same as in FIG. 16, FIG. 23, and FIG. 30).
- the token “Invalid” In this case, “exec_array” indicates “nop”, that is, no execution, and “Valid” indicates “exec”, that is, execute.
- “Ld [addr], r0” 5400 is an instruction to load the address of the reference data into the register 0.
- “exec_array rOj is an instruction to perform an operation on the previous reference data pointed to by register 0.
- Embodiment 2 is different from Embodiment 1 in that when the evaluation value of the correlation strength at a certain timing is equal to or greater than a predetermined value, the subsequent calculation is canceled as unnecessary.
- FIG. 15 is a diagram showing the operations of the token and the PE on the time axis.
- a canceling method outputs a signal from a circuit that evaluates the value of SAD to an instruction generation unit (3100, etc.) to generate a cancel instruction indicating that the operation is stopped.
- the unnecessary computation portion 6200 can be stopped and the power consumption can be reduced.
- FIG. 16 is a diagram showing an example of a program.
- the program instruction list 5300 and the like can be executed in the same manner as in the first embodiment (see FIG. 14).
- the present embodiment performs the same calculation as that of the first embodiment, but differs in that the execution speed is increased.
- Fig. 13 which shows the token and PE operations of the first embodiment on the time axis
- the calculation of the target image and the first column of the reference image is performed in cycles “Cyc-l" to "Cyc 12".
- the operation is performed and PE exists.
- the PE that has not performed this calculation is also allowed to perform the calculation.
- FIG. 22 is a diagram showing the token and PE operations of the third embodiment on the time axis. As shown in the figure, the calculation with the first column of the reference image is performed in cycles “Cyc-l” to “Cyc 12” as in the first embodiment (see FIG. 13). The operation of is different in that it starts from the cycle “Cyc 8”.
- Embodiment 1 there are two types of instructions to PE, execution or cancellation, but in this embodiment, “execute with data in the first row” and “execute with data in the second row”. ”And“ Cancel ”can be generated.
- FIG. 17 is a diagram illustrating details of the configuration of the array type arithmetic unit 1000 according to the third embodiment.
- Embodiment 1 The difference from the configuration of Embodiment 1 (see FIG. 5) is that two memory caches are used. Of course, it doesn't have to be physically two.
- Both memory cache 0 (1006) and memory cache 1 (1007) are connected to the reference data storage unit 2300, and each PE can select whether to read data.
- FIG. 18 is a flowchart showing processing for obtaining the correlation between the target image 200 and the reference image 100 by the array type arithmetic unit 1000. This process differs from the process in the first embodiment (see FIG. 9) in that two counters are set and that two lines of reference image data are used.
- each target data is read from the target image 200 into the target data storage unit 2200 of each PE (step S 100, see FIG. 6).
- a value is set in the counter storage unit 3010 of the control information generation unit 3000 (step S501).
- “CounterO” is set to the number of rows of the reference image 100 “8”, and “Counterl” is set to “0”. To do. In this case, “CounterO” is the active counter. If “Counterl” is set to “8”, “Counterl” becomes the active counter.
- step S502 the leading addresses of two rows supplied to the array type arithmetic unit 1000 are loaded into the register 0 and the register 1 (step S502).
- step S130 "exe array” is executed (step S130), and the operation result is output (step S140).
- step S140 The calculation process is repeated until the last line of the reference image 100 (from step SI 20 to step S 150).
- FIG. 19 is a flowchart showing the processing of “exec_array”.
- step S503 This is almost the same as 1 (see Fig. 10).
- the force PE processing (step S503) is different.
- FIG. Figure 20 is a flow chart showing PE processing.
- This processing differs from the processing in Embodiment 1 (see FIG. 11) in that when the reference data is read into the reference data storage unit 2300 of each PE, memory cache 0 or memory cache 1 is read. is there.
- memory cache 0 or memory cache 1 is read. is there.
- FIG. 22 from cycle “Cyc 9” to “Cyc 11”, data for two rows of reference images is required, and it is necessary to specify which data to read for each row of the PE array.
- step S300 execution
- memory cache 0 1006 pointed to by register 0 or memory cache 1 (1007 pointed to by register 1)
- step S504 To read the reference image into the reference data storage unit 2300 of each PE.
- the calculation unit 2100 obtains an absolute difference between the target data in the target data storage unit 2200 and the reference data in the reference data storage unit 2300 (step S310), and stores the calculation result as a correlation.
- the result is stored in the calculation result data 2410 of the part 2400 (step S320).
- the operation result data and the received data 2420 are added and sent to the PE of the next row, and the PE of the next row that has received the data is stored in the received data 2420 in its own PE.
- step S300 cancel
- FIG. 21 illustrates the processing of the control information generation unit 3000 and the processing of the instruction generation unit (3100, etc.).
- FIG. 21A is a flowchart showing the processing of the control information generation unit 3000 of the third embodiment.
- three types of commands are generated with the three types of control information.
- the control information generation unit 3000 generates tokens as control information using two counters.
- a token is generated based on the values of the two counters “CounterO” and “Counterl” (step S510).
- Control information generation section 3000 generates control information indicating the execution of computation for a period of "CounterO"> 0 or "Counterl”> 0.
- the means for determining the active counter may be a signal from the instruction issuing unit 1003, or when one counter completes counting, it is no longer active and the other is active. Good. The latter method is used here.
- FIG. 21 (b) is a flowchart showing the processing of the instruction generation unit (3100, etc.).
- the instruction generation unit (3100, etc.) performs the same processing as in the first embodiment.
- the token stored in the control information storage unit 3110 is transmitted to the next instruction generation unit (step S550), and the token is received from the previous instruction generation unit or the control information generation unit (step S560). .
- step S570 Based on the received token (step S570), a command to be executed by the PE is generated.
- step S571 If the token is "Invalid”, the "exec_array” instruction is not executed and an instruction is generated (step S571). If rValid.selOj is used, the "exec_array” instruction is executed using “data_selO". However, in the case of “Valid, sell”, an instruction for executing the “exec_array” instruction is generated using “data_sell” (step S573).
- the generated operation instruction is sent to each PE (step S575), and the token is stored in the control information storage unit 3110 (step S580).
- Figure 22 shows the token and PE operations on the time axis.
- the PEs in the first and second lines execute the instruction generated by the “Valid, sell” token
- the PEs in the fourth and fifth lines are “ The instruction generated with the “Valid, selO” token is executed.
- the PE in the third row is not executed, and the correlation storage unit 2400 of each PE is cleared. Sequentially, PEs are canceled (7100), resulting in delimitation of the reference image sequence.
- FIG. 23 is a diagram illustrating an example of a program according to the third embodiment.
- the program instruction list 7300 describes the operation 7302 for each program instruction 7301. It is listed. In addition, an operation according to the value of the control information (token) is shown. For example, in the case of the token “Invalid” 7303, ⁇ 6 (; _ && ”is ⁇ 0”, that is, the execution is not performed. In the case of ⁇ (1 ⁇ 10) 7304, “execute using € ⁇ & _ 3610” “In the case of V alid.sellj 7305, it indicates“ execute using data_sell ”! /.
- Lofp [addr], rO, rl” 7400 is an instruction to register the address of the reference image row in register 0 and register 1.
- register 0 is loaded with the address indicated by [addr]
- register 1 is simultaneously loaded with the address indicated by [addr] + of fset.
- This offset is a difference value from the address of a certain row data, and may be given in advance or generated at an appropriate time. As an example given in advance, there is an address difference between the last row data of a certain column and the first row data of the next column in the reference image.
- execution_array r0 rlj 7401 is an instruction to perform an operation using two reference image lines indicated by the register 0 and the register 1".
- the evaluation is performed using a bow I for a certain period of time, rather than obtaining the correlation with the reference image using all the pixels of the target image 200.
- This method is effective for reducing the amount of calculation, and is particularly effective for a battery-driven mopile device having a finite power.
- the correlation is obtained by thinning the target image into a checkered pattern, that is, using every other pixel in a grid pattern.
- FIG. 24 is a diagram illustrating an example of the target image and the reference image supplied to the PE array according to the fourth embodiment.
- the target image 8200 and the target image 8210 are arranged on the PE array 1100, that is, stored in the target data storage unit 2200 of the PE.
- the target image 8200 and the target image 8210 are the same. [0111] Of the two target images (8200, 8210), only the pixel data to be calculated is arranged, and the target image (8201, 8011) is created on the PE array 1100.
- the reference image 100 creates two reference images (8011, 8021) by combining the odd-numbered and even-numbered reference data for two rows (8010, 8020).
- one row having 7 pixel power is supplied as one row 8010 having 6 pixels and one row 8020 having 6 pixel power shifted by 1 pixel. This makes it possible to search for two horizontal positions at the same time.
- Two-stage reference data 8011 is created from the reference data 8010, and two-stage reference data 8021 is created from the reference data 8020, and an odd-numbered reference data 8100 and an even-numbered reference data 8101 are created.
- the configuration of the array type arithmetic unit 1000 of the present embodiment is the same as that of the third embodiment (see FIG. 17).
- FIG. 25 is a flowchart showing processing for obtaining the correlation between the target image 200 and the reference image 100 by the array type arithmetic unit 1000.
- This processing differs from the processing in Embodiment 1 (see FIG. 9) in that the target data to be set in the PE is thinned out, and the two reference data lines are divided into odd and even numbers. The point is to set the memory cache 0 and memory cache 1 separately.
- Embodiment 3 This is the same as Embodiment 3 in that two lines of reference data are used. However, in Embodiment 3, 2 The power to set the second row to another memory cache while using the first row because there is a period of using the data for the row at the same time.
- the data in the two memory caches is the same Since it is used for a period, it is different in that it is set at the same time.
- the data in the memory cache is used alternately.
- each target data is read from the target image 200 to the target data storage unit 2200 of each PE (step S601).
- the target images (8201, 8011) in FIG. 24, which are target images obtained by thinning the target image 200 in a pinecone pattern, are set.
- a value is set in the counter storage unit 3010 of the control information generation unit 3000 (step S110). For example, “8” is set in “Counter”.
- the addresses of two rows of data supplied to the array type arithmetic unit 1000 are loaded into the register 0 and the register 1 (step S602).
- the address of reference data 8100 in FIG. 24 is loaded into register 0, and the address of reference data 8101 is loaded into register 1.
- step S130 “exe array” is executed (step S130), and the calculation result is output (step S140). The calculation process is repeated until the last line of the reference image 100 (from step SI 20 to step S 150).
- step S160 When the calculation is performed up to the last column of the reference image 100 (step S160), the process is terminated.
- FIG. 26 is a flowchart showing the processing of “exec_array”.
- step S603 This is almost the same as 1 (see Fig. 10).
- the processing of force PE (step S603) is different.
- FIG. Figure 27 is a flow chart showing PE processing.
- This process is different from the process in Embodiment 1 (see FIG. 11) in that the reference data storage unit 2300 of each PE reads the reference data from the memory cache 0 and the memory cache 1. It is a point to read alternately.
- the target data of “T01”, “ ⁇ 20”, “ ⁇ 40” and so on is set in the PE of the first row of the PE array 1100, so refer to “R0y”, “R2y”, and “R4y”
- the target data from “T11” “ ⁇ 31” “ ⁇ 51” is set in the PE of the second row of PE array 1100. This is because it is necessary to perform operations with reference data “Rly”, “R3y”, and “R5y”.
- step S300 execution
- the memory cache 0 (1006) pointed to by register 0 or the memory cache 1 (1007 pointed to by register 1)
- the reference data is read into the reference data storage unit 2300 of each PE (step S604).
- the calculation unit 2100 obtains an absolute difference between the target data in the target data storage unit 2200 and the reference data in the reference data storage unit 2300 (step S310), and the calculation result is calculated in the correlation storage unit 2400.
- the result data 2410 is stored (step S320). After that, the operation result data and the received data 2420 are added and sent to the PE of the next row, and the PE of the next row that has received the data stores it in the received data 2420 in its own PE.
- step S300 cancel
- FIG. 28 illustrates the processing of the control information generation unit 3000 and the processing of the instruction generation unit (3100, etc.).
- FIG. 28 (a) is a flowchart showing processing of the control information generation unit 3000 of the fourth embodiment.
- a token is generated based on the value of the counter “Counter” (step S610).
- the counter value is decremented by 1 (step S620).
- FIG. 28 (b) is a flowchart showing the processing of the instruction generation unit (3100, etc.).
- the instruction generation unit (3100, etc.) performs the same processing as in the first embodiment.
- the token stored in the control information storage unit 3110 is transmitted to the next instruction generation unit (step S650), and the token is received from the previous instruction generation unit or the control information generation unit (step S660). ).
- step S670 If the token is “Invalid” (step S670), do not execute the “exec_array” instruction! Generate an instruction (step S671), and if rvalid, sel0j, execute the “exec_array” instruction using “data_sel0” (Step S672), and in the case of “Valid, sell”, an instruction to execute the “e X ec_array” instruction using “data_sell” is generated (step S673).
- the generated operation instruction is sent to each PE (step S685), and the token is stored in the control information storage unit 3110 (step S690).
- Figure 29 shows the token and PE operations on the time axis.
- the valid token issued by the control information generator 3000 is generated based on either “Valid, selO” or “Valid, sell” for the PE of each line of the force PE array that is “Valid, selO”.
- the ordered command will be passed.
- the token in the first line of the PE array is “Valid, selO”.
- the token on the second line is “Valid, sel 1” obtained by inverting the token on the first line, and the token on the first line is “Valid, selO”.
- 1st line PE, 3rd line PE and 5th line PE were generated with "Valid, selO” tokens, 2nd line PE and 4th line PEs with "Valid, sell” tokens. Execute the instruction.
- FIG. 30 is a diagram illustrating an example of a program according to the fourth embodiment.
- the program instruction list 8600 describes an operation 8602 for each program instruction 8601. In addition, an operation according to the value of the control information (token) is shown.
- "ld [addr], rO, rl" 8700 is an instruction to load the address of the next reference image into register 0 and register 1. Specifically, register 0 is loaded with the address indicated by [addr], and register 1 is simultaneously loaded with the address indicated by [addr] + offset. For example, when the reference data 8100 and the reference data 8101 in FIG. 24 are continuously present in the memory, [addr] is the address of the reference data 8100, and offset is the length of the reference data 8100.
- “Exec_array rOrl” 8701 is an instruction to perform an operation using the previous reference data pointed to by register 0 or register 1.
- Exec_array rO rl 8701 and “exec_array rO rl” 8702 have the same instruction, but which register is used depends on the token.
- the array type arithmetic unit according to the present invention has been described based on the embodiment.
- the execution device can be partially modified, and the present invention is not limited to the above-described embodiment. Of course. That is,
- each PE in the PE array is adjacent in the row direction and is operated by providing an instruction generation unit for each row, but is adjacent to each other including the column direction and the diagonal only in the row direction. Connect each PE and set up an instruction generator.
- an instruction generated based on a token can be sent to any PE in the PE array.
- the data input destination used by each PE can be dynamically changed by register settings and tokens, and PEs that execute instructions can be determined, that is, the scope of application can be determined. Execution becomes possible.
- the PE array is realized by hardware, but may be realized by using dynamically reconfigurable hardware.
- the dynamically reconfigurable hardware means that the hardware logical structure can be dynamically changed by giving configuration information to the programmable wiring that connects each logic of the hardware. .
- the reference data is converted by inverting the token.
- information indicating whether the conversion circuit is in an even position or an odd position may be fixed. Snow That is, the register to be read is fixed for each row of the PE array.
- the array-type arithmetic device that is useful in the present invention can realize flexible and high-performance processing with a simple device, and is particularly useful as an arithmetic unit of an image processing LSI.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computing Systems (AREA)
- Image Processing (AREA)
- Multi Processors (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/572,701 US7606996B2 (en) | 2004-08-04 | 2005-08-02 | Array type operation device |
JP2006531472A JP4213750B2 (ja) | 2004-08-04 | 2005-08-02 | アレイ型演算装置 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004-227927 | 2004-08-04 | ||
JP2004227927 | 2004-08-04 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2006013839A1 true WO2006013839A1 (ja) | 2006-02-09 |
Family
ID=35787124
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2005/014077 WO2006013839A1 (ja) | 2004-08-04 | 2005-08-02 | アレイ型演算装置 |
Country Status (4)
Country | Link |
---|---|
US (1) | US7606996B2 (ja) |
JP (1) | JP4213750B2 (ja) |
CN (1) | CN100458762C (ja) |
WO (1) | WO2006013839A1 (ja) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9965824B2 (en) * | 2015-04-23 | 2018-05-08 | Google Llc | Architecture for high performance, power efficient, programmable image processing |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH01295335A (ja) * | 1988-05-23 | 1989-11-29 | Fujitsu Ltd | 負荷分散方式 |
JPH0218687A (ja) * | 1988-07-06 | 1990-01-22 | Nec Software Ltd | パイプラインプロセッサ制御方式 |
JPH03268054A (ja) * | 1990-03-19 | 1991-11-28 | Fujitsu Ltd | 高速並列処理システム |
JPH04120652A (ja) * | 1990-09-11 | 1992-04-21 | Matsushita Graphic Commun Syst Inc | 並列処理装置 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5768561A (en) * | 1992-06-30 | 1998-06-16 | Discovision Associates | Tokens-based adaptive video processing arrangement |
US5659785A (en) | 1995-02-10 | 1997-08-19 | International Business Machines Corporation | Array processor communication architecture with broadcast processor instructions |
JPH08297650A (ja) * | 1995-04-25 | 1996-11-12 | Nippon Steel Corp | アレイプロセッサ |
US7082516B1 (en) * | 2000-09-28 | 2006-07-25 | Intel Corporation | Aligning instructions using a variable width alignment engine having an intelligent buffer refill mechanism |
-
2005
- 2005-08-02 US US11/572,701 patent/US7606996B2/en active Active
- 2005-08-02 CN CNB2005800263326A patent/CN100458762C/zh not_active Expired - Fee Related
- 2005-08-02 WO PCT/JP2005/014077 patent/WO2006013839A1/ja active Application Filing
- 2005-08-02 JP JP2006531472A patent/JP4213750B2/ja active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH01295335A (ja) * | 1988-05-23 | 1989-11-29 | Fujitsu Ltd | 負荷分散方式 |
JPH0218687A (ja) * | 1988-07-06 | 1990-01-22 | Nec Software Ltd | パイプラインプロセッサ制御方式 |
JPH03268054A (ja) * | 1990-03-19 | 1991-11-28 | Fujitsu Ltd | 高速並列処理システム |
JPH04120652A (ja) * | 1990-09-11 | 1992-04-21 | Matsushita Graphic Commun Syst Inc | 並列処理装置 |
Also Published As
Publication number | Publication date |
---|---|
US20080282061A1 (en) | 2008-11-13 |
JPWO2006013839A1 (ja) | 2008-05-01 |
CN101010671A (zh) | 2007-08-01 |
JP4213750B2 (ja) | 2009-01-21 |
CN100458762C (zh) | 2009-02-04 |
US7606996B2 (en) | 2009-10-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108073549B (zh) | 卷积运算装置及方法 | |
JP2008310649A (ja) | パイプライン演算装置 | |
TWI441091B (zh) | 借助圖形處理單元執行影像信號處理的方法以及用於執行影像信號處理的裝置 | |
JP2011035655A (ja) | フレームレート変換装置、およびそれを搭載した表示装置 | |
KR20180045029A (ko) | 배선 복잡성이 감소된 시프트 레지스터 | |
WO2006013839A1 (ja) | アレイ型演算装置 | |
JP4625903B2 (ja) | 画像処理プロセッサ | |
JPH11196425A (ja) | 動きベクトル検出装置 | |
CN102348090B (zh) | 帧插补装置 | |
JP3821198B2 (ja) | 信号処理装置 | |
JP4408113B2 (ja) | 信号処理方法 | |
JP2004356673A (ja) | 動きベクトル検出方法及び同方法を用いた画像処理装置 | |
US9277168B2 (en) | Subframe level latency de-interlacing method and apparatus | |
JPH0851627A (ja) | シストリックアーキテクチャ内に配置される「n+1」の演算子にオペランドを供給するための装置 | |
JP2012059131A (ja) | Simd型マイクロプロセッサ及びその処理方法 | |
TWI616840B (zh) | 卷積運算裝置及方法 | |
JP2007329858A (ja) | 動画像表示装置、動画像表示方法及びプログラム | |
JP3642010B2 (ja) | フレームシンクロナイザ | |
JP3704519B2 (ja) | 命令解読のための複数のソース | |
JP3352558B2 (ja) | 信号処理装置 | |
JP2004080295A (ja) | 動きベクトル検出装置および動きベクトル検出方法 | |
JP2925842B2 (ja) | パイプライン処理装置 | |
JP2003259220A (ja) | イメージデータ処理装置、撮像システム、イメージデータ処理方法、コンピュータプログラム、及びコンピュータ読み取り可能な記憶媒体 | |
JP5511400B2 (ja) | 並列信号処理プロセッサ | |
JPH10312454A (ja) | Simd制御並列プロセッサおよび演算方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2006531472 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 200580026332.6 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase | ||
WWE | Wipo information: entry into national phase |
Ref document number: 11572701 Country of ref document: US |