WO2007116560A1 - 並列画像処理システムの制御方法および装置 - Google Patents
並列画像処理システムの制御方法および装置 Download PDFInfo
- Publication number
- WO2007116560A1 WO2007116560A1 PCT/JP2006/324213 JP2006324213W WO2007116560A1 WO 2007116560 A1 WO2007116560 A1 WO 2007116560A1 JP 2006324213 W JP2006324213 W JP 2006324213W WO 2007116560 A1 WO2007116560 A1 WO 2007116560A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- element processor
- instruction
- execution
- pixels
- register
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 25
- 238000006243 chemical reaction Methods 0.000 claims abstract description 86
- 230000003252 repetitive effect Effects 0.000 claims abstract description 34
- 230000006870 function Effects 0.000 claims description 7
- 230000003044 adaptive effect Effects 0.000 abstract 1
- 101100136062 Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv) PE10 gene Proteins 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 3
- 101100243454 Caenorhabditis elegans pes-10 gene Proteins 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3005—Arrangements for executing specific machine instructions to perform operations for flow control
- G06F9/30065—Loop control instructions; iterative instructions, e.g. LOOP, REPEAT
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/34—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
- G06F9/345—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes of multiple operands or results
Definitions
- the present invention relates to a parallel image processing system, and more particularly to a control method and apparatus for a parallel image processing system having a one-dimensional SIMD (Single Instruction Multiple Data) processor array structure.
- SIMD Single Instruction Multiple Data
- Patent Document 1 An example of a conventional parallel image processing system is described in Japanese Patent No. 2839026 (Patent Document 1) and Japanese Patent Application Laid-Open No. 2002-7359 (Patent Document 2).
- This conventional parallel image processing system is composed of a PE array in which a large number of processor elements (hereinafter referred to as “PE”) are connected in parallel in one dimension and a controller for controlling them.
- PE processor elements
- Each PE is also configured with an arithmetic unit (ALU) that performs arithmetic processing, a local memory that stores local pixel values of the processing target image, and a register that holds temporary arithmetic results.
- ALU arithmetic unit
- JP 2004-362086 A discloses that the degree of parallelism of PEs is improved in order to improve program efficiency when the system does not have an optimal number of PEs for images to be processed.
- a SIMD-type parallel processing system with a function that automatically executes PE instructions repeatedly is described.
- the number of iterations is calculated from the degree of parallelism information instructed by the program and the degree of parallelism of the SIMD computing unit, and the PE instruction is automatically repeated as many times as required according to the number of PEs in the system. Realize execution is doing.
- Patent Document 1 Japanese Patent No. 2839026 (FIG. 1, paragraph 0008, etc.)
- Patent Document 2 Japanese Patent Laid-Open No. 2002-7359 (FIG. 1, paragraphs 0014 to 0016, etc.)
- Patent Document 3 Japanese Patent Application Laid-Open No. 2004-362086 (paragraphs 0011 to 0021, FIG. 1, etc.) Disclosure of Invention
- the pixels in the memory when processing is performed by assigning a plurality of pixels to each PE in which the number of PEs is smaller than the number of pixels in the width direction of the processing target image. No consideration is given to the deployment of data and the automation of data acquisition across PEs.
- An object of the present invention is to provide a novel parallel image processing system and a control method thereof that enable automation of processing that requires neighboring pixel values.
- Another object of the present invention is to achieve the same degree as that of a one-dimensional SIMD parallel image processing system having the same number of PEs as the number of pixels in the width direction of the processing target image even in processing that requires neighboring pixel values.
- An object of the present invention is to provide an image processing system control method and apparatus capable of executing image processing on an entire image with a program code amount.
- the present invention is a control method for a parallel image processing system that has a smaller number of element processors than the number of data to be processed, and each element processor processes a plurality of data to be processed. Instructions are automatically executed repeatedly according to the number of data to be processed, and when a given instruction is executed repeatedly, the operation code is replaced according to the data to be processed at each execution during the repetition, and the element processor is replaced by the replaced operation code. It is characterized by controlling.
- a one-dimensional processor having a smaller number of element processors than the number of pixels in the width direction of the processing target image, and each element processor processing a plurality of pixels.
- An image processing system control method that automatically repeats an instruction according to the number of pixels assigned to an element processor, and repeatedly executes a pixel value acquisition instruction for an adjacent pixel. Accordingly, the operation code is replaced with a combination of a pixel value transfer instruction from the adjacent element processor and a pixel value acquisition instruction on the own element processor, and the element processor is controlled by the replaced operation code.
- the register group is divided and used in accordance with the number of pixels allocated to each element processor, the operand is rewritten so as to switch the register group used at the time of repeated execution, and the memory access instruction is repeatedly executed at the time of repeated execution.
- An offset value corresponding to the number of pixels allocated to the element processor is added to the address.
- a parallel image processing system automatically repeats an instruction at each address in a program according to the number of pixels assigned to each PE in a controller that controls the PE array according to the program.
- a single instruction described in the program is automatically and repeatedly executed on a plurality of processing target pixels assigned to each PE, and an opcode is converted.
- the required processing can be automated, and image processing can be performed on the entire image with the same amount of program code as a one-dimensional SIMD parallel image processing system that has the same number of PEs as the number of pixels in the width direction of the image to be processed.
- an instruction is automatically and repeatedly executed according to the number of data to be processed assigned to an element processor, and when a predetermined instruction is repeatedly executed, each The opcode is replaced according to the data to be processed at runtime.
- the element processor is controlled by the code.
- the iterative execution means included in the controller, the means for converting the operand, and the means for converting the memory address It can be automatically and repeatedly executed only by the program description of the instruction for one pixel, and by converting the operation code, it is possible to automate the processing that requires neighboring pixel values without increasing the amount of program code. Image processing on the entire image can be executed.
- FIG. 1 is a block diagram showing a functional configuration of a parallel image processing system according to an embodiment of the present invention.
- the parallel image processing system according to this embodiment includes a PE array 1 that performs each instruction operation in image processing, a controller 2 that controls the operation of PE array 1 by a program, and a program that describes image processing to be executed. And program memory 3 for storing.
- the PE array 1 has a configuration in which a large number of PEs 10 are arranged and connected in one dimension, and each PE is configured as a SIMD system that executes the same program. However, only three adjacent PEs are shown in Fig. 1 so as not to be complicated!
- Each PE 10 includes a local memory 11, a calculation unit 12, and a register group 13. It is desirable that the number of PEs 10 included in PE array 1 is 1 / integer of the number of pixels in the width direction of the processing target image. This is because the same number of pixels can be assigned to each PE10. However, the present invention can be applied even when the condition of 1 / integer is not satisfied, and the same effect can be obtained.
- the controller 2 is a unit that controls the operation sequence of the PE array 1 and includes an instruction fetch / decode unit 21 and an iterative execution unit 22. Iterative execution unit 22 is implemented by controller 2. This unit is used to execute repeatedly according to the number of target pixels assigned to each PE for the instruction to be executed. Operand converter 221, memory address converter 222, opcode converter 223, repeat execution counter 224, an iterative execution designation register 225, and a processing target image height register 226.
- the repetitive execution counter 224 is a counter CR that is used when the repetitive execution unit 22 repeatedly executes instructions for the number of times designated by the value of the repetitive execution designation register 225.
- the iterative execution specification register 225 is a register that designates and holds the number of iterations NR when executing the instruction repeatedly.The number of iterations required by the ratio of the number of PEs and the number of pixels in the width direction of the processing target image, i.e., each Specify the number of target pixels assigned to PE.
- the processing target image height register 226 is a register that stores and holds the number of pixels NH in the height direction of the processing target image, and this value stores the processing target image when the memory access instruction is repeatedly executed. Used to calculate the address offset value.
- the PE array 1 performs image processing operations by assigning the pixels of the image to be processed to each PE10 and operating the same instructions in parallel on each PE.
- the PE 10 is an element processor that stores and calculates the assigned pixel values in the processing target image.
- the PE 10 has a command execution function of a normal processor and performs a command operation according to a control signal from the controller 2.
- Each PE10 is connected to an adjacent PE10 to exchange data.
- the local memory 11 is a memory that each PE 10 has individually.
- the local memory 11 is closely connected to the arithmetic unit 12 and stores the pixel values assigned to each PE in the processing target image. Stores calculation results and the like.
- Each pixel value assigned to each PE of the processing target image is stored in an address having an offset corresponding to the number of pixels in the height direction of the processing target image.
- the address offset value at this time that is, the number of pixels in the height direction of the processing target image is Om. For example, if the leftmost pixel value is stored at address A, the pixel values assigned to PE are stored at addresses A, A + Om, A + 2 X Om, and A + 3 X Om, respectively.
- the arithmetic unit 12 is a unit that executes a read Z write instruction for the register group 13, a read Z write instruction for the local memory 11, an arithmetic operation instruction, a logical operation instruction, and the like. Yes, in the image processing, an operation is performed on the assigned pixel value.
- the register group 13 is an arithmetic register that the arithmetic unit 12 in each PE 10 individually has, and stores a value that is input during an arithmetic operation and an arithmetic result that is output.
- Each register is given a register number from R (O) force to R (number of registers-1).
- R (O) force to R number of registers-1).
- the number of included registers is divided by the assigned number of pixels. For example, when two pixels are assigned to each PE10, the number of registers in the register group 13 is divided into two parts, the first half and the second half, and four pixels are assigned to each PE10. In this case, the number of registers in register group 13 is divided into four.
- the controller 2 advances image processing by sequentially reading and interpreting the program stored in the program memory 3 by the instruction fetch / decode unit 21 and controlling the PE array 1.
- the instruction fetch / decode unit 21 is a core unit of the controller 2 that reads the program from the program memory 3, interprets the opcode and the operand, and controls the PE array 1.
- the opcode and operand read from the program memory 3 are delivered to the iterative execution unit 22 to make a decision on repetitive execution. If the iterative execution unit 22 determines that iterative execution is to be performed, reading of the subsequent program is stopped until the specified number of iterations are completed.
- the PE array 1 is controlled by using the opcode and the operand rewritten according to each step of the repeated execution by the repeated execution unit 22.
- the iterative execution unit 22 determines whether the opcode input from the instruction fetch / decode unit 21 is an instruction to be repeatedly executed. If it is a target instruction, the iterative execution counter 224 is used. NR is repeatedly executed the number of times specified in the repeat execution specification register 225. Also, until the repeated execution is completed, the instruction fetch / decode unit 21 is instructed to stop reading the subsequent instruction from the program memory.
- the number of times NR specified in the repetitive execution specification register 225 that is, the number of processing target pixels assigned to each PE, and the pixel position in the processing target pixel group that can be calculated by the value CR of the repetitive execution counter 224
- the memory address conversion unit 222 converts operands, memory addresses, and opcodes.
- the operand conversion unit 221 is a unit that converts the register number in order to switch the register use portion in the register group 13 according to the processing target pixel position during the repeated execution.
- the number of registers in register group 13 in PE array 1 is divided by the number of repeated executions NR stored in iteration execution specification register 225, and the number of registers is divided into iteration execution counter 224.
- the value obtained by multiplying the stored value from 0 to (NR – 1) is calculated as the offset value used for register position switching.
- FIG. 2 is a block diagram functionally showing the configuration of the operand conversion unit in the parallel image processing system according to the present embodiment.
- the operand conversion unit 221 determines an offset value to be used for register number conversion using an offset calculation table with the repeated execution designation register value NR and the repeated execution counter value CR as a key.
- the offset calculation table 221.1 corresponding to the number of iterations specified by the value NR of the iteration execution specification register 225 from 1 (no iteration execution) to 4 times is prepared.
- Offset calculation table 221.1 in FIG. 2 is described assuming that the number of registers is N. It is determined whether or not the opcode is an instruction subject to operand conversion (221.6), and based on the result, the power to output the input register number as it is and the offset value determined by the offset calculation table 221.1 are added. (221.5) Select whether to output ( 221.7). That is, if the instruction is an instruction subject to offset value operand conversion, the result of adding the input register number and the offset value determined by the offset calculation table 221.1 is output. If the instruction is not the target instruction, the input register number is output. Output as is.
- the operand conversion unit 221 has a counter value adjustment unit that adjusts the iteration execution counter value CR when referring to the offset calculation table 221.1. This is a unit required when iteratively executing an acquisition command for adjacent pixel values.
- the adjacent PE holds the required adjacent pixel value according to the value CR of the repeat execution counter 224, and transfers the pixel value from the adjacent PE to its own PE.
- the register number specified for the input operand may be offset depending on whether the instruction refers to the pixel value on the left or right side, whether the transfer source is the adjacent PE, or the local PE. Necessary.
- the repetitive execution counter value CR is replaced when referring to the offset calculation table 221.1.
- it is determined whether or not the opcode force counter value CR needs to be adjusted (221.2). If necessary, the counter value CR is adjusted (221.3), and the adjusted result is selected (221.4). ) Refer to offset calculation table 221.1. If it is not necessary to adjust the counter value CR, select the input counter value CR (221.4) and refer to the offset calculation table 221.1.
- the memory address conversion unit 222 is a unit for converting an address to be accessed when a memory access instruction is repeatedly executed.
- the opcode conversion unit 223 executes the PE array 1 when executing the acquisition command of the left and right adjacent pixel values that need to perform different operations depending on the pixel position to be executed in the repeated execution. This is a unit that converts the opcodes executed in. Instruction fetch 'If the opcode input from the decode unit 21 is an acquisition command for the left and right adjacent pixel values, the pixel currently being executed by the values CR and NR in the repeated execution counter 224 and repeated execution designation register 225 The pixel position in the multiple pixels assigned to each PE is calculated, and it is determined whether the neighboring pixel value to be acquired is held in the own PE register and whether it is held in the neighboring PE register. The opecode converted to read the PE register power or transfer the register value from the adjacent PE on the right side is passed to the instruction fetch 'decode unit 21.
- FIG. 3 is a flowchart showing the overall operation of the parallel image processing system according to this embodiment.
- Instruction fetch 'The decode unit 21 reads the program code at the address to be executed from the program stored in the program memory 3, and supplies it to the repeated execution unit 22 (step A1).
- the repetitive execution unit 22 determines whether or not the operation code of the supplied program code is a repetitive execution target instruction (step A2). If it is determined that the instruction is not an instruction subject to repeated execution (NO in step A2), the program code is directly passed to the instruction fetch'decode unit 21, and PE array 1 is controlled to perform instruction processing (step A3, A4).
- step A2 If it is determined that the opcode is an instruction to be repeatedly executed (YES in step A2), the value CR of the repeat execution counter 224 is initialized to 0 (step A5), the operand conversion unit 221 and the memory address conversion unit Hand over the program code to 222 and opcode converter 223 (steps A6, A7, A8).
- Instruction fetch 'The decoding unit 21 receives the program code converted by the operand conversion unit 221, the memory address conversion unit 222, and the operation code conversion unit 223 (step A9), and interprets this to control the PE array 1. Then, processing corresponding to each command is performed (step A10).
- the value CR of the repeat execution counter 224 is incremented by 1 (step All), and compared with the value NR of the repeat execution designation register 225 (step A12). If the value of the repetitive execution counter 2 24 is CR ⁇ the value of the repetitive execution specification register 225 is NR (NO in step A12), the process returns to steps A6, A7, and A8 to perform repetitive execution.
- Repeat execution counter 224 value CR is repeated When the value in the execution specification register 225 becomes equal to NR (YES in step Al 2), it is assumed that the necessary number of iterations have been completed for one step of the input program code, and the next program code is processed. move on.
- steps A6, A7, and A8 executed by the operand conversion unit 221, the memory address conversion unit 222, and the operation code conversion unit 223 will be described in detail.
- FIG. 4 is a flowchart showing the operand conversion operation of the parallel image processing system according to this embodiment.
- the operand conversion unit 221 determines whether or not the operand that is the input source and output destination included in the program code input from the instruction fetch 'decode unit 21 should be converted according to the operation code and operand position. (Step Bl). If the operand is not subject to conversion (NO in step B1), the input register number is output without conversion.
- step B2 it is further determined whether the opcode is an instruction to acquire an adjacent pixel value (step B2).
- the adjacent pixel value acquisition command is repeatedly executed, the adjacent pixel value to be acquired is held by the adjacent PE according to the value CR of the repeat execution counter 224, and from the adjacent PE to the own PE.
- the pixel value transfer is performed, and the case where the own PE holds the adjacent pixel value to be acquired and the register force is referenced with the register number being offset.
- step B3 when the operation code is an instruction to acquire an adjacent pixel value (YES in step B2), it is further determined whether the operation code is an instruction to acquire an adjacent pixel value on the left or right side (step B3).
- the offset calculation table 221.1 is referenced using the value NR of the repeat execution specification register 225 and the value CR of the repeat execution counter 224 adjusted as necessary to determine the operand offset value. (Step B6).
- the result of adding the offset value to the input register number is output as the operand conversion result (step B7).
- FIG. 5 is a flowchart showing the memory address conversion operation of the parallel image processing system according to this embodiment.
- the memory address conversion unit 222 determines whether the input operation code force S memory address conversion target memory read instruction Z memory write instruction or the like (step Cl).
- step C1 If the input opcode is a command to be converted (YES in step C1), the value of the processing target image height register 226, NH, and the value of the repeat execution counter 224, CR, the value of the memory address offset value (Step C2). Then, a value obtained by adding the calculated offset value to the input memory address is output as a converted memory address (step C3). If the input opcode is not a command to be converted (NO in step C1), the address is output without conversion.
- FIG. 6 is a flowchart showing the operation code conversion operation of the parallel image processing system according to this embodiment.
- the operation code conversion unit 223 determines whether or not the operation code input from the instruction fetch 'decoding unit 21 is an acquisition command for left and right adjacent pixel values to be converted (step Dl).
- the input opcode is an acquisition command for left and right adjacent pixel values (YES in step D1)
- the following operations are performed depending on whether the opcode is a right adjacent pixel value acquisition command or a left adjacent pixel acquisition command, respectively. Perform (Step D2).
- the opcode is an instruction to obtain the right adjacent pixel value
- the repeated execution counter value CR (repeated execution designation register value NR—1)
- the register of the own PE is referred to.
- the register-to-register move instruction is output as the converted opcode.
- the repeated execution counter value CR (repetitive execution specification register value NR-1)
- the register holding the leftmost pixel of the right adjacent PE is referenced, so the register value of the right adjacent PE that is the input opcode is used.
- the instruction to be transferred is output as it is (step D3).
- the opcode is an instruction to acquire the left adjacent pixel value
- the register of the own PE is referred to, so the register-to-register move instruction within the same PE is used as the converted opcode.
- the repeat execution counter value CR 0, the register of the left adjacent PE that is the input opcode is output as it is because the register of the left adjacent PE is referred to (step D4).
- step D 1 If the input opcode is not an acquisition command for left and right adjacent pixel values (NO in step D 1), the operation code is not converted, and the operation code as it is is output to the instruction fetch / decode unit 21.
- the operation code conversion unit 223 is provided in the iterative execution unit 22, it is necessary when multiple pixels are assigned to one PE in processing that requires the passing of adjacent left and right pixel values.
- the operations that require different instructions such as the transfer operation between adjacent PEs and the operation using the pixel values held in the own PE, can be described in one batch.
- the amount of program code can be further reduced, and SIMD with different PE numbers It is easy to realize image processing on a parallel image processing system without changing the program.
- a parallel image system having a PE array 1 in which 128 PEs, which are half of the number of pixels in the width direction, are processed in a one-dimensional manner for an image to be processed that is 256 pixels wide by 256 pixels high. Shall be used.
- FIG. 7 is a diagram showing an example of a program for operating the parallel image processing system according to the embodiment of the present invention.
- a process for obtaining an absolute value of a difference from a pixel value adjacent to the right side for each pixel is shown.
- the local memory 11 stores the processing target image from address MEM1.
- the two pixels are stored in addresses having 256 offsets corresponding to the number of pixels in the height direction.
- the pixel value on the left side is stored from address MEM1 to address (MEM1 + 255), and the pixel value on the right side is address from address (MEM1 + 256) with offset value 256 added.
- the processing result image is stored after address MEM2.
- the instruction fetch 'decode unit 21 executes the program code in the first line from the program memory 3
- LD MEM1, R (O) is read out and delivered to the repetitive execution unit 22 (step Al in FIG. 3).
- the operation content of the program code on the first line is to read the value stored in the address MEM 1 of the local memory 11 and store it in the register R (O).
- Iterative execution unit 22 Is an operation code partial force S memory read instruction (LD) of the program code, and is therefore determined to be an instruction to be repeatedly executed, and the repeat execution counter 224 is set to 0 (step A5).
- the operand conversion unit 221 refers to the values NR and CR of the repeated execution designation register 225 and the repeated execution counter 224, and the offset calculation table 221.1 force also obtains 0 as an offset value.
- the result of adding this to the input register number 0 is passed to the instruction foot decode unit 21 as a converted register number (step A6).
- the memory address conversion unit 222 refers to the value CR of the iterative execution counter 224, and does not perform conversion because it is 0, and passes the memory address MEM1 to the instruction fetch / decode unit 21 (step A7).
- the opcode conversion unit 223 is a memory read command (LD) whose input opcode is a memory read command (LD) and is not a command to acquire the left and right neighboring pixel values that require opcode conversion. Fetch 'Hand over to decode unit 21 (step A8).
- the instruction fetch / decode unit 21 operates the PE array 1 based on the opcode, memory address, and operand input from the iterative execution unit 22, and stores the contents of the address MEM1 of the oral memory 11 in the register R (O). (Steps A9 and A10).
- the operand conversion unit 221 refers to the values NR and CR of the iterative execution designation register 225 and the iterative execution counter 224, and obtains 18 as an offset value for the offset calculation table 221.1.
- the result obtained by adding this to the input register number 0 is passed to the instruction fetch / decode unit 21 as a converted register number (step A6). Since the value CR of the iterative execution counter 224 is 1, the memory address conversion unit 222 delivers the address (MEM1 + 256) obtained by adding the offset 256 to the memory address MEM1 to the instruction fetch 'decoding unit 21 (step A7 ).
- the opcode converter 223 is used to read the input opcode Since this is a read instruction (LD) and not an instruction to acquire the left and right adjacent pixel values that require opcode conversion, the opcode conversion is not performed and the input opcode is directly passed to the instruction fetch / decode section 21 (step A8).
- the instruction fetch / decode unit 21 operates the PE array 1 based on the opcode, memory address, and operand input from the iterative execution unit 22, and the contents of the address (MEM1 + 256) in the local memory 11 are stored in the register R ( 18) (Steps A9 and A10).
- the repetitive execution unit 22 increases the value CR of the repetitive execution counter 224 by 1! And sets it to 2 (step All). After that, the value CR of the repeat execution counter 224 is compared with the value NR specified in the repeat execution specification register 225, and since this is the same value, it is determined that the necessary repeat execution has been completed and the program code in the first line is added. The corresponding process is terminated, and the next command process is started (step A12).
- the instruction fetch 'decoding unit 21 executes the program code in the second line from the program memory 3.
- step Al The operation content of the program code on the second line is that the value of register R (O) corresponding to the right pixel is stored in register R (l).
- register R (O) the value stored in register R (O) of the right adjacent PE is transferred to its own PE and stored in register R (l).
- two pixels are assigned to one PE. Therefore, the right pixel is not always held by the right adjacent PE.
- the pixel value is stored in a separate register in the PE.
- the right pixel value acquisition operation reads the register value in its own PE according to the processing target pixel, or transfers the register R (O) of the right adjacent PE to its own PE. Divided into actions.
- the iterative execution unit 22 determines that the operation code is a right side adjacent pixel value transfer instruction, and therefore determines that the operation is an iterative execution target instruction and sets the value of the iterative execution counter 224 to 0, and then sets the operand conversion unit 221 and the opcode conversion unit 223 to Operate (Steps A2 and A5).
- the operand conversion unit 221 performs operand conversion processing for each of the input source register and output destination register specified in the program code (step A6). Since the input opcode is the acquisition command (MVL) for the right adjacent pixel value, the input source register and output destination level Each operand performs a different operand operation. For the input source register, refer to the values NR and CR of the repeat execution specification register 225 and repeat execution counter 224, and obtain an offset value of 18 using the offset calculation table 221.1. 18 that is the result of adding this to the input register number 0 is transferred to the instruction fetch 'decoding unit 21 as the converted input source register number.
- VML acquisition command
- the offset execution counter value CR when referencing the offset calculation table is adjusted to 0 and the offset value obtained by performing offset calculation is added to the input register number 1.
- the result “1” is transferred to the instruction fetch / decode unit 21 as the converted output destination register number.
- the operation code conversion unit 223 determines that the right adjacent pixel value acquisition instruction (MVL) is an instruction that requires operation code conversion, and the value CR of the repeat execution counter 224 is set to (the value NR of the repeat execution designation register 225). Since it is not equal to —1), it is converted to a register-to-register move instruction within the same PE and delivered to the instruction foot decode unit 21 (step A8). Instruction fetch 'Decode unit 21 operates PE array 1, and the contents of register R (18) are stored in register R (l) (steps A9 and A10).
- Operand conversion unit 221 has a right-hand side pixel value transfer instruction (MVL) whose opcode is equal to the value CR of repetitive execution counter 224 (value NR—1 of repetitive execution designation register 225). Conversion is performed so that the content of PE register R (O) is transferred. For this reason, the input source register is set to 0, which is an offset value obtained by performing offset calculation by adjusting the iteration execution counter value CR when referring to the offset calculation table 221.1 to 0. The result of addition to the original register number 0 is transferred to the instruction fetch / decode unit 21 as the converted input source register number.
- VL right-hand side pixel value transfer instruction
- the same conversion as in the case of the first line is performed, and the value is obtained using the offset calculation table 221.1 by the values NR and CR of the repeated execution specification register 225 and the repeated execution counter 224.
- "19" which is the result of adding the offset value "18" to the input output destination register number 1, is delivered to the instruction fetch / decode unit 21 as a converted output destination register (step A6).
- the opcode conversion unit 223 converts the input opcode because the value CR of the repetitive execution counter 224, which is the instruction subject to the opcode conversion, is equal to (the value NR-1 of the repetitive execution specification register 225). Then, it is delivered to the instruction foot / decode unit 21 (step A8).
- the instruction fetch / decode unit 21 operates the PE array 1, and the contents of the register R (O) of the right PE are stored in the register R (19) (steps A9 and A10).
- the iterative execution unit 22 increases the value CR of the iterative execution counter 224 by 1! And sets it to 2 (step Al 1). Since this value is the same as the value NR specified in the repeat execution specification register 225, it is determined that the necessary repeat execution has ended, the processing corresponding to the program code on the second line is ended, and the next instruction processing (Step A12)
- the instruction fetch 'decoding unit 21 reads the program code (ABS R (O), R (l), R (2)) on the third line from the program memory 3 and delivers it to the iterative execution unit 22.
- the operation content of the program code on the third line is to calculate the absolute value of the difference between register R (O) and register R (l) and store the operation result in register R (2).
- the iterative execution unit 22 performs the same conversion as in the case of the first row for each of the input source designation and output destination designation operands, and executes the instruction fetch.
- Decode unit 21 operates PE array 1 as a result of register operation.
- the absolute value of the difference between R (O) and register R (l) is stored in register R (2), and the absolute value of the difference between register R (18) and register R (19) is stored in register R (20). Move on to instruction processing.
- the instruction fetch / decode unit 21 reads the program code (ST MEM2, R (2)) on the fourth line and passes it to the iterative execution unit 22.
- the operation content of the program code on the 4th line is to read the value stored in the register R (2) and write it to the address MEM2 of the local memory 11.
- the same memory address and operand conversion is performed in the iterative execution unit 22 as in the first row, the instruction fetch 'decode unit 21 operates the PE array 1, and the value of the register R (2) is set to the local memory 11
- the value of register R (20) is stored at address (MEM2 + 256) in local memory 11 at address MEM2.
- 1 of the program code read from the program memory 3 Since the PE array control equivalent to two instructions is performed by the iterative execution unit 22 for the instructions, the amount of program code can be reduced.
- another instruction can be executed by the opcode conversion unit 223 in a partial cycle at the time of execution of the iteration, it is not possible to cope with repeated execution of the same instruction, and even if it is in the process of passing between adjacent pixels automatically It is possible to perform operations by repeated execution and further reduce the amount of program code.
- the ratio between the number of pixels in the width direction of the processing target image and the number of PEs is 2: 1.
- the ratio is not limited to 3: 1, 4: 1. Applicable when N: l and the number of pixels in the width direction of the processing target image is larger than the number of PEs in the system.
- the present invention can be applied to applications such as an image processing device, an image detection device, and an image recognition device that receive video images, sensor images, and the like.
- FIG. 1 is a block diagram showing a functional configuration of a parallel image processing system according to an embodiment of the present invention.
- FIG. 2 is a block diagram functionally showing the configuration of an operand converter in the parallel image processing system according to the present embodiment.
- FIG. 3 is a flowchart showing an overall operation of the parallel image processing system according to the present embodiment.
- FIG. 4 is a flowchart showing the operand conversion operation of the parallel image processing system according to the present embodiment.
- FIG. 5 is a flowchart showing a memory address conversion operation of the parallel image processing system according to the present embodiment.
- FIG. 6 is a flowchart showing an operation code conversion operation of the parallel image processing system according to the present embodiment.
- FIG. 7 is a diagram showing an example of a program for operating a parallel image processing system according to an embodiment of the present invention.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Image Processing (AREA)
Abstract
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP06833968.8A EP2000973B1 (en) | 2006-03-30 | 2006-12-05 | Parallel image processing system control method and apparatus |
CN2006800541190A CN101416216B (zh) | 2006-03-30 | 2006-12-05 | 并行图像处理系统控制方法和设备 |
JP2008509691A JP5077579B2 (ja) | 2006-03-30 | 2006-12-05 | 並列画像処理システムの制御方法および装置 |
US12/224,988 US8106912B2 (en) | 2006-03-30 | 2006-12-05 | Parallel image processing system control method and apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2006-092831 | 2006-03-30 | ||
JP2006092831 | 2006-03-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2007116560A1 true WO2007116560A1 (ja) | 2007-10-18 |
Family
ID=38580862
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2006/324213 WO2007116560A1 (ja) | 2006-03-30 | 2006-12-05 | 並列画像処理システムの制御方法および装置 |
Country Status (6)
Country | Link |
---|---|
US (1) | US8106912B2 (ja) |
EP (1) | EP2000973B1 (ja) |
JP (1) | JP5077579B2 (ja) |
KR (1) | KR20080100380A (ja) |
CN (1) | CN101416216B (ja) |
WO (1) | WO2007116560A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010134891A (ja) * | 2008-11-05 | 2010-06-17 | Toshiba Corp | 画像処理プロセッサ |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2464292A (en) | 2008-10-08 | 2010-04-14 | Advanced Risc Mach Ltd | SIMD processor circuit for performing iterative SIMD multiply-accumulate operations |
US10733478B2 (en) * | 2016-08-31 | 2020-08-04 | Facebook, Inc. | Systems and methods for processing media content that depict objects |
CN110728364A (zh) * | 2018-07-17 | 2020-01-24 | 上海寒武纪信息科技有限公司 | 一种运算装置和运算方法 |
US11182160B1 (en) * | 2020-11-24 | 2021-11-23 | Nxp Usa, Inc. | Generating source and destination addresses for repeated accelerator instruction |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04291659A (ja) * | 1991-03-20 | 1992-10-15 | Hitachi Ltd | 並列コンピュータシステムおよびその動作方法 |
JPH05174166A (ja) * | 1991-12-24 | 1993-07-13 | Sony Corp | Simd型並列演算装置 |
JP2839026B1 (ja) | 1997-06-25 | 1998-12-16 | 日本電気株式会社 | 並列画像処理装置 |
JP2002007359A (ja) | 2000-06-21 | 2002-01-11 | Sony Corp | Simd制御並列処理方法および装置 |
JP2004362086A (ja) | 2003-06-03 | 2004-12-24 | Matsushita Electric Ind Co Ltd | 情報処理装置および機械語プログラム変換装置 |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6741294B2 (en) * | 1997-12-08 | 2004-05-25 | Sony Corporation | Digital signal processor and digital signal processing method |
AU2362200A (en) * | 1998-12-15 | 2000-07-03 | Intensys Corporation | Digital camera using programmed parallel computer for image processing functionsand control |
US7506136B2 (en) * | 1999-04-09 | 2009-03-17 | Clearspeed Technology Plc | Parallel data processing apparatus |
GB2389689B (en) * | 2001-02-14 | 2005-06-08 | Clearspeed Technology Ltd | Clock distribution system |
JP4143302B2 (ja) * | 2002-01-15 | 2008-09-03 | キヤノン株式会社 | 画像処理装置、画像処理方法、制御プログラム及び記録媒体 |
JP4136432B2 (ja) * | 2002-04-15 | 2008-08-20 | 松下電器産業株式会社 | 図形描画装置 |
US20050188087A1 (en) * | 2002-05-28 | 2005-08-25 | Dai Nippon Printing Co., Ltd. | Parallel processing system |
WO2005069215A1 (en) * | 2004-01-14 | 2005-07-28 | Koninklijke Philips Electronics N.V. | Processor architecture |
US20050257026A1 (en) * | 2004-05-03 | 2005-11-17 | Meeker Woodrow L | Bit serial processing element for a SIMD array processor |
JP4478050B2 (ja) * | 2005-03-18 | 2010-06-09 | 株式会社リコー | Simd型マイクロプロセッサ及びデータ処理方法 |
-
2006
- 2006-12-05 CN CN2006800541190A patent/CN101416216B/zh not_active Expired - Fee Related
- 2006-12-05 EP EP06833968.8A patent/EP2000973B1/en not_active Not-in-force
- 2006-12-05 JP JP2008509691A patent/JP5077579B2/ja not_active Expired - Fee Related
- 2006-12-05 US US12/224,988 patent/US8106912B2/en not_active Expired - Fee Related
- 2006-12-05 KR KR1020087023995A patent/KR20080100380A/ko not_active Application Discontinuation
- 2006-12-05 WO PCT/JP2006/324213 patent/WO2007116560A1/ja active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04291659A (ja) * | 1991-03-20 | 1992-10-15 | Hitachi Ltd | 並列コンピュータシステムおよびその動作方法 |
JPH05174166A (ja) * | 1991-12-24 | 1993-07-13 | Sony Corp | Simd型並列演算装置 |
JP2839026B1 (ja) | 1997-06-25 | 1998-12-16 | 日本電気株式会社 | 並列画像処理装置 |
JP2002007359A (ja) | 2000-06-21 | 2002-01-11 | Sony Corp | Simd制御並列処理方法および装置 |
JP2004362086A (ja) | 2003-06-03 | 2004-12-24 | Matsushita Electric Ind Co Ltd | 情報処理装置および機械語プログラム変換装置 |
Non-Patent Citations (1)
Title |
---|
See also references of EP2000973A4 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010134891A (ja) * | 2008-11-05 | 2010-06-17 | Toshiba Corp | 画像処理プロセッサ |
Also Published As
Publication number | Publication date |
---|---|
EP2000973A9 (en) | 2009-03-04 |
EP2000973A4 (en) | 2012-01-04 |
EP2000973A2 (en) | 2008-12-10 |
US8106912B2 (en) | 2012-01-31 |
EP2000973B1 (en) | 2013-05-01 |
KR20080100380A (ko) | 2008-11-17 |
JP5077579B2 (ja) | 2012-11-21 |
CN101416216A (zh) | 2009-04-22 |
JPWO2007116560A1 (ja) | 2009-08-20 |
US20090106528A1 (en) | 2009-04-23 |
CN101416216B (zh) | 2012-11-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10416999B2 (en) | Processors, methods, and systems with a configurable spatial accelerator | |
EP1028382B1 (en) | Microcomputer | |
US8161266B2 (en) | Replicating opcode to other lanes and modifying argument register to others in vector portion for parallel operation | |
WO2008112207A2 (en) | Software programmable timing architecture | |
JP2006313422A (ja) | 演算処理装置及びデータ転送処理の実行方法 | |
WO2021249054A1 (zh) | 一种数据处理方法及装置、存储介质 | |
US20070016760A1 (en) | Central processing unit architecture with enhanced branch prediction | |
WO2007116560A1 (ja) | 並列画像処理システムの制御方法および装置 | |
JP2001290658A (ja) | マッピング回路及び方法 | |
JPH07104784B2 (ja) | デジタルデータ処理装置 | |
EP3264261B1 (en) | Processor and control method of processor | |
JP2003501773A (ja) | 算術論理演算装置およびスタックを備えたデータプロセッサ | |
US20060200648A1 (en) | High-level language processor apparatus and method | |
JP3614646B2 (ja) | マイクロプロセッサ、演算処理実行方法及び記憶媒体 | |
JP2009507292A (ja) | 分離したシリアルモジュールを備えるプロセッサアレイ | |
Brand et al. | Orthogonal instruction processing: An alternative to lightweight VLIW processors | |
JP4444305B2 (ja) | 半導体装置 | |
AU2003204212A1 (en) | A Processor for Alpha-compositing | |
Fuertler et al. | Novel development tool for software pipeline optimization for VLIW-DSPs used in real-time image processing | |
Mehendale et al. | Low power code generation of multiplication-free linear transforms | |
Panis | Vliw dsp processor for high-end mobile communication applications | |
Kuo et al. | Digital signal processor architectures and programming | |
JP2002312178A (ja) | メモリアクセス最適化方法 | |
JP2004303058A (ja) | ベクトルプロセッサおよびそのデータ処理方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 06833968 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 12224988 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2006833968 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2008509691 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 5252/CHENP/2008 Country of ref document: IN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 200680054119.0 Country of ref document: CN |