WO2015125960A1 - Information processing device, digital camera, and processor - Google Patents

Information processing device, digital camera, and processor Download PDF

Info

Publication number
WO2015125960A1
WO2015125960A1 PCT/JP2015/055050 JP2015055050W WO2015125960A1 WO 2015125960 A1 WO2015125960 A1 WO 2015125960A1 JP 2015055050 W JP2015055050 W JP 2015055050W WO 2015125960 A1 WO2015125960 A1 WO 2015125960A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
memory
unit
register
marching
Prior art date
Application number
PCT/JP2015/055050
Other languages
French (fr)
Japanese (ja)
Inventor
武昭 杉村
風見 一之
Original Assignee
株式会社ニコン
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社ニコン filed Critical 株式会社ニコン
Priority to JP2016504209A priority Critical patent/JP6319420B2/en
Publication of WO2015125960A1 publication Critical patent/WO2015125960A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/30134Register stacks; shift registers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management

Definitions

  • the present invention relates to an information processing apparatus including a data memory that temporarily stores stream data including a plurality of unit data, and a processor that performs predetermined information processing on stream data read from the data memory.
  • the present invention also relates to a digital camera equipped with this information processing apparatus.
  • Image processing apparatus As one of the information processing apparatuses as described above.
  • Image processing apparatuses are widely used in digital cameras, video cameras, computers, printers, and the like that handle image data recorded on recording media of these cameras.
  • the volume of image data per image one frame
  • recording has been performed per unit time due to the demand for higher frames.
  • the amount of image data to be processed has increased remarkably.
  • the image processing apparatus is required to increase the processing speed.
  • Image processing apparatuses having various configurations have been proposed as means for speeding up image processing. For example, predetermined (for example, one image or one frame) image data is read from a plurality of image data recorded on an external recording medium such as a CompactFlash (registered trademark) card or an SD card (registered trademark).
  • an image processing apparatus including a data memory such as a DRAM that temporarily stores data and an image processing processor that performs predetermined image processing on image data temporarily stored in the data memory, between the data memory and the image processing processor,
  • a read cache memory that reads a part of image data from a data memory and temporarily stores it, and a write cache memory that temporarily stores the part of the image data processed by the image processor. Yes.
  • an image processing apparatus that employs a single instruction multiple data stream (SIMD: Single Instruction Multiple Date stream) has been proposed (for example, Patent Document 1). reference).
  • SIMD Single Instruction Multiple Date stream
  • Patent Document 1 a plurality of arithmetic units that execute arithmetic processing are provided in parallel, and these are simultaneously operated in parallel, thereby improving the processing speed.
  • the marching memory is configured with a column in which a plurality of storage areas (referred to as cells) are continuously arranged as one unit. Both ends of the column become data (or instruction) input ports and output ports according to the setting, and data is sequentially input to the end of the column set as the input port. A plurality of data and the like input to the input port are sequentially transferred to adjacent cells and temporarily stored in each cell.
  • cells a plurality of storage areas
  • the left end side of the column is an input port, and five data of 1, 2, 3, 4, and 5 are stored in the input port.
  • 1 is temporarily stored in the leftmost cell when 1 as the first data is input.
  • the other cells are in a reset state (or a state in which the four data inputted immediately before these five data are inputted are temporarily stored).
  • the second data 2 is input, the data 1 that has been stored in the leftmost cell until then is sent to the second cell from the left end and moved to the second cell, and the leftmost cell contains 2 of the newly input data. Is temporarily stored.
  • the input data is sequentially transferred and stored in each cell, and when all five data are input, 1, 2, 3, 4, and 5 are input to each cell from the right end side to the left end side of the column. Five data are temporarily stored.
  • the right end side of the column when the right end side of the column is set as an output port, it is temporarily stored when an output command for five data temporarily stored in the marching memory is issued or whenever new data is input.
  • the data is output from the output port in the order of 1, 2, 3, 4, and 5.
  • the temporarily stored data when the right end of the column is an input port and an output port, when an output command for five data temporarily stored in the marching memory is issued, the temporarily stored data is 5, 4, 3 , 2 and 1 in this order.
  • the object of the present invention is to provide a suitable application utilizing the characteristic operation mode of marching memory, which is such a novel memory technology.
  • An application includes a data memory that temporarily stores stream data composed of a plurality of unit data (for example, image data in the embodiment) and a stream data read from the data memory having a computing unit.
  • An information processing apparatus including a processor (for example, a digital signal processor in the embodiment) that performs information processing.
  • the information processing apparatus includes a marching memory as a register file of the processor.
  • the marching memory is a unit marching memory that has a column in which a plurality of storage areas are connected in series, and that temporarily transfers a plurality of input unit data from one end of the column to adjacent storage areas and temporarily stores them in each storage area. A plurality are provided.
  • the marching memory temporarily stores a plurality of input unit data in each storage area of the first unit marching memory, and the processor stores each storage area of the first unit marching memory based on the batch calculation processing instruction.
  • a plurality of unit data temporarily stored in the storage unit are sequentially calculated by an arithmetic unit, and the processed unit data are temporarily stored in each storage area of the second unit marching memory.
  • stream data in this specification is a data group composed of a plurality of unit data, and data in which adjacent unit data have a predetermined relationship with each other spatially and / or temporally.
  • An example of stream data is shown as an example of stream data.
  • image data of a still image taken by a digital camera is an aggregate (data group) of unit data generated based on the detection signal of each pixel constituting the image sensor.
  • the unit data constituting this data group is not random data having no relationship between data but data having a spatial relationship within a predetermined range. Based on this spatial relevance, adjacent unit data has a certain relevance with respect to feature quantities such as lightness and saturation, and the feature quantities change smoothly, for example.
  • image data or still image data when still images are continuously shot in addition to having the spatial relationship as described above with respect to unit data spatially adjacent within one frame, There is a temporal relationship between them. The same applies to audio data recorded by an IC recorder or the like, and is an aggregate of unit data whose frequency and intensity change smoothly in the time axis direction.
  • the information processing apparatus includes a marching memory as a shared memory between the data memory and the plurality of processors.
  • the marching memory is a unit marching memory that has a column in which a plurality of storage areas are connected in series, and that temporarily transfers a plurality of input unit data from one end of the column to adjacent storage areas and temporarily stores them in each storage area. A plurality of units are provided in parallel.
  • an arrangement changing unit for example, changing the arrangement of a plurality of unit data inputted and / or the arrangement of a plurality of unit data outputted
  • the ring register in the embodiment can be provided to constitute the information processing apparatus.
  • the arrangement changing means includes a ring register composed of a register group connected to an input port and / or an output port of each unit marching memory, and a sequencer for controlling the operation of the ring register (for example, read / write in the embodiment).
  • the sequencer may be configured to control the operation of the ring register in accordance with the information processing mode executed by the information processing apparatus and change the arrangement of the plurality of unit data.
  • the information processing mode can be appropriately set according to the system to which the information processing apparatus is applied.
  • an image compression mode a plurality of known modes can be considered.
  • image data compressed by thinning out the number of pixels from image data of all pixels stored in the data memory (stream data having a large data capacity)
  • An example is a mode that changes to stream data with a reduced data capacity.
  • the information processing apparatus includes a marching memory as a buffer memory of the processor.
  • the marching memory is a unit marching memory that has a column in which a plurality of storage areas are connected in series, and that temporarily transfers a plurality of input unit data from one end of the column to adjacent storage areas and temporarily stores them in each storage area. A plurality of units are provided in parallel.
  • an information processing apparatus according to any one of the above, an image input system having an image sensor and inputting image data as stream data to the information processing apparatus, and processing by the information processing apparatus.
  • a digital camera provided with an image output system for outputting the processed image data.
  • the marching memory uses a column in which a plurality of storage areas (cells) are connected as a unit, and unit data input to the input port is sequentially transferred to adjacent storage areas and temporarily stored in each cell.
  • the input and movement speed of the unit data can correspond to, for example, a CPU reference clock, and a high-speed reading operation is possible.
  • the data group targeted by the information processing apparatus according to the aspect of the present invention is stream data composed of a plurality of unit data, and is a data group having a predetermined relationship between adjacent data. That is, unlike random data that requires addressing for each unit data to be read, it is a data group that matches the operation mode of the marching memory.
  • FIG. 1 A block diagram illustrating the architecture of a signal processing system in a digital camera is shown in FIG.
  • This signal processing system includes an image input / output unit 1, a data memory unit 2, a CPU core unit 3, a DSP array unit 4, a CPU / DSP synchronous communication mechanism unit 5, and buses 61 to 66 that connect the respective units. .
  • the image input / output unit 1 is a control unit that controls input / output of image data to / from an external recording medium such as an image input system, an image output system, a CF card, or an SD card (registered trademark) (not shown). It is.
  • the image input / output unit 1 is provided with an I / O circuit 11 that exchanges image data with the above-described units, a function block (IP) 12 that executes codec and decoding of image data, and the like, and a stream I / O bus 61. Via an image input system, an image output system, and an external recording medium.
  • IP function block
  • the image input system is a system for inputting original data of images (still images or moving images) taken in various formats by a digital camera.
  • an image sensor such as a CMOS or a CCD
  • an imaging lens that forms an image.
  • the image output system is a system for outputting image data processed by the image processing apparatus to a liquid crystal display panel or an external output terminal (see FIG. 6).
  • the image input / output unit 1 is connected to the main bus 62 via a stream I / O bus 61 and a bus bridge.
  • a data memory unit 2, a CPU core unit 3, and a DSP array unit 4 are connected to the main bus 62. That is, in addition to the data memory unit 2 being connected to the main bus 62, the CPU data bus 63 of the CPU core unit 3 is connected via a bus bridge, and the DSP data of the DSP array unit 4 is similarly connected via the bus bridge.
  • a bus (stream bus) 64 is connected.
  • the stream I / O bus 61 and the DSP data bus 64 are also directly connected via a bus bridge.
  • the data memory unit 2 includes a data memory 21 and a memory controller 22.
  • the data memory 21 is a storage element that temporarily stores image data input via the image input / output unit 1 from an image input system or an external recording medium.
  • a DRAM Dynamic Random Access Memory
  • the memory controller 22 is a control circuit that controls the writing of image data to the data memory 21 and the reading of image data temporarily stored in the data memory 21.
  • the CPU core unit 3 is a control unit that controls the operation of the digital camera based on a program set and stored in advance, and FIG. 1 shows a configuration example of a parallel arithmetic module in which two processing circuits are provided in parallel.
  • the CPU core unit 3 includes an instruction RAM (Instruction RAM) 31, a processing circuit connected in parallel to the instruction RAM 31 via a CPU instruction bus 65, and a DMAC (Direct that is connected to each processing circuit via a CPU data bus 63. It is configured to have a Memory (Access Controller) 35, SRAM (Static Random Access Memory) 36, etc.
  • the instruction RAM 31 is a RAM (Random Access Memory) in which a program composed of a plurality of processing instructions is set and stored in advance.
  • Each processing circuit provided in parallel includes an instruction cache 32 that temporarily holds a processing instruction of each step of the program, a CPU 33 that executes the processing instruction, and a data cache that temporarily holds arithmetic data referred to when the processing instruction is executed. 34.
  • the instruction cache 32 is connected to the instruction RAM 31 via the CPU instruction bus 65, and the processing instructions of the program stored in the instruction RAM 31 are read as the processing steps progress and are temporarily held in the instruction cache 32.
  • the data cache 34 is connected to a CPU data bus 63, and arithmetic data to be referred to when the CPU 33 executes processing is read from the DMAC 35, SRAM 36, etc. connected to this bus as the processing steps progress. It is temporarily held in the cache 34. Based on the processing instruction temporarily held in the instruction cache 32, the CPU 33 refers to the operation data temporarily held in the data cache 34 and executes processing.
  • the DSP array unit 4 is an image processing apparatus that performs predetermined image processing based on a program set and stored in advance.
  • FIG. 1 shows a configuration example of a parallel processing module in which a plurality of digital signal processors (DSPs) 43, 43... Are provided in parallel.
  • DSPs digital signal processors
  • FIG. 1 shows a configuration example in which a common shared memory 44 is provided for a plurality of digital signal processors 43, 43...
  • the DSP array unit 4 includes an instruction RAM 41 in which a program including a plurality of processing instructions is set and stored in advance, an image processing circuit connected in parallel to the instruction RAM 41 via a DSP instruction bus 66, and a DSP data bus 64 connected to the image processing circuit. And a DMAC 45, an SRAM 46, and the like connected via each other.
  • Each of the image processing circuits provided in parallel has an instruction cache (I $) 42 that temporarily holds processing instructions, and a digital signal processor (hereinafter abbreviated as DSP) 43 that executes the processing instructions, A plurality of DSPs 43, 43... Share processing and execute image processing in parallel.
  • the instruction caches 42, 42... Are connected in parallel to the instruction RAM 41 via the DSP instruction bus 66, and processing instructions to be shared by the DSPs 43 are read and temporarily stored in the instruction caches 42.
  • DSP 43, 43... are connected to a shared memory 44, and each DSP can read image data from the shared memory 44 or write processed image data to the shared memory 44.
  • the shared memory 44 is connected to the DSP data bus 64 and is connected to the main bus 62 and the stream I / O bus 61 via a bus bridge. Therefore, the shared memory 44 can exchange image data with the data memory 21, the DMAC 45, the SRAM 46, the image input system, and the like via these buses.
  • the shared memory 44 reads predetermined image data from the above-described units (for example, the data memory 21) based on the processing instruction read from the instruction RAM 41, and temporarily stores it in the memory.
  • the DSPs 43, 43... Read out a part of the image data shared by them from the shared memory 44 based on the processing instructions read from the instruction RAM 41 and temporarily held in the instruction caches 42, 42. Execute the process.
  • the image data processed by the DSPs 43, 43... Is temporarily stored in the shared memory 44, and sequentially transferred to the data memory 21 for storage.
  • the CPU / DSP synchronous communication mechanism unit 5 is a mechanism that adjusts the timing of processing executed between the CPU core unit 3 and the DSP array unit 4, and is provided between the CPU 33 and the DSP 43.
  • the control interrupt controller 51 includes a shared RAM 52 provided between the CPU data bus 63 and the DSP data bus 64. For example, when a process that requires image processing occurs during execution of the program, the CPU 33 writes data such as the address of image data to be processed and the processing content to the shared RAM 52, and the DSP 43 via the synchronization control interrupt controller 51. The interrupt processing execution command is output to.
  • the DSP 43 reads the designated image data, executes image processing, writes the address of the processed image data into the shared RAM 52, and signals to the CPU 33 that the interrupt processing has been completed via the synchronous control interrupt controller 51. Is output. Thereby, the CPU core unit 3 and the DSP array unit 4 perform parallel processing efficiently.
  • image data recorded on an external recording medium (not shown) or an image read from the external recording medium and temporarily stored in the data memory 21
  • the data is a data group composed of a large number of unit data constituting an image, and is an aggregate of data in which adjacent unit data has a predetermined relationship spatially, that is, stream data.
  • the DSP 43 that performs predetermined image processing on image data has an arithmetic unit such as an arithmetic logic unit (ALU) or a floating point number processing unit (FPU) in the processor. Configured.
  • ALU arithmetic logic unit
  • FPU floating point number processing unit
  • a plurality of storage areas are provided between an external recording medium (not shown) in which image data is temporarily stored, the data memory 21, and the arithmetic units of the DSPs 43, 43. Is provided with a marching memory having columns connected to each other.
  • a marching memory having columns connected to each other.
  • the image processing apparatus includes a marching memory in the DSP 43, that is, the marching memory is built in the DSP 43 and is integrally configured.
  • An image processing apparatus 100 according to a first aspect included in this embodiment will be described with reference to FIG.
  • FIG. 2 is a block diagram mainly showing a DSP (digital signal processor) 43 in the image processing apparatus 100.
  • the memory controller 22 in the data memory unit 2, the instruction RAM 41 in the DSP array unit 4, the instruction cache 42, and the sharing are shown in FIG. Description of the memory 44 and the like is omitted.
  • the image processing apparatus 100 includes a data memory 21 that temporarily stores image data, and a DSP 43 that performs predetermined information processing on the image data read from the data memory 21.
  • the DSP 43 of this aspect includes a register file 110 that temporarily holds data necessary for arithmetic processing, an arithmetic unit 120 such as an ALU or FPU that executes arithmetic processing using the data held in the register file 110, and an arithmetic result.
  • the accumulator 130 temporarily holds, and a buffer memory 150 including a load buffer 151 and a store buffer 152 provided between the data memory 21 and the register file 110.
  • a marching memory is used as the load buffer 151 and the store buffer 152.
  • the load buffer 151 and the store buffer 152 are configured by providing a unit marching memory formed by a column in which a plurality of cells (storage areas) are connected in the row direction as one buffer, and providing a plurality of them in parallel.
  • a unit marching memory formed by a column in which 256 cells are connected in the row direction is used as one buffer, and 256 units are provided in parallel in the column direction so that a load buffer 151 including 256 buffers and a store buffer are provided.
  • 152 is formed.
  • Each cell of the marching memory temporarily stores unit data constituting image data
  • the load buffer 151 and the store buffer 152 are each configured to be capable of temporarily storing (temporarily storing) 256 ⁇ 256 unit data.
  • the load buffer 151 accesses the data memory 21 in response to a DMA (Direct Memory Access) transfer command from the command RAM 41, reads a predetermined range of image data, and temporarily holds it.
  • image data for 256 pixels in the X direction and 256 pixels in the Y direction is read from image data of 4 megapixels in the horizontal direction (X direction) ⁇ 3 megapixels in the vertical direction (Y direction), Temporarily stored in 256 buffers.
  • unit data for 256 pixels of the X1 line read first in the X direction is sequentially sent from the first cell of the B1 buffer to the 256th cell, temporarily held, and then read out for 256 pixels of the X2 line.
  • Minute unit data is sequentially sent from the first cell of the B2 buffer to the 256th cell and temporarily held. Thereafter, the read operation is repeated in the same manner, and the B1 buffer and the B256 buffer are sequentially temporarily held.
  • Each register temporarily holds unit data of 256 pixels adjacent to each other along the X direction line of the image data.
  • the register file 110 reads unit data from the load buffer 151 in response to a load instruction from the instruction RAM 41 and stores it in a predetermined register, for example, the R1 register. Similarly, data corresponding to the image processing to be executed is read from the DMAC 45 or the SRAM 46 and stored in a predetermined register, for example, the R2 register.
  • the arithmetic unit 120 executes arithmetic processing using the unit data stored in the R1 register and the data stored in the R2 register in accordance with the arithmetic instruction from the instruction RAM 41, and registers the arithmetic result via the accumulator 130.
  • the data is output to the file 110 and stored in a predetermined register, for example, the R3 register.
  • the register file 110 outputs the unit data after the arithmetic processing in accordance with the store instruction from the instruction RAM 41 and writes the unit data of the arithmetic result in the store buffer 152.
  • the unit data arranged in the column of each buffer (unit marching memory) in the load buffer 151 is sequentially loaded into the register file 110 and processed by the calculator 120. Further, the unit data after the arithmetic processing is stored in the order in which the arithmetic processing is performed, and is temporarily held in the column of each buffer (unit marching memory) in the store buffer 152.
  • the unit data arranged in the column of the B1 buffer in the load buffer 151 is read into the register file 110 sequentially from the first cell and is arithmetically processed by the arithmetic unit 120.
  • the unit data after the arithmetic processing is B1 in the store buffer 152. The data is sequentially sent from the first cell to the 256th cell and written in the buffer column. Thereafter, the same applies to the B2 buffer to the B256 buffer.
  • the image data temporarily stored in the load buffer 151 sequentially decreases, and the image data after the arithmetic processing (after image processing) sequentially increases in the store buffer 152.
  • the load buffer 151 and the store buffer 152 perform DMA transfer of image data to and from the data memory 21 in parallel with the progress of the arithmetic processing, that is, in the background of execution of the arithmetic processing.
  • image data for one line in the X direction is temporarily held from the first cell to the 256th cell of each buffer, and image data from the X1 line to the X256 line is transferred from the B1 buffer.
  • the data is temporarily stored in the B256 buffer (from the first unit marching memory to the 256th unit marching memory).
  • the computing unit 120 uses the first cell to the 256th cell of the B1 buffer, the first cell to the 256th cell of the B2 buffer, the first cell to the 256th cell of the B3 buffer, and the first cell of the B256 buffer.
  • unit data temporarily held in the cells of each buffer is sequentially loaded and processed.
  • Image data is a collection of data in which unit data temporarily stored in adjacent cells are spatially related to each other (for example, detected by adjacent pixels), and is stream data. Therefore, unlike random data, it is not necessary to address and read / write each unit data, and unit data for one line is sequentially sent in the buffer of each unit marching memory and stored in the cell, or Read it out. In addition, data transfer between adjacent cells, that is, unit data write and read operations can be performed at high speed in synchronization with the clock pulse of the DSP 43.
  • the load buffer 151 and store buffer 152 and the data memory 21 are physically separated from each other, it takes time to exchange image data, which may be one obstacle for speeding up and increasing the efficiency of image processing.
  • the transfer of the image data between the load buffer 151 and the store buffer 152 and the data memory 21 is performed in the background in parallel with the period during which the arithmetic unit 120 executes the arithmetic processing. Done in
  • the image processing apparatus 100 it is possible to execute image processing at high speed and high efficiency by utilizing the characteristic operation form of the marching memory.
  • each of the B1 to B256 buffers can temporarily store image data of an arbitrary line and is freed according to processing. Can use the buffer.
  • the image data of the X1 line can be temporarily stored in the B35 buffer
  • the image data of the X2 line can be temporarily stored in the B42 buffer.
  • FIG. 3 is a block diagram mainly showing a DSP (digital signal processor) 43 in the image processing apparatus 200. Similar to FIG. 2, the memory controller 22 in the data memory unit 2, the instruction RAM 41 in the DSP array unit 4, and the instruction cache are shown. 42, the shared memory 44, etc. are omitted.
  • DSP digital signal processor
  • the image processing apparatus 200 includes a data memory 21 that temporarily stores image data, and a DSP 43 that performs predetermined information processing on the image data read from the data memory 21.
  • the DSP 43 according to this aspect includes a register file 210 that temporarily holds data necessary for arithmetic processing, an arithmetic unit 220 such as an ALU or FPU that performs arithmetic processing using the data held in the register file 210, and an arithmetic result.
  • An accumulator 230 that temporarily holds and an address register file 250 that temporarily holds address information of image data temporarily stored in each register of the register file 210 are configured.
  • a marching memory is used as the register file 210.
  • the register file 210 includes a unit marching memory formed by a column in which a plurality of cells (storage areas) are connected in the row direction as one register, and a plurality of the unit marching memories are provided in parallel.
  • unit marching memories 211, 212, 213,..., 21N provided in parallel constitute R1, R2, R3,.
  • 32 registers each capable of holding 256 unit data are provided.
  • the address register file 250 is a register file that temporarily holds the address data of the image data temporarily held in each register of the register file 210.
  • loading, storing, and arithmetic processing for one line of the register are executed by one batch instruction. That is, one line of image data consisting of 256 unit data is loaded into the register by one batch load instruction, and one line of image data is arithmetically processed by one batch operation instruction. The image data for one line is stored in the data memory 21 by the instruction.
  • batch instructions and processing are executed as follows.
  • the register file 210 accesses the data memory 21 in response to a batch load command from the command RAM 41, reads a predetermined range of image data, and stores it in each register. For example, image data of 256 pixels in the horizontal direction ⁇ 10 pixels in the vertical direction is read from the image data of 4 megapixels in the horizontal direction ⁇ 3 megapixels in the vertical direction, and stored in 10 registers. At this time, the unit data for 256 pixels of the X1 line loaded based on the first batch load instruction is sequentially sent from the first cell of the R1 register to the 256th cell and stored, and based on the next batch load instruction.
  • the unit data for 256 pixels of the loaded X2 line is sequentially sent from the first cell of the R2 register to the 256th cell to be stored. Thereafter, similarly, the load operation based on the batch load instruction is repeated, and sequentially stored in the R1 register to the R10 register.
  • Each register of R1 to R10 stores unit data for 256 pixels adjacent along the X-direction line of the image data.
  • the address register file 250 stores the address information of the image data read based on the batch load instruction and stored in each register. For example, when image data of the X1 line is stored in the first cell to the 256th cell of the R1 register, the address of the X1 line is stored in the A1 register. Further, when the image data of the X1 line after the arithmetic processing described below is stored in the R21 register, the address of the X1 line after the arithmetic processing is stored in the A21 register.
  • data corresponding to the image processing to be executed is read from the DMAC 45 or the SRAM 46 in response to the execution of the batch load instruction from the instruction RAM 41, and stored in each register.
  • data corresponding to the X1 line image processing is read from the DMAC 45 according to the execution of the batch load instruction and stored in the R11 register, and the X2 line image processing is performed according to the execution of the batch load instruction.
  • Corresponding 256 pieces of data are read out and stored in the R12 register.
  • 256 data corresponding to the image processing of the X10 line are read and stored in the R20 register.
  • the computing unit 220 performs batch processing on the image data stored in each register in response to a batch computation command from the command RAM 41.
  • the batch operation instruction from the instruction RAM 41 is a batch operation instruction for adding the image data stored in the R1 register and the data stored in the R11 register and storing them in the R21 register
  • 220 executes arithmetic processing as follows.
  • the arithmetic unit 220 adds the unit data stored in the first cell of the R1 register based on the batch operation instruction and the data stored in the first cell of the R11 register, and stores the result in the first cell of the R21 register. To do. Next, the unit data stored in the second cell of the R1 register and the data stored in the second cell of the R11 register are added and stored in the second cell of the R21 register. Similarly, the unit data stored in the nth cell of the R1 register and the data stored in the nth cell of the R11 register are added and stored in the nth cell of the R21 register. This process is executed from the first cell to the 256th cell with one instruction.
  • the arithmetic unit 220 When a processing instruction for the image data stored in the R2 register (X2 line) is issued as the next batch operation instruction from the instruction RAM 41, the arithmetic unit 220 performs processing performed on the image data stored in the R1 register. The arithmetic processing is executed in the same way as Thereafter, when the batch calculation is repeated, for example, the same calculation process is executed for the R3 to R10 registers.
  • the unit data before the arithmetic processing forming the X1 line to the X10 line are sequentially stored in the first cell to the 256th cell of the R1 register to the R10 register, respectively.
  • Each register constituting the register file 210 is a unit marching memory, and the data stored in the first cell to the 256th cell is moved forward while maintaining the data arrangement order. Therefore, the unit marching memory is simply forwardly operated at the time of execution of the arithmetic processing, so that the unit data stored in the first cell to the 256th cell can be sequentially sent to the arithmetic unit 220 and the arithmetic processing can be executed.
  • the unit marching memory is sequentially operated to send the unit data to the first cell to the second cell.
  • Unit data is stored in order in 256 cells.
  • the image data before the arithmetic processing stored in the R1 register to the R10 register and the data stored in the R11 register to the R20 register are sequentially reduced and stored in the R21 register to the R30 register.
  • the stored image data after the arithmetic processing sequentially increases.
  • the DSP 43 issues a DMA transfer of image data between the register file 210 and the data memory 21 by issuing a batch store instruction in parallel with the progress of the arithmetic processing by the arithmetic unit 220.
  • image data including 256 unit data for one X direction line is stored in the first cell to the 256th cell of each register by one batch load instruction.
  • Image data composed of unit data of 2560 from the X1 line to the X10 is stored from the R1 register to the R10 register by the batch load instruction.
  • image data for one line composed of 256 unit data is processed by one batch operation instruction, and image data for 10 lines stored in the R1 register to R10 register by 10 batch operation instructions. It is calculated and stored in R21 to R30.
  • Image data is a collection of data in which unit data stored in adjacent cells are spatially related to each other (for example, detected by adjacent pixels), and is stream data. Therefore, unlike the random data, it is not necessary to perform addressing for each unit data one by one and read and write, and the time required for addressing, data search, etc. can be eliminated.
  • the arithmetic unit 220 performs one instruction for each of the load, operation, and store processes of a unit data group (256 unit data groups in the above embodiment) corresponding to one X-direction line. Process in batch. Since the unit data group for one line is moved forward while maintaining the data arrangement order, the unit data group can be moved in order by simply operating the unit marching memory at the time of loading, calculation processing, or storing. It is possible to load, perform arithmetic processing, and store the data group of the arithmetic results in order. Data transfer movement between adjacent cells in the unit marching memory, that is, unit data write and read operations can be performed at high speed in synchronization with the reference clock pulse. Furthermore, load, store, and arithmetic processing for one register line are executed by one batch instruction, so that the instruction efficiency can be greatly improved.
  • the register file 210 and the data memory 21 are physically separated, it takes time to exchange image data, which can be an obstacle to speeding up and increasing the efficiency of image processing.
  • the transfer of the image data between the register file 210 and the data memory 21 is performed in the background in parallel with the period during which the arithmetic unit 220 executes the arithmetic processing.
  • the image processing apparatus 200 it is possible to execute image processing at high speed and high efficiency by utilizing the characteristic operation form of the marching memory.
  • the image data of the X1 to X10 lines are stored in the R1 to R10 registers in the register file 210
  • the processing data of the X1 to X10 lines are stored in the R11 to R20 registers, and R21 to R30.
  • the configuration in which the image data after processing of the X1 to X10 lines is stored in the register is illustrated.
  • the R1 to R32 registers can store image data, processed data, and processed image data of arbitrary lines, respectively, according to the allocation at that time, and an unused register can be allocated and used according to the processing. it can.
  • the image data of the X1 line and the X2 line are stored in the R8 register and the R13 register
  • the processing data of the X1 line and the X2 line are stored in the R2 register and the R25 register
  • the X1 line and the X2 line are stored in the R5 register and the R32 register.
  • the processed image data can be stored.
  • the A1 to A32 registers can each store the address of the image data of an arbitrary line and the address of the processed image data according to the assignment at that time, and are free according to the processing. Registers can be allocated and used. For example, the address of the X3 line image data stored in the R9 register can be stored in the A2 register, and the processed address of the X3 line stored in the R22 register can be stored in the A11 register.
  • the image processing apparatus includes a data memory 21 and a plurality of DSPs (digital signal processors) 43, 43,... 43, and the data memory 21 and the DSPs 43, 43,.
  • a shared memory 44 using a marching memory is provided between the two.
  • FIG. 4 is a block diagram mainly showing the shared memory 44 in the image processing apparatus 400, and the description of the memory controller 22 in the data memory unit 2, the instruction RAM 41, the instruction cache 42, etc. in the DSP array unit 4 is omitted. .
  • the image processing apparatus 400 includes a data memory 21 that temporarily stores image data, a DSP 43 that performs predetermined information processing on image data read from the data memory 21, and a shared memory 44.
  • the shared memory 44 includes a data storage unit 401 that temporarily stores image data and the like, and a data control unit 402 that controls the flow of image data and the like input to and output from the data storage unit 401.
  • the data storage unit 401 includes an MM array 410 that temporarily stores image data, and an MM label management controller 420 that temporarily stores information of image data temporarily stored in the MM array (marching memory array) 410.
  • the data control unit 402 is a read / write adjustment circuit 430 provided between the data memory 21 and DSPs 43, 43,... 43 and the MM array 410, and a read that controls the operation of the read / write adjustment circuit 430. / Write sequencer 440.
  • the MM array 410 includes a unit marching memory formed by a column in which a plurality of cells (storage areas) are connected in the row direction as a single memory, and a plurality of them are provided in parallel.
  • unit marching memories 411, 412, 413,..., 41N provided in parallel constitute C1, C2, C3,.
  • the capacity of the MM array is set according to the data capacity of the image data captured by the digital camera. For example, 256 ⁇ m unit marching memories having 256 ⁇ n cells in the row direction are provided in parallel in the column direction. Configured. In the present embodiment, a configuration in which one end of a column of each unit marching memory is used as a data input / output port is illustrated.
  • the read / write adjustment circuit 430 reads and writes image data between the MM array 410 and the data memory 21, and transfers image data between the MM array 410 and the DSPs 43, 43,. This is a circuit for adjusting the flow of image data when it is performed.
  • the read / write adjustment circuit 430 shown in FIG. 1 includes load / store units 431, 431,... 431 provided corresponding to the DSPs 43, 43,.
  • the load / store unit 431 has one end connected to the DSPs 43, 43,..., 43 and the data memory 21 via the global memory transfer bus 68 and the other end connected to the port connection controller 432.
  • the ring register 433 has one end of each of the first to Nth registers connected to an input / output port of the corresponding C1 to CN memory, and the other end connected to a port connection controller 432.
  • the read / write sequencer 440 controls the operations of the load / store unit 431, the port connection controller 432, and the ring register 433 of the read / write adjustment circuit 430 according to the contents of the image processing executed by the image processing apparatus 400.
  • a -Y conversion mode and (3) a compression mode for reducing the number of image data stored in the data memory 21 will be described as representative examples of processing modes.
  • the copy mode is a processing mode in which image data stored in the data memory 21 is temporarily stored in the MM array 410 as it is.
  • the read / write sequencer 440 controls the read / write adjustment circuit 430 as follows. Now, it is assumed that the first load / store unit 431 reads the image data of the X1 line to the X5 line and temporarily stores (writes) them in the C1 memory to the C5 memory of the MM array 410.
  • the read / write sequencer 440 first causes the first load / store unit 431 to specify the address of the X1 line and read the image data for one X direction line.
  • the port connection controller 432 is connected to the first load / store unit 431 and a first register that is a register corresponding to the C1 memory in the ring register 433.
  • the ring register 433 is set to output data input from the port connection controller 432 to the memory without performing data movement between the registers.
  • the image data of the X1 line read from the data memory 21 passes through the first registers of the first load / store unit 431 to the port connection controller 432 to the ring register 433, and is temporarily stored in the C1 memory of the MM array 410.
  • the read / write sequencer 440 causes the first load / store unit 431 to specify the address of the X2 line and read the image data for one X direction line.
  • the port connection controller 432 is connected to the first load / store unit 431 and the second register of the ring register 433.
  • the ring register 433 maintains a setting that does not move data between registers.
  • the image data of the X2 line read from the data memory 21 passes through the second registers of the first load / store unit 431 to the port connection controller 432 to the ring register 433, and is temporarily stored in the C2 memory of the MM array 410.
  • the X3 line to X5 line By sequentially switching the connection settings of the port connection controller 432, the image data of the X3 line to X5 line is temporarily stored in the C3 memory to C5 memory of the MM array 410.
  • address information of image data temporarily stored in the C1 memory to the C5 memory of the MM array 410 is temporarily stored as a label.
  • the image data temporarily stored in the C1 memory temporarily stores a label indicating that it is the X1 line image data in the original image data stored in the data memory 21.
  • image data of an arbitrary line can be temporarily stored in the C1 to CN memories of the MM array 410.
  • the X1 line image data can be temporarily stored in the C3 memory
  • the X2 line image data can be temporarily stored in the C6 memory.
  • the above has described the case where the image data stored in the data memory 21 is temporarily stored in the MM array 410.
  • the image data temporarily stored in the MM array 410 is transferred to the DSP 43 or written into the data memory 21.
  • the read / write sequencer 440 sequentially switches the connection between the memory of the MM array 410 to be read and the load / store unit 431 that performs transfer to the DSP 43, and transfers image data in a predetermined range to the DSP 43.
  • the read / write sequencer 440 first causes the first load / store unit 431 to read the image data for one line in the X direction by designating the address of the X1 line.
  • the port connection controller 432 is connected to the first load / store unit 431 and the first register in the ring register 433.
  • the ring register 433 is set so that data is sent and moved between registers each time unit data constituting the X1 line is input.
  • FIG. 5 shows the operation of the ring register 433 when one X-direction line is composed of four unit data.
  • the four unit data of the X1 line read from the data memory 21 are sequentially moved by the ring register 433, and the first unit data of the X1 line is transferred to the fourth register corresponding to the C4 memory.
  • the unit data is moved to the third register corresponding to the C3 memory, the third unit data is moved to the second register corresponding to the C2 memory, and the fourth unit data is moved to the first register corresponding to the C1 memory.
  • the unit data stored in each register is written into the MM array 410.
  • the X1 line is XY-converted to the Y1 line.
  • the read / write sequencer 440 causes the first load / store unit 431 to specify the address of the X2 line and read the image data for one X direction line.
  • the settings of the port connection controller 432 and the ring register 433 are the same.
  • the four unit data of the X2 line read from the data memory 21 are sequentially moved by the ring register 433, the first unit data of the X2 line is transferred to the fourth register, the second unit data is transferred to the third register, The third unit data is moved to the second register, and the fourth unit data is moved to the first register.
  • the unit data stored in each register is written into the MM array 410.
  • the X2 line is XY converted to the Y2 line.
  • each unit data of the Y1 line temporarily stored in the first cell of the C4 memory to the C1 memory is sent and moved to the second cell together with the writing operation of each unit data of the Y2 line.
  • XY lines are similarly converted for the X3 line to X5 line and temporarily stored in the MM array 410 as the Y3 line to Y5 line.
  • information of image data temporarily stored in the C1 memory to the C4 memory of the MM array 410 is temporarily stored as a label.
  • the label indicating that the image data temporarily stored in the C1 memory is the image data corresponding to the Y4 line in the original image data stored in the data memory 21 is temporarily stored. The same applies to the C2 memory to C4 memory.
  • the C1 memory to the C4 memory are simultaneously sent to operate the unit data in each register of the ring register 433. If the four unit data are moved and output from one load / store unit 431, the Y-direction line can be XY-converted into the X-direction line and transferred to the DSP 43 or the like.
  • the compression mode is a processing mode in which image data in which the number of data is reduced by thinning out the image data stored in the data memory is temporarily stored in the MM array 410.
  • the read / write sequencer 440 controls the read / write adjustment circuit 430 as follows. Now, a case where the first load / store unit 431 reads the image data of the X1 line to X5 line and temporarily stores the image data in which the number of data is compressed to 1 ⁇ 4 in the MM array 410 will be described.
  • the read / write sequencer 440 first causes the first load / store unit 431 to specify the address of the X1 line and read the image data for one X direction line.
  • the port connection controller 432 is connected to the first load / store unit 431 and the first register in the ring register 433.
  • the unit data of the X1 line input from the port connection controller 432 to the ring register 433 is 4n ⁇ 2 to 4n (n is an integer of 1 or more)
  • the unit data is input to the ring register 433. Each time it is set to send and move data between registers.
  • the ring register 433 sequentially moves the data between the registers.
  • the unit data is moved to the fourth register
  • the sixth unit data is moved to the third register
  • the seventh unit data is moved to the second register
  • the eighth unit data is moved to the first register.
  • the unit data stored in each register is written into the MM array 410.
  • the first and fifth unit data in the X1 line are the C4 memory
  • the second and sixth unit data are the C3 memory
  • the third and seventh unit data are the C2 memory
  • the fourth and eighth unit data Is temporarily stored in the C1 memory.
  • the 1,5, 9,..., 4mth unit data in the X1 line is stored in the C4 memory of the MM array 410, and the 2,6 in the X1 line is stored in the C3 memory.
  • (4m + 1) th unit data, 3,7,11, ..., (4m + 2) th unit data in the X1 line in the C2 memory, 4,8,12, in the X1 line in the C1 memory ..., (4m + 3) th unit data is temporarily stored.
  • the image data temporarily stored in the C1 memory to the C4 memory is all X1 line image data, but is compressed image data in which the number of data is reduced to 1 ⁇ 4 by skipping four data.
  • the X direction lines after the X2 line can also be executed simultaneously in synchronism with the X1 line.
  • the read / write sequencer 440 causes the second load / store unit 431 to specify the address of the X2 line and read the image data for one X direction line.
  • the port connection controller 432 is connected to the second load / store unit 431 and the fifth register in the ring register 433.
  • the unit data of the X2 line input from the port connection controller 432 to the ring register 433 is 4n-2 to 4n (n is an integer of 1 or more), the unit constituting the X2 line It is set so that data is sent and moved between registers each time data is input.
  • the 1st to 4th unit data in the X2 line are the first unit data in the 8th register, the 2nd unit data in the 7th register, and the 3rd unit data in the 6th register.
  • the fourth unit data is moved to the fifth register.
  • the unit data stored in each register is written into the MM array 410.
  • the first unit data in the X2 line is temporarily stored in the C8 memory
  • the second unit data is the C7 memory
  • the third unit data is the C6 memory
  • the fourth unit data is temporarily stored in the C5 memory.
  • the fifth unit data is stored in the eighth register
  • the sixth unit data is stored in the seventh register
  • the seventh unit data Are moved to the sixth register and the eighth unit data are moved to the fifth register.
  • the unit data stored in each register is written into the MM array 410.
  • the first and fifth unit data in the X2 line are the C8 memory
  • the second and sixth unit data are the C7 memory
  • the third and seventh unit data are the C6 memory
  • the fourth and eighth unit data Is temporarily stored in the C5 memory.
  • the C8 memory of the MM array 410 has 1,5, 9,..., 4mth unit data in the X2 line, and the C7 memory has 2,6,6 in the X2 line.
  • the image data temporarily stored in the C5 memory to the C8 memory is all X2 line image data, but is compressed image data in which the number of data is reduced to 1 ⁇ 4 by skipping four pieces of data.
  • the read / write sequencer 440 performs the same processing for the X3 line to X5 line as well as the same setting as the X1 line and X2 line.
  • the 1st, 5th, 9th,..., 4mth unit data in the X3, X4, and X5 lines are temporarily stored in the C12 memory, C16 memory, and C20 memory, and the C11 memory, C15 memory, and C19 memory.
  • 4m + 1-th unit data in the X3, X4, and X5 lines are temporarily stored.
  • the MM array 410 has the X1-line compressed image data in which the number of data is reduced to 1/4 in the C1 to C4 memories, and the number of data in the C5 to C8 memories is reduced to 1/4.
  • X2 line compressed image data, C9 to C12 memory each reduced the number of data to X3 line compressed image data, ... C17 to C20 memory each reduced the number of data to 1/4 X5 line compressed image data is temporarily stored.
  • the port connection controller 432 causes the first load / store unit 431 and the C1 memory, the second load / store unit 431 and the C5 memory, the third load / store unit 431 and the C9 memory,.
  • the load / store unit 431 and the C17 memory are connected to each other, and the unit marching memory of the C1, C5, C9,..., C17 memory is operated in order and written to the data memory 21, for example, to the original image data.
  • compressed image data in which the number of data is reduced to 1 ⁇ 4 can be created.
  • one end of a plurality of unit marching memory columns arranged in parallel is an input / output port, but one end may be an input port and the other end may be an output port, or both ends may be input / output ports. Also good.
  • the load / store unit 431 and the port connection controller 432 are provided on both sides of the MM array 410, but the ring register 433 may be either one.
  • FIG. 6 is a block diagram schematically showing a signal processing system in the image processing apparatus.
  • the illustrated signal processing system includes an image input system 510, an image processing system 520, an image output system 530, a DRAM 540 connected to the image processing apparatus, an external memory (MM) 550, a storage 560, an external processing system 570, and the like.
  • MM external memory
  • the image processing system 520 includes a CPU 521, a GPU (Graphics Processing Unit) 522, a codec 523, a DRAM controller 524, an external memory controller 525, a storage IP 526, an external processing system IP 527, and the like. System on Chip) configuration.
  • the CPU 521 generally corresponds to the CPU core unit 3
  • the GPU 522 corresponds to the DSP array unit 4
  • the DRAM 540 corresponds to the DRAM 21.
  • the marching memory is suitably applied to the image input system 510, the CPU 521 and the external processing system IP 527 in the image processing system 520, the image output system 530, the external memory 550, the storage 560, the external processing system 570, and the like. can do. That is, the marching memory is suitably applied to the image input system 510 for temporary storage of image data captured by the image sensor and the image output system 530 for temporary storage of image data output from the image processing system 520. can do.
  • a marching memory can be suitably applied to temporary storage of image data when performing face recognition or tracking of a moving subject. The same applies to the external memory 550, the storage 560, the external processing system 570, and the like, and the marching memory can be suitably applied to temporary storage of image data.
  • the data group targeted by the present invention is stream data, and has a predetermined relationship between adjacent data. Therefore, unlike the random data, it is not necessary to perform addressing for each unit data one by one and read and write, and the time required for addressing, data search, etc. can be eliminated.
  • the marching memory a column in which a plurality of cells are connected is used as a unit, and unit data input to the cell is sequentially transferred and temporarily stored in each cell. At this time, the input and movement speed of the unit data can be made to correspond to the reference clock of the CPU, and the writing and reading operations can be performed at a high speed. Therefore, according to the present invention, it is possible to provide a suitable application utilizing the characteristic operation form of the marching memory.

Abstract

An application of an embodiment showing an example of the present invention is an information processing device equipped with a data memory (21), which temporarily stores a stream of data comprising a plurality of unit data; and a processor (43) that has a computation unit (220) and performs prescribed information processing on stream data that has been read from the data memory. This information processing device (200) is configured such that a matching memory (210), which has columns wherein a plurality of storage regions are consecutively provided, is provided between the data memory (21) and the computation unit (220).

Description

情報処理装置、デジタルカメラおよびプロセッサInformation processing apparatus, digital camera and processor
 本発明は、複数の単位データからなるストリームデータを一時記憶するデータメモリと、データメモリから読み出したストリームデータに所定の情報処理を行うプロセッサとを備えた情報処理装置に関する。また、この情報処理装置を備えたデジタルカメラに関する。 The present invention relates to an information processing apparatus including a data memory that temporarily stores stream data including a plurality of unit data, and a processor that performs predetermined information processing on stream data read from the data memory. The present invention also relates to a digital camera equipped with this information processing apparatus.
 上記のような情報処理装置の一つとして画像処理装置がある。画像処理装置は、デジタルカメラやビデオカメラ、これらカメラの記録媒体に記録された画像データを取り扱うコンピュータ、プリンタ等に幅広く用いられている。近年では、画像の高精細化や高画質化などの要請から画像1枚(1フレーム)当たりの画像データの容量が大幅に増加し、さらに高フレーム化の要請が加わって、単位時間当たりに記録される画像データの容量が著しく増加している。このように巨大化した画像データを、撮影者等にストレスを与えることなく画像処理するため、画像処理装置には処理速度の高速化が求められている。 There is an image processing apparatus as one of the information processing apparatuses as described above. Image processing apparatuses are widely used in digital cameras, video cameras, computers, printers, and the like that handle image data recorded on recording media of these cameras. In recent years, the volume of image data per image (one frame) has increased significantly due to demands for higher definition and higher image quality, and recording has been performed per unit time due to the demand for higher frames. The amount of image data to be processed has increased remarkably. In order to perform image processing on image data that has become enormous in this way without causing stress on the photographer or the like, the image processing apparatus is required to increase the processing speed.
 画像処理を高速化する手段として種々の構成の画像処理装置が提案されている。例えば、コンパクトフラッシュ(登録商標)カードやSDカード(登録商標)などの外部記録媒体に記録された複数の画像データから、所定の(例えば画像1枚分または1フレーム分の)画像データを読み出して一時記憶するDRAM等のデータメモリと、データメモリに一時記憶された画像データに対して所定の画像処理を行う画像処理プロセッサとを備える画像処理装置において、データメモリと画像処理プロセッサとの間に、データメモリから画像データの一部を読み出して一時記憶するリードキャッシュメモリと、画像処理プロセッサにより処理された上記一部の画像データを一時記憶するライトキャッシュメモリとを設けた画像処理装置が知られている。 Image processing apparatuses having various configurations have been proposed as means for speeding up image processing. For example, predetermined (for example, one image or one frame) image data is read from a plurality of image data recorded on an external recording medium such as a CompactFlash (registered trademark) card or an SD card (registered trademark). In an image processing apparatus including a data memory such as a DRAM that temporarily stores data and an image processing processor that performs predetermined image processing on image data temporarily stored in the data memory, between the data memory and the image processing processor, There is known an image processing apparatus provided with a read cache memory that reads a part of image data from a data memory and temporarily stores it, and a write cache memory that temporarily stores the part of the image data processed by the image processor. Yes.
 また、画像データに対する演算処理そのものを高速化する技術として、単一命令複数データ流(SIMD:Single Instruction Multiple Date stream)の演算方式を採用した画像処理装置が提案されている(例えば特許文献1を参照)。この技術は、演算処理を実行する演算器を複数並列に設け、これらを同時に並行して動作させることにより、処理速度の向上を図るものである。 Further, as a technique for speeding up the arithmetic processing for image data itself, an image processing apparatus that employs a single instruction multiple data stream (SIMD: Single Instruction Multiple Date stream) has been proposed (for example, Patent Document 1). reference). In this technique, a plurality of arithmetic units that execute arithmetic processing are provided in parallel, and these are simultaneously operated in parallel, thereby improving the processing speed.
特開2002-358288号公報JP 2002-358288 A
 一方、データを記憶する記憶手段として、新規なメモリ技術の研究が進展しており、その一つとしてマーチングメモリ(Marching Memory)がある。マーチングメモリは、半導体チップのマイクロアーキテクチャに関する国際的なシンポジウムである、IEEE/ACMの第43回国際アニュアルシンポジウム(MICRO-43)と併せて開催された、第6回ユニークなチップおよびシステムに関するワークショップUCAS-6で発表された。その発表内容は、予稿集である以下の文献に記載されている。
T.Watanabe and M.J.Flynn,"Marching Memory:designing computers to avoid the Memory Bottleneck",Workshop on Unique Chips and Systems,UCAS-6,December 2010 Atlanta,GA,pp44-47.
On the other hand, research on a new memory technology is progressing as a storage means for storing data, and one of them is a marching memory. Marching Memory is the 6th Workshop on Unique Chips and Systems held in conjunction with the 43rd International Annual Symposium (MICRO-43) of IEEE / ACM, an international symposium on semiconductor chip microarchitecture It was announced at UCAS-6. The contents of the announcement are described in the following documents which are the proceedings.
T. Watanabe and MJFlynn, "Marching Memory: designing computers to avoid the Memory Bottleneck", Workshop on Unique Chips and Systems, UCAS-6, December 2010 Atlanta, GA, pp44-47.
 上記文献に示された内容から、マーチングメモリの基本的な構成および動作の態様を概説すると以下のようになる。マーチングメモリは、複数の記憶領域(セルという)が連設されたカラムを一つの単位として構成される。カラムの両端は、設定に応じてデータ(または命令)の入力ポートおよび出力ポートとなり、入力ポートに設定されたカラムの端部に順次データが入力される。入力ポートに入力された複数のデータ等は、順次隣接するセルに送り移動されて各セルに一時記憶される。 From the contents shown in the above document, the basic configuration and operation mode of the marching memory is outlined as follows. The marching memory is configured with a column in which a plurality of storage areas (referred to as cells) are continuously arranged as one unit. Both ends of the column become data (or instruction) input ports and output ports according to the setting, and data is sequentially input to the end of the column set as the input port. A plurality of data and the like input to the input port are sequentially transferred to adjacent cells and temporarily stored in each cell.
 例えば、5個のセル(記憶領域)が左右方向に連接されたカラムを有するマーチングメモリにおいて、カラムの左端側を入力ポートとし、入力ポートに1,2,3,4,5の5つのデータが順に入力される場合について説明する。この場合において、最初のデータである1が入力されたときに左端のセルに1が一時記憶される。他のセルはリセット状態(またはこの5つのデータが入力される直前に入力された4つのデータが一時記憶された状態)である。2番目のデータ2が入力されると、それまで左端のセルに記憶されていたデータである1は左端から2番目のセルに送り移動され、左端のセルには新たに入力されたデータの2が一時記憶される。以下同様に入力されたデータが順送りされて各セルに記憶され、5つのデータが全て入力されると、カラムの右端側から左端側に向けて各セルに1,2,3,4,5の5つのデータが一時記憶された状態になる。 For example, in a marching memory having a column in which five cells (storage areas) are connected in the left-right direction, the left end side of the column is an input port, and five data of 1, 2, 3, 4, and 5 are stored in the input port. The case of inputting in order will be described. In this case, 1 is temporarily stored in the leftmost cell when 1 as the first data is input. The other cells are in a reset state (or a state in which the four data inputted immediately before these five data are inputted are temporarily stored). When the second data 2 is input, the data 1 that has been stored in the leftmost cell until then is sent to the second cell from the left end and moved to the second cell, and the leftmost cell contains 2 of the newly input data. Is temporarily stored. In the same manner, the input data is sequentially transferred and stored in each cell, and when all five data are input, 1, 2, 3, 4, and 5 are input to each cell from the right end side to the left end side of the column. Five data are temporarily stored.
 そのため、カラムの右端側を出力ポートとして設定した場合には、マーチングメモリに対して一時記憶した5つのデータの出力指令が出されたとき、または新たなデータが入力されるごとに、一時記憶されたデータが1,2,3,4,5の順に出力ポートから出力される。一方、カラムの右端部を入力ポートおよび出力ポートとした場合には、マーチングメモリに対して一時記憶した5つのデータの出力指令が出されたときに、一時記憶されたデータが5,4,3,2,1の順に出力ポートから出力される。 Therefore, when the right end side of the column is set as an output port, it is temporarily stored when an output command for five data temporarily stored in the marching memory is issued or whenever new data is input. The data is output from the output port in the order of 1, 2, 3, 4, and 5. On the other hand, when the right end of the column is an input port and an output port, when an output command for five data temporarily stored in the marching memory is issued, the temporarily stored data is 5, 4, 3 , 2 and 1 in this order.
 本発明は、このような新規なメモリ技術であるマーチングメモリについて、その特長的な動作形態を活かした好適なアプリケーションを提供することを目的とする。 The object of the present invention is to provide a suitable application utilizing the characteristic operation mode of marching memory, which is such a novel memory technology.
 本発明を例示する態様のアプリケーションは、複数の単位データからなるストリームデータ(例えば、実施形態における画像データ)を一時記憶するデータメモリと、演算器を有しデータメモリから読み出したストリームデータに所定の情報処理を行うプロセッサ(例えば、実施形態におけるデジタル信号プロセッサ)とを備えた情報処理装置である。そのうえで、第1の態様の情報処理装置は、プロセッサのレジスタファイルとしてマーチングメモリを備える。マーチングメモリは、複数の記憶領域が連設されたカラムを有し入力された複数の単位データをカラムの一端から順次隣接する記憶領域に送り移動させて各記憶領域に一時記憶する単位マーチングメモリが複数設けられて構成される。そして、マーチングメモリは、入力された複数の単位データを第1の単位マーチングメモリの各記憶領域に一時記憶し、プロセッサは、バッチ演算処理命令に基づいて、第1の単位マーチングメモリの各記憶領域に一時記憶された複数の単位データを、演算器により順次演算処理して処理後の各単位データを第2の単位マーチングメモリの各記憶領域に一時記憶させるように構成される。 An application according to an aspect exemplifying the present invention includes a data memory that temporarily stores stream data composed of a plurality of unit data (for example, image data in the embodiment) and a stream data read from the data memory having a computing unit. An information processing apparatus including a processor (for example, a digital signal processor in the embodiment) that performs information processing. In addition, the information processing apparatus according to the first aspect includes a marching memory as a register file of the processor. The marching memory is a unit marching memory that has a column in which a plurality of storage areas are connected in series, and that temporarily transfers a plurality of input unit data from one end of the column to adjacent storage areas and temporarily stores them in each storage area. A plurality are provided. The marching memory temporarily stores a plurality of input unit data in each storage area of the first unit marching memory, and the processor stores each storage area of the first unit marching memory based on the batch calculation processing instruction. A plurality of unit data temporarily stored in the storage unit are sequentially calculated by an arithmetic unit, and the processed unit data are temporarily stored in each storage area of the second unit marching memory.
 ここで、本明細書における「ストリームデータ」とは、複数の単位データからなるデータ群であり、隣接する単位データが、空間的および/または時間的に相互に所定の関連性をもっているようなデータの集合体をいう。具体的には、デジタルカメラやデジタルビデオカメラ等で撮影された静止画や動画の画像データ、これらのカメラやICレコーダ等により録音された音声データ、各種測定器により測定された時間的・空間的な測定データなどがストリームデータの例として示される。 Here, “stream data” in this specification is a data group composed of a plurality of unit data, and data in which adjacent unit data have a predetermined relationship with each other spatially and / or temporally. An aggregate of Specifically, image data of still images and moving images taken by digital cameras and digital video cameras, audio data recorded by these cameras and IC recorders, and temporal and spatial data measured by various measuring instruments An example of stream data is shown as an example of stream data.
 例えば、デジタルカメラによって撮影された静止画の画像データは、撮像素子を構成する各画素の検出信号に基づいて生成された単位データの集合体(データ群)である。このデータ群を構成する単位データは、データ間に関連性がないランダムデータではなく、所定範囲で空間的な関連性を有するデータである。この空間的な関連性に基づいて、隣接する単位データ間では、明度や彩度等の特徴量についてある一定の関連性を有し、例えば特徴量が滑らかに変化している。また、静止画が連写された場合の画像データや動画の画像データでは、1フレーム内で空間的に隣接する単位データについて上記のような空間的な関係を有することに加えて、前後のフレーム間で時間的な関連性を有している。ICレコーダ等により録音された音声データも同様であり、周波数や強度が時間軸方向に滑らかに変化する単位データの集合体である。 For example, image data of a still image taken by a digital camera is an aggregate (data group) of unit data generated based on the detection signal of each pixel constituting the image sensor. The unit data constituting this data group is not random data having no relationship between data but data having a spatial relationship within a predetermined range. Based on this spatial relevance, adjacent unit data has a certain relevance with respect to feature quantities such as lightness and saturation, and the feature quantities change smoothly, for example. In addition, in the case of image data or still image data when still images are continuously shot, in addition to having the spatial relationship as described above with respect to unit data spatially adjacent within one frame, There is a temporal relationship between them. The same applies to audio data recorded by an IC recorder or the like, and is an aggregate of unit data whose frequency and intensity change smoothly in the time axis direction.
 第2の態様の情報処理装置は、データメモリと複数のプロセッサとの間に、共有メモリとしてマーチングメモリを備える。マーチングメモリは、複数の記憶領域が連設されたカラムを有し入力された複数の単位データをカラムの一端から順次隣接する記憶領域に送り移動させて各記憶領域に一時記憶する単位マーチングメモリが複数並列に設けられて構成される。 The information processing apparatus according to the second aspect includes a marching memory as a shared memory between the data memory and the plurality of processors. The marching memory is a unit marching memory that has a column in which a plurality of storage areas are connected in series, and that temporarily transfers a plurality of input unit data from one end of the column to adjacent storage areas and temporarily stores them in each storage area. A plurality of units are provided in parallel.
 このとき、前記マーチングメモリにおける各単位マーチングメモリの入力ポートおよび/または出力ポートには、入力された複数の単位データの配列および/または出力する複数の単位データの配列を変更する配列変更手段(例えば、実施形態におけるリングレジスタ)を設けて情報処理装置を構成することができる。 At this time, in the input port and / or the output port of each unit marching memory in the marching memory, an arrangement changing unit (for example, changing the arrangement of a plurality of unit data inputted and / or the arrangement of a plurality of unit data outputted) The ring register in the embodiment can be provided to constitute the information processing apparatus.
 また、前記配列変更手段は、各前記単位マーチングメモリの入力ポートおよび/または出力ポートに接続されたレジスタ群からなるリングレジスタと、このリングレジスタの作動を制御するシーケンサ(例えば、実施形態におけるリード/ライトシーケンサ)とを有し、シーケンサが、情報処理装置が実行する情報処理のモードに応じてリングレジスタの作動を制御し、複数の単位データの配列を変更するように構成しても良い。 The arrangement changing means includes a ring register composed of a register group connected to an input port and / or an output port of each unit marching memory, and a sequencer for controlling the operation of the ring register (for example, read / write in the embodiment). The sequencer may be configured to control the operation of the ring register in accordance with the information processing mode executed by the information processing apparatus and change the arrangement of the plurality of unit data.
 ここで、情報処理のモードは、情報処理装置が適用されるシステムに応じて適宜に設定することができる。例えば、情報処理装置をデジタルカメラに適用する場合の代表例として、画像圧縮モードや垂直-水平変換モード、デコンボリューションモードなどが例示される。画像圧縮モードについても既に公知の複数のモードが考えられるが、例えば、データメモリに記憶された全画素の画像データ(データ容量が大きいストリームデータ)から、画素数を間引くことにより圧縮した画像データ(データ容量を縮小したストリームデータ)に変関するようなモードが例示される。 Here, the information processing mode can be appropriately set according to the system to which the information processing apparatus is applied. For example, as a typical example when the information processing apparatus is applied to a digital camera, an image compression mode, a vertical-horizontal conversion mode, a deconvolution mode, and the like are exemplified. As the image compression mode, a plurality of known modes can be considered.For example, image data compressed by thinning out the number of pixels from image data of all pixels stored in the data memory (stream data having a large data capacity) ( An example is a mode that changes to stream data with a reduced data capacity.
 第3の態様の情報処理装置は、プロセッサのバッファメモリとしてマーチングメモリを備える。マーチングメモリは、複数の記憶領域が連設されたカラムを有し入力された複数の単位データをカラムの一端から順次隣接する記憶領域に送り移動させて各記憶領域に一時記憶する単位マーチングメモリが複数並列に設けられて構成される。 The information processing apparatus according to the third aspect includes a marching memory as a buffer memory of the processor. The marching memory is a unit marching memory that has a column in which a plurality of storage areas are connected in series, and that temporarily transfers a plurality of input unit data from one end of the column to adjacent storage areas and temporarily stores them in each storage area. A plurality of units are provided in parallel.
 本発明を例示する他の態様は、以上いずれかに記載の情報処理装置と、撮像素子を有し前記情報処理装置にストリームデータである画像データを入力する画像入力系と、情報処理装置により処理された画像データを出力する画像出力系とを備えたデジタルカメラである。 According to another aspect of the present invention, there is provided an information processing apparatus according to any one of the above, an image input system having an image sensor and inputting image data as stream data to the information processing apparatus, and processing by the information processing apparatus. A digital camera provided with an image output system for outputting the processed image data.
 前述したように、マーチングメモリは複数の記憶領域(セル)が連設されたカラムを単位とし、入力ポートに入力された単位データが順次隣接する記憶領域に送り移動されて各セルに一時記憶される。このとき、単位データの入力および移動速度は、例えばCPUの基準クロックに対応させることもでき、高速の読み取り動作が可能である。また、本発明の態様の情報処理装置が対象とするデータ群は、複数の単位データからなるストリームデータであり、隣接するデータ間で所定の関連性を有するデータ群である。すなわち、読み込む単位データごとに逐一アドレッシングが必要なランダムデータと異なり、マーチングメモリの動作形態に合致したデータ群なのである。 As described above, the marching memory uses a column in which a plurality of storage areas (cells) are connected as a unit, and unit data input to the input port is sequentially transferred to adjacent storage areas and temporarily stored in each cell. The At this time, the input and movement speed of the unit data can correspond to, for example, a CPU reference clock, and a high-speed reading operation is possible. The data group targeted by the information processing apparatus according to the aspect of the present invention is stream data composed of a plurality of unit data, and is a data group having a predetermined relationship between adjacent data. That is, unlike random data that requires addressing for each unit data to be read, it is a data group that matches the operation mode of the marching memory.
 従って、本発明の態様の情報処理装置および情報処理装置を備えたデジタルカメラによれば、マーチングメモリの特長的な動作形態を活かした好適なアプリケーションを提供することができる。 Therefore, according to the information processing apparatus of the aspect of the present invention and the digital camera including the information processing apparatus, it is possible to provide a suitable application utilizing the characteristic operation form of the marching memory.
本発明の適用例として示す、デジタルカメラにおける信号処理系のアーキテクチャを例示するブロック図である。It is a block diagram which illustrates the architecture of the signal processing system in a digital camera shown as an example of application of this invention. 第1実施形態における第1の態様の画像処理装置の概要のブロック図である。It is a block diagram of the outline | summary of the image processing apparatus of the 1st aspect in 1st Embodiment. 第1実施形態における第2の態様の画像処理装置の概要のブロック図である。It is a block diagram of the outline | summary of the image processing apparatus of the 2nd aspect in 1st Embodiment. 第2実施形態の画像処理装置の概要のブロック図である。It is a block diagram of the outline | summary of the image processing apparatus of 2nd Embodiment. 第2実施形態の画像処理装置におけるリングレジスタの作用を説明するための説明図である。It is explanatory drawing for demonstrating the effect | action of the ring register in the image processing apparatus of 2nd Embodiment. 本発明の他の適用例を説明するための、画像処理装置における信号の処理系統を大まかにまとめたブロック図である。It is a block diagram which put together roughly the signal processing system in an image processing device for explaining other examples of application of the present invention.
 以下、本発明を実施するための形態について、図面を参照しながら説明する。本発明は様々な情報を処理する情報処理装置に適用可能であるが、本実施形態では、一例として本発明をデジタルカメラの画像処理装置に適用した場合について説明する。デジタルカメラにおける信号処理系のアーキテクチャを例示するブロック図を図1に示す。 Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings. The present invention can be applied to an information processing apparatus that processes various types of information. In the present embodiment, a case where the present invention is applied to an image processing apparatus of a digital camera will be described as an example. A block diagram illustrating the architecture of a signal processing system in a digital camera is shown in FIG.
 この信号処理系は、画像入出力部1、データメモリ部2、CPUコア部3、DSPアレイ部4、CPU/DSP間同期通信機構部5、および各部を繋ぐバス61~66などから構成される。 This signal processing system includes an image input / output unit 1, a data memory unit 2, a CPU core unit 3, a DSP array unit 4, a CPU / DSP synchronous communication mechanism unit 5, and buses 61 to 66 that connect the respective units. .
 画像入出力部1は、いずれも図示を省略する画像入力系や画像出力系、CFカードやSDカード(登録商標)などの外部記録媒体との間で、画像データの入出力を制御する制御部である。画像入出力部1には、上記各部と画像データのやりとりを実行するI/O回路11、画像データのコーデックやデコードを実行する機能ブロック(IP)12などが設けられ、ストリームI/Oバス61を介して画像入力系、画像出力系、外部記録媒体と繋がっている。 The image input / output unit 1 is a control unit that controls input / output of image data to / from an external recording medium such as an image input system, an image output system, a CF card, or an SD card (registered trademark) (not shown). It is. The image input / output unit 1 is provided with an I / O circuit 11 that exchanges image data with the above-described units, a function block (IP) 12 that executes codec and decoding of image data, and the like, and a stream I / O bus 61. Via an image input system, an image output system, and an external recording medium.
 なお、画像入力系は、デジタルカメラにより種々のフォーマットで撮影された画像(静止画または動画)の原データを入力する系であり、例えば、CMOSあるいはCCD等の撮像素子と、撮像素子に被写体の像を結像する撮像レンズとを備えて構成される。また、画像出力系は、画像処理装置により処理された画像データを液晶表示パネルや外部出力端子に出力する系である(図6を参照)。 The image input system is a system for inputting original data of images (still images or moving images) taken in various formats by a digital camera. For example, an image sensor such as a CMOS or a CCD, and a subject on the image sensor. And an imaging lens that forms an image. The image output system is a system for outputting image data processed by the image processing apparatus to a liquid crystal display panel or an external output terminal (see FIG. 6).
 画像入出力部1は、ストリームI/Oバス61およびバスブリッジを介してメインバス62に接続される。メインバス62には、データメモリ部2、CPUコア部3、DSPアレイ部4が接続されている。すなわち、メインバス62には、データメモリ部2が接続される他、バスブリッジを介してCPUコア部3のCPUデータバス63が接続され、同様にバスブリッジを介してDSPアレイ部4のDSPデータバス(ストリームバス)64が接続されている。なお、ストリームI/Oバス61とDSPデータバス64とは、バスブリッジを介して直接的にも接続されている。 The image input / output unit 1 is connected to the main bus 62 via a stream I / O bus 61 and a bus bridge. A data memory unit 2, a CPU core unit 3, and a DSP array unit 4 are connected to the main bus 62. That is, in addition to the data memory unit 2 being connected to the main bus 62, the CPU data bus 63 of the CPU core unit 3 is connected via a bus bridge, and the DSP data of the DSP array unit 4 is similarly connected via the bus bridge. A bus (stream bus) 64 is connected. The stream I / O bus 61 and the DSP data bus 64 are also directly connected via a bus bridge.
 データメモリ部2は、データメモリ21とメモリコントローラ22とを備えて構成される。データメモリ21は、画像入力系や外部記録媒体等から画像入出力部1を介して入力された画像データを一時記憶する記憶素子であり、例えば記憶量がギガバイトオーダのDRAM(Dynamic Random Access Memory)が用いられる。メモリコントローラ22は、データメモリ21への画像データの書き込みや、データメモリ21に一時記憶された画像データの読み出しを制御する制御回路である。 The data memory unit 2 includes a data memory 21 and a memory controller 22. The data memory 21 is a storage element that temporarily stores image data input via the image input / output unit 1 from an image input system or an external recording medium. For example, a DRAM (Dynamic Random Access Memory) having a storage capacity of the order of gigabytes. Is used. The memory controller 22 is a control circuit that controls the writing of image data to the data memory 21 and the reading of image data temporarily stored in the data memory 21.
 CPUコア部3は、予め設定記憶されたプログラムに基づいてデジタルカメラの作動を制御する制御部であり、図1では2つの処理回路を並列に設けた並列演算モジュールの構成例を示す。CPUコア部3は、命令RAM(Instruction RAM)31と、命令RAM31にCPU命令バス65を介して並列接続された処理回路と、各処理回路にCPUデータバス63を介して接続されたDMAC(Direct Memory Access Controller)35やSRAM(Static Random Access Memory)36等とを有して構成される。命令RAM31は、複数の処理命令からなるプログラムが予め設定記憶されたRAM(Random Access Memory)である。 The CPU core unit 3 is a control unit that controls the operation of the digital camera based on a program set and stored in advance, and FIG. 1 shows a configuration example of a parallel arithmetic module in which two processing circuits are provided in parallel. The CPU core unit 3 includes an instruction RAM (Instruction RAM) 31, a processing circuit connected in parallel to the instruction RAM 31 via a CPU instruction bus 65, and a DMAC (Direct that is connected to each processing circuit via a CPU data bus 63. It is configured to have a Memory (Access Controller) 35, SRAM (Static Random Access Memory) 36, etc. The instruction RAM 31 is a RAM (Random Access Memory) in which a program composed of a plurality of processing instructions is set and stored in advance.
 並列に設けられた各処理回路は、プログラムの各ステップの処理命令を一時保持する命令キャッシュ32、処理命令を実行するCPU33、処理命令を実行する際に参照される演算データを一時保持するデータキャッシュ34を有して構成される。命令キャッシュ32は、CPU命令バス65を介して命令RAM31に接続されており、命令RAM31に記憶されたプログラムの処理命令が、処理ステップの進行とともに読み出されて命令キャッシュ32に一時保持される。データキャッシュ34はCPUデータバス63に接続されており、このバスに接続されたDMAC35やSRAM36等から、CPU33が処理を実行する際に参照する演算データが、処理ステップの進行とともに読み出されてデータキャッシュ34に一時保持される。CPU33は、命令キャッシュ32に一時保持された処理命令に基づき、データキャッシュ34に一時保持された演算データを参照して処理を実行する。 Each processing circuit provided in parallel includes an instruction cache 32 that temporarily holds a processing instruction of each step of the program, a CPU 33 that executes the processing instruction, and a data cache that temporarily holds arithmetic data referred to when the processing instruction is executed. 34. The instruction cache 32 is connected to the instruction RAM 31 via the CPU instruction bus 65, and the processing instructions of the program stored in the instruction RAM 31 are read as the processing steps progress and are temporarily held in the instruction cache 32. The data cache 34 is connected to a CPU data bus 63, and arithmetic data to be referred to when the CPU 33 executes processing is read from the DMAC 35, SRAM 36, etc. connected to this bus as the processing steps progress. It is temporarily held in the cache 34. Based on the processing instruction temporarily held in the instruction cache 32, the CPU 33 refers to the operation data temporarily held in the data cache 34 and executes processing.
 DSPアレイ部4は、予め設定記憶されたプログラムに基づいて所定の画像処理を行う画像処理装置である。図1には、複数のデジタル信号プロセッサ(DSP:Digital Signal Processor)43,43・・・を並列に設けた並列処理モジュールの構成例を示す。また、図1には、複数のデジタル信号プロセッサ43,43・・・について共通の共有メモリ44を設けた構成例を示す。 The DSP array unit 4 is an image processing apparatus that performs predetermined image processing based on a program set and stored in advance. FIG. 1 shows a configuration example of a parallel processing module in which a plurality of digital signal processors (DSPs) 43, 43... Are provided in parallel. 1 shows a configuration example in which a common shared memory 44 is provided for a plurality of digital signal processors 43, 43...
 DSPアレイ部4は、複数の処理命令からなるプログラムが予め設定記憶された命令RAM41と、命令RAM41にDSP命令バス66を介して並列接続された画像処理回路と、画像処理回路にDSPデータバス64を介して接続されたDMAC45やSRAM46等とを有して構成される。 The DSP array unit 4 includes an instruction RAM 41 in which a program including a plurality of processing instructions is set and stored in advance, an image processing circuit connected in parallel to the instruction RAM 41 via a DSP instruction bus 66, and a DSP data bus 64 connected to the image processing circuit. And a DMAC 45, an SRAM 46, and the like connected via each other.
 並列に設けられた画像処理回路は、各々、処理命令を一時保持する命令キャッシュ(I$)42と、処理命令を実行するデジタル信号プロセッサ(以下、適宜DSPと略記する)43とを有し、複数のDSP43,43・・・が処理を分担して並列に画像処理を実行する。命令キャッシュ42,42・・・は、DSP命令バス66を介して命令RAM41に並列接続されており、各命令キャッシュ42には各DSP43が分担処理すべき処理命令が読み出されて一時保持される。DSP43,43・・・は共有メモリ44に接続され、各DSPが共有メモリ44から画像データを読み出し、あるいは処理した画像データを共有メモリ44に書き込み可能になっている。 Each of the image processing circuits provided in parallel has an instruction cache (I $) 42 that temporarily holds processing instructions, and a digital signal processor (hereinafter abbreviated as DSP) 43 that executes the processing instructions, A plurality of DSPs 43, 43... Share processing and execute image processing in parallel. The instruction caches 42, 42... Are connected in parallel to the instruction RAM 41 via the DSP instruction bus 66, and processing instructions to be shared by the DSPs 43 are read and temporarily stored in the instruction caches 42. . DSP 43, 43... Are connected to a shared memory 44, and each DSP can read image data from the shared memory 44 or write processed image data to the shared memory 44.
 また、共有メモリ44は、DSPデータバス64に接続され、バスブリッジを介してメインバス62およびストリームI/Oバス61に接続されている。そのため、共有メモリ44は、これらのバスを介してデータメモリ21やDMAC45、SRAM46、画像入力系等との間で画像データのやりとりが可能になっている。 The shared memory 44 is connected to the DSP data bus 64 and is connected to the main bus 62 and the stream I / O bus 61 via a bus bridge. Therefore, the shared memory 44 can exchange image data with the data memory 21, the DMAC 45, the SRAM 46, the image input system, and the like via these buses.
 共有メモリ44は、命令RAM41から読み出した処理命令に基づいて、上記各部(例えばデータメモリ21)から所定の画像データを読み出し、メモリ内に一時記憶する。DSP43,43・・・は、命令RAM41から読み出して命令キャッシュ42,42・・・に一時保持された処理命令に基づいて、共有メモリ44から各自が分担する画像データの一部を読み出し、それぞれ画像処理を実行する。DSP43,43・・・により画像処理された画像データは共有メモリ44に一時記憶され、逐次データメモリ21に転送されて記憶される。 The shared memory 44 reads predetermined image data from the above-described units (for example, the data memory 21) based on the processing instruction read from the instruction RAM 41, and temporarily stores it in the memory. The DSPs 43, 43... Read out a part of the image data shared by them from the shared memory 44 based on the processing instructions read from the instruction RAM 41 and temporarily held in the instruction caches 42, 42. Execute the process. The image data processed by the DSPs 43, 43... Is temporarily stored in the shared memory 44, and sequentially transferred to the data memory 21 for storage.
 CPU/DSP間同期通信機構部5は、CPUコア部3とDSPアレイ部4との間で、相互に実行する処理のタイミングを調整する機構であり、CPU33とDSP43との間に設けられた同期制御割り込みコントローラ51と、CPUデータバス63とDSPデータバス64との間に設けられた共有RAM52とを有して構成される。例えば、CPU33は、プログラムの実行中に画像処理が必要なプロセスが生じたときに、共有RAM52に処理すべき画像データのアドレスや処理内容等のデータを書き込み、同期制御割り込みコントローラ51を介してDSP43に割り込み処理の実行指令を出力する。DSP43は、指定された画像データを読み出して画像処理を実行し、処理後の画像データのアドレス等を共有RAM52に書き込むとともに、同期制御割り込みコントローラ51を介してCPU33に割り込み処理が完了した旨の信号を出力する。これにより、CPUコア部3とDSPアレイ部4とで効率的に並列処理が行われる。 The CPU / DSP synchronous communication mechanism unit 5 is a mechanism that adjusts the timing of processing executed between the CPU core unit 3 and the DSP array unit 4, and is provided between the CPU 33 and the DSP 43. The control interrupt controller 51 includes a shared RAM 52 provided between the CPU data bus 63 and the DSP data bus 64. For example, when a process that requires image processing occurs during execution of the program, the CPU 33 writes data such as the address of image data to be processed and the processing content to the shared RAM 52, and the DSP 43 via the synchronization control interrupt controller 51. The interrupt processing execution command is output to. The DSP 43 reads the designated image data, executes image processing, writes the address of the processed image data into the shared RAM 52, and signals to the CPU 33 that the interrupt processing has been completed via the synchronous control interrupt controller 51. Is output. Thereby, the CPU core unit 3 and the DSP array unit 4 perform parallel processing efficiently.
 以上のように概要構成されるデジタルカメラの信号処理系にあって、外部記録媒体(不図示)に記録された画像データや、外部記録媒体から読み出されてデータメモリ21に一時記憶される画像データは、画像を構成する多数の単位データからなるデータ群であり、隣接する単位データが空間的に相互に所定の関連性をもったデータの集合体、すなわちストリームデータである。また、画像データに所定の画像処理を行うDSP43は、プロセッサ内部に算術論理ユニット(ALU:Arithmetic Logic Unit)、あるいは浮動小点演算ユニット(FPU:Floating Point number processing Unit)等の演算器を有して構成される。 In the signal processing system of the digital camera schematically structured as described above, image data recorded on an external recording medium (not shown) or an image read from the external recording medium and temporarily stored in the data memory 21 The data is a data group composed of a large number of unit data constituting an image, and is an aggregate of data in which adjacent unit data has a predetermined relationship spatially, that is, stream data. The DSP 43 that performs predetermined image processing on image data has an arithmetic unit such as an arithmetic logic unit (ALU) or a floating point number processing unit (FPU) in the processor. Configured.
 そして、本デジタルカメラの信号処理系においては、画像データが一時記憶された外部記録媒体(不図示)やデータメモリ21と、DSP43,43・・・の演算器との間に、複数の記憶領域が連設されたカラムを有するマーチングメモリが設けられて構成される。以下、マーチングメモリを用いた画像処理装置の具体的な態様について詳細に説明する。 In the signal processing system of the present digital camera, a plurality of storage areas are provided between an external recording medium (not shown) in which image data is temporarily stored, the data memory 21, and the arithmetic units of the DSPs 43, 43. Is provided with a marching memory having columns connected to each other. Hereinafter, specific modes of the image processing apparatus using the marching memory will be described in detail.
 第1実施形態の画像処理装置は、DSP43にマーチングメモリを備える、すなわち、マーチングメモリをDSP43に内蔵して一体に構成される。本形態に含まれる第1の態様の画像処理装置100について、その概要のブロック図である図2を参照して説明する。なお、図2は、画像処理装置100におけるDSP(デジタル信号プロセッサ)43を主体として示すブロック図であり、データメモリ部2におけるメモリコントローラ22や、DSPアレイ部4における命令RAM41、命令キャッシュ42、共有メモリ44等の記載を省略している。 The image processing apparatus according to the first embodiment includes a marching memory in the DSP 43, that is, the marching memory is built in the DSP 43 and is integrally configured. An image processing apparatus 100 according to a first aspect included in this embodiment will be described with reference to FIG. FIG. 2 is a block diagram mainly showing a DSP (digital signal processor) 43 in the image processing apparatus 100. The memory controller 22 in the data memory unit 2, the instruction RAM 41 in the DSP array unit 4, the instruction cache 42, and the sharing are shown in FIG. Description of the memory 44 and the like is omitted.
 画像処理装置100は、画像データを一時記憶するデータメモリ21と、データメモリ21から読み出した画像データに所定の情報処理を行うDSP43とを備える。本態様のDSP43は、演算処理に必要なデータを一時保持するレジスタファイル110と、レジスタファイル110に保持されたデータを用いて演算処理を実行するALUあるいはFPU等の演算器120と、演算結果を一時的に保持するアキュムレータ130と、データメモリ21とレジスタファイル110との間に設けたロードバッファ151およびストアバッファ152からなるバッファメモリ150とを備えて構成される。そして、このロードバッファ151およびストアバッファ152としてマーチングメモリが用いられる。 The image processing apparatus 100 includes a data memory 21 that temporarily stores image data, and a DSP 43 that performs predetermined information processing on the image data read from the data memory 21. The DSP 43 of this aspect includes a register file 110 that temporarily holds data necessary for arithmetic processing, an arithmetic unit 120 such as an ALU or FPU that executes arithmetic processing using the data held in the register file 110, and an arithmetic result. The accumulator 130 temporarily holds, and a buffer memory 150 including a load buffer 151 and a store buffer 152 provided between the data memory 21 and the register file 110. A marching memory is used as the load buffer 151 and the store buffer 152.
 ロードバッファ151およびストアバッファ152は、複数のセル(記憶領域)が行方向に連接されたカラムにより形成される単位マーチングメモリを一つのバッファとし、これを複数並列に設けて構成される。例えば、行方向に256のセルが連接されたカラムにより形成される単位マーチングメモリを一つのバッファとし、これを列方向に256個並列に設けて、256個のバッファからなるロードバッファ151、ストアバッファ152が形成される。マーチングメモリの各セルには画像データを構成する単位データが一時保持され、ロードバッファ151およびストアバッファ152に、各々256×256個の単位データを一時保持(一時記憶)可能に構成される。 The load buffer 151 and the store buffer 152 are configured by providing a unit marching memory formed by a column in which a plurality of cells (storage areas) are connected in the row direction as one buffer, and providing a plurality of them in parallel. For example, a unit marching memory formed by a column in which 256 cells are connected in the row direction is used as one buffer, and 256 units are provided in parallel in the column direction so that a load buffer 151 including 256 buffers and a store buffer are provided. 152 is formed. Each cell of the marching memory temporarily stores unit data constituting image data, and the load buffer 151 and the store buffer 152 are each configured to be capable of temporarily storing (temporarily storing) 256 × 256 unit data.
 ロードバッファ151は、命令RAM41からのDMA(Direct Memory Access)転送命令に応じてデータメモリ21にアクセスし、所定範囲の画像データを読み込んで一時保持する。例えば、水平方向(X方向とする)に4メガピクセル×垂直方向(Y方向とする)に3メガピクセルの画像データから、X方向に256ピクセル×Y方向に256ピクセル分の画像データを読み出し、256のバッファに一時保持する。このとき、X方向について最初に読み出したX1ラインの256ピクセル分の単位データを、B1バッファの第1セルから第256セルに順次送り動作させて一時保持し、次に読み出したX2ラインの256ピクセル分の単位データを、B2バッファの第1セルから第256セルに順次送り動作させて一時保持させる。以降同様に読み出し動作を繰り返し、B1バッファからB256バッファに順次一時保持させる。各レジスタには画像データのX方向ラインに沿って隣接する256ピクセル分の単位データが一時保持される。 The load buffer 151 accesses the data memory 21 in response to a DMA (Direct Memory Access) transfer command from the command RAM 41, reads a predetermined range of image data, and temporarily holds it. For example, image data for 256 pixels in the X direction and 256 pixels in the Y direction is read from image data of 4 megapixels in the horizontal direction (X direction) × 3 megapixels in the vertical direction (Y direction), Temporarily stored in 256 buffers. At this time, unit data for 256 pixels of the X1 line read first in the X direction is sequentially sent from the first cell of the B1 buffer to the 256th cell, temporarily held, and then read out for 256 pixels of the X2 line. Minute unit data is sequentially sent from the first cell of the B2 buffer to the 256th cell and temporarily held. Thereafter, the read operation is repeated in the same manner, and the B1 buffer and the B256 buffer are sequentially temporarily held. Each register temporarily holds unit data of 256 pixels adjacent to each other along the X direction line of the image data.
 レジスタファイル110は、命令RAM41からのロード命令に応じてロードバッファ151から単位データを読み込み、所定のレジスタ、例えばR1レジスタに格納する。同様にDMAC45あるいはSRAM46から実行する画像処理に応じたデータを読み込み、所定のレジスタ、例えばR2レジスタに格納する。 The register file 110 reads unit data from the load buffer 151 in response to a load instruction from the instruction RAM 41 and stores it in a predetermined register, for example, the R1 register. Similarly, data corresponding to the image processing to be executed is read from the DMAC 45 or the SRAM 46 and stored in a predetermined register, for example, the R2 register.
 演算器120は、命令RAM41からの演算命令に応じ、R1レジスタに格納された単位データと、R2レジスタに格納されたデータとを用いて演算処理を実行し、演算結果をアキュムレータ130を介してレジスタファイル110に出力し、所定のレジスタ、例えばR3レジスタに格納する。 The arithmetic unit 120 executes arithmetic processing using the unit data stored in the R1 register and the data stored in the R2 register in accordance with the arithmetic instruction from the instruction RAM 41, and registers the arithmetic result via the accumulator 130. The data is output to the file 110 and stored in a predetermined register, for example, the R3 register.
 レジスタファイル110は、命令RAM41からのストア命令に応じて演算処理後の単位データを出力し、ストアバッファ152に演算結果の単位データを書き込む。 The register file 110 outputs the unit data after the arithmetic processing in accordance with the store instruction from the instruction RAM 41 and writes the unit data of the arithmetic result in the store buffer 152.
 このとき、レジスタファイル110には、ロードバッファ151における各バッファ(単位マーチングメモリ)のカラムに並んだ単位データが順にロードされて演算器120により演算処理される。また、演算処理後の単位データは、演算処理された順にストアされ、ストアバッファ152における各バッファ(単位マーチングメモリ)のカラムに一時保持される。例えば、ロードバッファ151におけるB1バッファのカラムに並んだ単位データが、第1セルから順にレジスタファイル110に読み込まれて演算器120により演算処理され、演算処理後の単位データは、ストアバッファ152におけるB1バッファのカラムに、第1セルから第256セルに順次送り動作されて書き込まれる。以降、B2バッファ~B256バッファについても同様である。 At this time, the unit data arranged in the column of each buffer (unit marching memory) in the load buffer 151 is sequentially loaded into the register file 110 and processed by the calculator 120. Further, the unit data after the arithmetic processing is stored in the order in which the arithmetic processing is performed, and is temporarily held in the column of each buffer (unit marching memory) in the store buffer 152. For example, the unit data arranged in the column of the B1 buffer in the load buffer 151 is read into the register file 110 sequentially from the first cell and is arithmetically processed by the arithmetic unit 120. The unit data after the arithmetic processing is B1 in the store buffer 152. The data is sequentially sent from the first cell to the 256th cell and written in the buffer column. Thereafter, the same applies to the B2 buffer to the B256 buffer.
 演算器120による上記演算処理の進行に伴い、ロードバッファ151に一時保持された画像データは順次減少し、ストアバッファ152には演算処理後(画像処理後)の画像データが順次増加する。ロードバッファ151およびストアバッファ152は、この演算処理の進行と並行して、すなわち演算処理実行のバックグラウンドで、データメモリ21との間で画像データのDMA転送を行う。 As the arithmetic processing by the arithmetic unit 120 proceeds, the image data temporarily stored in the load buffer 151 sequentially decreases, and the image data after the arithmetic processing (after image processing) sequentially increases in the store buffer 152. The load buffer 151 and the store buffer 152 perform DMA transfer of image data to and from the data memory 21 in parallel with the progress of the arithmetic processing, that is, in the background of execution of the arithmetic processing.
 このような構成の画像処理装置100においては、X方向ライン1本分の画像データが各バッファの第1セルから第256セルに一時保持され、X1ラインからX256ラインの画像データが、B1バッファからB256バッファ(第1単位マーチングメモリから第256単位マーチングメモリ)に一時保持される。そして、演算器120により、B1バッファの第1セル~第256セル、B2バッファの第1セル~第256セル、B3バッファの第1セル~第256セル・・・、B256バッファの第1セル~第256セルのように、各バッファのセルに一時保持された単位データが順次ロードされて演算処理される。 In the image processing apparatus 100 having such a configuration, image data for one line in the X direction is temporarily held from the first cell to the 256th cell of each buffer, and image data from the X1 line to the X256 line is transferred from the B1 buffer. The data is temporarily stored in the B256 buffer (from the first unit marching memory to the 256th unit marching memory). Then, the computing unit 120 uses the first cell to the 256th cell of the B1 buffer, the first cell to the 256th cell of the B2 buffer, the first cell to the 256th cell of the B3 buffer, and the first cell of the B256 buffer. As in the 256th cell, unit data temporarily held in the cells of each buffer is sequentially loaded and processed.
 画像データは、隣接するセルに一時記憶される単位データが空間的に相互に関連性を有する(例えば隣接する画素により検出された)データの集合体であり、ストリームデータである。そのため、ランダムデータのように、各単位データに対してアドレッシングを施して読み書きを行う必要がなく、1ライン分の単位データを各単位マーチングメモリのバッファにおいて順に送り動作させてセルに記憶させ、あるいは読み出せばよい。また、隣接するセル間のデータの送り移動、すなわち単位データの書き込みおよび読み出し動作は、DSP43のクロックパルスに同期させて高速に行うことができる。 Image data is a collection of data in which unit data temporarily stored in adjacent cells are spatially related to each other (for example, detected by adjacent pixels), and is stream data. Therefore, unlike random data, it is not necessary to address and read / write each unit data, and unit data for one line is sequentially sent in the buffer of each unit marching memory and stored in the cell, or Read it out. In addition, data transfer between adjacent cells, that is, unit data write and read operations can be performed at high speed in synchronization with the clock pulse of the DSP 43.
 さらに、ロードバッファ151およびストアバッファ152とデータメモリ21とは、物理的に離れているため画像データのやりとりに時間がかかり、画像処理を高速化・高効率化する一つの障害になり得る。しかし、画像処理装置100においては、ロードバッファ151およびストアバッファ152とデータメモリ21との間での画像データの転送が、演算器120において演算処理が実行されている期間に並行して、バックグラウンドで行われる。 Furthermore, since the load buffer 151 and store buffer 152 and the data memory 21 are physically separated from each other, it takes time to exchange image data, which may be one obstacle for speeding up and increasing the efficiency of image processing. However, in the image processing apparatus 100, the transfer of the image data between the load buffer 151 and the store buffer 152 and the data memory 21 is performed in the background in parallel with the period during which the arithmetic unit 120 executes the arithmetic processing. Done in
 従って、画像処理装置100によれば、マーチングメモリの特長的な動作形態を活かし、高速・高効率で画像処理を実行することができる。 Therefore, according to the image processing apparatus 100, it is possible to execute image processing at high speed and high efficiency by utilizing the characteristic operation form of the marching memory.
 なお、B1~B256バッファにX1~X256ラインの画像データが一時保持される構成を例示したが、B1~B256バッファは各々任意ラインの画像データを一時保持させることができ、処理に応じて空いているバッファを使用することができる。例えば、B35バッファにX1ラインの画像データ、B42バッファにX2ラインの画像データを一時保持させる等のように使用することができる。 The configuration in which the image data of X1 to X256 lines are temporarily stored in the B1 to B256 buffer is illustrated, but each of the B1 to B256 buffers can temporarily store image data of an arbitrary line and is freed according to processing. Can use the buffer. For example, the image data of the X1 line can be temporarily stored in the B35 buffer, and the image data of the X2 line can be temporarily stored in the B42 buffer.
 次に、第1実施形態に含まれる第2の態様の画像処理装置200について、図3を参照して説明する。図3は、画像処理装置200におけるDSP(デジタル信号プロセッサ)43を主体として示すブロック図であり、図2と同様にデータメモリ部2におけるメモリコントローラ22や、DSPアレイ部4における命令RAM41、命令キャッシュ42、共有メモリ44等の記載を省略している。 Next, a second aspect of the image processing apparatus 200 included in the first embodiment will be described with reference to FIG. 3 is a block diagram mainly showing a DSP (digital signal processor) 43 in the image processing apparatus 200. Similar to FIG. 2, the memory controller 22 in the data memory unit 2, the instruction RAM 41 in the DSP array unit 4, and the instruction cache are shown. 42, the shared memory 44, etc. are omitted.
 画像処理装置200は、画像データを一時記憶するデータメモリ21と、データメモリ21から読み出した画像データに所定の情報処理を行うDSP43とを備える。本態様のDSP43は、演算処理に必要なデータを一時保持するレジスタファイル210と、レジスタファイル210に保持されたデータを用いて演算処理を実行するALUあるいはFPU等の演算器220と、演算結果を一時的に保持するアキュムレータ230と、レジスタファイル210の各レジスタに一時記憶された画像データのアドレス情報を一時保持するアドレスレジスタファイル250とを備えて構成される。そして、本態様の画像処理装置200においては、レジスタファイル210としてマーチングメモリが用いられる。 The image processing apparatus 200 includes a data memory 21 that temporarily stores image data, and a DSP 43 that performs predetermined information processing on the image data read from the data memory 21. The DSP 43 according to this aspect includes a register file 210 that temporarily holds data necessary for arithmetic processing, an arithmetic unit 220 such as an ALU or FPU that performs arithmetic processing using the data held in the register file 210, and an arithmetic result. An accumulator 230 that temporarily holds and an address register file 250 that temporarily holds address information of image data temporarily stored in each register of the register file 210 are configured. In the image processing apparatus 200 of this aspect, a marching memory is used as the register file 210.
 レジスタファイル210は、複数のセル(記憶領域)が行方向に連接されたカラムにより形成される単位マーチングメモリを一つのレジスタとし、これを複数並列に設けて構成される。すなわち、並列に設けられた単位マーチングメモリ211,212,213,・・・,21Nが、レジスタファイル210におけるR1,R2,R3,・・・,RNレジスタを構成する。例えば、行方向に256のセルを有する単位マーチングメモリを列方向に32個並列に設けたレジスタファイル210では、各々256個の単位データを保持可能なレジスタが32個設けられる。 The register file 210 includes a unit marching memory formed by a column in which a plurality of cells (storage areas) are connected in the row direction as one register, and a plurality of the unit marching memories are provided in parallel. In other words, unit marching memories 211, 212, 213,..., 21N provided in parallel constitute R1, R2, R3,. For example, in the register file 210 in which 32 unit marching memories having 256 cells in the row direction are provided in parallel in the column direction, 32 registers each capable of holding 256 unit data are provided.
 アドレスレジスタファイル250は、レジスタファイル210の各レジスタに一時保持された画像データのアドレスデータを一時保持するレジスタファイルであり、R1,R2,R3,・・・,RNレジスタに対応したA1,A2,A3,・・・,ANアドレスレジスタを有して構成される。例えば32のレジスタを有して構成される。 The address register file 250 is a register file that temporarily holds the address data of the image data temporarily held in each register of the register file 210. A1, A2, A2, A2, A2, A2, A2, A2, A2, A2, A3,..., AN address register. For example, it has 32 registers.
 本態様の画像処理装置200において、1つのバッチ命令により、レジスタ1ライン分のロード、ストア、演算処理が実行される。すなわち、1つのバッチロード命令により256個の単位データからなる1ライン分の画像データがレジスタにロードされ、1つのバッチ演算命令により1ライン分の画像データについて演算処理が行われ、1つのバッチストア命令により1ライン分の画像データがデータメモリ21にストアされる。レジスタにマーチングメモリを用いこのようなバッチ命令を行うことにより、命令効率を大幅に高めることができる。具体的には、以下のようにバッチ命令、処理が実行される。 In the image processing apparatus 200 of this aspect, loading, storing, and arithmetic processing for one line of the register are executed by one batch instruction. That is, one line of image data consisting of 256 unit data is loaded into the register by one batch load instruction, and one line of image data is arithmetically processed by one batch operation instruction. The image data for one line is stored in the data memory 21 by the instruction. By using a marching memory for the register and executing such batch instructions, the instruction efficiency can be greatly increased. Specifically, batch instructions and processing are executed as follows.
 レジスタファイル210は、命令RAM41からのバッチロード命令に応じてデータメモリ21にアクセスし、所定範囲の画像データを読み出して、各レジスタに格納する。例えば、水平方向に4メガピクセル×垂直方向に3メガピクセルの画像データから、水平方向に256ピクセル×垂直方向に10ピクセル分の画像データを読み出し、10のレジスタに格納する。このとき、最初のバッチロード命令に基づいてロードしたX1ラインの256ピクセル分の単位データを、R1レジスタの第1セルから第256セルに順次送り動作させて格納し、次のバッチロード命令に基づいてロードしたX2ラインの256ピクセル分の単位データを、R2レジスタの第1セルから第256セルに順次送り動作させて格納させる。以降同様にバッチロード命令に基づくロード動作を繰り返し、R1レジスタからR10レジスタに順次格納する。R1~R10の各レジスタには画像データのX方向ラインに沿って隣接する256ピクセル分の単位データが格納される。 The register file 210 accesses the data memory 21 in response to a batch load command from the command RAM 41, reads a predetermined range of image data, and stores it in each register. For example, image data of 256 pixels in the horizontal direction × 10 pixels in the vertical direction is read from the image data of 4 megapixels in the horizontal direction × 3 megapixels in the vertical direction, and stored in 10 registers. At this time, the unit data for 256 pixels of the X1 line loaded based on the first batch load instruction is sequentially sent from the first cell of the R1 register to the 256th cell and stored, and based on the next batch load instruction. The unit data for 256 pixels of the loaded X2 line is sequentially sent from the first cell of the R2 register to the 256th cell to be stored. Thereafter, similarly, the load operation based on the batch load instruction is repeated, and sequentially stored in the R1 register to the R10 register. Each register of R1 to R10 stores unit data for 256 pixels adjacent along the X-direction line of the image data.
 このとき、アドレスレジスタファイル250には、バッチロード命令に基づいて読み出され、各レジスタに格納された画像データのアドレス情報が格納される。例えば、R1レジスタの第1セル~第256セルに、X1ラインの画像データが格納された場合には、このX1ラインのアドレスがA1レジスタに格納される。また、次述する演算処理後のX1ラインの画像データがR21レジスタに格納された場合には、演算処理後のX1ラインのアドレスがA21レジスタに格納される。 At this time, the address register file 250 stores the address information of the image data read based on the batch load instruction and stored in each register. For example, when image data of the X1 line is stored in the first cell to the 256th cell of the R1 register, the address of the X1 line is stored in the A1 register. Further, when the image data of the X1 line after the arithmetic processing described below is stored in the R21 register, the address of the X1 line after the arithmetic processing is stored in the A21 register.
 また、レジスタファイル210には、命令RAM41からのバッチロード命令の実行に応じてDMAC45あるいはSRAM46から、実行する画像処理に応じたデータが読み出され、各レジスタに格納される。例えば、DMAC45から、バッチロード命令の実行に応じてX1ラインの画像処理に対応した256個のデータが読み出されてR11レジスタに格納され、バッチロード命令の実行に応じてX2ラインの画像処理に対応した256個のデータが読み出されてR12レジスタに格納される。以下同様にして、X10ラインの画像処理に対応した256個のデータが読み出されてR20レジスタに格納される。 Also, in the register file 210, data corresponding to the image processing to be executed is read from the DMAC 45 or the SRAM 46 in response to the execution of the batch load instruction from the instruction RAM 41, and stored in each register. For example, 256 data corresponding to the X1 line image processing is read from the DMAC 45 according to the execution of the batch load instruction and stored in the R11 register, and the X2 line image processing is performed according to the execution of the batch load instruction. Corresponding 256 pieces of data are read out and stored in the R12 register. Similarly, 256 data corresponding to the image processing of the X10 line are read and stored in the R20 register.
 演算器220は、命令RAM41からのバッチ演算命令に応じて、各レジスタに格納された画像データをバッチ処理する。例えば、命令RAM41からのバッチ演算命令が、R1レジスタに格納された画像データと、R11レジスタに格納されたデータとを加算して、R21レジスタに格納せよというバッチ演算命令であったとき、演算器220は次のように演算処理を実行する。 The computing unit 220 performs batch processing on the image data stored in each register in response to a batch computation command from the command RAM 41. For example, when the batch operation instruction from the instruction RAM 41 is a batch operation instruction for adding the image data stored in the R1 register and the data stored in the R11 register and storing them in the R21 register, 220 executes arithmetic processing as follows.
 演算器220は、バッチ演算命令に基づいてR1レジスタの第1セルに格納された単位データと、R11レジスタの第1セルに格納されたデータとを加算して、R21レジスタの第1セルに格納する。次いで、R1レジスタの第2セルに格納された単位データと、R11レジスタの第2セルに格納されたデータとを加算して、R21レジスタの第2セルに格納する。以下同様に、R1レジスタの第nセルに格納された単位データと、R11レジスタの第nセルに格納されたデータとを加算して、R21レジスタの第nセルに格納する。この処理を1命令で第1セル~第256セルまで実行する。 The arithmetic unit 220 adds the unit data stored in the first cell of the R1 register based on the batch operation instruction and the data stored in the first cell of the R11 register, and stores the result in the first cell of the R21 register. To do. Next, the unit data stored in the second cell of the R1 register and the data stored in the second cell of the R11 register are added and stored in the second cell of the R21 register. Similarly, the unit data stored in the nth cell of the R1 register and the data stored in the nth cell of the R11 register are added and stored in the nth cell of the R21 register. This process is executed from the first cell to the 256th cell with one instruction.
 命令RAM41からの次のバッチ演算命令として、R2レジスタ(X2ライン)に格納された画像データに対する処理命令が出されると、演算器220は、R1レジスタに格納された画像データに対して行った処理と同様に演算処理を実行する。以降、バッチ演算を繰り返す場合は、例えばR3レジスタ~R10レジスタについて、同様の演算処理を実行する。 When a processing instruction for the image data stored in the R2 register (X2 line) is issued as the next batch operation instruction from the instruction RAM 41, the arithmetic unit 220 performs processing performed on the image data stored in the R1 register. The arithmetic processing is executed in the same way as Thereafter, when the batch calculation is repeated, for example, the same calculation process is executed for the R3 to R10 registers.
 ここで、R1レジスタ~R10レジスタの第1セル~第256セルには、それぞれX1ライン~X10ラインを形成する演算処理前の単位データが順に並んで格納される。また、レジスタファイル210を構成する各レジスタはいずれも単位マーチングメモリであり、第1セル~第256セルに格納されたデータはデータの並び順を維持したまま順送り移動される。そのため、演算処理の実行時に各単位マーチングメモリを単純に順送り動作させることによって、第1セル~第256セルに格納された単位データを順に演算器220に送り出し、演算処理を実行させることができる。演算処理後の単位データをR21レジスタ~R30レジスタの第1セル~第256セルに書き込むときについても同様であり、各単位マーチングメモリを順送り動作させて単位データを送り込むことにより、第1セル~第256セルに単位データが順に並んで格納される。 Here, the unit data before the arithmetic processing forming the X1 line to the X10 line are sequentially stored in the first cell to the 256th cell of the R1 register to the R10 register, respectively. Each register constituting the register file 210 is a unit marching memory, and the data stored in the first cell to the 256th cell is moved forward while maintaining the data arrangement order. Therefore, the unit marching memory is simply forwardly operated at the time of execution of the arithmetic processing, so that the unit data stored in the first cell to the 256th cell can be sequentially sent to the arithmetic unit 220 and the arithmetic processing can be executed. The same applies to the case where the unit data after the arithmetic processing is written to the first cell to the 256th cell of the R21 register to the R30 register. The unit marching memory is sequentially operated to send the unit data to the first cell to the second cell. Unit data is stored in order in 256 cells.
 演算器220による上記演算処理の進行に伴い、R1レジスタ~R10レジスタに格納された演算処理前の画像データ、およびR11レジスタ~R20レジスタに格納されたデータは順次減少し、R21レジスタ~R30レジスタに格納された演算処理後の画像データが順次増加する。DSP43は、演算器220による演算処理の進行と並行して、バッチストア命令を出すことにより、レジスタファイル210とデータメモリ21との間で画像データのDMA転送を行わせる。 As the arithmetic processing by the arithmetic unit 220 proceeds, the image data before the arithmetic processing stored in the R1 register to the R10 register and the data stored in the R11 register to the R20 register are sequentially reduced and stored in the R21 register to the R30 register. The stored image data after the arithmetic processing sequentially increases. The DSP 43 issues a DMA transfer of image data between the register file 210 and the data memory 21 by issuing a batch store instruction in parallel with the progress of the arithmetic processing by the arithmetic unit 220.
 このような構成の画像処理装置200においては、1つのバッチロード命令でX方向ライン1本分の256の単位データからなる画像データが各レジスタの第1セルから第256セルに格納され、10のバッチロード命令でX1ラインからX10の2560の単位データからなる画像データが、R1レジスタからR10レジスタに格納される。演算器220では、1つのバッチ演算命令で256の単位データからなる1ライン分の画像データが演算処理され、10のバッチ演算命令でR1レジスタ~R10レジスタに格納された10ライン分の画像データが演算処理されて、R21~R30に格納される。 In the image processing apparatus 200 having such a configuration, image data including 256 unit data for one X direction line is stored in the first cell to the 256th cell of each register by one batch load instruction. Image data composed of unit data of 2560 from the X1 line to the X10 is stored from the R1 register to the R10 register by the batch load instruction. In the arithmetic unit 220, image data for one line composed of 256 unit data is processed by one batch operation instruction, and image data for 10 lines stored in the R1 register to R10 register by 10 batch operation instructions. It is calculated and stored in R21 to R30.
 画像データは、隣接するセルに格納される単位データが空間的に相互に関連性を有する(例えば隣接する画素により検出された)データの集合体であり、ストリームデータである。そのため、ランダムデータのように、各単位データに対して逐一アドレッシングを施して読み書きを行う必要がなく、アドレッシングやデータサーチ等に要する時間を排除することができる。 Image data is a collection of data in which unit data stored in adjacent cells are spatially related to each other (for example, detected by adjacent pixels), and is stream data. Therefore, unlike the random data, it is not necessary to perform addressing for each unit data one by one and read and write, and the time required for addressing, data search, etc. can be eliminated.
 さらに、画像処理装置200においては、演算器220が1本のX方向ライン分に相当する単位データ群(上記実施例では256の単位データ群)のロード・演算・ストアの各処理をそれぞれ1命令でバッチ処理する。1ライン分の単位データ群はデータの並び順を維持したまま順送り移動されるため、ロード時や演算処理の実行時、ストア時に単位マーチングメモリを単純に順送り動作させることによって、単位データ群を順にロードし、演算処理し、演算結果のデータ群を順に格納することができる。単位マーチングメモリにおける隣接するセル間のデータの送り移動、すなわち単位データの書き込みおよび読み出し動作は、基準クロックパルスに同期させて高速に行うことができる。さらに、1つのバッチ命令により、レジスタ1ライン分のロード、ストア、演算処理が実行されるため命令効率を大幅に高めることができる。 Further, in the image processing apparatus 200, the arithmetic unit 220 performs one instruction for each of the load, operation, and store processes of a unit data group (256 unit data groups in the above embodiment) corresponding to one X-direction line. Process in batch. Since the unit data group for one line is moved forward while maintaining the data arrangement order, the unit data group can be moved in order by simply operating the unit marching memory at the time of loading, calculation processing, or storing. It is possible to load, perform arithmetic processing, and store the data group of the arithmetic results in order. Data transfer movement between adjacent cells in the unit marching memory, that is, unit data write and read operations can be performed at high speed in synchronization with the reference clock pulse. Furthermore, load, store, and arithmetic processing for one register line are executed by one batch instruction, so that the instruction efficiency can be greatly improved.
 また、レジスタファイル210とデータメモリ21とは、物理的に離れているため画像データのやりとりに時間がかかり、画像処理を高速化・高効率化する一つの障害になり得る。しかしながら、画像処理装置200においては、レジスタファイル210とデータメモリ21との間での画像データの転送が、演算器220において演算処理が実行されている期間に並行して、バックグラウンドで行われる。 Also, since the register file 210 and the data memory 21 are physically separated, it takes time to exchange image data, which can be an obstacle to speeding up and increasing the efficiency of image processing. However, in the image processing apparatus 200, the transfer of the image data between the register file 210 and the data memory 21 is performed in the background in parallel with the period during which the arithmetic unit 220 executes the arithmetic processing.
 従って、画像処理装置200によれば、マーチングメモリの特長的な動作形態を活かし、高速・高効率で画像処理を実行することができる。 Therefore, according to the image processing apparatus 200, it is possible to execute image processing at high speed and high efficiency by utilizing the characteristic operation form of the marching memory.
 なお、以上では、説明簡明化のため、レジスタファイル210におけるR1~R10レジスタにX1~X10ラインの画像データが格納され、R11~R20レジスタにX1~X10ラインの処理データが格納され、R21~R30レジスタにX1~X10ラインの処理後の画像データが格納される構成を例示した。しかしながら、R1~R32レジスタは、その時の割り付けにより、各々任意ラインの画像データ、処理データ、処理後の画像データを格納することができ、処理に応じて空いているレジスタを割り付けて使用することができる。例えば、R8レジスタ及びR13レジスタにX1ライン及びX2ラインの画像データが格納され、R2レジスタ及びR25レジスタにX1ライン及びX2ラインの処理データが格納され、R5レジスタ及びR32レジスタにX1ライン及びX2ラインの処理後の画像データが格納されるようにすることができる。 In the above description, for simplification of explanation, the image data of the X1 to X10 lines are stored in the R1 to R10 registers in the register file 210, the processing data of the X1 to X10 lines are stored in the R11 to R20 registers, and R21 to R30. The configuration in which the image data after processing of the X1 to X10 lines is stored in the register is illustrated. However, the R1 to R32 registers can store image data, processed data, and processed image data of arbitrary lines, respectively, according to the allocation at that time, and an unused register can be allocated and used according to the processing. it can. For example, the image data of the X1 line and the X2 line are stored in the R8 register and the R13 register, the processing data of the X1 line and the X2 line are stored in the R2 register and the R25 register, and the X1 line and the X2 line are stored in the R5 register and the R32 register. The processed image data can be stored.
 アドレスレジスタ250についても同様であり、A1~A32レジスタは、その時の割り付けにより、各々任意ラインの画像データのアドレス、処理後の画像データのアドレスを格納することができ、処理に応じて空いているレジスタを割り付けて使用することができる。例えば、R9レジスタに格納したX3ラインの画像データのアドレスがA2レジスタに格納され、R22レジスタに格納したX3ラインの処理後のアドレスがA11レジスタに格納されるようにすることができる。 The same applies to the address register 250, and the A1 to A32 registers can each store the address of the image data of an arbitrary line and the address of the processed image data according to the assignment at that time, and are free according to the processing. Registers can be allocated and used. For example, the address of the X3 line image data stored in the R9 register can be stored in the A2 register, and the processed address of the X3 line stored in the R22 register can be stored in the A11 register.
 次に、第2実施形態の画像処理装置について説明する。第2実施形態の画像処理装置は、データメモリ21と、複数のDSP(デジタル信号プロセッサ)43,43,・・・,43とを備え、データメモリ21とDSP43,43,・・・,43との間にマーチングメモリを利用した共有メモリ44を設けて構成される。 Next, an image processing apparatus according to the second embodiment will be described. The image processing apparatus according to the second embodiment includes a data memory 21 and a plurality of DSPs (digital signal processors) 43, 43,... 43, and the data memory 21 and the DSPs 43, 43,. A shared memory 44 using a marching memory is provided between the two.
 本実施形態の画像処理装置400について、図4を参照して説明する。図4は、画像処理装置400における共有メモリ44を主体として示すブロック図であり、データメモリ部2におけるメモリコントローラ22や、DSPアレイ部4における命令RAM41、命令キャッシュ42等の記載を省略している。 The image processing apparatus 400 of the present embodiment will be described with reference to FIG. FIG. 4 is a block diagram mainly showing the shared memory 44 in the image processing apparatus 400, and the description of the memory controller 22 in the data memory unit 2, the instruction RAM 41, the instruction cache 42, etc. in the DSP array unit 4 is omitted. .
 画像処理装置400は、画像データを一時記憶するデータメモリ21と、データメモリ21から読み出した画像データに所定の情報処理を行うDSP43と、共有メモリ44とを備える。共有メモリ44は、画像データ等を一時記憶するデータ記憶部401と、データ記憶部401に入出力される画像データ等の流れを制御するデータ制御部402とから構成される。 The image processing apparatus 400 includes a data memory 21 that temporarily stores image data, a DSP 43 that performs predetermined information processing on image data read from the data memory 21, and a shared memory 44. The shared memory 44 includes a data storage unit 401 that temporarily stores image data and the like, and a data control unit 402 that controls the flow of image data and the like input to and output from the data storage unit 401.
 データ記憶部401は、画像データを一時記憶するMMアレイ410と、MMアレイ(マーチングメモリアレイ)410に一時記憶された画像データの情報を一時保持するMMラベル管理コントローラ420とを有して構成される。データ制御部402は、データメモリ21およびDSP43,43,・・・,43とMMアレイ410との間に設けられたリード/ライト調整回路430と、リード/ライト調整回路430の作動を制御するリード/ライトシーケンサ440とを有して構成される。 The data storage unit 401 includes an MM array 410 that temporarily stores image data, and an MM label management controller 420 that temporarily stores information of image data temporarily stored in the MM array (marching memory array) 410. The The data control unit 402 is a read / write adjustment circuit 430 provided between the data memory 21 and DSPs 43, 43,... 43 and the MM array 410, and a read that controls the operation of the read / write adjustment circuit 430. / Write sequencer 440.
 MMアレイ410は、複数のセル(記憶領域)が行方向に連接されたカラムにより形成される単位マーチングメモリを一つのメモリとし、これを複数並列に設けて構成される。すなわち、並列に設けられた単位マーチングメモリ411,412,413,・・・,41Nが、MMアレイ410におけるC1,C2,C3,・・・,CNメモリを構成する。MMアレイの容量は、デジタルカメラにより撮影される画像データのデータ容量に応じて設定され、例えば、行方向に256×nのセルを有する単位マーチングメモリを、列方向に256×m個並列に設けて構成される。本実施形態では、各単位マーチングメモリのカラムの一端をデータの入出力ポートとした構成を例示する。 The MM array 410 includes a unit marching memory formed by a column in which a plurality of cells (storage areas) are connected in the row direction as a single memory, and a plurality of them are provided in parallel. In other words, unit marching memories 411, 412, 413,..., 41N provided in parallel constitute C1, C2, C3,. The capacity of the MM array is set according to the data capacity of the image data captured by the digital camera. For example, 256 × m unit marching memories having 256 × n cells in the row direction are provided in parallel in the column direction. Configured. In the present embodiment, a configuration in which one end of a column of each unit marching memory is used as a data input / output port is illustrated.
 リード/ライト調整回路430は、MMアレイ410とデータメモリ21との間での画像データの読み出しや書き込み、MMアレイ410とDSP43,43,・・・,43との間での画像データの転送を行う際の、画像データの流れを調整する回路である。例示するリード/ライト調整回路430は、DSP43,43,・・・,43に対応して設けられたロード/ストアユニット431,431,・・・,431と、単位マーチングメモリ411,412,・・・,41Nに対応する第1~第Nレジスタが円環状に相互接続されたリングレジスタ433と、ロード/ストアユニット431とリングレジスタ433との接続をコントロールするポート接続コントローラ432とを有して構成される。 The read / write adjustment circuit 430 reads and writes image data between the MM array 410 and the data memory 21, and transfers image data between the MM array 410 and the DSPs 43, 43,. This is a circuit for adjusting the flow of image data when it is performed. The read / write adjustment circuit 430 shown in FIG. 1 includes load / store units 431, 431,... 431 provided corresponding to the DSPs 43, 43,. A ring register 433 in which first to Nth registers corresponding to 41N are interconnected in a ring shape, and a port connection controller 432 that controls connection between the load / store unit 431 and the ring register 433 Is done.
 ロード/ストアユニット431は、一端がグローバルメモリ転送バス68を介してDSP43,43,・・・,43およびデータメモリ21に接続され、他端がポート接続コントローラ432に接続される。リングレジスタ433は、第1~第Nレジスタの一端が各々対応するC1~CNメモリの入出力ポートに接続され、他端がポート接続コントローラ432に接続される。 The load / store unit 431 has one end connected to the DSPs 43, 43,..., 43 and the data memory 21 via the global memory transfer bus 68 and the other end connected to the port connection controller 432. The ring register 433 has one end of each of the first to Nth registers connected to an input / output port of the corresponding C1 to CN memory, and the other end connected to a port connection controller 432.
 リード/ライトシーケンサ440は、画像処理装置400で実行する画像処理の内容に応じて、リード/ライト調整回路430のロード/ストアユニット431、ポート接続コントローラ432、およびリングレジスタ433の作動を制御する。画像処理装置400で実行する画像処理は既に公知の種々のものがあり、処理内容に応じてリード/ライトシーケンサ440が制御するリード/ライト調整回路430の制御形態(処理モード)も異なったものとなる。ここでは、(1)データメモリ21に記憶された画像データをそのままMMアレイ410に一時記憶するコピーモード、(2)データメモリ21に記憶された画像データの水平方向と垂直方向とを反転させるX-Y変換モード、(3)データメモリ21に記憶された画像データのデータ数を減少させる圧縮モードを、処理モードの代表例として説明する。 The read / write sequencer 440 controls the operations of the load / store unit 431, the port connection controller 432, and the ring register 433 of the read / write adjustment circuit 430 according to the contents of the image processing executed by the image processing apparatus 400. There are various known image processes executed by the image processing apparatus 400, and the control mode (processing mode) of the read / write adjustment circuit 430 controlled by the read / write sequencer 440 differs depending on the processing contents. Become. Here, (1) a copy mode in which image data stored in the data memory 21 is temporarily stored in the MM array 410 as it is, and (2) X in which the horizontal direction and the vertical direction of the image data stored in the data memory 21 are reversed. A -Y conversion mode and (3) a compression mode for reducing the number of image data stored in the data memory 21 will be described as representative examples of processing modes.
(1)コピーモード
 コピーモードは、データメモリ21に記憶された画像データを、そのままMMアレイ410に一時記憶するような処理モードである。コピーモードの場合に、リード/ライトシーケンサ440は、リード/ライト調整回路430を以下のように制御する。いま、第1のロード/ストアユニット431に、X1ライン~X5ラインの画像データを読み出し、MMアレイ410のC1メモリ~C5メモリに一時記憶させる(書き込む)とする。
(1) Copy mode The copy mode is a processing mode in which image data stored in the data memory 21 is temporarily stored in the MM array 410 as it is. In the copy mode, the read / write sequencer 440 controls the read / write adjustment circuit 430 as follows. Now, it is assumed that the first load / store unit 431 reads the image data of the X1 line to the X5 line and temporarily stores (writes) them in the C1 memory to the C5 memory of the MM array 410.
 このとき、リード/ライトシーケンサ440は、まず第1のロード/ストアユニット431に、X1ラインのアドレスを指定してX方向ライン1本分の画像データを読み出させる。ポート接続コントローラ432には、第1のロード/ストアユニット431と、リングレジスタ433におけるC1メモリに対応したレジスタである第1レジスタとを接続させる。リングレジスタ433は、レジスタ間でのデータ移動を行わず、ポート接続コントローラ432から入力されたデータをそのメモリに出力する設定とする。これにより、データメモリ21から読み出されたX1ラインの画像データは、第1のロード/ストアユニット431~ポート接続コントローラ432~リングレジスタ433の第1レジスタを通り、MMアレイ410のC1メモリに一時記憶される。 At this time, the read / write sequencer 440 first causes the first load / store unit 431 to specify the address of the X1 line and read the image data for one X direction line. The port connection controller 432 is connected to the first load / store unit 431 and a first register that is a register corresponding to the C1 memory in the ring register 433. The ring register 433 is set to output data input from the port connection controller 432 to the memory without performing data movement between the registers. As a result, the image data of the X1 line read from the data memory 21 passes through the first registers of the first load / store unit 431 to the port connection controller 432 to the ring register 433, and is temporarily stored in the C1 memory of the MM array 410. Remembered.
 次いで、リード/ライトシーケンサ440は、第1のロード/ストアユニット431に、X2ラインのアドレスを指定してX方向ライン1本分の画像データを読み出させる。ポート接続コントローラ432には、第1のロード/ストアユニット431と、リングレジスタ433の第2レジスタとを接続させる。リングレジスタ433はレジスタ間でのデータ移動を行わない設定を維持する。これにより、データメモリ21から読み出されたX2ラインの画像データが、第1のロード/ストアユニット431~ポート接続コントローラ432~リングレジスタ433の第2レジスタを通り、MMアレイ410のC2メモリに一時記憶される。X3ライン~X5ラインについても同様であり、ポート接続コントローラ432の接続設定を順次切り替えることにより、X3ライン~X5ラインの画像データが、MMアレイ410のC3メモリ~C5メモリに一時記憶される。 Next, the read / write sequencer 440 causes the first load / store unit 431 to specify the address of the X2 line and read the image data for one X direction line. The port connection controller 432 is connected to the first load / store unit 431 and the second register of the ring register 433. The ring register 433 maintains a setting that does not move data between registers. As a result, the image data of the X2 line read from the data memory 21 passes through the second registers of the first load / store unit 431 to the port connection controller 432 to the ring register 433, and is temporarily stored in the C2 memory of the MM array 410. Remembered. The same applies to the X3 line to X5 line. By sequentially switching the connection settings of the port connection controller 432, the image data of the X3 line to X5 line is temporarily stored in the C3 memory to C5 memory of the MM array 410.
 MMラベル管理コントローラ420には、MMアレイ410のC1メモリ~C5メモリに一時記憶された画像データのアドレス情報がラベルとして一時記憶される。例えば、C1メモリに一時記憶された画像データは、データメモリ21に記憶された原画像データにおけるX1ラインの画像データである旨のラベルが一時記憶される。C2メモリ~C5メモリについても同様である。 In the MM label management controller 420, address information of image data temporarily stored in the C1 memory to the C5 memory of the MM array 410 is temporarily stored as a label. For example, the image data temporarily stored in the C1 memory temporarily stores a label indicating that it is the X1 line image data in the original image data stored in the data memory 21. The same applies to the C2 memory to C5 memory.
 なお、MMアレイ410のC1~CNメモリには、任意ラインの画像データを一時記憶させることができる。例えば、C3メモリにX1ラインの画像データ、C6メモリにX2ラインの画像データを一時記憶させるようにすることができる。 Note that image data of an arbitrary line can be temporarily stored in the C1 to CN memories of the MM array 410. For example, the X1 line image data can be temporarily stored in the C3 memory, and the X2 line image data can be temporarily stored in the C6 memory.
 以上は、データメモリ21に記憶された画像データをMMアレイ410に一時記憶する場合について説明したが、MMアレイ410に一時記憶された画像データを、DSP43に転送するとき、あるいはデータメモリ21に書き込むときについても同様である。すなわち、リード/ライトシーケンサ440は、読み出すべきMMアレイ410のメモリと、DSP43への転送を行うロード/ストアユニット431との接続を順次切り換えて、所定範囲の画像データをDSP43に転送する。 The above has described the case where the image data stored in the data memory 21 is temporarily stored in the MM array 410. However, the image data temporarily stored in the MM array 410 is transferred to the DSP 43 or written into the data memory 21. The same applies to time. That is, the read / write sequencer 440 sequentially switches the connection between the memory of the MM array 410 to be read and the load / store unit 431 that performs transfer to the DSP 43, and transfers image data in a predetermined range to the DSP 43.
(2)X-Y変換モード
 X-Y変換モードは、データメモリ21に記憶された画像データにおける水平方向(X方向)と垂直方向(Y方向)とを反転させてMMアレイ410に一時記憶するようなる処理モードである。X-Y変換モードの場合、リード/ライトシーケンサ440は、リード/ライト調整回路430を以下のように制御する。いま、第1のロード/ストアユニット431に、X1ライン~X5ラインの画像データを読み出し、MMアレイ410にY1ライン~Y5ラインとして一時記憶させる場合を説明する。
(2) XY Conversion Mode In the XY conversion mode, the horizontal direction (X direction) and the vertical direction (Y direction) in the image data stored in the data memory 21 are reversed and temporarily stored in the MM array 410. This is the processing mode. In the XY conversion mode, the read / write sequencer 440 controls the read / write adjustment circuit 430 as follows. Now, a case will be described in which image data of the X1 line to X5 line is read out to the first load / store unit 431 and temporarily stored in the MM array 410 as the Y1 line to Y5 line.
 このとき、リード/ライトシーケンサ440は、まず第1のロード/ストアユニット431に、X1ラインのアドレスを指定してX方向イン1本分の画像データを読み出させる。ポート接続コントローラ432には、第1のロード/ストアユニット431と、リングレジスタ433における第1レジスタとを接続させる。リングレジスタ433については、X1ラインを構成する単位データが入力されるごとに、レジスタ間でデータを送り移動させるように設定する。 At this time, the read / write sequencer 440 first causes the first load / store unit 431 to read the image data for one line in the X direction by designating the address of the X1 line. The port connection controller 432 is connected to the first load / store unit 431 and the first register in the ring register 433. The ring register 433 is set so that data is sent and moved between registers each time unit data constituting the X1 line is input.
 図5は、1本のX方向ラインが4つの単位データで構成される場合のリングレジスタ433の作用を示している。このとき、データメモリ21から読み出されたX1ラインの4つの単位データは、リングレジスタ433により順次データが移動され、X1ラインの第1単位データがC4メモリに対応する第4レジスタに、第2単位データがC3メモリに対応する第3レジスタに、第3単位データがC2メモリに対応する第2レジスタに、第4単位データがC1メモリに対応する第1レジスタに移動される。そして、X方向ライン1本分の画像データがリングレジスタ433の第4レジスタ~第1レジスタに格納されたのち、各レジスタに格納された単位データがMMアレイ410に書き込まれる。これにより、X1ラインがY1ラインにX-Y変換される。 FIG. 5 shows the operation of the ring register 433 when one X-direction line is composed of four unit data. At this time, the four unit data of the X1 line read from the data memory 21 are sequentially moved by the ring register 433, and the first unit data of the X1 line is transferred to the fourth register corresponding to the C4 memory. The unit data is moved to the third register corresponding to the C3 memory, the third unit data is moved to the second register corresponding to the C2 memory, and the fourth unit data is moved to the first register corresponding to the C1 memory. Then, after the image data for one line in the X direction is stored in the fourth to first registers of the ring register 433, the unit data stored in each register is written into the MM array 410. As a result, the X1 line is XY-converted to the Y1 line.
 次いで、リード/ライトシーケンサ440は、第1のロード/ストアユニット431に、X2ラインのアドレスを指定してX方向ライン1本分の画像データを読み出させる。ポート接続コントローラ432、リングレジスタ433の設定は同一である。データメモリ21から読み出されたX2ラインの4つの単位データは、リングレジスタ433により順次データが移動され、X2ラインの第1単位データが第4レジスタに、第2単位データが第3レジスタに、第3単位データが第2レジスタに、第4単位データが第1レジスタに移動される。そして、X方向ライン1本分の画像データがリングレジスタ433の第4レジスタ~第1レジスタに格納されたのち、各レジスタに格納された単位データがMMアレイ410に書き込まれる。これにより、X2ラインがY2ラインにX-Y変換される。このとき、先にC4メモリ~C1メモリの第1セルに一時記憶されていたY1ラインの各単位データは、Y2ラインの各単位データの書き込み動作とともに第2セルに送り移動される。 Next, the read / write sequencer 440 causes the first load / store unit 431 to specify the address of the X2 line and read the image data for one X direction line. The settings of the port connection controller 432 and the ring register 433 are the same. The four unit data of the X2 line read from the data memory 21 are sequentially moved by the ring register 433, the first unit data of the X2 line is transferred to the fourth register, the second unit data is transferred to the third register, The third unit data is moved to the second register, and the fourth unit data is moved to the first register. Then, after the image data for one line in the X direction is stored in the fourth to first registers of the ring register 433, the unit data stored in each register is written into the MM array 410. As a result, the X2 line is XY converted to the Y2 line. At this time, each unit data of the Y1 line temporarily stored in the first cell of the C4 memory to the C1 memory is sent and moved to the second cell together with the writing operation of each unit data of the Y2 line.
 X3ライン~X5ラインついても同様にしてX-Y変換され、Y3ライン~Y5ラインとしてMMアレイ410に一時記憶される。 XY lines are similarly converted for the X3 line to X5 line and temporarily stored in the MM array 410 as the Y3 line to Y5 line.
 MMラベル管理コントローラ420には、MMアレイ410のC1メモリ~C4メモリに一時記憶された画像データの情報がラベルとして一時記憶される。例えば、C1メモリに一時記憶された画像データは、データメモリ21に記憶された原画像データにおけるY4ラインに相当する画像データである旨のラベルが一時記憶される。C2メモリ~C4メモリについても同様である。 In the MM label management controller 420, information of image data temporarily stored in the C1 memory to the C4 memory of the MM array 410 is temporarily stored as a label. For example, the label indicating that the image data temporarily stored in the C1 memory is the image data corresponding to the Y4 line in the original image data stored in the data memory 21 is temporarily stored. The same applies to the C2 memory to C4 memory.
 以上は、データメモリ21に記憶された画像データを、X-Y変換してMMアレイ410に一時記憶する場合について説明したが、MMアレイ410に一時記憶された画像データを、DSP43に転送するとき、あるいはデータメモリ21に書き込むときにX-Y変換する場合も同様である。例えば、上記のようにしてC1メモリ~C4メモリにX-Y変換された画像データが一時記憶されている場合に、C1メモリ~C4メモリを同時に送り動作させてリングレジスタ433の各レジスタに単位データを格納し、この4つの単位データをデータ移動させて一つのロード/ストアユニット431から出力させれば、Y方向ラインをX方向ラインにX-Y変換してDSP43等に転送することができる。 In the above description, the case where the image data stored in the data memory 21 is XY-converted and temporarily stored in the MM array 410 has been described. However, when the image data temporarily stored in the MM array 410 is transferred to the DSP 43. The same applies to XY conversion when writing to the data memory 21. For example, when the XY-converted image data is temporarily stored in the C1 memory to the C4 memory as described above, the C1 memory to the C4 memory are simultaneously sent to operate the unit data in each register of the ring register 433. If the four unit data are moved and output from one load / store unit 431, the Y-direction line can be XY-converted into the X-direction line and transferred to the DSP 43 or the like.
(3)圧縮モード
 圧縮モードは、データメモリに記憶された画像データを間引いてデータ数を減少させた画像データをMMアレイ410に一時記憶するような処理モードである。圧縮モードの場合、リード/ライトシーケンサ440は、リード/ライト調整回路430を以下のように制御する。いま、第1のロード/ストアユニット431に、X1ライン~X5ラインの画像データを読み出し、MMアレイ410にデータ数を1/4に圧縮した画像データを一時記憶させる場合を説明する。
(3) Compression mode The compression mode is a processing mode in which image data in which the number of data is reduced by thinning out the image data stored in the data memory is temporarily stored in the MM array 410. In the compression mode, the read / write sequencer 440 controls the read / write adjustment circuit 430 as follows. Now, a case where the first load / store unit 431 reads the image data of the X1 line to X5 line and temporarily stores the image data in which the number of data is compressed to ¼ in the MM array 410 will be described.
 このとき、リード/ライトシーケンサ440は、まず第1のロード/ストアユニット431に、X1ラインのアドレスを指定してX方向ライン1本分の画像データを読み出させる。ポート接続コントローラ432には、第1のロード/ストアユニット431と、リングレジスタ433における第1レジスタとを接続させる。リングレジスタ433に対しては、ポート接続コントローラ432からリングレジスタ433に入力されるX1ラインの単位データが4n-2~4n(nは1以上の整数)の場合には、単位データが入力されるごとに、レジスタ間でデータを送り移動させるように設定する。 At this time, the read / write sequencer 440 first causes the first load / store unit 431 to specify the address of the X1 line and read the image data for one X direction line. The port connection controller 432 is connected to the first load / store unit 431 and the first register in the ring register 433. When the unit data of the X1 line input from the port connection controller 432 to the ring register 433 is 4n−2 to 4n (n is an integer of 1 or more), the unit data is input to the ring register 433. Each time it is set to send and move data between registers.
 具体的には、X1ラインを構成する単位データが入力されたときに、1~4番目(n=1)の単位データについて、前述したX-Y変換モードと同様の動作を行わせる。すなわち、2~4番目の単位データが入力されたときに、リングレジスタ433によりレジスタ間で順次データが移動され、1番目の単位データが第4レジスタに、2番目の単位データが第3レジスタに、3番目の単位データが第2レジスタに、4番目の単位データが第1レジスタに移動される。そして、これら4つの単位データがリングレジスタ433の第4レジスタ~第1レジスタに格納されたのち、各レジスタに格納された単位データがMMアレイ410に書き込まれる。これによりX1ラインにおける、1番目の単位データがC4メモリ、2番目の単位データがC3メモリ、3番目の単位データがC2メモリ、4番目の単位データがC1メモリに一時保持される。 Specifically, when the unit data constituting the X1 line is input, the same operation as in the XY conversion mode described above is performed on the first to fourth (n = 1) unit data. That is, when the second to fourth unit data is input, the data is sequentially moved between the registers by the ring register 433, the first unit data is transferred to the fourth register, and the second unit data is transferred to the third register. The third unit data is moved to the second register, and the fourth unit data is moved to the first register. Then, after these four unit data are stored in the fourth register to the first register of the ring register 433, the unit data stored in each register is written into the MM array 410. As a result, the first unit data in the X1 line is temporarily stored in the C4 memory, the second unit data is the C3 memory, the third unit data is the C2 memory, and the fourth unit data is temporarily stored in the C1 memory.
 5~8番目(n=2)の単位データが入力されたときも同様に、6~8番目の単位データが入力されたときに、リングレジスタ433によりレジスタ間で順次データが移動され、5番目の単位データが第4レジスタに、6番目の単位データが第3レジスタに、7番目の単位データが第2レジスタに、8番目の単位データが第1レジスタに移動される。そして、これら4つの単位データがリングレジスタ433の第4レジスタ~第1レジスタに格納されたのち、各レジスタに格納された単位データがMMアレイ410に書き込まれる。これによりX1ラインにおける、1番目と5番目の単位データがC4メモリ、2番目と6番目の単位データがC3メモリ、3番目と7番目の単位データがC2メモリ、4番目と8番目の単位データがC1メモリに一時保持される。 Similarly, when the fifth to eighth (n = 2) unit data is input, when the sixth to eighth unit data is input, the ring register 433 sequentially moves the data between the registers. The unit data is moved to the fourth register, the sixth unit data is moved to the third register, the seventh unit data is moved to the second register, and the eighth unit data is moved to the first register. Then, after these four unit data are stored in the fourth register to the first register of the ring register 433, the unit data stored in each register is written into the MM array 410. As a result, the first and fifth unit data in the X1 line are the C4 memory, the second and sixth unit data are the C3 memory, the third and seventh unit data are the C2 memory, the fourth and eighth unit data. Is temporarily stored in the C1 memory.
 このような動作をX1ライン1本分について実行すると、MMアレイ410のC4メモリにX1ラインにおける1,5,9,・・・,4m番目の単位データ、C3メモリにX1ラインにおける2,6,10,・・・,(4m+1)番目の単位データ、C2メモリにX1ラインにおける3,7,11,・・・,(4m+2)番目の単位データ、C1メモリにX1ラインにおける4,8,12,・・・,(4m+3)番目の単位データが一時記憶される。C1メモリ~C4メモリに一時記憶された画像データは、いずれもX1ラインの画像データであるが、データを4つ飛びに間引いてデータ数を1/4に減少させた圧縮画像データである。 When such an operation is performed for one X1 line, the 1,5, 9,..., 4mth unit data in the X1 line is stored in the C4 memory of the MM array 410, and the 2,6 in the X1 line is stored in the C3 memory. , (4m + 1) th unit data, 3,7,11, ..., (4m + 2) th unit data in the X1 line in the C2 memory, 4,8,12, in the X1 line in the C1 memory ..., (4m + 3) th unit data is temporarily stored. The image data temporarily stored in the C1 memory to the C4 memory is all X1 line image data, but is compressed image data in which the number of data is reduced to ¼ by skipping four data.
 X2ライン以降のX方向ラインについても、X1ラインと同様に同期して同時に実行させることができる。リード/ライトシーケンサ440は、例えば第2のロード/ストアユニット431に、X2ラインのアドレスを指定してX方向ライン1本分の画像データを読み出させる。ポート接続コントローラ432には、第2のロード/ストアユニット431と、リングレジスタ433における第5レジスタとを接続させる。リングレジスタ433に対しては、ポート接続コントローラ432からリングレジスタ433に入力されるX2ラインの単位データが4n-2~4n(nは1以上の整数)の場合には、X2ラインを構成する単位データが入力されるごとに、レジスタ間でデータを送り移動させるように設定する。 The X direction lines after the X2 line can also be executed simultaneously in synchronism with the X1 line. For example, the read / write sequencer 440 causes the second load / store unit 431 to specify the address of the X2 line and read the image data for one X direction line. The port connection controller 432 is connected to the second load / store unit 431 and the fifth register in the ring register 433. For the ring register 433, if the unit data of the X2 line input from the port connection controller 432 to the ring register 433 is 4n-2 to 4n (n is an integer of 1 or more), the unit constituting the X2 line It is set so that data is sent and moved between registers each time data is input.
 このような設定により、X2ラインにおける1~4番目の単位データは、1番目の単位データが第8レジスタに、2番目の単位データが第7レジスタに、3番目の単位データが第6レジスタに、4番目の単位データが第5レジスタに移動される。そして、これら4つの単位データがリングレジスタ433の第8レジスタ~第5レジスタに格納されたのち、各レジスタに格納された単位データがMMアレイ410に書き込まれる。これによりX2ラインにおける、1番目の単位データがC8メモリ、2番目の単位データがC7メモリ、3番目の単位データがC6メモリ、4番目の単位データがC5メモリに一時保持される。 With this setting, the 1st to 4th unit data in the X2 line are the first unit data in the 8th register, the 2nd unit data in the 7th register, and the 3rd unit data in the 6th register. The fourth unit data is moved to the fifth register. Then, after these four unit data are stored in the eighth to fifth registers of the ring register 433, the unit data stored in each register is written into the MM array 410. As a result, the first unit data in the X2 line is temporarily stored in the C8 memory, the second unit data is the C7 memory, the third unit data is the C6 memory, and the fourth unit data is temporarily stored in the C5 memory.
 5~8番目(n=2)の単位データが入力されたときも同様に処理され、5番目の単位データが第8レジスタに、6番目の単位データが第7レジスタに、7番目の単位データが第6レジスタに、8番目の単位データが第5レジスタに移動される。そして、これら4つの単位データがリングレジスタ433の第8レジスタ~第5レジスタに格納されたのち、各レジスタに格納された単位データがMMアレイ410に書き込まれる。これによりX2ラインにおける、1番目と5番目の単位データがC8メモリ、2番目と6番目の単位データがC7メモリ、3番目と7番目の単位データがC6メモリ、4番目と8番目の単位データがC5メモリに一時保持される。 When the fifth to eighth (n = 2) unit data is input, the same processing is performed. The fifth unit data is stored in the eighth register, the sixth unit data is stored in the seventh register, and the seventh unit data. Are moved to the sixth register and the eighth unit data are moved to the fifth register. Then, after these four unit data are stored in the eighth to fifth registers of the ring register 433, the unit data stored in each register is written into the MM array 410. As a result, the first and fifth unit data in the X2 line are the C8 memory, the second and sixth unit data are the C7 memory, the third and seventh unit data are the C6 memory, the fourth and eighth unit data. Is temporarily stored in the C5 memory.
 このような動作をX2ライン1本分について実行すると、MMアレイ410のC8メモリにX2ラインにおける1,5,9,・・・,4m番目の単位データ、C7メモリにX2ラインにおける2,6,10,・・・,(4m+1)番目の単位データ、C6メモリにX2ラインにおける3,7,11,・・・,(4m+2)番目の単位データ、C5メモリにX2ラインにおける4,8,12,・・・,(4m+3)番目の単位データが一時記憶される。C5メモリ~C8メモリに一時記憶された画像データは、いずれもX2ラインの画像データであるが、データを4つ飛びに間引いてデータ数を1/4に減少させた圧縮画像データである。 When such an operation is executed for one X2 line, the C8 memory of the MM array 410 has 1,5, 9,..., 4mth unit data in the X2 line, and the C7 memory has 2,6,6 in the X2 line. ,..., (4m + 1) th unit data, 3,6,7,11,..., (4m + 2) th unit data in the X2 line in the C6 memory, 4,8,12, in the X2 line in the C5 memory ..., (4m + 3) th unit data is temporarily stored. The image data temporarily stored in the C5 memory to the C8 memory is all X2 line image data, but is compressed image data in which the number of data is reduced to ¼ by skipping four pieces of data.
 リード/ライトシーケンサ440は、X3ライン~X5ラインについても上記X1ラインおよびX2ラインと同様の設定を行い、同様の処理を行わせる。MMアレイ410には、C12メモリ,C16メモリ,C20メモリにX3,X4,X5ラインにおける1,5,9,・・・,4m番目の単位データが一時記憶され、C11メモリ,C15メモリ,C19メモリにX3,X4,X5ラインにおける2,6,10,・・・,4m+1番目の単位データが一時記憶される。 The read / write sequencer 440 performs the same processing for the X3 line to X5 line as well as the same setting as the X1 line and X2 line. In the MM array 410, the 1st, 5th, 9th,..., 4mth unit data in the X3, X4, and X5 lines are temporarily stored in the C12 memory, C16 memory, and C20 memory, and the C11 memory, C15 memory, and C19 memory. , 4m + 1-th unit data in the X3, X4, and X5 lines are temporarily stored.
 以上の動作により、MMアレイ410には、C1~C4メモリに各々データ数を1/4に減少させたX1ラインの圧縮画像データ、C5~C8メモリに各々データ数を1/4に減少させたX2ラインの圧縮画像データ、C9~C12メモリに各々データ数を1/4に減少させたX3ラインの圧縮画像データ、・・・、C17~C20メモリに各々データ数を1/4に減少させたX5ラインの圧縮画像データが一時記憶される。 Through the above operation, the MM array 410 has the X1-line compressed image data in which the number of data is reduced to 1/4 in the C1 to C4 memories, and the number of data in the C5 to C8 memories is reduced to 1/4. X2 line compressed image data, C9 to C12 memory each reduced the number of data to X3 line compressed image data, ... C17 to C20 memory each reduced the number of data to 1/4 X5 line compressed image data is temporarily stored.
 そして、ポート接続コントローラ432により、第1のロード/ストアユニット431とC1メモリ、第2のロード/ストアユニット431とC5メモリ、第3のロード/ストアユニット431とC9メモリ、・・・、第5のロード/ストアユニット431とC17メモリを接続する設定とし、C1,C5,C9,・・・,C17メモリの単位マーチングメモリを順送り動作させて、例えばデータメモリ21に書き込むことにより、原画像データに対してデータ数を1/4に減少させた圧縮画像データを作成することができる。 Then, the port connection controller 432 causes the first load / store unit 431 and the C1 memory, the second load / store unit 431 and the C5 memory, the third load / store unit 431 and the C9 memory,. The load / store unit 431 and the C17 memory are connected to each other, and the unit marching memory of the C1, C5, C9,..., C17 memory is operated in order and written to the data memory 21, for example, to the original image data. On the other hand, compressed image data in which the number of data is reduced to ¼ can be created.
 なお、本実施形態では、複数並列に並べられた単位マーチングメモリのカラムの一端を入出力ポートとしたが、一端を入力ポート、他端を出力ポートとしてもよく、あるいは両端をともに入出力ポートとしてもよい。これらの場合に、ロード/ストアユニット431およびポート接続コントローラ432は、MMアレイ410を挟んで両側に設けられるが、リングレジスタ433は、いずれか一方としてもよい。 In the present embodiment, one end of a plurality of unit marching memory columns arranged in parallel is an input / output port, but one end may be an input port and the other end may be an output port, or both ends may be input / output ports. Also good. In these cases, the load / store unit 431 and the port connection controller 432 are provided on both sides of the MM array 410, but the ring register 433 may be either one.
 さて、以上では、マーチングメモリを画像処理装置に適用した場合について、いくつかの適用例を示して説明してきた。しかし、マーチングメモリはこれらの適用例に限らず、画像処理装置における他の部位や、処理形態が異なる他の情報処理装置等にも広く適用することが可能である。画像処理装置における他の部位への適用について、図6を参照して簡潔に説明する。図6は画像処理装置における信号の処理系統を大まかにまとめたブロック図である。 In the above, the case where the marching memory is applied to the image processing apparatus has been described with some application examples. However, the marching memory is not limited to these application examples, and can be widely applied to other parts of the image processing apparatus, other information processing apparatuses having different processing forms, and the like. Application to other parts of the image processing apparatus will be briefly described with reference to FIG. FIG. 6 is a block diagram schematically showing a signal processing system in the image processing apparatus.
 図示する信号処理系は、画像入力系510、画像処理系520、画像出力系530、画像処理装置に接続されるDRAM540、外部メモリ(MM)550、ストレージ560、および外部処理系570などから構成される。 The illustrated signal processing system includes an image input system 510, an image processing system 520, an image output system 530, a DRAM 540 connected to the image processing apparatus, an external memory (MM) 550, a storage 560, an external processing system 570, and the like. The
 画像処理系520は、CPU521、GPU(Graphics Processing Unit)522、コーデック523、DRAMコントローラ524、外部メモリコントローラ525、ストレージ用IP526、外部処理系用IP527などからなり、これらがバスで接続されたSoC(System on Chip)構成になっている。なお、図1を参照して説明した信号処理系との対比においては、概ね、CPU521がCPUコア部3、GPU522がDSPアレイ部4、DRAM540がDRAM21に相当する。 The image processing system 520 includes a CPU 521, a GPU (Graphics Processing Unit) 522, a codec 523, a DRAM controller 524, an external memory controller 525, a storage IP 526, an external processing system IP 527, and the like. System on Chip) configuration. In comparison with the signal processing system described with reference to FIG. 1, the CPU 521 generally corresponds to the CPU core unit 3, the GPU 522 corresponds to the DSP array unit 4, and the DRAM 540 corresponds to the DRAM 21.
 このような信号処理系において、画像入力系510、画像処理系520におけるCPU521および外部処理系用IP527、画像出力系530、外部メモリ550、ストレージ560、外部処理系570などにマーチングメモリを好適に適用することができる。すなわち、画像入力系510においては、撮像素子により撮影された画像データの一時記憶に、画像出力系530においては、画像処理系520から出力された画像データの一時記憶に、マーチングメモリを好適に適用することができる。画像処理系520におけるCPU521においては、例えば、顔認識や動く被写体のトラッキング等を行う際の画像データの一時記憶にマーチングメモリを好適に適用することができる。外部メモリ550やストレージ560、外部処理系570等についても同様であり、画像データの一時記憶にマーチングメモリを好適に適用することができる。 In such a signal processing system, the marching memory is suitably applied to the image input system 510, the CPU 521 and the external processing system IP 527 in the image processing system 520, the image output system 530, the external memory 550, the storage 560, the external processing system 570, and the like. can do. That is, the marching memory is suitably applied to the image input system 510 for temporary storage of image data captured by the image sensor and the image output system 530 for temporary storage of image data output from the image processing system 520. can do. In the CPU 521 in the image processing system 520, for example, a marching memory can be suitably applied to temporary storage of image data when performing face recognition or tracking of a moving subject. The same applies to the external memory 550, the storage 560, the external processing system 570, and the like, and the marching memory can be suitably applied to temporary storage of image data.
 以上説明したように、本発明が対象とするデータ群はストリームデータであり、隣接するデータ間で所定の関連性を有している。そのため、ランダムデータのように、各単位データに対して逐一アドレッシングを施して読み書きを行う必要がなく、アドレッシングやデータサーチ等に要する時間を排除することができる。また、マーチングメモリは複数のセルが連設されたカラムを単位とし、セルに入力された単位データが順次送り移動されて各セルに一時記憶される。このとき、単位データの入力および移動速度は、CPUの基準クロック等に対応させることもでき、高速で書き込みおよび読み出し動作させることができる。従って、本発明によれば、マーチングメモリの特長的な動作形態を活かした好適なアプリケーションを提供することができる。 As described above, the data group targeted by the present invention is stream data, and has a predetermined relationship between adjacent data. Therefore, unlike the random data, it is not necessary to perform addressing for each unit data one by one and read and write, and the time required for addressing, data search, etc. can be eliminated. In the marching memory, a column in which a plurality of cells are connected is used as a unit, and unit data input to the cell is sequentially transferred and temporarily stored in each cell. At this time, the input and movement speed of the unit data can be made to correspond to the reference clock of the CPU, and the writing and reading operations can be performed at a high speed. Therefore, according to the present invention, it is possible to provide a suitable application utilizing the characteristic operation form of the marching memory.
21 データメモリ
43 デジタル信号プロセッサ(DSP)
44 共有メモリ
100 第1実施形態における第1の態様の画像処理装置
120 演算器
150 バッファメモリ
151 ロードバッファ(マーチングメモリ)
152 ストアバッファ(マーチングメモリ)
200 第1実施形態における第2の態様の画像処理装置
210 レジスタファイル(マーチングメモリ)
211,212,213,・・・,21N 単位マーチングメモリ
220 演算器
400 第2実施形態の画像処理装置
410 MMアレイ(マーチングメモリ)
411,412,413,・・・,41N 単位マーチングメモリ
433 リングレジスタ(配列変更手段)
440 リード/ライトシーケンサ(シーケンサ)
510 画像入力系
520 画像処理系
530 画像出力系
21 Data memory 43 Digital signal processor (DSP)
44 Shared Memory 100 Image Processing Device 120 of First Mode in First Embodiment Operation Unit 150 Buffer Memory 151 Load Buffer (Marching Memory)
152 Store buffer (marching memory)
200 Image Processing Device 210 of Second Mode in First Embodiment Register File (Marching Memory)
211, 212, 213,..., 21N Unit marching memory 220 Calculator 400 Image processing apparatus 410 according to the second embodiment MM array (marching memory)
411, 412, 413,..., 41N Unit marching memory 433 Ring register (array changing means)
440 Read / write sequencer (sequencer)
510 Image Input System 520 Image Processing System 530 Image Output System

Claims (10)

  1.  複数の単位データからなるストリームデータを一時記憶するデータメモリと、
     演算器を有し前記データメモリから読み出したストリームデータに所定の情報処理を行うプロセッサとを備えた情報処理装置であって、
     前記プロセッサのレジスタファイルとしてマーチングメモリを備え、
     前記マーチングメモリは、複数の記憶領域が連設されたカラムを有し入力された複数の単位データを前記カラムの一端から順次隣接する記憶領域に送り移動させて各記憶領域に一時記憶する単位マーチングメモリが複数並列に設けられて構成され、
     前記マーチングメモリは、入力された複数の単位データを第1の前記単位マーチングメモリの各記憶領域に一時記憶し、
     前記プロセッサは、バッチ演算処理命令に基づいて、前記第1の単位マーチングメモリの各記憶領域に一時記憶された前記複数の単位データを、前記演算器により順次演算処理して処理後の各単位データを第2の前記単位マーチングメモリの各記憶領域に一時記憶させる情報処理装置。
    A data memory for temporarily storing stream data composed of a plurality of unit data;
    An information processing apparatus comprising a processor and a processor that performs predetermined information processing on stream data read from the data memory;
    A marching memory is provided as a register file of the processor,
    The marching memory has a column in which a plurality of storage areas are arranged in series, and a plurality of unit data that are input are sequentially moved from one end of the column to adjacent storage areas and temporarily stored in each storage area. A plurality of memories are provided in parallel,
    The marching memory temporarily stores a plurality of input unit data in each storage area of the first unit marching memory,
    The processor is configured to sequentially calculate the plurality of unit data temporarily stored in the storage areas of the first unit marching memory based on a batch calculation processing instruction, and sequentially process the unit data using the calculator. Is temporarily stored in each storage area of the second unit marching memory.
  2.  前記データメモリと前記プロセッサとの間に、共有メモリとして第2のマーチングメモリを備え、
     前記第2のマーチングメモリは、入力された前記複数の単位データを一時記憶する請求項1に記載の情報処理装置。
    A second marching memory is provided as a shared memory between the data memory and the processor,
    The information processing apparatus according to claim 1, wherein the second marching memory temporarily stores the input unit data.
  3.  前記プロセッサのバッファメモリとして第3のマーチングメモリを備える請求項1又は2に記載の情報処理装置。 3. The information processing apparatus according to claim 1, further comprising a third marching memory as a buffer memory of the processor.
  4.  複数の単位データからなるストリームデータを一時記憶するデータメモリと、
     演算器を有し前記データメモリから読み出したストリームデータに所定の情報処理を行う複数のプロセッサとを備えた情報処理装置であって、
     前記データメモリと前記複数のプロセッサとの間に、共有メモリとしてマーチングメモリを備え、
     前記マーチングメモリは、複数の記憶領域が連設されたカラムを有し、入力された複数の単位データを前記カラムの一端から順次隣接する記憶領域に送り移動させて各記憶領域に一時記憶する単位マーチングメモリが複数並列に設けられて構成される情報処理装置。
    A data memory for temporarily storing stream data composed of a plurality of unit data;
    An information processing apparatus comprising a plurality of processors having a computing unit and performing predetermined information processing on stream data read from the data memory,
    A marching memory is provided as a shared memory between the data memory and the plurality of processors,
    The marching memory has a column in which a plurality of storage areas are continuously arranged, and a unit for temporarily storing a plurality of input unit data by sequentially moving the input unit data from one end of the column to an adjacent storage area An information processing apparatus configured by providing a plurality of marching memories in parallel.
  5.  前記マーチングメモリにおける各前記単位マーチングメモリの入力ポートおよび/または出力ポートには、入力された複数の単位データの配列および/または出力する複数の単位データの配列を変更する配列変更手段が設けられる請求項4に記載の情報処理装置。 The input port and / or the output port of each unit marching memory in the marching memory is provided with an arrangement changing means for changing an arrangement of a plurality of unit data inputted and / or a plurality of unit data outputted. Item 5. The information processing apparatus according to Item 4.
  6.  前記配列変更手段は、各前記単位マーチングメモリの入力ポートおよび/または出力ポートに接続されたレジスタ群からなるリングレジスタと、前記リングレジスタの作動を制御するシーケンサとを有し、
     前記シーケンサは、情報処理装置が実行する情報処理のモードに応じて前記リングレジスタの作動を制御し、前記複数の単位データの配列を変更する請求項5に記載の情報処理装置。
    The arrangement changing means has a ring register composed of a register group connected to an input port and / or an output port of each unit marching memory, and a sequencer for controlling the operation of the ring register,
    6. The information processing apparatus according to claim 5, wherein the sequencer controls the operation of the ring register in accordance with an information processing mode executed by the information processing apparatus, and changes the arrangement of the plurality of unit data.
  7.  複数の単位データからなるストリームデータを一時記憶するデータメモリと、
     演算器を有し前記データメモリから読み出したストリームデータに所定の情報処理を行う複数のプロセッサとを備えた情報処理装置であって、
     前記プロセッサのバッファメモリとしてマーチングメモリを備え、
     前記マーチングメモリは、複数の記憶領域が連設されたカラムを有し、入力された複数の単位データを前記カラムの一端から順次隣接する記憶領域に送り移動させて各記憶領域に一時記憶する単位マーチングメモリが複数並列に設けられて構成される情報処理装置。
    A data memory for temporarily storing stream data composed of a plurality of unit data;
    An information processing apparatus comprising a plurality of processors having a computing unit and performing predetermined information processing on stream data read from the data memory,
    A marching memory is provided as a buffer memory of the processor,
    The marching memory has a column in which a plurality of storage areas are continuously arranged, and a unit for temporarily storing a plurality of input unit data by sequentially moving the input unit data from one end of the column to an adjacent storage area An information processing apparatus configured by providing a plurality of marching memories in parallel.
  8.  請求項1~7のいずれか一項に記載の情報処理装置と、
     撮像素子を有し前記情報処理装置に前記ストリームデータである画像データを入力する画像入力系と、
     前記情報処理装置により処理された画像データを出力する画像出力系と
    を備えたデジタルカメラ。
    An information processing apparatus according to any one of claims 1 to 7,
    An image input system having an image sensor and inputting image data as the stream data to the information processing apparatus;
    A digital camera comprising: an image output system that outputs image data processed by the information processing apparatus.
  9.  複数の単位データからなるストリームデータに所定の情報処理を行うプロセッサであって、
     マーチングメモリで構成されたレジスタファイルと、
     前記レジスタファイルに保持されたデータを用いて演算処理を実行する演算器と、を備え、
     前記マーチングメモリは、複数の記憶領域が連設されたカラムを有し、入力された複数の単位データを前記カラムの一端から順次隣接する記憶領域に送り移動させて各記憶領域に一時記憶する単位マーチングメモリが複数並列に設けられて構成されるプロセッサ。
    A processor that performs predetermined information processing on stream data composed of a plurality of unit data,
    A register file composed of marching memory,
    An arithmetic unit that performs arithmetic processing using the data held in the register file,
    The marching memory has a column in which a plurality of storage areas are continuously arranged, and a unit for temporarily storing a plurality of input unit data by sequentially moving the input unit data from one end of the column to an adjacent storage area A processor configured by providing a plurality of marching memories in parallel.
  10.  マーチングメモリで構成されたバッファメモリを備える請求項9に記載のプロセッサ。 10. The processor according to claim 9, further comprising a buffer memory composed of a marching memory.
PCT/JP2015/055050 2014-02-24 2015-02-23 Information processing device, digital camera, and processor WO2015125960A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2016504209A JP6319420B2 (en) 2014-02-24 2015-02-23 Information processing apparatus, digital camera and processor

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2014032724 2014-02-24
JP2014-032724 2014-02-24

Publications (1)

Publication Number Publication Date
WO2015125960A1 true WO2015125960A1 (en) 2015-08-27

Family

ID=53878456

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2015/055050 WO2015125960A1 (en) 2014-02-24 2015-02-23 Information processing device, digital camera, and processor

Country Status (2)

Country Link
JP (1) JP6319420B2 (en)
WO (1) WO2015125960A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003346138A (en) * 2002-05-22 2003-12-05 Sony Corp Image processor and image processing method
JP2012159903A (en) * 2011-01-31 2012-08-23 Fujitsu Semiconductor Ltd Data processing system, data-processing device and data processing method
JP2012533784A (en) * 2009-07-21 2012-12-27 維男 中村 High-speed computer with low energy consumption and no memory bottleneck

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04100358A (en) * 1990-08-17 1992-04-02 Matsushita Electric Ind Co Ltd Cell transfer circuit
JPH04293151A (en) * 1991-03-20 1992-10-16 Fujitsu Ltd Parallel data processing system
JP4186561B2 (en) * 2002-04-25 2008-11-26 ソニー株式会社 Image processing apparatus and method
JP4264526B2 (en) * 2002-05-23 2009-05-20 ソニー株式会社 Image processing apparatus and method
JP4264527B2 (en) * 2002-05-23 2009-05-20 ソニー株式会社 Image processing apparatus and method
JP2004013873A (en) * 2002-06-03 2004-01-15 Sony Corp Image processor
JP4264529B2 (en) * 2002-07-19 2009-05-20 ソニー株式会社 Image processing apparatus and method
JP4264530B2 (en) * 2002-07-19 2009-05-20 ソニー株式会社 Image processing apparatus and method
JP2004118713A (en) * 2002-09-27 2004-04-15 Sony Corp Image processing apparatus
JP2004118822A (en) * 2002-09-27 2004-04-15 Sony Corp Image processing apparatus
JP2004127227A (en) * 2002-10-04 2004-04-22 Sony Corp Image processing apparatus
JP2004145838A (en) * 2002-10-25 2004-05-20 Sony Corp Image processor
JP2010054939A (en) * 2008-08-29 2010-03-11 Toshiba Corp Information processor and image signal processing method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003346138A (en) * 2002-05-22 2003-12-05 Sony Corp Image processor and image processing method
JP2012533784A (en) * 2009-07-21 2012-12-27 維男 中村 High-speed computer with low energy consumption and no memory bottleneck
JP2012159903A (en) * 2011-01-31 2012-08-23 Fujitsu Semiconductor Ltd Data processing system, data-processing device and data processing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TETSUO HIRONAKA ET AL.: "Benchmarking a Vector-Processor Prototype Based on Multithreaded Streaming/FIFO Vector (MSFV) Architecture", PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS'92, 1992, pages 272 - 281, XP055221440 *

Also Published As

Publication number Publication date
JPWO2015125960A1 (en) 2017-03-30
JP6319420B2 (en) 2018-05-09

Similar Documents

Publication Publication Date Title
JP4893621B2 (en) Signal processing device
CN108388527B (en) Direct memory access engine and method thereof
US20060161720A1 (en) Image data transmission method and system with DMAC
US10545894B2 (en) Information processor with tightly coupled smart memory unit
WO2022179074A1 (en) Data processing apparatus and method, computer device, and storage medium
US20140253598A1 (en) Generating scaled images simultaneously using an original image
US20160070642A1 (en) Memory control and data processing using memory address generation based on differential addresses
JP2001084229A (en) Simd-type processor
US20200372332A1 (en) Image processing apparatus, imaging apparatus, image processing method, non-transitory computer-readable storage medium
JP2010244096A (en) Data processing apparatus, printing system, and program
JP6319420B2 (en) Information processing apparatus, digital camera and processor
JP6294732B2 (en) Data transfer control device and memory built-in device
JP2021012596A (en) Calculation processing device and calculation processing method
JP5675278B2 (en) Data processing apparatus and image processing apparatus
JP5196946B2 (en) Parallel processing unit
US7523264B1 (en) Apparatus, system, and method for dependent computations of streaming multiprocessors
JPH07210545A (en) Parallel processing processors
US20180336147A1 (en) Application processor including command controller and integrated circuit including the same
JP5358315B2 (en) Parallel computing device
CN113781290B (en) Vectorization hardware device for FAST corner detection
CN102622318A (en) Storage controlling circuit and vector data addressing method controlled by same
JP2011103025A (en) Data input/output device and data processing apparatus using the same
Kamesaka et al. Design of the convolution layer using HDL and evaluation of delay time using a camera signal
JP4244619B2 (en) Image data processing device
US20180095877A1 (en) Processing scattered data using an address buffer

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15752558

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2016504209

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15752558

Country of ref document: EP

Kind code of ref document: A1